National tibrary Bibliothèque nationale l*l of Canada du Canada Acquisitions and Acquisitions et Bibliographic Services services bibliographiques 395 Wellinglon Street 395,rue Wellington Ottawa ON KIAOW Onawa ON K1A ON4 Canada Canada Vour Ne Votre rdlemnco

Our liki Noue reldienw

The author has granted a non- L'auteur a accordé une licence non exclusive licence dowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or seil reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or elec~onicformats. la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or othenvise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract

We are overwhelmed by the growing availabiiity of information sources in this information age. It becomes harder for peopIe to extract relevant and useful infomation fiom information sources such as the Web. It also becornes harder for people to distribute their information to the relevant people who will find it usehl through the Internet. Information retrieval and information filtering are comrnonly used techniques to address the information overload problem. This thesis deals with a community-based rnulti-agent information retrieval and provision system. It focusses on the development of a keyphrase-based information sharing system for a commwiity of mobiie agents.

Ln this thesis we first review the concept of software agents and their applications.

Subsequently, we describe the information sharing activities in our daify life and the requirements that our proposed system should possess. The agents are classified into three

categories and their data structures are given. The proposed infomation sharing systern is

implemented using the programming language and its muitithreading technique. Finaliy,

we incorporate the cosine measure method into the Agent-based Cornrnunity-Onented

Routing Network system (ACORN). We test and evaluate our system by using actual data,

which venfies that this system has fuIfilled the design requirements. We incorporate the

cosine measure method into ACON and compare the test resuiis of ACORN before and

after the incorporation of the cosine measure method. The results indicate that the ACORN

. F .. , . ...------A L-.A- :-1----4;-~ -$+ha C;-;~O~-I marnc~~teme+hnd lIlLO~TIlaUunsriaruiii; iS iiiipiuvcu uy uii; iiikurpuioiiuii U& uir JU~~~~~.~.ri--rri. Acknowledgments

I would like to express my special thanks to my supervisors Dr. V.C. Bhavsar,

Dr. A. Ghorbani, and Dr. S. Marsh for their guidance and assistance throughout this thesis research and writing.

1 would aiso like to thank my parents, and my daughter Angela Mengyang Yu for their constant encouragement, patience, understanding, and support. Table of Contents .. Abstract ...... 11

.. . Acknowledgrnent...... 111

... List of Tables ...... VIII

List of Figures ...... ix

Chapter 1 Introduction 1 . . 1.1 Background and Motivation ,...... ,+...... 1 . . 1.2 Thesis Objectives ...... 4 1.3 Thesis Outline...... 5 Chapter 2 Software Agents 7 2.1 Definition...... 7 2.2 Classification ...... ,...., ...... S , . 2.3 Applications ...... 10 2.3.1 Electronic Mail ...... -...... 1 O 2.3.2 Usenet Netnews ...... 1 1 2.3.3 Web Browsing ...... 1 I 2.3.4 Electronic Commerce ...... 12 2.3.5 Entertainment ...... *...... 1 3 2.3.6 Other Activities ...... 14 2.4 Information RetrievaI and Information Filtering ...... ,...... 15 2.4.1 Information Retrieval ...... ,,...... ~...... ,,,,...... ,, 15 . . 2.4.2 Information Filtering ...... 17 2.5 Community-based Agent Systems ...... 19 2.5.1 ACORN ...... 19 2 5 1 . 1 Architecture ...... 20 . . 2.5.1 2 Migration and Interactions ...... 23 2.5.2 Yenta ...... 24 2.5.3 Kasbah ...... 25 2.6 Concluding Remarks ...... 27

Chapter 3 Keyphrase-bascd Information Sharing 2 8 3.1 Introduction ...... 28 3.2 Scenarios ...... 29 3.3 Concept of the Keyphrase-based Information Sharinç...... 32 3.4 System Architecture ...... 32 3.1.1 Agent Categories...... 32

-! 9 3.4.2 The Café...... JJ - . 3.3 Privacy ...... 36 3.6 Concluding Remarks ...... 36

Chapter 4 Agents and the Café 38 4.1 Overview...... 38 4.2 Agent Representation...... 39 4.2.1 Agent Data Structure...... 39 4.2.2 User Profile...... 44 4.2.3 Metadata ...... :...... 47 4.3 The Cafe...... 48 . . 4.3.1 Similarity Measures ...... 49 4.3.2 Information Sharing...... 51 4.3 Surnmary ...... 58 Chapter 5 ImpIementation. Test and Evaluation 60 5.1 Introduction ...... 60 5.2 Implementation of Agents ...... 60 5.2.1 Agent Class ...... 60 5.2.2 AgentCore Class...... 62 5.2.3 KeyPhraseVector Class ...... 63 5.7.4 DublinCore Class...... 67 5.3 The Café 67 5.3.1 LetAgentin Thread ...... 69 5.3.2 IntoSharing Thread ...... 69 5.3.3 TimeChecker Thread ...... 70 5.3.4 RemoveAgentout Thread ...... 71 5.3.5 Cafe Class ...... 72 5.4 Test and Evaluation ...... 75 5.4.1 Comparison of Similarity Measure Methods ...... 76 5-12 System Test ...... 53 5.4.3 System Evaluation ...... 57 5.5 Conclusion ...... 91

Chaptcr 6 ,\CORN 93 6.1 Introduction ...... 93 6.7 Cosine Measure Method ...... 93 6.3 Simulations ...... 94 6.3.1 Test Data ...... 95 6.3.2 Results ...... 97 6.4 Concluding Remarks ...... 101

Chaptcr 7 Conclusions and Future Work 102 7.1 Summary of the Thesis Work ...... 102 7.2 Future Work ...... 103 References 105

Appendix A: Test Data ...... 1 10 . . Appendiz B: Mingling of Agents ...... i 20

vii List of Tables

Table 5.1 Relationship between relevant and non-relevant documents ...... 89 Table 5.2 Average precision and recall under various similarity thresholds ...... 91 Table 6.1 Mingle and processing time of ACORN ...... 97 List of Figures 4.1 Data structure for agents ...... 42 4.2 Information sharing between two SearchAgent S, and S, ...... 54 4.3 Information sharing between a SearchAgent SI and an InfoAgent 1, ...... 54 4.4 Information sharing between a SearchAgent SIand a SearcWInfoAgent SI,...... 56 4.5 Information sharing between two InfoAgent 1, and Iz...... 56 4.6 Information sharing between an InfoAgent 1, and a SearchhfoAgent SI, ...... 57 4.7 Information sharing between two SearcMInfoAgent SI, and SI ,...... 57 5.1 Class card for Agent ...... 61 5.2 Class card for AgentCore ...... 63 5.3 Class card for KeyPhrases ...... 66 5.4 Class card for KeyPhraseVector ...... 66 5.5 Class card for DublinCore...... 68 5.6 Class card for Cafe...... 72 5.7 Similarities of article 3 from category 1 with al1 articles ...... 78 5.8 Similarities of article 12 from category 2 with al1 articles ...... 79 5.9 Similarities of article 23 from category 3 with al1 articles ...... 80 5.10 Similarities of article 35 from category 4 with al1 articles...... 81 5.11 Similarities of article 46 from category 5 with al1 articles ...... 52 5.12 The sequence of agents entering and leaving the Café ...... 85 5.13 The sequence of agents entering and leaving the Cafe ...... 86 5.14 The sequence of agents entering and leaving the Café ...... 88 . . 5.15 Average recall-precision graph ...... 91 6.1 Example of agents created by machine...... 1...... 95 6.2 The execution time for the whole process and information sharing in ACORN with and without the cosine measure method ...... 99 6.3 Mingle results of ACORN with and without the cosine measure method ...... 100 Chapter 1 Introduction

1.1 Background and Motivation

Information is playing an incrrasingly important role in our lives and has become an instrument that can be used to solve many problerns. Great changes are taking place in the area of information supply and demand due to the widespread application of computers and the exponential increase of computer networks such as the Intemet. The first big change is related to the form information is available in. In the past, paper was the most frequently used medium for information. It is still very popular. However, more and more information is becoming available through electronic media.

The second big change involves the amount of information, the number of information sources, and the ease with which information can be obtained. In recent years, we have seen an explosive growth in the sheer volume of information. Davies and Weeks (1995) report that by 1982, the volume of scientific, corporate, and technical information was doubling every 5 years. Six years later, Le., 1988, it was doubling every 2.2 years, and by 1 992, every

1.6 years. This trend suggests that it should now be doubling every year. What is more, much of this information is now accessibleelectronicallyon the Internet and the World Wide Web, which are in the situation of explosive growth. Al1 of these changes have made information distribution and demand process much easier, but at the sarne time these changes have made the process of demanding and distributing information more complicated and difficult. The exponential increase of computer network systems has resulted in a corresponding exponential increase in the amount of information available on-line. It is true that the more information is available, the harder it is to get what you want. Again, the on-line information is disûibuted among heterogeneous sources, is ofien unstructured, and is continuously changing. The problems we are facing right now are how to extract relevant, useful, and interesting information from sources such as the Web and how to distribute our information to the relevant people through the Intemet.

ïhe problem we are facing today with the on-line information overload is similar to that of print Iibranes. In the past, as the first libraries were built and books became available to larger groups of people, a problem arose. Once buildings could be filled with more piles of books than any human could possibly read in a lifetime, how could people find books on the topics they wanted? The solution to that problem evolved into an entire field called Library

Science. Similarly, new theories or tools are needed to assist people with the online information overload.

Currently, two different technologies are comrnonly developed and used to address the

information dernand problem, but fewer systerns exist for distributing information. The

existing solutions for the information dernand problem cm be either information retrieval

or information filtering. Information retrieval (IR) systems are concerned with locating documents relevant to the information needs of users. Intemet search engines which use indexing databases to store an efficient and compact representation of a large nurnber of Web documents are popular IR systems. Several search engines for the Web, such as Aha Visra, Yahoo, and Elecîronic

Library, are available ta the public to help users in their search. Search engines differ in the way they buiId their databases (either using Web spiders or by subscription) and how they keep them up-to-date. Some search engines try to keep a record of the entire Web, ivhile othcrs try to download just a fiaction of the whde Web but with more up-to-date information. The main limitation of standard search engines is that they assume that the

environment remains unchanged, which is obviously not tme since the Web is continually being updated. Therefore, some index tables may contain a portion of incorrect information.

Lawrence and Giles (1 998) estimate that main searchengines return, as an answer to aquery,

a significant percentage of invaIid documents (berween 2% and 9% of dead links). In

addition, the same siudy shows that ody a fraction of the total nurnber of documents are

indexed. Although the coverage varies with different search engines, no one indexes more

thanone third of the indexable Web. Evidence aIso suggests that the coverage is increasingly

slower than the size of the Web.

Information filtering (IF) systems deal with the defivery of information that is relevant to the

user in a timely marner. They assist users by filtering the data Stream and by delivering the

relevant information to the users based on their interest profiles. The information filtering

systems use many of the sarne techniques as IR systems, but IF systems are optimized for long-term information needs from a Stream of incoming documents. IF systems build user profiles to describe the documents that should (or should not) be presented to the users.

Simple examples of IF systems include "kill files" that are used to filter out advertising and e-mail filtering sofhvare that sort e-mail into priority categories based on the sender and the subject, and whether the message is persona1 or sent to a list. More complex IF systems provide periodic personalized digests of matenal trom sources such as tvires, discussion lists, and web pages.

One embodiment of IF technique is the sofhvare agent. These software agents exhibit a degree of autonomous behavior, and attempt to act intelligently on behalf of the user for whom they are working. Agents maintain user interest profiles by updating them based on the feedback on whether the user likes the items selected by the current profile.

Thesis Objectives

The main objective of this thesis is to design and implement an information sharing systen for the community of mobile agents based on keyphrases. To fulfil this objective, the most important task is to determine the similarity between two agents since only relevant agents share information with each other.

Another objective is to incorporate an appropriate similarity measure method into ACORN

(Agent-based Community-Oriented Routing Network) to improve its information sharing

&i!ih;. The cu~pntP,CQ,RI.fispr 2 simple h~~!enntpsh_rarp matchin: methcid.

4 1.3 Thesis Outline

The rest of this thesis is organized as follows.

Chapter 2 introduces the concept of sohvare agents and their applications. It also introduces some comrnunity-based agent systems related to ihis project.

Chapter 3 presents several scenarios of information sharing activities. It classifies agents into three categones based on the role of the agents in the information sharing activities. It also describes al1 of the functions that the system should have.

Chapter 4 describes the design of the keyphrase-based information sharing system. It examines al1 aspects of the agents, including the agent data structure, the representations of users interests, and the ccntents of the documents to be distributed. The design pnnciples of the Café and the method used to detennine the degree of relevancy behveen agents are given. Finally, it presents the schemes used by agents to exchange information.

Chapter 5 describes the implementation of the keyphrase-based information sharing system.

It presents the detailed implementation of agents as well as the Café. Finally, the results of testing and evaluating the system with real data are presented.

Chapter 6 first describes the incorporation ofthe cosine measure method into ACORN. It then

=-----.nrpwntc*- the--- cnmpirilrnn----- rew!ts nf A fORNhefore and afier the incorporation ofthe sirnilady

5 rneasure method by using reaI and machine-created data.

Chapter 7 contains the concluding remarks and presents some directions for future work. Chapter 2 Software Agents

Sofhvare agents have become increasingly popular in a variety of research projects and applications. However, to many, it is unclear what agents are and what they can do. In this chapter, we present concepts of software agents and their classifications and applications.

Then, we describe some current and recent sohvare agent systems related to our work.

2.1 Definition

The term "so~vareagent" has become popular both in the press and in indusûy, but its definition is much controversial. It is getting increasing difficult to differentiate a system which is a software agent fiom a system which is not because almost every systern that performs a function and is described as a black box cm be labelcd as a sofhvare agent

(Moukas, 1991). Based on Lieberman (1997), a software agent is a program that can be considered by the user to be acting as an assistant or helper, rather than as a tool in the manner of a conventional direct-manipulation interface. A software agent should display some of the characteristics that we associate with human intelligence: learning, inference, adaptability, independence, creativity, etc. In other words, a software agent is a program that its users can delegate tasks to, rather than commanding it to perform the tasks (Maes, 1997).

Any software agent must have the following characteristics: It must be autonornous, which means that it cmoperate on its own without a direct

human command. Software agents have individual intemal States and goals, and tiiey

act in such a manner as to meet their goals on behalf of their users.

Tt must be personalized; that is, it acquires the user's interests and adapts as it evolves

over tirne.

It must be persistent, either run continuousIy or Save its state, so that the user cm see the

agent as a stable entity and develop a trust relationship with it.

2.2 Classification

As to the classification, al1 existing software agents can be put into different agent classes based on what classification rules are used. For example, agents can be classified as either deliberative or reactive. Deliberative agents are derived from the deliberative thinking paradigm. They possess an internal syrnbolic, reasoninç model, and they engage in planning and neçotiation in order to achieve coordination with other agents. Reactive agents, on the contrary, do not have any internal symbolic models of their environrnent, and they act using a stimulus/response type of behavior by responding to the present state of the environrnent in which they are embedded (Ferber, 1994).

Nwana (1996) uses three prirnary amibutes (autonomy, leaming, cooperation), which he thinks any agents should exhibit, plus agent mobility and role, etc., to classiQ agents. He identifies seven types of agents: Collaborative agents

Interface agents

Mobile agents

Information/Intemet agents

Reactive agents

Hybnd agents

Smart agents

\Ne use software agent mobility as a classification rule ro classi@ al1 software agents into

stationary or rnobile. The stationay agent does not or cannot move. It can execute only on the system on which it begins execution. IEit needs information that is not on that system or needs to interact with an agent on another system, it typically uses a communication

mechanism such as remote procedure calling. On the other hand, a mobile agent is not bound

to the system on which it begins execution. It is free to travel among the hosts in the nehvork.

Created in one execution environment, it can transport its state and code with it to another

execution environment in the nehvork where it resumes execution. The term 'state' typically

means the attribute values of the agent that help it to determine what to do when it resumes

execution at its destination. Code in an object-oriented context means the class code

necessary for an agent to execute.

A mobile agent has the unique ability to transport itself fiom one system in a network to

another in the sarne network. This ability allows it to move to a system containinç an object with which it wants to interact and then to take advantage of being in the same host or network as the object (Lang and Oshima, 1999).

2.3 Applications

Software agents have been widely researched and used in many areas. Following, we sumrnarize their applications.

2.3.1 Electronic Mail

Software agent systems have been developed to help users with their e-mail. Maxims (Maes,

1977) is an agent that assists the user with e-mail. It comrnunicates with the commercial e- mail package Eitdor-a using Apple Everrts and lems to prioritize, delete, forward, sort, and archive mail messages on behalf of the user. Mernory-based reasoninç is used as the main learning techniques, and the agent continuously 'looks over the shoulder' of the user as the user deals with e-mail. The agent memorizes al1 of the situation-action pairs generated by the user with which he or she performs actions. When a new situation occurs, which cm be due to the user taking an action or due to some extemal event such as a message arriving, the agent will try to predict the action of the user based on the examples stored in its memory.

The agent compares the new situation with the mernorized situations and iries to find the set of nearest neighbors. The closest situations contribute to the decision of which action to suggest or take in the current situation. 2.3.2 Usenet Netnews

Handling Usenet news is also a large application area for sofhvare agents. For example,

GrozrpLerrse, developed by Konstan et al. (1997), is one sample application of software agents in this area. GroupLense is a collaborative filtenng system that provides an open architecture wherein people can rate articles, and distribute the rates to other users agents on the Net through a specialized newsgroup that only carries article ratings.

Another exarnple is NewT developed by Sheth (1994) to help users filter and select articles fiom a continuous strearn of Usenet Netnews. The basic idea of this system is to have the user create one or many new agents (e.g., one agent for sports news, one for financial news, etc.) and t~ainthem by example (Le., by presenting to them positive and negative examples of what should or should not be retrieved). It is message-content and keyword-based, but it also exploits other information such as author and source.

2.3.3 Web Browsing

Probably one of the more widely useful applications of software agents is helping users

browse the Web. Since more and more information is becoming available on the World Wide

Web, users are becoming more and more desperate for tools that will help them find relevant,

interesting information on the Web.

Lerizia (Liebemann, 1995) is a user interface agent system that assists users to browse the

Web. As a user operates a conventional Web browser such as Netscape or Mosaic, the agent

11 tracks the user's behavior and attempts to anticipate what items may be interest to the user.

It uses a simple set of heuristics to mode1 the user's browsing behavior. Upon request, it can display a page containing its cunent recommendations, which the user cm follow.

Let's Brorvse (Liebermann et al., 1999) is a collaborative web browsing agent system designed to assist a group of people in browsing the Web. It features automatic detection of the presence of users, automated 'channel surfing' browsing, and dynarnic display of the user profiles and explanation of recornmendations.

Armstrong et al. (1 995) have developed an agent system caIIed Web ?Vatclter to assist users locate information on the Web. The system can easily be installed on any HTML page by inserting a hyperlink to the WebWatcher server. A user enters a WebWatcher server by clicking on a hyperlink to the WebWatcher server and cm specify his or her interest by giving keywords. Then, Web Watcher takes the user back to the page from which he or she entered the system. Frorn now on, WebtVatcher follows the user's actions and suggests hyperlinks using its leamed knowledge. It also offers other useful functions, such as adding new hyperlinks to the cunent page based on the user's interests, suggesting pages related to the current page, and sending e-mail messages to the user whenever specified pages change.

2.3.4 Electronic Commerce

The Internet and the Web represent an increasingly important channel for retail commerce as well as business-to-business transactions. MorerecentIy, agents have been applied to assist people conduct their electronic commerce activities, such as buying, selling, or some other transactions on the Web (Maes et al., 1999).

AucrionBot (auction.eecs.urnich.edu) is a multi-purpose Intemet auction server developed at the University of Michigan. Its users create new auctions by choosing fiom a selection of auction types and then specifying its parameter (such as clearing time, method for resolving tie bids, and number of sellers permitted). Buyers and sellers can then bid according to the auction's multilateral distributive negotiation protocols. In a typical scenario, a seller bids a reservation price after creating the auction and lets AuctionBot manage and enforce buyer bidding according to the auction's proposals and parameters.

Firefly (wmv.firefly.com) helps consumers find products. It recommends products through

an automated "word-of-mouth" recommendation mechanism. The systern first compares a

shopper's product ratings with those of other shoppers. After identifiing the shopper's

"nearest neighbors," or users with similar taste, the system recomrnends the products the

neighbors rated highly but which the shopper may not yet have rated, possibly resulting in

serendipitous finds. It uses the opinions of like-minded people to offer recommendations of

commodity products such as music and books, as well as more-difficult-to-charactuize

products such as Web pages and restaurants.

2.3.5 Entertainment

Software agent systems have been deveIoped to help people select movies, books, and

13 television and radio shows based on their persona1 tastes. Ringo (Shardanand, 1991) is a personalized music recornmendation system. The agents in Ringo use a "social filtering" technique. They do not attempt to correlate the user's interests with the contents of the items recommended. Instead, they rely solely on correlations between different users.

In Ringo, every user has an agent to memorize which music albums its user has evaluated and how much the user liked them .Then, agents compare themselves with other agents. An agent finds others agents that are correlated, that is, agents that have values for similar items and whose values are positively correlated to the values of this agent. Agents accept recommendations from other related agents.

2.3.6 Other Activities

There are also other systems that help computer users in different activities. Kozierok and

Maes (1993) describe an interface agent, Calendar Agent, for scheduling meetings; it is attachable to any application provided it is descriptable and recordable. Calendal- Agent assists its user in scheduling meetings, which involves accepting, rejecting, scheduling, negotiating, and rescheduling meeting times. It really comes into its element because it cm learn, over time, the preferences and commitments of its user, e.g., she does not like to attend meetings on Friday aaftrnoons, he prefers meetings in the morning, etc. The leaming techniques ernployed are memory-based leaming and reinforcement leaming. 2.4 Information Retrieval and Information Filtering

In Chapter 1, we have mentioned that information retrieval (IR) and information filtering (IF) are the two technologies that are currently used to deal with the on-line information overload problem. Each focuses primarily on a particular set of tasks or questions. Based on Belkin and Crofi (1992), information retneval and filtering systems are the same in that both ~ryto retum useful and relevant information to the users. However, IR systerns usually focus on one-time queries, while IF systems try to fulfill long-term user goals. In this section, we compare these two technologies.

2.4.1 Information Retrieval

Information retrieval is the process of identieing and retrieving relevant documents based on a user's query. An IR system consists of three basic elemcnts: a document representation, a query representation, and a measure of similarity behveen queries and documents. The document representation provides a formal description of the information contained in the documents, the query representation provides a forma1 description of a user's information need, and the similarity measures define the niles and procedures for matching the query and relevant documents. These three elements collectively define a retrieval model. The most cornmon models include the Boolean model, the vector space model, and the probabilistic mode].

in a Boolean retrieval model, a document is represented as a set of terrns, 4 = {fi, ..., t,), where each t, is a term that appears in thejfh document df A query is represenkd as a boolean expression of terms using the standard boolean operators and, or, and not. A document matches the query if the set of terms associated with the document satisfies the Boolean expression that represents the query. The result of the query is the set of matchinç documents. Boolean retrieval systems are also called "exact-match" systems because an exact match is needed betiveen the textual elements of the query and the contents of the database elements that will be retrieved.

The vector space model enhances the document and query representations of the boolean retrieval model by assigning a weight to each term that appears in a document and que.. A document and query are then represented as a vector of term weights. The number of dimensions in the vector space is equal to the number of terms used in the overall document collection. One of the advantages of the vector space model is that, unlike Boolean retrieval model systems, the retrieved documents can be ranked based on their relevance.

The probabilistic retneval model is based on the probability ranking principle, which States that an information retrieval system is most effective when it responds to information need with a list of documents ranked in the decreasing order of probability of relevance, and the probabilities are estimated as accurately as possible given al1 the available information. in this model, the answer to a query is generated by estimating P(re1evantld) (the probability of the information need being satisfied given document d) for every document, and ranlung the documents according to these estimates. Using Bayes' theorem and a set ofindependence assumptions about the distribution of terms in documents and queries, P(reievant /d)cm be expressed as a fùnction of the probabilities of the terms in d appearing in relevant and non-relevant documents. Different independence assumptions lead to different forms of the funcrion.

2.4.2 Information Filtering

[nforrnation filtering systems usually share the following characteristics (Belkin and Croft,

1992):

They involve with large incoming streams of data

They primarily deal with unstructured or semi-stmctured textual data

They are based on some sort of predefined filter

Their objective is to prune data that does not match the filter, rather than to locate data

Several di fferen t architectures have been proposed for building effective and efficient filtering systems. They cm all, however, be classified under two broad categories:

* Content-based filtering, in which the systeni actually processes a document and tries to

extract useful informstion about its content. The techniques used in content-based

filtering can Vary greatly in complexity. Keyword-based searching invoIves rnatching

different combinations of keywords. It is one of the simplest techniques. The

Newsweeder system (Lang. 19951, for example, was designed for filtering in USENET

17 newsgroups. Statistical keyword analysis represents a more advanced fom of filtering,

in which the stop-words are removed fiom the document, and the rest of the words are

stemrned, vectorized, and given a weight based on their significance. This form of

representation was first used in the SMART system and has become the most popular

representation form.

Collaborative filtering, in which the system recommends documents to a user based on

the opinions of other users. In their purest fom, collaborative filtenng systems do not

consider the content of the documents at all. Instead, they rely exclusively on the

judgement ofhumans as to whether the document is valuable. In this way, collaborative

filtering attempts to recapture the cross topic recommendations that are cornmon in

communities of people.

Tapestry (Goldberg, et al., 1991) is one of the first cornputer-based information filtering systems. It accepts the ratings or annotations of users for documents such as e-mail and

Netnews. As users read documents, they may attach annotations to documents. The filters that search the annotations for interesting articles, however, are constructed by the end user using a query language. The query may involve many different critena, including keywords, subjects, authors, and the Iike, and annotations given to the document by other people.

Therefore, they make it possible to request documents approved by others. However, shouId users want to retrieve documents approved by people similar to themselves, they must know who these similar people are and specifically request documents annotated by those peopk.

That is, in this system, users still need to know those people who have tastes similar to their own. Thus, collaborative filtering is left to the user: Tapesby only provides an on-line architecture for facilitating the process "word of mouth."

Collaborative filtering systems for large cornrnunities have used statistical techniques to provide persona1 recornmendations of documents by finding a group of other users, known as neighbors, that have a history of agreeing with the target user. Once a neighborhood of users is found, particular documents can be evaluated by forming a weighted composite of the neighbors' opinions of that document. Similarly, when a user requests recommendations for a set of documents to read, the system cm return a set of documents that is popular within the neighborhood. These statistical approaches, known as automated collaborative filtering, typically rely upon ratings as numerical expressions of user preference.

Several ratings-based automated collaborative filtering systerns have been developed. The

CroupLens (Konstan, et al., 1997) provides a pseudonymous collaborative filtering solution for Usenet news and movies. Ringo is an e-mail system that generates recornmendations on music using collaborative filtering technology,

2.5 Community-based Agent Systems

2.5.1 ACORN

The system ACORN (Marsh, 1997) provides an agent-based architecture using community- based approaches for information retrieval and provision across networks. It is based on the assumption that a mixture of consumer pu11 and producer push, coupled with a tight control of information spread, will allow people to keep up-to-date with topics, and will allow the producers of information to get their information in a timely fashion to those who will find it relevant. The agents in the system are autonomous. They make their own decisions about what to do based on information they receive from their creators and fiom the data they get

from other agents of community.

2.5.1.1 Architecture

ACORN is a multi-agent-based system. It contains both static and mobile agents that work

together to fulfil the information search and distribution task for their users. The static agents

in ACORN include server, client, and café, and the mobile agents are composed of InfoAgent

and SearchAgent. A server plays the role that controis when agents get into and get out of

the site. The café exists to help mobile agents find other like-minded agents and share their

information with each other.

A. Static ACON Agents

The static agents in ACORN architecture are used to control mobile agents migration,

communication, information sharing, and so fortfi. A briefdescription about each static agent

in ACORN is given below. CLient

The Client of A CORN has nvo roles. The first role acts as a user interface to the ACORW architecture when agents are created or exarnined, and the second role acts as a filter of incoming information.

The first role ofthe Client involves the creation of InfoAgent or SearchAgent for distribution on the ACORVnetwork. The creation for both mobile agents is similar and involves getting

information from both the user and the Client. When a user wants to search for some kind information, helshe informs the Client in the form of a query. The Client creates a

SearchAgent based on the query. Similarly, InfoAgents are created for new pieces of

information to be distributed.

The second role of the Client is as a buffer between a user and the system. Any mobile agent

wishing to communicate with a user first contact the Client. The Client filters the incoming

agents based on the user's interest. It presents only the relevant agents to the user.

Server

Much like a Web semer, the ACORN Server resides at the point of entry fiom a network to

a site. Ail mobile agents must enter the site through the Server. \r,?iile the CIient acts as a

user-controlled information filter, the ACORN Server is a site filter whose prirnary task is to

protect the site and decide what kind of mobile agent is allowed in. It also controis mobile

agent migration canied out via semer-to-semer communication. In addition, it acts as a permanent repository for Client and resident mobile agents at a site in the following way.

When a mobile agent arrives at a site, the Server saves its state. Clients cornmunicate their state to the Semer whenever they are started up and at specified intervals while they are mnning. The Semer stores îhe Client data and cm augment it with messages for the user if any are sent.

Café

The Café is a virtual meeting place in which mobile agents coexist. In the Café, al1 agents give their community and persona! data to the Café manager, which compares al1 data and let agents share their relevant comunity data. After complethg information sharing tasks, the Café sends al1 agents out of it. The net result is that on exiting, agents may well know mort: about the community than when they came into the Café.

B. Mobile ACORN Agents

The ACORN system currently has two types of mobile agents: InfoAgent and SearchAgent.

These agents perform the task of moving information or queries from site to site to relevant

readers.

The InfoAgent

The InfoAgent is generated when a document is created for distribution. The InfoAgent

carries rnetadata information stored in a Dublin Core metadata element set and migrates

around the ACORN network to look for people who would want to read that document. In fact, for the ACOW system, the code of the agents is not mobile. The only movable thing is the information it cames. This information migration process is completed by creating a new TnfoAgent at a new site to handle the information.

The SearchAgent

The SearchAgent is created to search information for its user. The user inputs a query through prompts tiom the Client. The Client then generates a Dublin Core representation of the query and hands it, together with relevant community data, to a new mobile agent. The

Info.;\gent moves fiom site to site, gathers information relevant to the query, and then sends it back when possible.

2.5.1.2 migration and Interactions

A. Migration

Migration in ACOWis very important to the system. Without migration, one agent cannot reach other agents to interact with them. Hence information cannot be shared. in fact, in the system, only information or quenes carried by agents migrate. The agents themselves do not migrate.

When a mobile agent wants to migrate to another site, it inforrns the Server at the present site of this fact. The site's Server first ascertains that the mobile agent is allowed to move from this site. If so, the agent's state is passed to the remote Server Ma ACOW Server-to-Server protocol. At the other site, the remote Server decides whether or not the migration cmtake place. If not, it informs the mobiIe agent, and the agent reorganizes its migration strategy.

B. Interactions

When an agent anives at an ACORN site, hvo types of interactions between agents take place. The first interaction is that the mobile agent may have a specific person (hisher

Client) to contact. In this way, information cm be shared between the person and the mobile agent. The second type of interaction takes place between mobile agents in a Café. Once interactions are completed, we expect that each mobile agent will get more information than before.

2.5.2 Yenta

Yenta (Foner, 1996 and 1997) is a rnulti-agent matchmaking system. In this system, agents representing users' interest or interests are created and sent out on the network to cornmunicate with other agents. In this way, an agent can discover like-minded agents and introduce them to other such agents. The result is clusters of like-minded agents, which can then share relevant information arnongst themselves. In addition, when a person is looking for someone (an expert), the clusters can use "word of mouth" to direct the searcher to the proper agent or agents to talk to.

In Yenta, one user has just one agent to represent herself. The agent, therefore, usualIy contains more than one interest, which is obtained by clustering. Before a Yenta agent starts ninning, it must determine its user's interests. It does this by first collecting some subset of

24 the user's e-mail, newsgroup articles, and files, then classimng them into different interest

clusters. During a Yenta agent's life time, it is expected that the user will constantly add new

interests or new documents as new messages come in or new files are created. This means that the user's interests change dynamically.

After completing interest classification, Yenta requires that its agent find at least one other

Yenta agent with which to cornrnunicate. AAer this first communication, it is much easier for

the Yenta agent to find other agents to talk to since during the first communication, it is quite

possible that the Yenta agent gets some information about other like-minded agents which

it can go to talk to. Yenta uses a boostrapping technique to help an agent find the first other

agent. The boostrapping includes broadcasting and directing multicasts on local nehvork

systems to find other agents in the same organization, asking a central registry which

contains a partial list of other known agents, and asking the user for suggestions.

Through communication, Yenta agents build up community clusters with each other based

on their interests due to direct communication and referral. Agents in the sarne cluster then

share their information. In this way, users cm give their new documents to other users in the

same cornmunity by one-to-one communication or by broadcasting. On the other side, users

can get new messages from other cornmunity users.

2.5.3 Kasbah

Kasbah (Chavez and Maes, 1996; Moukas et al, 1999 and Maes et al, 1999) is a Web-based

2 5 multi-agent consumer-to-consumer transaction systern in which users create buying and selling agents to buy and sell goods on their behalf. The system consists of selling agents, buying agents, and a marketplace.

When a person wants to seIl something, helshe creates a selring agent and gives the selling agent a description of the item helshe wants it to sell. Helshe also sets several parameters. such as the desired date to sel1 the item by, the desired price, and the lowest acceptable price, to guide the selling agent as it tries to sell the specified item. Then hetshe releases the selling agent into the marketplace. Once the selling agent enters the marketplace, it contacts interested buying agents and negotiates with them to find the best deal.

Syrnmetricaliy, the job for a buying agent is to buy goods on behalf of its user. When a user wants to buy some item, what helshe needs to do is creating a buying agent, descnbing the item of interest, and setting parameters such as the date to buy the item by, desired pnce, and the highest acceptable price, to direct the buying agent as it tries to buy the item. The user can also specifi a set of selling agents already in the marketplace and require hisher buying agent to buy fiom one of them. Like a selling agent, the buying agent is also pro-active. Once created, it is released into the marketplace where it negotiates with selling agents automaticalIy.

The marketplace's job is to facilitate interaction between buying and selling agents. The role

that Kasbah's marketplace plays is to match agents interested in buying and selling the same kinds of things. When a selling agent goes into the marketplace, the marketpiace asks what the agent is interested in seiling. Then, it sends the agent a list of ail potential buyers of that particular item. It also sends messages to al1 of these potential buyers, informing them of the existence of the new selling agent. The sarne thing happens when a buying agent is created.

When an agent leaves the marketplace, the marketplace notifies al1 of its potential customers.

2.6 Concluding Rernarks

The concept of software agents and their applications were given in the first part of this chapter. Aftcr that, we presented information of the agent systems, ACQW, Yenenru, and

Kasbah, which are related to this thesis. in addition, we have also described the two popular information research technologies, information retrieval and information filtering, and cornpared their similarities and differences. Chapter 3 Keyphrase-based Information Sharing

3.1 Introduction

Information sharing among people exists everyvhere. In our daily life, we need to exchange information with other people due to vanous reasons. But no mattcr what reason or reasons for such information exchanges, we do believe that this information exchange should happen among people who share similar interests. People having different interests should not share their information with each other. If they are forced to exchange information with each other, the shared information is rneaningless to al1 of them. This constraint indicates that before people exchange information with each other, they have to detemine if al1 of them share similar interests. Only after they confirm that they have similar interests, they can pass their information to other people. They can also take other people's information at the same time.

This chapter explains the basic concepts and constraints of the keyphrase-based information sharing systern. Several scenarios that include information sharing activities are described in Section 3.2. Section 3.3 briefiy presents the concept of the keyphrase-based information sharing. Section 3.4 descnbes the architecture of the information shaririç system, and explains agent classification as well as the functions the system should possess in order to carry out the information sharing among agents. The privacy of the system is discussed in

Section 3.5. Finally, a few concluding remarks are given. Scenaaios

Three scenarios will be presented in this section. These scenarios include several examples of the information sharing activities in our daily life that can be carried out using agent techniques.

Scenario 1 - Information Search and Provision

This scenario is about receiving and providing up-to-date information. John is a computer science student. He plans to choose parallel processing as his honors thesis topic. Therefore, he needs to collect up-to-date information about parallel processing. On theother hand, Peter is a professor whose research area is parallel processing. Every year, he publishes several papers on his research work, and he wishes that people with an interest in parallel processing can read his papers as quickly as possible.

Let us assume that John creates an agent to find people who have information about parallel processing and who are willing to share this information with other people, In the same vein,

Peter also creates an agent to find people who are interested in his research. John and Peter send their agents out on the network, and the two agents meet. They communicate with each other. If John's agent finds out that John shares similar interests with Peter, it adds Peter into

John's distribution list. Once John's agent puts Peter in John's distribution list, Peter's agent also puts John into Peter's receiving agent list. After that, John's agent sends Peter's address and information to John, and John can contact Peter directly. Similarly, Peter's agent also sends John's address and information to Pcter. Next tirne, when Peter produces a new

document about parallel processing, he cm send it directly to John. In this way, John can get

up-to-date parallel processing information, and Peter cm make sure that his new document

is distributed quickly to the people who wiH find it usefu1.

Scenario 2 - Novice and Expert

The second scenario is about a novice and an expert. Jean is a novice Java programmer. She

has difficulty in leaming the langage and in understanding al1 the existing packages and

classes. She needs to parse some text. She spends some time going through the manuals,

online documentation, and mailing list archives. She tries to post to a newsgroup. But she

does not get any response. While talking to a CO-worker,Mark, she mentions her problem.

He replies that using the StnngTokenizer class wouId make it easier and shows her a piece

of code he had previously written that uses the class.

Now, imagine that Jean has an agent and that she asks it to find someone who is able and

willing to help her. The agent can go out on the network and find people who cmsolve her

problem. It retums a list of people she might want to talk to. When she contacts thern, they

tell her what class to use and show her some code to do it.

Scenario 3 - E-commerce

This scenario is about business transactions. Mary is a banana buyer. She buys bananas fiom

banana farmers. Jason is a banana farmer looking for banana buyers. Mary is looking for banana sellers and Jason for banana buyers. They do not have any idea about where to find their potential customers. Hence, both of them create their own agents and send them out on the network. Somewhere on the network, the two agents meet and communicate. They find that they have similar interests. They start negotiations with each other based on information provided by their users. AAer they are satisfied with their deal, they inform their creators, who make the final decision about the transaction.

The above scenarios are just three examples of information sharing cases. Note that the information sharing in the above scenarios takes place between information providers and information receivers. In addition to this type of information sharing, an agent can also access another agent's information through referrals fiom other agents. For example, agent

A represents a banana buyer, agent B a banana seller, and agent C another banana buyer.

When the three agents meet and interact, agents A and C may share information with agent

B. Also, if agent C knows another banana seller agent, Say D, it can introduce agent LI to agent A. Agent A decides if agent D is relevant to it. If it is, then agent A will put asent D on its providers list. Note that the above three scenarios include just receivers (i.e,, John in the first scenario) and providers (i.e., Peter in the first scenario). There is, however, another type of agent whose owner, unlike the above two types of agent owners, has the goal of both providing and receiving information. For exarnple, Peter can create agents, not just for distributing his research papers to the interested people, but also for searching new information fiom other people who share similar interests with him. 3.3 Concept of the Keyphrase-based Information Sharing

The scenarios in the previous section indicate that people can use agents to retrieve information from other agents and provide their information to relevant ones. One of the important aspects of this information rebneval and provision is to share information among agents. By information sharing, an agent can receive information 6om and provide information to other relevant agents. in keyphrase-based information sharing, keyphrases and their weights are used to represent user interests. -4gents use these keyphrase and weight pairs to find other relevant agents and exchange information with them.

3.4 System Architecture

One of the goals of this thesis is to build an information sharing system which can help one agent to locate other relevant agents and exchange information with them. People create agents to represent and work for them, but their purposes may not be the sarne. For example, some people may create agents for searching for information, while others may create agents for distributing infomation to relevant people. Therefore, agents can be classified into several categories.

3.4.1 Agent Categories

Agents can be classified into three categories based on their functions: search agents

(SearchAgent), information agents (InfoAgent), and searcldinformation agents

(SearchhfoAgent). Each category of agents has its own task. As the narnes suggest, Searcfigent is used to look for relevant information for its owner; InfoAgent is used by its creator to disrribute information to relevant people, and SearcWInfoAgent is used to search and distribute inforrnation.

Since each type ofagent has a specific purpose during its life cycle, the inforrnation each type of agent is Iooking for may Vary. For example, John creates an agent for searching inforrnation about parallel processing, not for distributing information about parallel processing. Due to this need, he wants his agent to bring him a list of people who have information about parailel processing only. He does not want to get any information about people who, Iike him, also want to find information about parallel processing. As another example, Peter creates an agent for distributing infonnation about paraIlel processing to reIevant people. He does not want to receive similar infonnation frorn other people.

Therefore, Peter does not want his agent to bring back a list of people who, like him, are distributing similar information. What Peter wants his agent to do is to bring him a list of people who are interested in his information. For those people who want to not only distribute information to but also search for infonnation fiom other people, it is necessary for their agents to bring back information about al1 relevant people.

3.4.2 The Café

in the keyphrase-based information sharing system, al1 three mesof agents meet in a virtual

meeting place, called the Café. The job of a Cafe is to facilitate information exchange and

interaction arnong relevant agents. A Café can have many roies. At the minimum, the Café

3 3 ensures that the agents entering the Café 'speak' a common language and have compatibIe data structures. The Café manager performs a number of functions.

Agent enters the Café: The Café has a capacity that determines the maximum number

of agents in it at the same time. The system administrator cm control this capacity. Due

to the limited capacity of the Café, when an agent arrives, the Café manager has to

check if it has room for the newly-arrived agent. The agent has to wait if the Café is full.

Agent leaves the Café: An agent cannot stay in the Café forever. It hm to leave the Café

afier it has stayed there for a given period of tirne. Many factors determine when an

agent has to leave the Café. Currently, the Café determines this action based on time and

whether the interactions between this agent and other agents in the Café have been

completed. Each agent stays in the Café for a certain arnount of time. After that time,

and afier it has finished interactions with other agents in the Café, the agent will be

allowed to leave the Café. If the predefined visiting time is over, but the interactions

with other agents are still in progress, the agent has to stay in the Café until its

interactions with other agents are completed.

Blackboard: Blackboard approach is a powerful means of flexibly combining

individually developed software systems and modules into a single, integrated

application. It has been used by people to solve problems cooperatively. The blackboard

approach can be described as follows: A group of human specialists are in a room with

a large blackboard. The specialists are working as a team to brainstorm a solution to a

problem, using the blackboard as the workplace for cooperatively developing the solution. The session begins when the problem specifications are written onto the blackboard. The specialists watch the blackboard, looking for an oppominity ro apply their expertise to the developing solution. When someone writes something on the blackboard that allows another specialist to apply her expertise, she records her

contribution on the blackboard, enabling other specialists to apply their expertise. This process of adding conûibutions to the blackboard continues until the problem has been

solved.

In this thesis, the blackboard is used to keep the information of a certain number of

agents afier they have lefi the Cafë. In other words, the blackboard is just a buffer with

some specific storage capacity in the Café. When an agent is about to leave, the Café

manager should have the ability to keep its information for a while. The blackboard

allows other agents to get this agent's information, even though they could not meet in

the Catë. After a while, the Café manager deletes this agent information to make space

for information from other agents. ïhe order of deletion is based on the first in first out

(FIFO) principle.

Agents' interaction: When an agent enters the Café, it visits the blackboard. After that

it interacts withother agents. Through the interactions, it can exchange information with

other agents. The information exchange happens behveen agents that share sirnilar

interests. There are several information sharing cases, al1 of which can be gouped into

two categories. Each of these two categories cm be further divided. In the first case

where both agents belong to the sarne category. Let's Say that both agent A and agent B

are for searching information. In this situation, agent A and agent B do not share their own information. But it is possible for agent -4 to get some information of other agents

carried by agent B, and agent B can take information of other agents canied by agent A.

In the second case, both agents belong to different categories. In this situation, one agent

may take another agent's o~vninformation. One example for this case is that agent A and

agent B are of different categories of agents. When they meet, agent A may take the

information of agent B as well as the information of other agents canied by agent B.

Similar information exchange also happens to agent B.

3.5 Privacy

The issue ofprivacy, though important, is not considerd in the pcoposed infornlation sharing system. We assume that the users using this system are willing to share their profiles with other people. However, we acknowledge the fact that a person may not want to share

information with some other people. The agent allows users to determine the information it

shares with others and correct the interest profile if they think it is incorrect. In this fashion,

a person is able to establish the image he or she wants others to have of him or her. We are

not particularly womed about people who pretend to have information to distribute when

they don't, since they would get contact fiom other people, and they would be discouraged.

3.6 Concluding Remarks

Agents are classified into three categones based on their functions: SearchAgent, InfoAgent,

and SearcNInfoAgent. The SearcMgent represents agents for searching information, InfoAgent represents agents for distributing information, and SearchIInfoAgent represents agents for both searching and distributing information. We also have provided the concept of the keyphrase-based information sharing and defined the functions of the Café based on the requirement of the system. The Café is a virtual meeting place where agents coexist and share their information. Generally speaking, the Café should have functions that allow agents to enter and leave the Café, keep some agents' information for a time period, and make the information exchange if possible. Chapter 4 Agents and the Café

4.1 Overview

Some of the basic concepts of the keyphrase-based information sharing have been described

in the previous chapter. The basic idea of the keyphrase-based information sharing scheme

is as follows. Mien we need particular information or when we want to send our information

to relevant people, we create agents and send them out on the nettvork. These agents represent our interests in the tom of simple keyphrase vectors. They go either directly to

visit people recommended by their creators or to a vimial meeting place, the Café, to meet

other agents. This thesis does not consider the first situation, agents visiting the people

recommended by their creators. What it considers is the second situation, agents visiting the

Café. In the Café, agents meet and share infomation with relevant ones. In our proposed

infomation sharing system, agents acquire new information by direct communication with

other agents, or from the recommendation of other agents or by visiting the blackboard.

Agents may cary more information when they leave the Café than when they came into it.

An agent retums to its owner with a list of people who might be able to help her. The owners

could then retrieve information fiom or provide information to these selected people.

This chapter presents the design of the keyphrase-based information sharing system. In

Section 4.2, we examine different aspects of the agents, including the agent data structures,

3 8 the representations of users interests, and the information of the documents to be distributed.

Section 4.3 gives the structure and various design aspects of the Café, which are mainly concentrated with how to determine the relevance between agents and how agents exchange information with each other. The conclusions of this chapter are presented in Section 3.3.

4.2 Agent Representation

This thesis is related to a cornrnunity-based, multi-agent information retrieval and provision system and is aimed at developing an information sharing system for a cornrnunity of agents. in the proposed information sharing system. agents representing their owners' interests are created and sent out on the network. They visit a Café to meet other agents. Mile in a Cafi, agents visit the blackboard and communicate with each other. Through this communication, agents exchange relevant and useful information with each other. Agents also get other relevant agents' information through the recommendation of another agent during their communication or by visitinç the blackboard in the Café. In order to fulfil this task, we need to determine what information an agent should carry, and what information it should bring back to its owner. That is, what infomation should be shared. In the following, we will explain the agents' data structure, the representation scheme used to represent documents and users' interests.

4.2.1 Agent Data Structure

in Chapter 3, we have classified agents into three categories based on their goals and tasks. The three types ofagents are SearchAgent, InfoAgent, and SearcMInfoAgent. The main task for a SearchAgent is to search for information for its owner. This category's agents are created and dispatched onto the network when their creators have some information demands on a particular topic. Withn their life, the agents travel on the network, meet other agents, find agents with similar interests, and share information with them. They then return a list of relevant people to their owners, who cm use this information to contact those people to get the information they want. It is assumed that the creators of the SearchAgent are interested in only those people who have information the creators are lookinç for. They definitely do not wish their agents to provide tliem with a list of those people who, like themselves, are also looking for information.

Agents whose task is to distribute information for their owners are called InfoAgent. This type of agents was first introduced and used in ACORiV, and it is this type of asent that distinguishes ACORNfiom other similar cornmunity-based software agent systerns (Marsh,

1997). The InfoAgent is the embodiment of a piece of information. When a user decides to distribute a piece of information, he/she can create an InfoAgent to carry out this task. The

InfoAgent makes use of the cornmunity of agents and their knowledge to navigate the system; it presents itself to those agents it believes will be interested in the information it carries. The InfoAgent not ody distributes the current piece of information, it also helps its owner distribute similar information to relevant people later. For example, John currently has finished a research paper and has decided to distribute it to relevant people. He cmcreate an InfoAgent to cary out this task. If John's InfoAgent believes that Mark is a relevant person, it will pass John's address and the information to Mark. On the other hand, John's lnfoAgent will also add Mark's address to its distribution list. Later, John cm fonvard new information on the sarne subject to Mark. Therefore, the owners of the InfoAgent expect their agents to bnng back a list of people who are interested in their information. The owners can use this information to tonvard new information to these people directly. ïhsy are not interested in people who, like themselves, want to distribute information, but do not want to get any from other people.

Agents responsible for both information searching and distribution are divided into the

SearcWInfoAgent categoiy. This type of agent makes our system different from other similar systems, including ACOfiV. We believe that some users may not only wish to distribute their own information to other relevant people, but also wish to get the relevant information from other people. Hence, the owners of this type of agent want to know not just the people who are interested in their information, but also the people who distribute information and have interests similar to theirs.

Data structure is used to store data for agents. Agents use the data information to guide their information sharing. Based on the above descriptions of the three categones of agents, a global data structure is used to describe them, as shown in Figure 4.1. The data structure contains the following fields:

+ E-mail Address: lhs field represents the orner's electronic mail address, which is

4 1 mil Address: User email address Web Address: User Web address TVpe : Agent category Date: The date agent is created Togic: The topic of the agent Threshold : Similarity threshold In terest : User's interests Metadata: Information of a document ta be distributed

List of other relevant agents whose tosk is searching information ~istof other relevant agents whose task is distributing in£ormation List of other relevant agents whose task is searching and distributing information

Figure 4.1 Data structure for agents unique and can be available almost to everyone. This field is used to differentiate agents from each other.

Web Address: This field represents the owner's URL. Interested people cmconnect to the web page by using this information.

Type: This field stores the type of an agent. Its content must be one of the three agent categones. That is, it can only be SearchAgent, InfoAgent, or SearchlInfoAgent. This field is one of the fields that determine the agent's behavior while it contacts other agents in the Cafi.

Date: This field shows the date the agent was created.

Topic: This fieId is the agent topic and is used by the agent creator to organize his/her own agents. For example, the owner of one agent may assign 'Java' to this field, and hdshe may assign 'basketball' for another agent.

User Profde: This field represents the user interest represented in some compter recognizable form, such as vector space representation. The details ofhow to obtain the users interests are presented in Section 4.2.2.

Threshold: This field is used in the SearchAgent and the SearchhfoAgent. The threshold defines the tolerance the agent creator allows the agent to possess. For any two agents, a sinilarity value can be calculated based on their interests. Information sharing takes place between them oniy when their similarity value is greater than or equal to this threshold. The details ofhow to obtain the similarity value between agents are provided in Section 4.4.

Metadata: This field is used to store information about the documents to be disûibuted. It is not empty only for the InfoAgent and SearcMnfoAgent. For the SearchAgent, it is

empty. We present the details of this field in Section 4.3.

SearchAgents: This field contins a list of the agents beIonging to the SearchAgent

category and cm be meaningful for both the InfoAgent and the Search/InfoAgent.

InfoAgents: This field contains a list of the agents whose type is InfoAgent. Tt cm bs

meaningful for agents beIonging to the SearchAgent or the SearcWInfoAgent oniy.

* SearchhfoAgents: This field contains a list of the agents beIonging to the category of

SearcWInfoAgent. It is meaningful to al1 three category agents.

4.2.2 User Profile

Before agents can exchange information, they have to detemine if they share similar interests. Agents do not exchange information if they do not share similar interests.

Therefore, matching people according to their interests is an important step in the information sharing process. An important prerequisite in order to matching people based on their interests is to represent their interests in a cornputer recognizable fbm. The users' interests me used to estimate the degree of similarity between two agents.

in this thesis, we use the vector space mode! to represent the user profile. The vector space model is the most widely used representation model in both information retrievzl (IR) and information filtering (IF) cornmunities. This mode1 deds mainly with text documents.

Although many features (for exarnple visuai appeal, cornplexity of writing and, of course,

the subject) of a document are usefd, the vector space model is usually used to represent the subject ofadocument. This thesis deals with text-document and assumes that auser's interest profile is extracted from a text document.

The computer is not likely to store the complete text of each document in the natural

Ianguage in which it was writîen. It will have, instead, a document representative which rnay have been produced Fom the documents either manually or automatically. The starting point of the text analysis process may be the complete document text, an abstract, the title, or perhaps a list of keywords. This process must produce a machine readable document representation.

There are two conflicting ways of looking at the problem of indexing documents in an information retrieval comrnunity. One is to characterize a document through a representation

of its contents, regardless of the way in which other documents may be described. The other

way is to insist that in characterizing a document, one is discriminating it f?om all, or

potentially all, other documents in the collection, The first method is called 'representation

without discrimination' and the second 'discrimination without representation' (Rijsbergen,

1979). No matter which method is used to build the vector space representation of a

document, we have to first process the document by following three steps: removing high

Frequency words, stripping suffixes, and detecting equivalent stems.

The rernoval of high Fequency words, 'stop' words such as 'and', 'are', 'but', 'is', 'the', etc., is

nomally done by comparing the input text with a 'stop list' of words that are to be removed. The advantages of the process are not only that the non-significant words are removed and will, therefore, not interfere during retrieval, but also that the size of the total document file can be reduced by 30 to 50 percent (Rijsbergen, 1979).

The second step consists of stemming the words that remain fiom previous processing. That is removing word suffixes and leaving just the roots by using a suffix-stripping algorithm

(Porter, 1980). The process is very complicated, and the standard approach is to have a complete list of suffixes and remove the longest possible ones.

The final step is to determine whether the words that are equivalent have the sarne stem. The simplest method of dealing with this is to construct a list of equivalent stem-endings. For hvo stems to be equivalent, they must match except for their endings, which must appear in the list as equivalent. It is assurned in the context of IR that if two words have the same underlying stem, then they refer to the same concept and should be indexed as such.

After the above processing, the final output is a vector of keywords and their frequencies. If we consider al1 other documents in a collection where the document occurs, the vector space representation of a document can be obtained by weighting the previous keyword fiequency vector using a term f?equency/inverse document fiequency (TFIDF) scheme, which is defined as where Wkis the weight of term k in a document d. f, is the number of times the terrn k appears in the document d, n, is the number of documents in the corpus which contain k (the document fiequency), N is the number of documents in the corpus, and F is the maximum tem frequency over al1 words in d. This vector space mode1 is widely used in IR.

For Our user interest representation, it is impossible to get the document fiequency of the

"TFIDF" scheme since we usually just have a single document. For this reason, we have decided to use Extractor (a document indexing software) to extract users' interests. The

Extractor was developed by the Interactive Information Group of the National Research

Council of Canada (Turney, 1999). It uses machine learning methods to extract keyphrases fiom text document. A keyphrase is defined as a sequence of one, two, or three words that

appear consecutively in a text document with no intervenhg stop words or punctuation. The

ftequencies and the positions in which the keyphrases first appear are considered in the

Extractor. See Tumey (1999) for more details about Extractor.

4.2.3 Metadata

Agents that distribute information for their users do not carry the whole document to be

distributed. Instead, they cary just the metadata elernents of the document. In this work,

sirnilar to ACORN, Dublin Core is used to store the information about the document. Dublin Core is a set of a minimum number of metadata elements that are required to facilitate the discovery of a document. The semantics of the Dublin Core elements is intended to be clear enough to be understood by a wide range of users. The following are the metadata elements of a document in the Dublin Core set, taken from Wiebel et al (1995).

Subject: The topic addressed by the work Title: The name of the object Author: Thepersou(s) primarily responsible for the intellecttial content of the object Publisher: The agent or agency responsible for making the object available OtherAgent: The person(s), srrch as editors and transc~ibers,do hm nrnde other signijkatlt it~tellecttialco~ltributions to the work Date: The date ofpiiblication ObjectType: The genre of the object, srich as novel, poem, or dicriorla? Form: The data represetirntion of the object, sttcli as Postscr-ipt Jle or JVit~dows e.'cecutablejîle Identifier: The String or number tcsed to tiniquely iden tifi the object Relation: The Rehionship to other objects Source: The Objecis, either print or elecrronic, fram which this object is derived, y applicable Language: The Langriage of the intellectrraf content Coverage: The spatial locations and temporal durations characteristic of the objecr

4.3 The Café

The Café is a virtual place where agents meet each other, find relevant ones and share

information with them. We stated in Chapter 3 that the Café manager has severai hctions

in conducting agent information sharing. These functions include regulating agents corning

into and leaving the Café, keeping agent information for a period of tirne, and conducting

fan ,l,g, - - -- -3- --n-• Go.4 --A ~hlr~ 111C iii~~iû~~UDCU UJ &? u5bllb bu AAAAU uriu d~iurr

48 information with other relevant agents are explained in the following subsections.

4.3.1 Similarity Measures

Similarity measures are widely used in the IR comiunity and are sometimes referred to as the matching functions, the correlation coefficients, or the selection algorithms. In IR. similarity measures are the mechanisms through which the retrieval software makes a cornparison between document and query representations to effect retrieval (Geme, 1983).

A similarity measure is any function that assigns a value of matching coefficient to a pair of vectors. Each vector includes a set of attributes that characterizes an entity. Similarity measures have been used to cluster documents and determine similarity between a query and a document. When the similarity measures are used to cluster documents, they are designed to quanti@ the likeness behveen documents. If one assumes that it is possible to group documents in such a way that a document in one group is more like the other members of that group than it is like any object outside that group, then a cluster method enables such a group structure to be discovered. When these measures are applied to determine the similarity between a query and a document, they serve in matching or ranking.

There are nurnber of matching coefficients techniques that can be used to measure the similarity between two documents. Two of thern will be explained in the following. The first one is called the cosine measure (Salton and McGill, 1983); its formula is where Wr.,represents the weight of the terni t occurred in document 1. The cosine measure method finds the cosine of the angle between two vectors. From the above equation, it is obvious that the cosine measure value is obtained by computing the dot product of the two vectors, then dividing it by the product of their magnitudes. sim,,,,, the similarity result behveen documents d 1 and dî, is 1 .O for identical documents and 0.0 for documents with no common terms.

The cosine measure has received its name from a trigonometrical analogy. The angle between any two vectors defined in two or more dimensions can be measured in a two-dimensional plane. In this plane the angle, say 0, behveen two vectors cm be measured. Menthe vectors point in the same direction, we have 0 = O. When they are perpendicular, the angle between

[hem is 0 = 1~12.Hence, the cosine of the angle behveen any two vectors falls into the following range:

in the vector space model, the desired result is directly available by taking the cross product of the vectors. For texts, each document or query is represented by a vector ofweighted terms of keywords or keyphrases: where, of the pair (ti, wi)t, represents the ilh term and wi its corresponding numerical weight. The term is some component feature of the document, usually a string containing a word or a phrase, e.g., 'cornputer' or 'concept graphs'. In addition, the term may be any booiean feature ofthe document. In the case of term weights, the least weight of any term is zero, so the maximum angle between any two vectors is 90 deçrees.

The second similarity measure we have used is simply the proportion of common terms in two documents dl and d2 (Morita and Shinoda, 1993):

where V,, and V,? represent the sets of keywords of dl and 62, respectively. IVd,[is the total number of keyphrases in dl. 1 VJ is the total nurnber of Ei,i keywords. [V,,nV,,l is the nurnber of keywords shared by both documents. sim,,,, , the similarity result between dl and d?, is

1 .O for identical documents and 0.0 for documents with no common terms.

4.3.2 Information Sharing

As soon as an agent entas a Café, it immediately starts interacting with ottier agents in the

Café. The airn of interacting with other agents is to find agents with similar interests and exchange information with them. The agent may also receive relevant agent information by inspecting the blackboard in the Café or through other agents recommendations. Since agents have their own specific tasks, they each behave differently while they are looking to find other relevant agents to share information with. Al1 possible information sharing cases are described in the following. We examine these cases by assuming one agent is SearchAgent,

InfoAgent, or SearcMnfoAgent in return.

A. SearchAgent

When a SearchAgent enters the Café, it meets other agents that cmbe of type SearchAgent,

InfoAgent, or SearchlInfoAgent. Let us assume the following scenarios:

A SearchAgent S, meets another SearchAgent, Say, S:. In this case, neither of them

shares its own information. What they share is the information of other agents they

cary. Si will provide al1 agents in its InfoAgents and SearcMnfoAgents lists to S2. S

will add those relevant agents into its InfoAgents and SearchIInfoAgents lists,

respectively. In the same vein, S, will provide ai1 agents in its InfoAgents and

SearcMnfoAgents lists to Si, which will add the relevant ones into its corresponding

agent lists. Figure 4.2 shows the information sharing between Si and S,. It is obvious

fiom Figure 4.2 that S, receives information about irifo-4 andsearchlnfo-3 from Si, and

S, has got information about info-1 andsearchlnfo-I from S,. The information got fiom

the other agents were bold in Figure 4.2. This notation is also used in the following

cases. * The Searcwgent S, meets an InfoAgent, Say 1,. In this case, I, first provides its own

information to agent S,. If SI finds that 1, has similar interests, it puts 1, into its

InfoAgents list. Once this happens, Il also puts S, into its SearchAgents Iist. S, will

provide other agents in its SearcMnfoAgents Iist to Il, which may take relevant ones

and adds them into its SearcMnfoAgents list. Similarly, agent 1, will recornmend to S,

al1 agents in its Semh/InfoAgents list, and S, rnay add the relevant ones into its

corresponding agent list. Figure 4.3 shows the information sharing between S, and Il.

The SearchAgent SImeets a Search/InfoAgentSI,. In this case, SI, first provides its own

information to S,. Once S,determines to take SI,'s infomation because it is interested

in SI,, SI, will add SI into its ÇearchAgents list. If agent S, does not want SI,, SI, does

not take Sis own information either. In other words, they do not share their own

information with each other. But they stilI can recommend each other to other agents

they know. S, will recornmend al1 agents in its InfoAgents and SearchtInfoAgents list

to SI,,which will then take the relevant ones and put them into corresponding lists. In

the same vein, agent Si, will recommend al1 agents in its InfoAgent and

SearcWInfoAgent lists to S,, which will take the relevant ones and put them into the

corresponding Iist. Figure 4.4 shows the information sharing between S, and SI,. The

shared information is bold.

B. InfoAgent

When an InfbAgcnt enters a Café, it meets other agents, which may be of types SearchAgent,

InfoAgent, or Search/InfoAgent. Therefore, tiiere are three information sharing possibilities s1 ---A------InfoAgents: in£0-3 In£oAgenLs : inf 0-4 in£0-1 z----p info-1 inEo-2 - -+------...... inf 0-4 4.- ...... Search/InfoAgents: searchInEo-3 Search/InfoAgents: searchInfo-4 saarchInfo-1 eearchInf 0-1 searchInfo-2 /*', ...... searchf nf 0-3 4-

4.- -> one way infurmation e..rchangc --• hvo way information exchange

Figure 4.2: Information sharing between two SearchAgent S,and S?

\\ '1 /' Infoilgents: -. SearcMgents : inf 0-1 '. y< search-1 inf0-2 ,,,/ '.. . grarch-2 11 4-

-b one way information exchange 4-• two way information cxchange

Figure 4.3: Information sharing between a SearchAgent S, and an InfoAgent I, behveen this agent and other agents currently residing in the Café. The information sharing

behveen a SearchAgent and an InfoAgent is described above. In the following, we will

descnbe the other two cases.

In the first case, the InfoAgent contacrs another InfoAgent. In this situation, neither of

thern shares their o~vninformation with each other. What they share are the agents they

cany in their SearchAgents list and SearcMnfoAgents list. They provide al1 agents in

both lists to each other and put the relevant ones into their corresponding lists. Figure

3.5 shows the information sharing behveen two agents.

in the second case, an InfoAgent, Say 1, meets another SearcWInfoAgent, say SI,. In this

situation, both of them first provide their own information to each other. If SI, decides

to take Il, then Il also takes SI,. Othenvise, neither of them shares information. No

matter what happens behveen them, they do recommend al1 agents they are carrying to

each other. Il provides al1 agents in its SearchAgent and SearcMInfoAgent lists to SI,,

and SI, takes the relevant ones. In the same way, S1, provides 1, al1 the agents in its

SearchAgents and SearcWInfoAgent lists and Il takes the relevant ones. The information

sharing between the two agents is show in Figure 4.6.

C. SearcMnfoAgent

Afier finishing the descriptions of the above five information sharing cases, there is only one

information sharing scheme lefi to be explained: / /------,,SearcN\gents : search-1 ------search-2 Search/InfoAgents // searchInf 0-1 > ...... searchInf o-2~/ Search/InfoAgents: 1 SIL searchInfo-3 esarchf nf O-*-- searchInfo-4 ...... eearchInfo-1

InfoAgents: InfoAgents: inf 0-1 inf 0-3 Info-3 info-4 info-4 *------info-1

-> one way information exchange <-• nÿo way information exchange

Figure 4.4: Information sharing between a SearchAgent S, and a SearcidInfoAgent SI,

------SearchAgents: SearckAgents: search-3 search-1 search-4 search-2 4-w---p isrrch_l eearch-4 ......

-) one way information exchange f-, two way information enchange

Figure 4.5: Information sharing between hvo InfoAgent 1, and 1, ------InfoAgents: l ------Search/InfoAgen ts : searchInfo-1 ...... rearchinf o-~.---~ Search/lnfoAgen t~: searchïnf 0-3 asarchInf 0-3 searchInfo-4 ...... * eearchfafa-2 ...... SearchAgen ts :

--,one way information exchange 4-b t\vo way information exchange

Figure 4.6: Information sharing between an InfoAgent 1, and a SearcMnfo agent SI,

InfoAgents: in£oj inf 0-1 info-4 info-2 4 ,inf 0-1 I info-4 1 ......

-b one way information exchange 4-, two way information exchange

Figure 4.7: Information sharing between hvo SearcMnfoAgent SI, and SII This SearcMnfoAgent, SI,, cornes into a Café and meets another SearcMnfoAgent, SI,.

In this case, they share their own information first. They then recommend each other to

other agents in their ai1 three agent lists. The information sharing between SI, and

is show in Figure 4.7.

Six information sharing cases have been described above. From these descriptions, the following three concIusions about information sharing among agents cm be summarized as follows:

* When SearchAgent or InfoAgent rneets another SearchAgent or InfoAgent, they do not

share their own information with each other, however, they recommend each other al1

other agents they carry;

When an agent from one category, Say SearcMgent, meets an agent from another

category, Say InfoAgent, they share their own information with each other. They also

recommend to each other al1 other agents they carry in appropriate categories;

Ifboth agents are SearcNInfoAgent, theynot only share their own information with each

other, tfiey also recornmend al1 agents they cany to each other.

4.4 Summary

In this chapter, the data structure of agents has been descnbed based on their functions. We

have also explained how to obtain and represent users' interests as well as how to represent

the information of a document to be distributed. We have presented two rnethods to determine the similarity between agents. We have also described how agents share their information with each other by both direct communication and recomrnendation. Chapter 5: Implementation, Test and Evaluation

5.1 Introduction

This chapter describes the implementation of the proposed keyphrase-based information

sharing system outlined in Chapter 4. The irnplementation was carried out by using Java

programming language version i.1.4; Section 5.3 presents the implementation of agents.

Following the description of the implementation of the Cafe class in Section 5.3, we present

the system test and evaluation resuIts in Section 5.4.

5.2 Implementation of Agents

This section examines the implementation of an agent. That is, how to represent agent

information including both its own information and other relevant agent information camed

by it. A top-down scheme technique is adopted to explain the irnplementation. First, the

Agent class is examined; then, the classes which are used to define fields of the Agent class

are discussed.

5.2.1 Agent Class

The data structure for al1 three types of agents is given in Chapter 4. The implementation of

the data structure is carried out by using the Agent cIass show in Figure 5.1. CmsMME Agent 1 ATTRIBUTES --- -- .. ------{ AgentCore agentInfo; Vector searchAgents, infoAgents, search- infoAgents; 1 SERVICES

Prototypes: setAgentInfo () getAgentInfo () setSearchAgents0 getSearchAgents () setInfoAgents() get Inf oAgents ( ) setsearch InfoAgentc ( j getsearch-1nfo~gentsO-

Figure 5.1: Class card for Agent

The Agent class contains four fields, explained below:

* agentlnfo: This field is used to represent each agent's own information. This variable

is an object of the class AgentCore, which is described in Section 5-22

searchAgents: This is a vector field used to store search agents which are relevant to

the object agentltfo of the cIass AgenrCore. This field is meaningful only when the

agent is infoAgent or searcWinfoAgent, Le., only info and searcldinfo agents use this field to hold information about other relevant search agents.

infoAgents: This vector field is used to store al1 the information of the distribution

agents. The information of agents stored in this field is of those agents whose

information could be relevant to the object agentlnfo of the AgentCore class. This field

exists only when the agent is searchAgent or searchhfoAgent. This field is empty for

an infoAgent.

search-InfoAgents: This vector field is used to store infom~ationabout searchiinfo

agents. The agents stored in this vector field are those which are relevant to the object

agentlnfo of the AgentCore class. This variable exists in al1 types of agents.

The methods of the Agent class are used to create an object of the class and cary out the information sharing activities, as well as to present information to the users.

5.2.2 AgentCore Class

The AgentCore class is used to represent the information of an agent. Based on the data structure of an agent given in Chapter 4, the AgentCore class is defined in Figure 5.2. There are nine fields in the AgentCore class. However, some fields may be empty for some types of agents. For example, the nietaduta field is not used by searchAgents and should, therefore, be kept empty. Similarly, the similarityThreshold field should be empty for information distributing agents. A detailed description of data fields of the AgentCore class follows:

emailAddress: This string variable is used to store the e-mail address of the agent AgentCore

1 { String emailAddress , webAddress , type, date, topic; 1 double similaritYThreshold, dist; KeyPhraseVector interest; 1 Vector metadata;

1 SERVICES

setEmailAddress(String emailAddress) getEmailAddress ( ) setWebAddress(String webAddress) getwebkddress ( ) setType (String type) getType ( setTopic(String topic) getTopic ( ) setDate(String date) getDate ( ) setSimilarityThreshoId(double threshold) getSimilarityThreshold() setDis t (double dis t) getDis t ( ) setKeyPhrases(KeyPhraseVector interest) getKeyPhrases0 setTopicFile(Vector metadata) get~opicFiie( ) clone (

Figure 5.2: Class card for AgentCore creator. This attribute can be used to distinguish one user's agent from other users' agents since the e-mail address is unique to each person. This attribute can also be used by other people to contact the owner of the agent. webAddress: This string variable represents the URL of the agent owner. A non-empty webAddress field means that the creator of the agent wishes other people to visit her web site. Othenvise, this field is empty. type: This string variable specifies the category of the agent. The value of this attribute can be only one of the three string values, "search", "info", or "searchhfo". The value of this variable is used when information sharinç activiries take place arnong agents in the Cafe. date: This string variable represents the date when the agent was created. topic: This string variable is used by the agent creator to define what the agent is about.

For example, it stores "computer vision" if the agent has been created for

searchingldistributinginformation about computer vision. Similarly, it stores "so hvare!'

if it is created for searching/distributing information about software.

similarityThreshold: This variable of type double is used to store the sirnilarity

threshold. The similarity between any two agents is determined by selected similarity

neasure methods (see Section 4.3.1). Two agents can share information only if the

similarity between them is greater than the value of the similari~Threshold.The

similari~Thresholdis initialized to a value by the user at the time of creating the agent

and cm be changed by the agent user whenever necessary.

dist: This variable of type double represents the Euclidean distance between this agent

64 and the agent which carries it, and is used to order the agents in the relevant asent list.

a interest: This vector field gives the agent owner's interest profile, which is represented

by keyphrases with their associated weights. It is an object of the class

KeyphraseVector, which is examined in Section 5.2.3.

metadata: This vector field is used to provide al1 related information of the document

to be distributed. It is used only in agents whose function is to distribute information.

Each element of this vector is an object of the DublinCore class, which is presented in

Section 5.2.4.

5.2.3 KeyPhraseVector Class

The KeyPIiraseVector class is used to represent the user interest profile. A user's interest is expressed by a set of keyphrases and their associated weights. We use the KeyPttrases claçs to define a paired keyphrase and its weight. The KeyPhrases class is giveii in Figure 5.3.

The string variable kq and the double variable weight in the KeyPlirases class are used to represent a keyphrase and its associated weight, respectively. By using the KeyPhrases class, the KeyPhraseVector class is declared in Figure 5.4.

The KeyPhraseVector class has only one field, phrase. It is used to store al1 keyphrase- weight pairs. The method insertKeyPhrases is used to insert each keyphrase- weight pair into the phrase vector. This function keeps al1 keyphrases in alphabetical order. The method getKeyPhrases retums the phrase vector. CLASS NAME 1 KeyPhrases

ATTRIBUTES

-A - .--- (String key; double weight;}

SERVICES

KeyPhrases(String key, double weight) setKey (String key) getKey (1 setWeight(doub1e weight) getweight ( )

Figure 5.3: CIass card for KeyPhrases

CLASS NAME KeyPhraseVector

- - ATTRIBUTES

-. - - - (Vector phrase;}

SERVICES prototypes : KeyPhrasesVector (Vector keyphrases) insertKeyPhrases(KeyPhrases key- weight) getKeyPhrases ()

Figure 5.4: Class card for KeyPhraseVector 5.2.4 DublinCore Class

The DirblirtCore class is used to represent the related information of the document to be distributed. Al1 the fields of the class collectively provide the information of the document.

The DirblinCore class is declared in Figure 5.5. The descriptions of the various fields of the

DitblitiCore class have already been described in Section 4.3.3.

5.3 The Café

This section presents the implementation of the Cafd, a virtual meeting place for agents. In

Chapter 4, we state that the Café manager canies out the followinç tasks: (a) Deciding on whether or not one agent is allowed to enter into the Café, (b) Facilitating information sharing activities among agents; (c) Removing agents out of the Café when necessary, and

(d) Keeping information of an outgoing agent for a while.

We have implemented the Café by using the multithreading technique of the Java language.

The normal flow of control through a program is sequential. One statement is executed, and

when it is done, the next statement is executed, and so on. Unlike this sequential

programming technique, the multithreading programrning technique deals with a set of

threads. A thread is a programming mechanism that allows more than one task exscuted at

the sarne time.

The Café is implemented by using one class, Cufe, and four threads, LetAgentIn,

Ir;fi.Clinr.in~,Timefhcker, and RemoveA~entOut.Al1 four threads share the Cafe class. These

67 1 CLASS NAME DublinCore

ATTRIBUTES -- - I String title, author, publisher, other agent, date, object type, form,-identifier, relation, source, language, coverage; 1

SERVICES prototypes:

DublinCore ( ) setTitle(String title) getTitle ( ) çetAuthor(String author) getAuthor ( ) setPublisher(String publisher) getpublisher ( ) setOtherAgent(Strin9 other- agent) get0therAgent () setDate (String date) getDate () set0bject~~~e (String object- type) get0b jectType () setForm (String form) getfor () setIdentifier(String identifier) getIdentifier () setRelation(Strin9 relation) getRelation ( ) setsource (String source) getsource ( ) setLanguage(String language) 1 getlanguage ( ) setCoverage(String coverage) getcoverage ( )

Figure 5.5: Class card for DublinCore four threads and the Cafe class are described in the following subsections.

5.3.1 LetAgentIn Tliread

ïhejob of the Letiigetltln thread is to determine if an agent can enter into the Café or noi.

The decision is made based on space available in the Café. If there is an empty room in the

CaE, the agent is allowed to enter the Café; othenvise, it has to wait outside the Café until an empty space is available. The code for the Letilgetitln thread is relatively straightfonvard, and is given below:

class LetAgentIn extends Thread { // attributes Cafe meetingplace; Agent agents[];

// methods LetAgentIn(Cafe meetingplace) { this.meetingPlace = meetingplace; }

public void run0 { while (true) { meetingP~ace.getNewAgent(agents[l); sleep( (int)Math.random() 1 ; 1 1 1

5.3.2Ji,.v .&;InfoSharing Thread

The task of the hfoSliaring thread is to manage the communication and information sharhg

among agents. When an agent enters a Café, it first visits the blackboard to check whether there is any interesting information there. If there is information which it finds interesting, it copies them. Subsequently, the agent communicates and possibly shares information with other agents in the Café. The code for the I~zfoSharingthread is this:

class InfoSharing extends Thread { // attributes Cafe meetingplace;

// methods InfoSharing{Cafe meetingplace) C this.meetingPlace = meetingplace; 1

public void run0 { while ( true) { meetingPlace.infoShare0; yieldI 1 ; 1 1 1

5.3.3 TimeChecker Thread

The job ofthe TimeChecker thread is IO check if any agent has stayed in the Cafe longer than the pre-defined time. Each Café manager has a time period limit for agents to stay. If any agent stays longer, that agent shouid leave the Café. The code for the TimeChecker thread is as follows:

class Tirnecheckes extends Thtead ( // attributes Cafe meetingplace;

// methods public void run0 { while(true1 { meetingPlace.stayEnoughTime0; yield ( ) ; 1 1 1

5.3.4 RemoveAgentOut Thread

The RemoveAge~itOirtthread is used to remove an agent from the Cafe. An agent can be removed from the Café if its time is over and it is not in the middle of exchanging information with another agent. The code for this thread is as follows:

class RemoveAgentOut extends Thread { // attributes Cafe meetingplace;

// rnethods RemoveAgentOut(Cafe meetingplace) { this.meetingPlace = meetingplace; )

public void run0 { while(true) { meetingPlace.removeAgent0; yield(); 5.3.5 Cafe Class

As the name suggests, the Cajé class implements the Café. It uses synchronized methods to avoid race conditions and is decIared in Figure 5.6.

CLASS NAM^ 1 Ca fe ATTRIBUTES

{ static final int CAFE SIZE, BLACKBOARD s DE; Vector cafe, blackboard, timeslice; long patience ; int count ; boolean rningle, mingleDone, enoughstay; 1

SERVICES

prototypes :

getNewAgent (Ob ject x) in£oShare () stayEnoughTirne() removeAgent ( )

Figure 5.6: Class card for Cafe

The constant integer variables, CAFE-SIZE and BLACKBOARD-SIZE, represent the

maximum nurnber of agents allowed in the Cafe and the blackboard, respectively. The vector

field cafe is used to store the agents that are in the Café. The vector field blackboard is used

to temporarily keep the information of the agents that have already lefi the Café. The vector

field timeSlice is used to record when each agent entas the Café. The long variablepatience

is used to determine how long one agent cmstay in the Café. The integer variable count is

l-- - - -1------L-- -r ,,,+, :LIA- P-CA %P hnnlpan vnrinhl~~ USGU iü KCC~uaLa uiC iiuiiiuu ur cigciiw AU uiu burr. A .A- thr~e*------.--. . ----. - miUk,o,

72 mingleDone and enoughStay are used to store information such as if the infornation

exchange activity cmstart, if the information exchange activity has been completed, and if

an agent has stayed long enough in the Café, rcspectiveiy.

The Cafe class aIso has several methods. The methods are used to cary out particular tasks

of the information sharing system. These methods are described belwv.

The role of the gcNeih:-igenf method is to decide if an agent will be allowed to enter the

Café. When an agent is allowed to enter the Café, the gerh'eirclge~rtmethod increases the

variable colutf by one and records the tirne that the agent enters the Café. The algorithm for

the gerNeivAgent method is as follows:

procedure getNewAgent(newAg0nt)

begin mingle - false;

if count = CAFE-SIZE then wai t; el se begin cout++; let new coming agent enter the café; record the entering time of the new agenk; mingle - true; end; notifyAll(); end ; The infoshare method is responsible for carrying out information exchange arnong agents.

The algorithm for the infoshare method is as follows:

procedure infoShare(Agent newAgent)

begin mingleDone - false;

if mingle = false or count < 1 and blackboard.size() = O then wait; el se begin for each agent, thatAgent, in blockboard do newAgent and thatAgent share info if possible; for each agent, thatAgent, in café do newAgent and thatAgent share info if possible; end; mingleDone - true; notifyAll() ; end;

The stayEnoughTime method checks if an agent has been in the Café too long. The algorithm for the stayEriotcghTime method is given below:

procedure stayEnoughTime0

begin enoughStay - false;

if timeSlice,size() = O then return; if the time period of an agent in the Café > patience then enoughstay - true; else enoughstay - false; notifyAll() ; end;

The removeAgent rnethod is responsible for removing agents out of the Café, and adding agent information to and removing agent infortnation from the blackboard. The algorithm for the removetlgent method is as follows:

procedure removeAgent0

begin if mingleDone = false or enoughstay = false or count = O then wait; else begin count--; if the blackboard is full of agents then 1. remove the first one out of the blackboard 2. add the agent to be removed out £rom the café at the end of the blackboard 3. remove the first agent in the café; 4. remove the first time element in timeslice; end; noti@All(); end;

5.4 Test and Evaluation

The implementation of the keyphrase-based information sharing system is presented in the

75 previous two sections. We compare the two similarity measure methods described in Chapter

4 by using actual text articles as test data. We test the information sharing system based on agents' direct communication and recornrnendations from other agents, and by visiting the blackboard.

5.4.1 Cornparison of Similarity Measure Methods

A number of similarity measure methods have been used to measure the similarity behveen a query and a collection of documents in a database in the area of information retrieval

(Brüninghaus and Ashley, 1998). For the purpose of information sharing arnong software agents, these methods appear to be good starting points in the selection of one of the measures that provide a numerical distance between agents. This section describes the evaluation of the two similarity measures explained in Chapter 4, the cosine measure method and the substring indexing method.

A sample of 49 article abstracts fiom technical magazines and web pages was taken as the evaluation data. These 49 abstracts were classified into five categories based on their content.

Each category represented one specific topic. The text document indexing software

Exîructor, developed by the Interactive Information Group of the National Research Council of Canada (Tumey, 1999), was used to index the abstracts and obtain their vector space representations. The source information of these article abstracts and their keyphrase vector space representations are listed in Appendix A. One article fiom each category was selected.

The similarities between the selected one and al1 others, including the selected one itself, were calculated by using equations 4.2 and 4.3. It is assumed in the thesis that the similarity between two articles is reflexive, which means that if the similarity of article 3 to article 12 is 0.06, then the similarity of article 12 to article 3 is likewise 0.06.

Figures 5.7 to 5.1 1 show the similarity measure results of one article fiom each category against al1 articles of the five categories including itself, by using the cosine measure method and the substring indexinç method. For al1 these figures, the numbers on the horizontal mis represent article sequence numbers, The first eleven articles represent the first categoty.

Articles 12 to 22 are fiom the second category. Articles 23 to 32 represent the third categoty.

Articles 33 to 42 are From the fourth category, and the last seven-articles are hmthe fifih category.

Al1 five figures show the same thing: the similarity result obtained from the cosinc measure method is much better than that obtained fiom the substring indexing method, both for distinguishing articles in one category from those in another category as well as for distinguishing between articles in the sarne category. In addition, the five figures also show that the similarity values, obtained by the cosine measure method for one article against other articles in the sarne category, are mucli higher than those obtained by the substring indexing measurernent results.

The above sirnilarity differences between the cosine measure method and the sub-string indexing method are due mainly to the fact that each treats a keyphrase of an article Figure 5.7: Similarities of article 3 fiom category 1 with al1 articles. (a) the cosine measure method results (b) the subsûing indexing method results Category 1: articles Cateaow 2: articles 12 - 22 ~ategory3: articles 23 - 32 Category 4: articles 33 - 42 Category 5: articles 43 - 49

articles (ai

Category 1: articles i - 11 Category 2: articles 12 - 22 CategoFj. 3.:._articles 23 - 32 Category 4: articles ?3 - 42 Categorj 5: articles 43 - 49

articles (b)

Figure 5.8: Similarity of article 12 fkom category 2 with al1 articles. (a) the cosine measure method results O>) the substring indexing method results articles ia 1

Category 4:

articles (b)

Figure 5.9: Similarity of article 23 fiom category 3 with al1 articles (a) the cosine measure method results (b) the substring indexing method results articles (ai

Category 1: articles 1 - 11

~ - 2: articles 12 - 22 C Cateaorv-., a I Category 3: articles 23 - 32 Tategory ET3ZTies Category 5: articles 43 - 49

articles (b)

Figure 5.10: Similarity of article 35 from category 4 with al1 articles. (a) the cosine measure method results (b) the substring indexing method results articles (a1

1: artycles - 11 Category 2: articles 12 - 22 cate;;Cate O 3: articles 231 - 32 Categorl 4: articles 33 - 42 1 Category 5: articles 43 - 49 Il-

articles (b)

Figure 5.1 1: Similarity of article 46 from category 5 with al1 articles. (a) the cosine measure mehtod results (b) the substring indexing method results differently. The cosine measure method does not consider each keyphrase in a keyphrase vector equally. Some keyphrases have weights more than other keyphrases. Therefore, a keyphrase with a higher weight has much more influence on the resulting similarity value than one whose weight is lower. However, the substring indexing method mats each keyphrase in a keyphrase vector equally.

5.4.2 System Test

The aboce cornparisons indicate that, for determining the relevance of agents, the similarity results obtained by using the cosine measure method are much better and more useful than those obtained by using the substring indexing method. Therefore, the cosine measure method was used in the implementation of the keyphrase-based information sharing system.

Experiments were conducted to test the system. The above 49 article vector space

representations are used to represent interests of agents, and 49 agents are created,

correspondingly. The tests have been designed to find out:

1. Agents can find other relevant agents and share information with them through direct

communication;

2. Agents cari get information through recornrnendation fiom other agents; and

3. Agents can get information by visiting the blackboard in the Café.

The test results are presented below. A. Information Sharing by Direct Communication

This test is designed to discover if the system can make agents comrnunicate with each other and share information with the relevant ones. We defined each agent type and arranged the order the agents enter the Café in such way that an agent could not get information by recommendation from other agents or by visiting the blackboard. The information sharing happens only through direct communication behveen agents.

Several tests were conducted for each of the five interest catcgory agents. Figure 5.12 displays one of these tests. It shows the order in which each agent entered and let? the Café.

Agents from cluster-1 to cluster-1 1 were created from articles of the same category. Agents from cluster-1 to ciuster-10 were defined as info agents, and agent cluster-1 1 as search agent with similarity-threshold of 0.3. Agents from cluster-1 to cluster-10 do not take information shnrinç since they are the same type agents, but agent cluster-1 1 could possibly share information with some of them. Figure 5.12 shows that agents fiom cluster-1, cluster-2 and cluste- had already lefl the Café before agent cluster-11 entered the Café.

Therefore, agent cluster-1 1 couldn't share information with them. The agents with which agent cluster-1 1 could share information are those in which cluster-1 1 is interested and are

in the Café while cluster-11 is also in the Café. The test results show us that after leavinç the

Café, agent cluster-1 1 carried the information of agents cluster-4, cluster-5, cluster-6,

cluster-7, cluster-9, and cluster-10, and each also carried the information about agent

cluster-Il with them. The test results verify that the system definitely cmhelp agents share

information with the relevant ones through direct communication. B. Getting Information by Recornmendation

In order to test if an agent can get information from the recommer,dation of other agents, nurnerous experiments were conducted. In one expenment, agent cluster-l was defined as search/infoAgent, agents fiom duster-2 to cluster-10 as infoAgent, and agent cluster-1 1 as searcmgent. It was impossible for agent cluster-1 to meet agent cluster-1 1 in the Café.

cluster-lgunb.ca is enrering the Cafe [email protected] iç entering che Cafe cluster-3Qunb.c.z is encering the Cafe [email protected] is enrering the Cafe cluscer-5Qunb.ca is entering the Cafe [email protected] is entering the Cafe [email protected] is entering the Cafe cluster-l@unb,ca is REMOVED out of ths Cafe cluster-8@unb,ca is entering the Cafe cluster-9eunb.ca is entering the Cafe [email protected] is entering the Cafe ciuster-2aunb.ca is REMOVED out of the Cafe [email protected] is REMOVED out of the Cafe [email protected] is snteriag the Cafa [email protected] is entering the Caie

[email protected] is REMOVED out of the Cafe [email protected] is entering the Cafe [email protected] is entering the Cafe shape-2eunb.ca is entering the Cafe [email protected] is ECeMOVED out of the Cafe [email protected] is REMOVED out of the Cafe

Figure 5.12: The sequence of agents entering and leaving the Câfé

In this situation, agent cluster-1 1 could obtain the information of agent cluster-1 onIy by

85 the recommendation from one of the agents cluster-2 to cluster_lO. Figure 5.13 shows the order of agents entering and leaving the Café. It shows us that before agent cluster-1 1

[email protected] is entering the Cafe [email protected] is entering the Caie [email protected] is entering the Cafe [email protected] is entering the Cafe [email protected] is entering the Cafe [email protected] is entering the Cafe [email protected] is entering the Cafe [email protected] is entering the Cafe [email protected] is entering the Caie cluster-l@~?.b.cais RE>!O'IE3 aï= of :he Cafe [email protected] is entering the Cafe [email protected] is REMOVED out of the Cafe [email protected] is entering tne Caie

[email protected] is REMOVED out of the Cafe [email protected] is entering the Cafe [email protected] is REMOVED out of the Cafe

[email protected] is entering the Cafe cluster,[email protected] is REMOVED out of the Cafe [email protected] is entering the Cafe [email protected] is REMOVED out of the Caie

Figure 5.13: The sequence of agents entering and leaving the Café

entered the Café, agents fiom cluster-1 to cluster-6 had already left the Café. Therefore,

cluster-1 1 could not coming into contact with agent cluster-1 and share information with it

directly. But agent cluster-1 1 got the information of agent cluster-1 after it got out of the

Café. This obviously venfies that the system cmmake agents get other agents' information

by recomrnendations. C. Getting Information by Visiting the Blackboard

This test was designed to veriS, that agents could get other agent information by visiting the blackboard in the Cafe. We conducted this type of experirnent by speciijhg that when one agent got into the Café, al1 other relevant agents had already Ieft, and that their infomiation could not be carried by other agents in the Café. Therefore, the agent could get other relevant agents' information only by visiting the blackboard. Figure 5.14 shows the order that each agent entered and left the Café and the blackboard. In this experiment, agents from cluster-1 to ciuster-9 had aiready lefi the Café when agent cluster-1 I got there. But agent clustecl i caned information of ciuster-6, cluster-7, and cluster-9 after it got out of the Cafe. It is obvious that cluster-1 1 got the information by visiting the blackboard.

5.43 System Evaluation

The performance measures used are precision and recall, which are well known and widely used to measure the retrieval systern performance in the information retrieval (IR) community. Recall, measuring the ability of the system to retrieve useful documents, is defined as the proportion ofrelevant material retrieved. Precision, conversely measuring the ability to reject useless materials, is defined as the proportion of retrieved material that is relevant to a query. In IR, the documents in a document collection and the documents retrieved corresponding to a query from the collection cari be divided into relevant and un- relevant ones, as shown in Table 5.1. The recall and precision cm be obtained by usinç these variables. [email protected] is entering the Cafe cluster-2Gunb.ca is entering the Cafe

ciuster-8&inù.ca is entering the Cafe [email protected] is entering the BLACKBOAWI [email protected] is REMOVED out of the Cafe

hougn-5Bunb.ca is entering the Cafe cluster-5Qunb.ca is entering the BLACKBORW cluster-5Qunb.ca is REMOVED out of the Cafe cluster_6@unb,ca is entering the BLACKBOARD [email protected] is REMOmD out af the BLACKBOARD. cluster-63unb.ca is REXOVED out of the Cafe

[email protected] is REMOVED ou: of the BLACIC30AP.D. cluster_9@unb. ca is REXOVED ou: of tne Caie [email protected] is entering the Cafe [email protected] ie entering the Cafe

cluster-lO@unb. ca is entering the XACKBQRW cluster-69unb.ca is REMOVED out of the 3UCKaOARD.

-..

Figure 5.14: The sequence of agents entering and leaving the Café

In Table 5.1, a represents the number of relevant and retrieved references, b represents the number of relevant and non-retrieved references, c represents the number of non-relevant and retrieved references, and d represents the number of no-relevant and non-reûieved references. Table 5.1 : Relationship between relevant and non-relevant documents

1 Relevant 1 Non-relevant I I 1 Retrieved a c Un-retrieved b d

We cm use the following equations to obtain the precision and recall measures:

a Precision: p= a + c

The above precision and recall definitions were sufficient and appropriate for the early IR systems, which were merely capable of boolean searching. In the early IR systems, a user's query was expressed as a boolean combination of keywords, and the systems retrieved the

documents matching the constraints represented by the query (Brüninghaus and Ashley,

1998). Unlike the early IR systems, our information sharing system uses vector-space mode1

of keyphrases and their weights and can be used to calculate the degree of similarity value

between one agent and another agent. A creator can assign a threshold (or cut-off point) to

her agent to determine whether it shares information with al1 other agents or just with some

of them.

Therefore, the choice of the threshold will greatly influence the precisiodrecall of the

8 9 system. For exarnple, when one agent communicates with others, if it shares information withmore other relevant agents, the system's recalI wilI increase. If at the same time it shares information with more other irrelevant agents, then the precision wiIl decrease. In order to take this trade-off into consideration, we evaluate the keyphrase-based information sharing system by using average precision and recdl values at different thresholds.

We obtain the average precision and mal1 vaIues this way: 1. We chose one agent fiom each agent category and defined it as a search agent and the remaining agents as info agents in each sxperiment; 2. We assigned various thresholds to the search agent; 3. We used equations 5. l and 5.2 to calculate the precision and recall values corresponding to various similarity thresholds based on the information that each search agent carries after it finishes the communication with al1 other info agents, and 4. We averaged the precision and recall values of various search agents. The information that each agent carries corresponding to various similarity threshoIds are given in Appendix B.

Table 5.2 lists the average precision and recall results based on Appendix B. Figure 5.15 depicts the recall and precision curve of different similarity thresholds. It is obvious fiom both the table and the figure that the precision value increases with the similarity threshold, but the recall decreases. This result shows that the higher similarity threshold an agent has, the higher relevant level is required when it shares infonnation ivith other agents, and the smaller number of other agents it can share infonnation. The result aIso shows that this keyphrase-based information sharing system cm guarantee that an agent can bring useful information back to its user once the user assigns a reasonable threshold.

Table 5.2: Average precision and recaIl under various similarity thresholds

Similarity Threshold 1 0.05 0.1 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Precision 0.82 0.9 0.98 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Recall 0.95 0.93 0.93 0.83 0.75 0.64 0.41 0.38 0.21 0.1

O O .2 0.4 0.6 0.8 1 Recall

ST: Sirnilarity threashold

Figure 5.15: Average recall-precision graph 5.5 Conclusion

This chapter described how to implement the proposed design of the keyphrase-based information sharing system. We first explained how to implement the agent. Following complementing the description of the implementation of the Café, we tested and evaluated this system by creating agents using data fsom scientific articles. Both the test and the evaluation show that the systern has achieved the primary defined goals, which were that an agent could share infonnation with other relevant ones by direct communication, by recommendation, or by visiting the blackboard in the Café. in the next chapter, we will compare the information sharing results of ACORN with and without the cosine measure method. Chapter 6 ACORN

6.1 Introduction

One of the objectives of this thesis is to incorporate an appropriate similarity measure method into ACORN to improve its information sharing. In this chapter, we first descnbe the incorporation ofthe cosine measure method into ACORN. Consequently, we compare the test results of ACORN before and after the incorporation of the similarity measure method by using actual and machine-created data. The test results will show us whether we cm fulfil

our goal of improving ACORhrs information sharing by adding the similarity measure

method.

6.2 Cosine Measure Method

ACORN is a multi-agent-based information retrieval and provision system (Mush, 1997).

It uses static and mobile agents to cany out the information search and distribution tasks for

its creators. The static agents, including Client, Server, and Café, are responsible for the

mobile agent creation, migration, information sharing, and so forth. The mobile agents,

including InfoAgent and SearchAgent, are used to move information and queries fiom site

to site to relevant users. In ACORN, both a query and a piece of information to be distributed are represented as boolean keyphrase vectors, which means that the keyphrase vector contains only a set of keyphrases, but not corresponding weights. There are two notions as to how people relevant to a query or information cm be found. The first type of notion is given by human users and agents do not reason about it. The second relevant notion is realized by agents through interaction with other agents. ACORN uses exact keyphrase matching method for agents to build their interest relevant community and to share information with each other in that community (Marsh and Masrour, 1997). It is obvious that using the exact keyphrase matching is not an efficient way of searching and distributing information. This matching method makes it almost impossible for people to extract relevant, usefùl, and interesting information fiom information sources. In the same vein, it is also impossible for people to distribute their information to other relevant people.

We have incorporated the cosine measure method into ACORN. The boolean keyphrase matching method has been replaced with the cosine measure method in the class Café.

6.3 Simulations

Simulation studies are carried out to measure the effectiveness of the incorporation of the cosine measure method. Both machine-created data and actual article data have been used to conduct the simulation experirnents in order to compare ACORN with the boolean keyphrase matching method ta ACOm with the cosine measure method. The experiments were camed out to verify whether the cosine measure method affects the execution speed of

94 ACORN and improves the information sharing capacity of ACORN. That is, do agents get more relevant information while interacting with other agents by using the cosine measure method? The simulation data used for this work and the experiment results are provided in the following.

6.3.1 Test Data

Simulations were carried out by using both machine-created data and actual article data. The difference between these two types of data is that how agents are denved. Ali agents in the machine-created data file were randomly produced by computer from a set of predefined terms. However, agents in the actual article data file were obtained fiom the processing of articles of some magazines. The data format for each agent in a data file can be e~plainedby using a sample, shown in Figure 6.1.

NUM-SERVERS:l 1s-NETW0RKED:O AGENT-DETAILS : SERVERN0:0,EMA1~~~DRESS:ghorSani@aghorb~~~:bhavsar@ aghorban;O;networking(0.96;computer science~0.58;rnultimedia studies(0.58;linux~0.58,~~YWOR~S:movies(O, USER-KEYW0RDS:artificial 10.12;software agents10.6l;~reek[O.61 % % % SERVERNO:O,EMAILADDRESS:bob@aghorban,RECED:steve@aghorba n;O;political science~0.05;audio-vidio services10.05;data analysis~0.54;biology~O.54;aboriginalstudies10.54, KEYWORDS:~~~~~~~.~~,SUBJECT:AGENTO,USER~KE'~WORDS:~~W~O.~~ % % B

Fi-we 6.1 Example of agents created by machine Figure 6.1 shows a machine-created a;ent file that includes only two agents. Based on this figure, the data format can be described as:

NUM-SERVERS: integer number. This first line defines the number of servers to start

at this site. This line is optional and cm be missed.

1s-NETWORKED: integer number. This is an optional field that specifies whether to

open TCPIIP ports to listen on. The integer number can be either O or 1; O means not

open TCP/IP ports, and 1 means open TCP/IP ports.

AGENT-DETAILS: This field indicates that the agent details will start in the following

line; each agent occupies only one line, and agents are separated by % sign.

SERVERNO: integer number. Tnis gives the server number, which is as integer. Note

that items are comma-separated,

EMAILADDRESS: agent owner's e-mail address.

RECOMMENDED: List of e-mail addresses and interests whose format is

address;Server ID number (integer);keyphrselweight;keyphraselweight; etc.. These

details are separated by + from other recommended people, where a vertical bar '1' is

used to separate keyphrase and its weight.

KEYWORDS: list of semicolon-separated keyphraselweight pair for the agent, where

a vertical bar '1' is used to separate keyphrase and its weight.

SUBJECT: a string, which is used to represent an ID for the agent.

USER-KEYWORDS: keyphraselweight;keyphrase\weight.The vertical bar '1' is used

to separate a keyphrase and its weight. 6.3.2 Results

Machine-generated agent data have been used to measure the execution time of ACORhl since these type data are very easy to be created by a cornputer. For each agent, its interest keyphrases are created associated with a randorn weight ranging from O to 1. We then parse the interest keyphrase in the class Café. The execution time measured includes the time needed for agents to complete their infamation sharing in the Café, called mingle time, and the tirne for ACORN to process al1 of the work, called processing time.

Table 6.1 : Mingle and processing tirne of ACOW

ACORN With Cosine Original ACORN

No, of Eiingle Processing Mingle Processing Agents Tirne The Th8 The I seconds) (seconds 1 (seconds) (seconds) 3 .O 15.33 Ten experiments with agent numbers ranging kom 10 to 100 were conducted for ACORiY with and without the incorp~rationof the cosine measure method, respectively. Each expenment was executed three times; and the average execution time results are listed in

Table 6.1.

Figure 6.2 shows the experiment results. The horizontal axis represents the number of agents, and the vertical axis the execution time in seconds. It is obvious fiom Table 6.1 and Figure

6.2 that both the mingle time and processing time increase with the number of agents. But the mingle time and processing time are lower for ACORN with the cosine measure method than that for ACORNwith its original boolean keyphrase matching method. This processing time difference rnay be due to the mingle time difference behveen the two methods. We attribute this mingle time difference to the fact that the original ACORN did not sort the interest keyphrase vectors, but did sort keyphrases when the cosine measure methods tvas incorporated.

To investigate whether the incorporation of the cosine measure method irnproves the information sharing ability for ACORN, agents were created by using actual article data by following the agent data file format. Figure 6.3 shows five agents and other relevant agent information they incIuded before they entered the Café and after they met each other, shared their information, and got out of the Café with and without the cosine measure method. The five agents and their included agents belong to three interest categories. Agents with same prefix are fiom the sarne interest category. For example, agents Cluster-3, Cluster-4, Cluster7 and Cluster-1 O share the sarne interest. However, agents with different prefixes such as Cluster-3 and Hough-7 do not share relevant interests. Only a method that helps agents to get relevant information by information sharing can be said to be practical.

Otherwise, the method is not practical since agents cannot get their relevant information, and some or most of the information they can get might not be useful to their users.

O 20 40 60 80 1 O0 Number of Agents

I Mingle Time with cosine --O- Total Exection Time with cosine

+ Mingle Time without cosine -o Total Exection Time wilhout cosine

Figure 6.2: The execution time for the whole process and information sharing of

ACORN with and without the cosine measure method. AWCM: AWCM: Agent-1 Agenc-5 Agent-5 Agent-1

Amal: AHIIKTM: Agent-1 kgen~5 Agent-5 Agent-1 Cluster-7 Cluster-10 Cluster-3 C lus ter-4

AWCM: ACOEW with the cosine measure method AWTCM: ACOEW without the cosine measure method

Figure 6.3: Mingle results of ACORN with and without the cosine measure method Figure 6.3 clearIy indicates that the incorporation of the cosine measure method really improves the information sharing for ACORN. AAer the incorporation ofthe cosine measure method, al1 five agents get relevant and useful information through information sharing. But agents in ACORiV with ifs original boolean keyphrase matching method get some irrelevant information in addition to some relevant information.

6.4 Concluding Remarks in this chapter, the incorporation of the cosine measure method in ACORN has been presented. We then used machine-generated agent data to obtain the execution time of

ACORN with and without the incorporation of the cosine measure rnethod. We also created

agents by using actual article data to test whether the incorporation of the cosine rneasure method really irnprovesACORN"' information sharing or not. Both experiments indicate that

the incorporation of the cosine rneasure rnethod not only speeds ACORhrs information

sharing execution time, it also makes its information sharing more efficient, more practical,

and more use fui. Chapter 7 Conclusions and Future UTork

This chapter contains the concluding remarks of this thesis work. It also identifies possible areas of future work.

7.1 Summary of the Thesis Work

This thesis aimed at designing and implementing a keyphrase-based information sharing system for a cornrnunity of mobile agents. The work carried out in this thesis can be surnmarized as follows:

Investigation of the concept of software agents and their applications: This overview

presented a basic idea of what software agents are and what they can do for handling

information overloading and other cornputer-related work for humans.

Descriptions of the information sharing in our daily life: Based on these descriptions,

we have divided agents into three categories: searchAgent, infoAgent, and

searchJinfoAgent. Each of these agents has its own task. As the names suggest,

searchAgent is used to look for relevant information for its user; infoAgent is used to

distribute information to relevant people; and search/infoAgent is used both to search

and distribute information.

Design and implementation of agents: For each category of agents, we have defined the information it must cqwith it to fulfil its task, and we have also implemented various

types of agents.

a Design and implementation ofthe virtual meeting place, Café, by using the multi-thread

technique: The Café determines if an agent will be allowed to enter the Café, based on

whether there is an empty space for the agent; it saves agents' information for a while

after they leave the Café; it makes agents find other relevant agents to exchanse

information with each other; and finally, it forces agents to leave the Café if they have

stayed long enough and have fmished shanng information with other agents.

Comparison of two similarity measure methods: We have implemented the cosine

measure method and the subsûing measure method and applied them to actual articles.

The results indicate that the cosine measure method is better than the subsûing rneasure

method for differentiating articles in one category fiom another category and between

artides within the sarne category.

Incorporation of the cosine measure method into ACORN: We have incorporated the

cosine measure method into ACORN. Further, we have carried out experiments using

both machine created data and real data. We have found that the incorporation of the

cosine measure method makes the information sharing in ACORN faster and more

efficient.

7.2 Future Work

This work is just part of a cornmunity-based information system. In order to make this work nsek! in infxxti~nrlictnhi'rilnn an? prnvisim, addine it into a cornmunity-based

1O3 information system is necessq. ACORN can be one of the ideai candidates to add this community-based information sharing system. However, due to the agent category differences and some other differences behveen ACORN and ths system, we need to modifj

ACOMso that this system can be incorporated into it. In addition, we can also develop our own cornrnunity-based information system based on our current work.

Another task is to explore its applications in other domains, such as E-commerce and expert-finding agent systems. We do believe that there are other application areas that have the same characteristics as the information retneval and provision area.

Finally, we need to automate the process of obtaining keyphrases from documents. In this way, auser's interest representation in an agent can be obtained once the user inputs relevant documents. References

Armstrong, R., Freitag, D., Joachims, T. and Mitchell, T. "WebWatcher: A Leaming apprentice for the World Wide Web", in MAISpring Symposium on Information Gathering from Heterogeneous, Distributed Environments, March 1995; URL: http//www .cs.cmu.edu/afs/cs/project/theo6/web-agent/~v!~vebagent-plus.ps.Z.

Belkin, N. and Croft, B. "Information Filtering and Information Retrieval", Communications of the ACM, Vo1.35, No.12, December 1992, pp.29-37.

Brüninghaus, S. and Ashely, K.D. "Evaluation ofTextual CBR Approaches", in Proceedings of the AAAI-98 Workshop on Textual Case-Based Reasoning (AAA- Technical Report WS- 95- 12), Madison, Wi, December 1998, pp.30-34.

Chavez, A. and Maes, P. "Kasbah: An Agent Marketplace for Buying and Selling Goods", in Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology (PAAM'96). London, UK, April 1996; URL: http:!/agents.~nv.media.mit.edu/groups/agents/publications!

Davies, N. J. and Weeks, R. " Jasper: Comrnunicating Information Agents", in Proceedings of the 4th International Conference on the World Wide Web, Boston, USA, December 1995, pp.28-37.

Ferber, J. "Simulating with Reactive Agents", Many Agent Simulation and Artificial Life, Amsterdam: IOS Press, Hillebrand, E. & Stender, J. (Ed.), 1994, pp.8-28.

Foner, L. N. "Yenta: A Multi-Agent, Referral Based Matchmaking System", the First Conference on Autonomous Agents (Agents '97), Marina del Rey, California, February 1997; Foner, L. N. "A Multi-Agent Referral Systern for Matchmaking", The First International Conference on the Practical Applications of Intelligent Agents and MuIti-Agent Technology, London, UK, April 1996; URL: http:/ifoner.~nvw.media.mit.edu/people/f~ner/yenta-brief.html

Geme, B. "Online Information Systems", Information Resources Press, Arlington, VA, 1983.

Goldberg, D., Nichols, D., Okj, B.M. and Terry, D. "Using Collaborative Filtering to Weave an Information Tapestry". Communication of the ACM, VOL 33, No. 12, December 1992, pp.6 1-70.

Konstan, J. A., Miller, B. N., Maltz, D.,Herlocker, J. L., Gordon, L. R. and Riedl, J. "GroupLense: Applying Filtering to Usenet News", Communications of the ACM, Vol. 40, No. 3, March 1997, pp.77-87.

Kozierok, R. and Maes, P. "A Leaming Interface Agent for Scheduling Meetings", in Proceedings of the ACM-SIGCHI International Workshop on Intelligent User Interfaces, ACM Press, N.Y., 1993, pp.8 1-93.

Lang, B.D. and Oshima, M. "Seven Good Reasons for Mobile Agents", Communications of the ACM, Vol. 42, No. 3, March 1999, pp.88-89.

Lang, K. 'Wewsweeder: Leaming to filtering netnews", Technical Report, School of Computer Science, Carnegie Mellon University, 1995; URL; http://anther.learning.cs.cmu.edu~mI95/ps.

Larence, S. and Giles, C. "Searching the World Wide Web", Science, Vol. 280, No. 3, April Lieberman, H. "Letizia: An Agent that Assists Web Browsing", in Proceedings of the International Joint Conference on Articial Intelligence, IJCAI 95, Montreal, August 1995; UiU: http;/llieber.www.media.mit.edu/peopleilieber/Liebera~/LetizidLetizia-Intro.h~l

Lieberman, H. "Autonomous Interface Agents", in Proceedings of the ACM Conference on Computers and Huinan Interface, CHI-97, Atlanta, Georgia, March 1997: URL: http:/~lieber.~n~~v.media.mit.edulpeople~ieber/Liebera~/LetizidLetizia-Intro.h~l

Liebeman, H., Van Dyke, N.W., and Vivacqua, AS. "Let's Browse: A Collaborative Web Browsing Agent", in Proceedings of International Conference on Intelligent User Interface, Redondo Beach, CA, USA, January 1999; URL: http://agents.w~wv.media.mit.edul~oups/agents/publications/

Maes, P. "Agents that Reduce Work and Information Overload". Software Agents, AA.41 PressTîhe MIT Press. Bradshaw, J.M.(Ed.), 1997, pp. 145-164.

Maes, P., Guttman, R.H., and Moukas, A.G. "Agents that Buy and SeIl", Communication of the ACM, Vol. 42, No, 3, March 1999, pp.8 1-9 I.

Marsh, S. "A Different Approach to Information Provision and Retrieval", in proceedings CAIS'97, Canadian Association for Information Science, Workshop on Communication and Information in Context, 1997, Learned Societies, St Johns, Newfoundland, June 1997; URL: http://ai.iit.nrc.ca/-stevePublications.h~l

Marsh, S. and Masrour, Y. "Agent Augrnented Community Information - The ACORN Architecture", in Proceedings CASCONP7, Meeting of Minds. November 1997; URL: http://ai.iit.nrc.ca/-stevdPublications.html Morita, M. and Shinoda, Y. "Information Filtenng Based on User Behavior Analysis and Best Match Text Retrieval", in Proceeedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, Springer-Verlag, 1994, pp.272-28 1.

Moukas A. "Information Discovery and Filtering using a Multiagent Evolving Ecosystem", MS Thesis, Media Laboratory, Massachusetts Institute of Technology, 1997. URL: http:llmoux.w\nv.media.mit.edu~people/mouxi hloukas, A., Guttman, R., Zacharia, G. and Maes, P. "Agent-mediated Electronic Commerce: An MIT Media Laboratory Perspective", 1999; URL: http://ecommerce.media.mit.edu

Nwana, S. H. "Software Agents: An Overview", Knowledge Engineering Review, Vol. 1 1, No. 3, September 1996. - Cambridge University Press, 1996, pp. 1-40.

Porter, M. "An Algorithm for Suffix Stripping", Program, Vol. 14, No. 3,1980, pp. 130-138.

Rijsbergen, C.J. "Information Retrieval", Buttenvorths, London, 1979.

Salton, G. and McGi11, M. "Introduction to Modem Information Retrieval", McGraw-Hill, New York, 1983.

Shardanand, U. "Social Information Filtering for Music Recommendation", MS Thesis, Media Laboratory, Massachusetts Institute of Technology, 1994. URL: http://agents,www.media.mit.edu~groups/alications/

Sheth, 3."A Learning Approach to Personalized Information Filtering", MS Thesis, Media Laboratory, Massachusetts Institute of Technology, 1994. URL: http://agents.~,v.media.mit.edu~groups/agents/publications/

Tumey, P.D. "Learning to Extract Keyphrases fiom Text", NRC, Technical Report ERB- 1057, National Research Council Canada, 1999; URL: http://extractor.iit.nrc.ca~cgi-bin/ex~p03A~/extractor.iit.nrc.ca/tool bar.htm1

Weibel, S., Godby, J., Miller, E. and Daniel, R. "OCLCNCSA Metadata Workshop Report", 1995; URL: http:il~~~~v.oclc.org:5046/oclc~researchlconfaencesimetadata~dublin~core~report.html Appendix A

Test Data

Following lists the source information and the keyphrase vector information about the articles used as test data for this thesis.

1. Article 1 (Clusterl): TITLE: Knowledge-based Clustering Scheme for Collection Management and Retrieval of Library Books. AUTHOR: MN.,Murty and A.K., Jain JOLXNAL: Pattern Recognition Vol. 78, No.7, pp.949-963, 1995 PUBLISHER: Elsevier Science Ltd. KEYPHR\SES: scheme, 15.43; cluster, 11.56; similarity, 3.85; hierarchy, 3.85: Computing Review, 3.85; domain knowledge, 3.85; unifom classification scheme, 2.00; experiment, 1.00; cornparison. 1.00; design, 1.00; assign, 1.00;

2. Article 2 (Cluster-2): TITLE: Divisive Clustering of Syrnbolic Object C'sing the Concepts of Both Similarity and Dissimilarity. AUTHOR: K.C., Gowda and T.V.Ravi JOURNAL: Pattern Recognition Vol, 38, No.8, pp. 1277- I28L 1995 PUBLISHER: Elsevier Science Ltd. KEYPHRASES: similarity, 43.5 1 : symbolic object, 22.75; cluster, 22.75; method, 15.56; span, 1 1.33; divisive clustering algorithm, 7.93; advantage, 7.93;

3. hrticle 3 (Cluster-3): TITLE: A Tabu Search Approach to the Clustering Problem. AUTHOR: K.S., Al-sultan JOURNAL: Pattern Recognition Vol. 25, No.9, pp. 1443- 143 I, 1995 PUBLISHER: Elsevier Science Ltd. KEYPHRASES: cluster, 15.42; algorithm, 5.00; classify, 3.85; Euclideanspace, 3.85; k-means, 2.00; simulated annealing algorithm, 1.00; encourage, 1 .OO; Preliminary computational experience, 1.00; tabu search technique, 1.00; local minima, 1.00; nonconvex program, 1.00; distance, 1.00;

4. Article 4 (Cluster-4): TITLE: A Combined AIgorithm for Weighting the Variables and clustering in the Clustering Problem. AUTWOR: V. 1. tumelsky JOURNAL: Pattern Recognition Vol. 15, No.2, pp.53-60, 1982 PUBLISHER: Elsevier Science Ltd. KEYPHRASES: cIuster, 30.84; transformation. 23.13; classify, 19.27; algorithm, 1 1.56; original variable, 1 1.56; weigiiting, t 1.00; weighting procedure, 4.00; interpret, 7.00; Mahalanobis distance method, 1.00; equal variance, 1.00; 5. Article 5 (Cluster-5): TITLE: Cluster Validity Profiles. AUTHOR: Thomas A. Bailey, etc. JOURNAL: Pattern Recognition Vol. 15, No.2, pp.61-83, 1982 PUBLISHER: Pergamon Press Ltd. KEYPHRASES: cluster, 46.26; profile, 26.98; probability, 19.27; rank-order proximity, 7.71; quantifi, 7.71; probability profile furnish, 3.85; judge, 3.85; clustering algorithm, 3.85; quantitative evaluation, 3.55; literature, 2.00; randomly chosen graph, 2.00; isolation, 2.00;

6. Article 6 (Cluste-): TITLE: Clustering Based on Multiple Paths. AUTHOR: S. Tamura. JOURNAL: Pattem Recognition Vol. 15, No,6, pp.477-483, 1982 PUBLISHER: Pergamon Press Ltd. KEYPHRASES: cluster, 15.42; classify, 7.7 1 ;pattern, 7.7 1; n-connectedness, 7.7 1 ; algorithm, 5.00; pph, 3.85; pattern pair, 3.35; strongIy touching cluster, 3.85; separation, 3.85; ordinary connectedness algorithm, 1.00; proposed algorithm. 1.00; classification method, 1.00; transitivity, 1.00;

7. -4rticle 7 (Cluster-7): TITLE: Clustering by competitive agglomeration. AUTHOR: Frigui, H.; Krishnapuram, R. JOURNAL: Pattern Recognition Vol. 10, No.7. pp. 1109- 1 1 19, 1997 PUBLISHER: Pergamon Press Ltd. KEYPHARSE: cluster, 30.84; algorithm, 15.42; partition, 1 1.56; incorporate; 7.71: objective function, 7.71; sequence, 3.85; advantage, 3.85; rninimize, 3.85; shape, 1.00; unknown, 1.00; environment, 1.00; update equation, 1.00;

S. Article S (Cluster-8): TITLE: A feature point clustering approach to the recognition of form documents. AUTHOR: Fan, K.-C.; Lu, S.-M.; Chen, G.-D. JOURNAL: Pattern Recognition Vol. 10, No.7, pp. 1 109- 1 1 19, 1997 PLJBLISHER: Pergamon Press Ltd. KEYPHRASES: form document, 26.98; cluster, 11.56; automation, 7.71; experiment, 2.00; novel method, 2.00; feasibility, 2.00; paaern, 2,00; character, 2.00; gaph matching problem, 1.00; distinct group, 1.00;

9. Article 9 (Cluster-9): TITLE: A New Clustering Aigorithm With Multiple Runs of Iterative Procedures. AUTHOR: Q. Zhang and R.D.Boyle. JOURNAL: Pattern Recognition Vol. 74, No.9, pp.835-848, 199 1 PUBLISHER: Pergarnon Press Ltd. KEYPHRASES: pattern, 7.7 1; cluster, 7.7 1; k-means, 7.7 1; algorithm, 7.? 1; pattern recognition, 3.85; scene analysis, 3.85; pattern classification, 3.85; chosen iterative procedure. 2.00; poor local minima, 2.00; comprehensive experiment, 2.00; random initialization, 1.00; controlled configuration change, 1.00; 10. Article 10 (Cluster-10): TITLE: A Simulated Annealing Algorithm for the Clustering Problem. AUTHOR: S.Z. Selim and K.Alsultan. JOURNAL: Pattern Recognition Vol. 24, No. 10, pp. 1003- 1008, 199 1 PUBLISHER: Pergamon Press Ltd. KEYPHRASES: cluster, 19.27; algorithm, 15.42; k-means, 7.71; simulated annealing, 3.85; solving optimization problem, 2.00; disadvantage, 1.00; advantage, 1.00; general data set, 1.00; global solution, 1.00; algorithm converge, 1.00;

1 1. Article 11 (Cluster-1 1): TTTLE: A clustering algorithm using an evolutionary progamming-based approach. AUTHOR: M. Sarkar, B. Yegianarayana and D. Khemani JOURNAL: Pattern Recognition Letter, Vol. 18, No.10 (1997) pp.975-986 PUBLISHER: Pergarnon Press Ltd. KEYPHRASES: cluster, 26.98; algorithm, 15.42; clustering task, 3.55; proposed tncthod, 3.85; optimum, 3.85; algorithm effectively group, 3.55; evolutionary programming-bascd cluster, 3.85; locally optimal solution, 1.00;

12. Article 12 (Hough-1): TITLE: Fast Generalized Hough Tmnsform AUTHOR: Shen-Ching Jeng and Wen-Hsiang Tsai JOURNAL: Pattem Recognition Letters 1 l(1990) 725-733 PUBLISHER: North-Holland KEYPHRASES: Hough transform, 15.42; image portion, 3.83; reduce, 3.85; generalized Hough operation, 3.85; inverse generalized Hou&, 3.85; hierarchical processing scheme, 3.85; algorithm, 3.55; pyrarnid machine, 1.00; processing elernent, 1.00; computation time, 1.00;

13. Article 13 (Hough-2): TITLE: Generalizing The Hou& Transform to Detect Arbitrary Shapes AUTHOR: D.H. Ballard JOURNAL: Pattern Recogiition Vol. 13, No.2, pp. 1 1 1-122, 198 1 PUBLISHER: Pergamon Press Ltd. KEYPHRASES: curve, 23.13; detection, 19.27; Hough transform, 19.27; exploit, 11.56; analytical curve, 7.71; map, 6.00; shape, 6.00; image, 3.00; duality, 3.85; complex shape, 2.00; Houghtransform space, 2.00; arbitrary non-analytical shpe2.00; generalized Hough transform, 2.00; universal transform, 1.00; component shape, 1.00;

14. Article 14 (Hough-3): TITLE: Block Decornposition and Segmentation for Fast Houghtransform evaluation. AUTHOR: S.J. Perantonis, B. Gatos and N. Papamarkos JOURNAL: Pattern Recognition Vol. 32, No.5, pp.8 11-824, 1999 PUBLISHER: Elsevier Scierice Ltd. KEYPHRASES: decomposition, 19.27; Hough transform, 15.42; block, 15.42; binary image, 11.56; complexity, 7.71; rectangular block, 7.71; evaluate, 3.85; fast method, 3.85; primitive, 3.55; foregound pixel, 3.85; algorithm, 3.00; image processing experiment, 1.00; linear feature, 1.00; significant acceleratioti, 1.00; 15. Article 15 (Hough-4): TITLE: Hierarchical Generalized Hough transforms and Line-Segment based Generalized Hough Transfomis. AWHOR: L.S.Davis JOüRîYAL: Pattern Recognition Vol. 15, No.4, pp.277-285, 1952 PUBLISHER: Pergarnon Press Ltd. KEYPHRASES: pattern, 19.27; match, 11 S6; extension, 7.7 1; Hough transform, 7.7 1; image processing, 3.85; pattern matching algorithrn, 3.85; segment, 1.00; geometric object, 1.00;

16. Article 16 ('Hou&-5): TITLE: Circular Kough Transfom for Roundness Measurement of Objects. AUTHOR: D., Luo etc. JOURNAL: Pattern Recognition Vol. 28, No. 1 1, pp. 1745-1749, 1995 PUBLISHER: Elsevier Science Ltd. KEYPHRASES: curve, 1 1.56; Houjh transfom, 7.7 1; purpose, 3.85; cornpute, 3.55; efficient method, 3.85; plane, 3.85; sharpness, 3.85; circular Hough transfom, 3.85; evolute, 1.00; intrinsic equation, 1.00;

17. Article 17 (Hough-6): TITLE: A Fast Digital Radon Transfomi -- an Efficient Means for Evaluating the Hough Transfom. AUTHOR: W.A. Gotz and H.J. Druckmuller JOURNAL: Pattern Recognition Vol. 28, No. 12, pp. 1985- 1992, 1995 PUBLISHER: Elsevier Science Ltd. KEYPHIWSES: Hough transfom, 11.56; straight, 7.71; Hough transfom algorithm, 4.00; sequential complexity, 3.85; compute, 2.00; classical Hough transfom, 1.00; memory access requirement, 1.00; global memory access, 1.00; parallel implement 1.00;

1 S. Article 18 (Hough-7): TITLE: A New Curve Detection Method: Randomized Hough Transform (RHT). AUTHOR: L. Xu, etc. JOL'RNAL: Pattern Recognition Letters 11 (1990) 33 1-338 PUBLISHER: Noth-Holland UYPHRASES: curve, 15.32; variant; 11 S6; pixel, 1 1S6; Hough transform, 7.7 1; parameter space, 7.71; hypersurface, 7.71; cuve detection, 7.71; arbitrarily high resolution, 3.00; advantage, 2.00; cornparison, 2.00; Keyphrase, 1.00;

19. Article 19 (Hough-8): TITLE: A Combinatonal Hou& Transfom. AUTHOR: D. Ben-tzvi and M.B.Sandler. JOURNAL: Pattern Recognition Letters 11 (1 990) 167- 174 PUBLISHER: Noth-Holland KEYPHRASES: calculate, 11.56; algorithm, 11.56; segment, 7.7 1; Hough transform, 7.71; cornpute, 3.85; transform space, 2.00; extract, 2.00; desired parameter, 1.00; significant ma. 1.00; extraneous data, 1.00; sparce image, 1.00;

20. Article 20 (Hough-9): TITLE: Scale- and Orientation-invariant Generalized Hough Transfonn - 4 New Approach AUTHOR: S.C. Jeng and W.H. Tsai. JOURNAL: Pattern Recognition Vo1.24, No. 11, pp.1037-105 1, 1991 PUBLISHER: Pergarnon Press plc KEYPHRASES: Hough transform, 26.98; shape, 7.71; detect, 7.71; brute force, 3.85; object shape, 3.85; incrementation, 2.00; maximum detect, 1.00; computation requirement, 1.00; required dimensionality, 1.00;

2 1. Article 3 1 (HoughJO): TITLE: Constrained Hough Transfoms for Curve Detection AUTHOR: Clark F. Olson. JOURNAL: Computer Vision and Image Understanding Vo1.73, No.3, pp.329-345, 1998 PUBLISHER: Academic Press KEYPHRASES: curve, 19.27; Hough transfomi, 15.12; detect. 15.42; technique, 15.42: parameter space, 7.71; accurate curve detect 7-71; edge pixel, 2.00; subproblem, 2.00; real image, 1.00; experiment, 1.00;

23. Article 32 (HouAl 1): TITLE: Guaranteed Convergence of the Hough Transform AUTHOR: Menashe Soffer, Nahum Kiqati JOURNAL: Computer Vision and image Understanding Vo1.69, No.2, pp. 119-134, 1998 PUBLISHER: Academic Press KEYPHRASES: Hough Trrinsfon, 30.83; global maximum, 36.95; voting kernel, 11.56; dimensional function, 3.85; detect, 3.85; straight-line Hough Transform, 3.85; image model, 3.00; parameter space quantization, 3.00; noise, 2.00; parameter space, 2.00; multiresolution Hough algonthrn, 2.00; global optimization problem, 2.00;

23. Article 23 (Java-1): TITLE: Creatig Cool Web Applets With Java AUTHOR: Paul J. Peny SOUEICE: http:l/home2.swipnet.se/-w-2012 l/javastore/applet/index.htrnl KEYPHRASES: Java, 30.84; object-oriented programming lmguage, 7.7 1; ballyhooed object-oriented propmming, 7.71; Java development tool, 7.7 1; applet, 4.00; chance, 3.85; Windows, 2.00; perfect, 7.00; live applet, 1.00; JDK, 1.00; Web create, 1.00; Java site, 1.00;

24. Article 24 (Java-2): TITLE: Creating Web AppIets With Java AUTHOR: David Gulbransen, et al SOURCE: http://home2.swipnet.se/-w-20 12 l/javastore/applet~iridex.html KEYPHRASES: applet, 42.40: Java, 38.55; propramming, 11.56; design, 7.71; power, 7.71; interactive Web, 3.85; programming language, 3.85; basic, 1.00; incorporate, 1.00; non-progammer, 1.00; application, 1.00;

25. Article 25 (Java-3): TITLE: Developing Java Entertainment Applets AUTHOR: John Withers SOURCE: http://homeL.swipnet.sd-w-2012 l/javastore/applet/index.html KEYPHRASES: lava, 26.98; garne, 23.13; Java programmer, 1 1.56; garne design, 4.00; draw, 3.85; sound effect, 3.85; animation, 3.85; graphies, 3.85; applet, 3.85; garne progamming solution, 3.00; principle, 2.00;

26. Article 26 (Java-4): TITLE: Essential Java : Developing interactive Applications for the World-Wide Web AUTHOR: Jsson J. Manger SOURCE: http://home2.swipnet.se/-w-20 12 l/javastore/applet~index.html KEYPHRASES: prograrnming, 23.13; developer, 15.42; Java, 15.42; Web developer, 7.71; prograrnrning language, 7.7 1; novice programmer, 3.85; understand, 3.85; Windows, 3.00; book. 1 .oo;

27. Article 27 (Java-5): TITLE: Hooked on Java : Creating Hot Web Sites With Java Applets AUTHOR: Arthur Van Hoff, et al SOURCE: http://home2.swipnet.se/-w-2012 l/javastore/applet/index.html KEYPHRASES: Java, 30.84; , 1 1S6; web site, 7.7 1; Java development goup, 3.85; rnember, 3.85; inteption, 2.00; book fit, 2.00; Java progamming, 1.00;

28. Article 28 (Java-6): TITLE: introduction to Programming Java Applets: Sun Java Workshop AUTHOR: MindQ Publishing SOURCE: http:l/home3.swipnet.se/-w-2012 l/javastore/applet/index.html KEYPHRASES: Java, 26.98; CD-ROM, 15.42, Standard Edition, 11.56; Integrated Development Environment, 7.7 1; Java Workshop, 7.7 1; prograrnming, 7.7 1; applet, 4.00; Java Workshop feature, 3.85; Standard edition plus, 3.85; learning curve, 1.00; HIML editor, 1.00; working applet, 1.00;

29. Article 29 (Java-7): TITLE: The Java Class Libraries : Java.Applet, Java.Awt, Java.Beans (Vol 2) Vol 2 AUTHOR: Patrick Chan, and Rosanna Lee SOURCE: http://home2.swipnet,se/-w-2013 1 /javastore/applet/index.html KEYPHRASES: Java, 15.42; member, 6.00; expenenced Java programmer, 3.85; beginner, 3.85; resource, 3.85; Java technology, 3.85; creator, 3.85; class library, 3.85; definitive reference, 3.85; context, 2.00; real-world context, 1.00; related group, 1.00; key concept, 1.00; related classes, 1.00;

30. Article 30 (Java-8): TITLE: Java Essentials for C and C-e-t Progammers AUTHOR: Bany Boone SOURCE: http://home2.swipnet.sd-w-2012 l/javastore/javac/index.html KEYPHRASES: Java, 23.13; programmer, 19.27; bleeding-edge programmer, 3.85; Java Essential, 3.85; platfom-independent development environment, 3.85; language, 3.85; question, 3.85; applet, 1.00; design, 1.00; object concept, 1.00; onented feature, 1.00; 3 1. Article 3 1 (Java-9): TITLE: Java for CIC* Programmers AUTHOR: Michael C. Daconta, and Mike Daconta SOURCE: http:l/home2.swipnet.sel-w-20 12 1/javastore/javac/uidex.html KEYPHRASES: Java, 19.77; programmer, 19.27; C/C+t, 7.71; building Web applet, 7.71; large-scale application, 3.85; object-oriented language, 3.85; robust, 3.85; progarnmer need, 1.00; complex program, 1.00; sirniliarity, 1.00; feature, 1.00;

32. Article 32 (Java-IO): TITLE: Java With Borlmd C++ AUTHOR: Chris H. Pappas, and William H. Murray SOURCE: http://?lorne2.swipnet.se/-w-20 12 l/javastore/javac/index.html KEYPHRASES: Java, 38.55; programming, 19.27; Java tool, 1 1S6; compiler, 7.7 1 ; lançuage, 7.71; Windows, 3.00; conversion, 2.00; ClCi-+, 2.00; feanircd Java application, 2.00; Java programrning fündamental, 2.00; Integrated Development Environment, 1.00; Windows code, 1.oo;

33. Article 33 (Shape-1): TITLE: Shape reco,g.ition using fractal geometry. AUTHOR: G. Neil and K.M. Curtis JOURNAL: Pattern Recognition, Vo1.30, No. 12, (1997) 1957- 1969 PUBLISHER: Pergamon KEYPHRASES: technique, 46.26; invariant, 19.27; recognition technique, 19.27; shape recognition technique, 11.56; fractal transformation, 7.71; review, 3.85; scale invariant, 3.85; high speed shape, 3.85; motivation, 3.85; paper fractal transformation, 3.85; mathematical analysis, 2.00; selection, 1.00; technique rotationally invariant, 1.00; algorithrn, 1.00;

34. Article 34 (Shape-2): TITLE: A survey of shape analysis techniques. AUTHOR: S. Loncaric JOURNAL: Pattern Recognition, Vol.3 1, No.& (1 998) 983- 100 1 PLTBLISHER: Pergamon KEYPHRASES: shape, 19.27; match, 3.85; object recognition, 3.85; analysis methods play, 3.85; review, 3.85; classification, 2.00; visual fom perception. 2.00: interior, 1.00;

35. Article 35 (Shape-3): TITLE: Near-optimal mst-based shape description using genetic algorithm. AUTHOR: S. Loncaric and A.P. Dhawan JOURNAL: Pattem Recognition, Vo1.28, No.4, (1995) 571-579 PUBLISHER: Pergamon KEYPHR4SES: shape, 23.13; match, 1 1S6; optimal süucturing element, 1 1.56; selection, 7.7 1; morphological signature transfomi, 3.85, experiment, 2.00; classi@, 1.00; unknownobject, , 1.00; object recognition application, 1.00; evolve, 1.00; proposed optimal shape, 1.00; mode1 shape, 1.00; robust shape match, 1.00; optimization cntena, 1.OO;

36. Article 36 (Shape-4): TITLE: Efficient shape matching through model-based shape recognition. AUTHOR: H. Liang-Kai and J.W. Mao-Jiun JOURNAL: Pattern Recognition, Vo1.29, No.2, (1996) 207-2 15 PUBLISHER: Pergamon KEYPHRASES: shape, 34.69; match, 15.42; polygonal approximation technique, 7.71; recognition, 7.7 1; shape matching method, 7.71; proposed matching algorithrn, 3.85; efficient shape matching, 3.85; real image; 1.00; scale change, 1.00; rotation, 1.00; translation, 1.00; invariant, 1.00; automated inspection, 1.00; application, 1.00; matching orientation information, 1.oo;

37. Article 37 (Shape-5): TITLE: Shape decomposition and representation using a recursive morphological operation. AUTHOR: D. Wang, V. Haese-Coat and J. Ronsin JOURNAL: Pattern Recognition, Vo1.28, No. 1 1 (1 995) 1783- 1792 PUB LISHER: Pergamon KEYPHR4SES: shape 19.27; structunng element, 7.7 1 ; recursive morphological operation, 3.85; object cornponent, 3.00; decomposition, 3.00; image processing, 2.00; non-overlapping object component, 2.00; information loss, 1.00; compact, 1.00; image processing facility, 1.00; compression ability, 1.00; skeleton, 1.00;

38. Article 38 (Shape-6): TITLE: Shape description and recognition using the high order morphological pattern spectrum. AUTHOR: X. Zhou and B. Yuan JOURNAL: Pattern Recognition, Vo1.28, No.9 (1995) 1333-1340 PUBLISHER: Pergamon KEYPHRASES: robustness, 7.71; spectrum, 4.00; sensitivity, 3.55; computer vision, 3.55; image processing, 3.85; recognition, 3.85; Shape analysis, 3.85; noisy, 2.00; invariance, 1.00; noisy environment, 1.00; image, 1.00; higher order component, 1.00; isotropic stnicturinç element, 1.00; rotatinç invariance, 1.00;

39. Article 39 (Shape-7): TITLE: Optimization models for shape matching of nonconvex polygons AUTHOR: C. Jen-Ming and J.A. Ventura JOURNAL: Pattern Recognition, Vo1.28, No.6 (1995) 863-877 PLBLISHER: Pergamon KEYPHRASES: shape, 30.84; match, 15.42; reference shape, 11.56; shape model, 7.71; discrete boundxy data, 7.71; shape matching problem, 7.7 1; defect-free reference shape, 3.85; input shape, 3.85; complexity analysis, 1.00; scene data, 1.00;

40. Article 40 (Shape-8): TITLE: Shape description using cubic polynomial Bezier curves AUTHOR: L. Cinque, S. Levialdi and A. Malizia JOURNAL: Pattern Recognition Letter, Vo1.19, No.9 (1998) 82 1-828 PUBLISHER: Pergamon KEYPHRASES: shape, 29.23; approximation process, 5.40; Bezier curve segment, 5.40; matching process, 2.70; image retrieval application, 2.70; variety, 2.70; resolution, 2.70; 4 1. Article 41 (Shape-9): TITLE: Shape recognition using spectral features. AUTHOR: K.B. Eom, JOURNAL: Pattern Recognition Letter, Vo1.19, No.2 (1998) 189-195 PUBLISHER: Pergarnon KEYPHRASES: shape, 45.5 1; classification, 34.13; contour, 22.75; spectral feature, 22.75; auto-regressive process, 11.38; centroid, 11.38; magnitude, 11.38; handwritten numeral, 7.93; aircraft, 7.93;

42. Article 42 (Shapt-10): TITLE: Partial shape matching using genetic algorithm. AUTHOR: E. Ozcan and C.K.Mohan JOURNAL: Pattern Recognition Letter, Vol. l 8, No. 10 ( 1997) 987-992 PUBLISHER: Pergamon KEYPHR4SES: shape, 23.13; feature, 7.71; mode1 shape, 7.71; input shape, 7.71; angle, 3.85; segment, 3.85; noisy, 3-85; shaperecognition, 3.85; partial shape-rnatching task, 1.00; attributed shape grammar, 1.00;

43. Article 43 (Agent-1): TITLE: Humanizing the Net: Social Navigation with a "Know-who" Email Agent AUTHOR: Alaina Kanfer, Ph.D., Jim Sweet, Ann Schlosser SOURCE: http://~~vw.uswest.codweb-conference/proceedingskanfer.html KEYPHRASES: agent, 23.13; information overload, 11.56; online, 7.71; internet, 7.71; email agent, 7.7 1; social network, 4.00; electronic mail communication, 3.85; natural language query, 2.00; social networks offline, 1.00; access stored information, 1.00; increased capacity, 1.00;

44. Article 44 (Agent-2): TITLE: Cooperating Mobile Agents for Mapping Networks AUTHOR: Nelson Minar, Kwindla Multman Kramer, and Pattie Maes SOURCE: http://nelson.~w.media.mit.edu~people/neIson~research/routes-coopagent slinde.u.htm1 KEYPHRASES: network, 38.55; agent, 12.00; computer network, 7.71; next-seneration network, 3.85; acknowledge, 3.85; programming tools embrace, 3.85; communications channel, 3.85; Contemporary computer network, 3.85; efficiency, 3.00; mobile agent, 3.00; collaborate, 2.00; interaction, 2.00;

45. Article 45 (Agent-3): TITLE: Attaching Interface Agent Software to Applications AUTHOR: Henry Lieberman SOURCE: http:Mieber.www.media.mit.edu~people/lieber/lieberary/Attaching~Attach inglAttaching.htrn1 KEYPHRASES: agent, 34.69; interface, 11.56; intelligent interface agent, 7.71; explicit intervention, 3.85; goal, 3.85; necessary application-agent communication, 2.00; program, 2.00; human user, 1.00; traditional application, 1.00; developer, 1.00; attach, 1.00; agent experiment, 1.00; news filtering agent, 1.00; demonstration system, 1.00;

46. Article 46 (Agent-4): TITLE: A Multi-Agent Referral System for Matchmaking AUTHOR: Leonard N. Foner SOURCE: http://foner.~v.media.mit.edu/peopldfoner/yenta-b~ef.html KEYPHRASES: agent, 23.13; network, 7.7 1; comrnunicate, 3.85; useful application. 3.85; matchmaker system. 3.00; cluster, 2.00; word-of-mouth, 1.00; decentralized fashion, 1.00; conventional internet media, 1.00;

47. Article 47 (Agent-5):

TITLE: Collaborative Interface Agents AüTHOR: Yezdi Lashkari, Max Metral and Pattie Maes SOURCE: http:i/agents.wwv.media.mit.edu~groups/agents/publications/aaai-ymplaaa i.html KEYPHRASES: agent, 23.13; leam, 11.56; working prototype, 7.7 1; daily computer-based task, 3.55; semi-intelligent system, 3.85; Interface agent, 3.85; collaboration, 3.00; electronic mail, 1.00; multi-agent collaboration, 1.00; fiamework, 1.00; competence, 1.00;

45. Article 45 (Agent-6): TITLE: Agent Augrnented Comrnunity-information - The ACORN Architecture AUTHOR: Stephen Marsh and Youssef Masrour SOURCE: http:llinvw.iit.nrc.cal-steve/pubs/ACORNCASCON97. html KEYPHRASES: agent, 11.56; query, 7.71; ACORN, 7.71; autonornous agent, 3.55; network. 3.85; multi-agent, 3.85; working irnplementation, 1.00; philosophy, 1.00; disseminate information, L -00; knowledgeable information source, 1.00;

49. Article 49 (Agent-7): TITLE: A Community of Autonomous Agents for the Search and Distribution of information in Networks AUTHOR: Stephen Marsh SOURCE: http://wv.iit.nrc.ca/-steve/pubs/ACORN/CS .html KEYPHRASES: information share, 3.85; agent, 3.85; provision systern, 3.85; multi-agent, 3.85; timely information, 1.00;