Dynamic Clustering of Partial Preference Relations by Mian Qin

Bachelor of Science, Beijing Technology and Business University, 2002

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

Master of Computer Science In the Graduate Academic Unit of Computer Science

Supervisors: Michael W. Fleming, Ph. D., Computer Science Scott Buffett, Ph. D., Computer Science Examining Board: Huajie Zhang, Ph. D., Computer Science, Chair Kenneth Kent, Ph. D., Computer Science Richard Tervo, Ph. D., Electrical and Computer Engineering

This thesis is accepted Dean of Graduate Studies

THE UNIVERSITY OF NEW BRUNSWICK

September, 2007

©Mian Qin, 2007 Library and Archives Bibliotheque et 1*1 Canada Archives Canada Published Heritage Direction du Branch Patrimoine de Pedition

395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada

Your file Votre reference ISBN: 978-0-494-63749-4 Our file Notre reference ISBN: 978-0-494-63749-4

NOTICE: AVIS:

The author has granted a non­ L'auteur a accorde une licence non exclusive exclusive license allowing Library and permettant a la Bibliotheque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans le loan, distribute and sell theses monde, a des fins commerciales ou autres, sur worldwide, for commercial or non­ support microforme, papier, electronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats.

The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette these. Ni thesis. Neither the thesis nor la these ni des extra its substantiels de celle-ci substantial extracts from it may be ne doivent etre imprimes ou autrement printed or otherwise reproduced reproduits sans son autorisation. without the author's permission.

In compliance with the Canadian Conformement a la loi canadienne sur la Privacy Act some supporting forms protection de la vie privee, quelques may have been removed from this formulaires secondaires ont ete enleves de thesis. cette these.

While these forms may be included Bien que ces formulaires aient inclus dans in the document page count, their la pagination, il n'y aura aucun contenu removal does not represent any loss manquant. of content from the thesis.

1*1 Canada Abstract

In electronic commerce (EC), negotiation can be performed to determine fair exchanges between trading partners. In order to negotiate autonomously on behalf of a user, an intelligent agent must obtain as much information as possible about the user's preferences over possible outcomes, but without asking the user an unreasonable number of questions. This thesis explores the idea of clustering partial preference relations as a means for predicting a user's preferences. Previously unknown preferences for a user can be predicted by observing those of similar users in the same cluster. Three techniques for clustering and predicting preferences are developed based on the Y- means clustering method, and a number of experiments are conducted. The MovieLens data set, normally used to test recommendation systems, is adapted for this domain and used to provide experiments with real subjects. Results show that one particular method, which predicts which of two outcomes is preferred by analyzing the confidence in average estimated for users in the same cluster, is accurate 70-75% of the time when cluster data are sufficient for making a prediction (about 67% of the time).

Another method, while maintaining a slightly lower prediction rate, is shown to be accurate 72-82% of the time, depending on the number of known preferences for clustered users. Statistical tests show that these results are significant.

n Acknowledgements

I would like to express my sincere gratitude to my supervisors Dr. Scott Buffett and Dr. Michael W. Fleming. Without their invaluable supervision, patience and support throughout the thesis research, I would not have overcome the difficulty that I encountered. Especially, when I wrote my thesis, they gave me encouragement, lots of good ideas, and sound advice to help me achieve my goal. I also extend my gratitude to the members of my academic committee: Dr. Huajie Zhang, Dr. Kenneth Kent and Dr.

Richard Tervo. Thanks also go to the financial support from Faculty of Computer

Science at University of New Brunswick, and Natural Sciences and Engineering

Research Council (NSERC).

Last but not least, I would like to thank my parents Jiachun Qin and Xuemei

Kong and my husband Yonglin Ren, for their love and support.

Ill Table of Contents

Abstract ii

Acknowledgements iii

Table of Contents iv

List of Tables vii

List of Figures ix

Chapter 1 Introduction 1

Chapter 2 Background 4

2.1 Automated Negotiation 4

2.1.1 Introduction 4

2.1.2 Mechanisms 5

2.1.3 Application of Automated Negotiation 7

2.2 Theory 8

2.2.1 Preference Relations 8

2.2.2 Utility and Utility Functions 10

2.3 Preference Elicitation 13

2.4 Clustering 15

2.4.1 Introduction 15

2.4.2 Clustering Techniques 16

2.4.3 Applications of Clustering 18

2.5 Existing Methods for Inferring Preferences 19

2.5.1 Conditional Preference Networks 19

2.5.2 Conditional Outcome Preference Networks (COP-nets) 21

iv 2.5.3 Minimax Regret 22

Chapter 3 Clustering Partial Preference Relations 23

3.1 Motivation 23

3.2 Partial Preferences 24

3.2.1 Complete Preference Relations and Partial Preference Relations 24

3.2.2 Conditional Outcome Preference Networks 26

3.3 Distance Measurement 32

3.3.1 Probabilistic Distance 32

3.3.1.1 Probabilistic Distance on Complete Preference Relations 33

3.3.1.2 Probabilistic Distance on Partial Preference Relations 33

3.3.2 Distance Computation with COP-nets 35

3.4 Y-means Clustering Method for Partial Preference Relations 39

Chapter 4 Inferring Preferences 43

4.1 Direct Inference Based on Clustering 45

4.2 Pre-processing Inference Based on Clustering 47

4.3 Post-processing Inference Based on Clustering 49

4.4 A Simple Example 50

Chapter 5 Implementation 60

5.1 Input and Output 60

5.2 Algorithm for Direct Inference Based on Clustering 62

5.3 Algorithm for Pre-processing Inference Based on Clustering 65

5.4 Algorithm for Post-processing Inference Based on Clustering 66

Chapter 6 Experimentation 68

6.1 Experimental Goals 68

v 6.2 Experimentation Methods 69

6.2.1 Experimentation Method One 69

6.2.1.1 Experimental Data 69

6.2.1.2 Experimental Design 71

6.2.2 Experimentation Method Two 77

6.2.2.1 Experimental Data 77

6.3 Analysis of Results 80

6.3.1 Analysis of Experiment Method One 80

6.3.2 Analysis of Experiment Method Two 84

Chapter 7 Conclusions and Future Work 89

7.1 Conclusions 89

7.2 Future Work 91

Bibliography 93

Curriculum Vitae

VI List of Tables

Table 3.1 Node representation for Figure 3.4 29

Table 4.1 Three methods for inferring preferences 44

Table 4.2 Users and their preferences 51

Table 4.3 A new user and his preferences 52

Table 4.4 The vectors for each user 54

Table 4.5 The vector for the new user 54

Table 4.6 Clusters formed by the Y-means clustering method 55

Table 4.7 The centers for clusters 55

Table 4.8 Clusters formed by a specified criterion 56

Table 4.9 Sub-clusters from cluster 1 56

Table 4.10 Sub-clusters from cluster 2 56

Table 4.11 Sub-clusters from cluster 3 57

Table 4.12 The centers for each sub-clusters 57

Table 4.13 Sub-clusters formed by a specified criterion 58

Table 4.14 Small clusters from Sub-cluster 1 58

Table 4.15 Small clusters from Sub-cluster 2 58

Table 4.16 Small clusters from Sub-cluster 3 59

Table 4.17 The centers for clusters 59

Table 6.1 Movies and Their Ratings 75

Table 6.2 Possible ranks of the movies 76

Table 6.3 Performance of the Direct Inference Based on Clustering method 80

vii Table 6.4 Performance of the Pre-processing Inference Based on

Clustering method 82

Table 6.5 Performance of the Post-processing Inference Based on

Clustering method 83

Table 6.6 Performance of the Direct Inference Based on Clustering method 85

Table 6.7 Performance of the Pre-processing Inference Based on

Clustering method 85

Table 6.8 Performance of the Post-processing Inference Based on

Clustering method 86

vin List of Figures

Figure 2.1 An example of a CP-network 21

Figure 3.1 An example of complete preference relations 24

Figure 3.2 An example of partial preference relations 25

Figure 3.3 An example of redundant edges 27

Figure 3.4 An example of COP-nets 30

Figure 3.5 A COP-net for computing utilities 32

Figure 3.6 Two partial preference relations 35

Figure 3.7 Flow chart for calculating distances 36

Figure 3.8 The COP-nets for P, and P2 38

Figure 4.1 Flow chart of the Direct Inference Based on Clustering 46

Figure 4.2 Flow Chart of the Pre-processing Inference Based on

Clustering method 48

Figure 4.3 Flow Chart of the Post-processing Inference Based on

Clustering method 51

Figure 4.4 The COP-net for w, 53

Figure 6.1 Performance of three techniques 84

Figure 6.2 Performance of three techniques 87

IX Chapter 1 Introduction

With the rapid development of networks and the Internet, a new mode of business has emerged: electronic commerce (EC). Although there is no universally agreed-upon definition, EC is commonly referred to as the field encompassing real-time business transactions processed over the Internet, using any of the applications that depend on the Internet [17]. Computer technologies, network techniques and telecommunications have been essential in the development of EC. An important component of EC is the ability to determine fair exchanges between trading partners, such as those that take place when a user purchases an item from a business via a website. Negotiation can be a useful vehicle for settling these agreements.

However, there are problems associated with the idea of direct negotiation between a user and a website. For example, users would have to spend a great deal of time negotiating with different websites. Fortunately, advances in agent-oriented theory and multi-agent systems have allowed more and more tasks to be handled by intelligent software agents. Such agents can be programmed to communicate with websites on behalf of users. Since the agents are automated, the negotiation process is automated as well.

In order to perform automated negotiation on behalf of a user, an agent needs to know a sufficient number of the user's preferences. Even though preference elicitation techniques can help agents elicit some preferences, it is difficult to learn all preferences because of the typically unreasonably large number of outcomes. Agents thus must infer or predict the unknown preferences without overly bothering the users.

1 Buffett and Spencer proposed an approach for inferring the preferences of an opponent during a negotiation [5]. First, possible preference relations, perhaps obtained from a database of previous users, are clustered. Second, the correct class to which the agent belongs is determined. Finally, some assumptions can be made with regard to the opponent's preferences, according to other preference relations in this class. Finding an effective initial set of classes is critical in this approach, and there are a few problems with the clustering mechanism in this work. In particular, the K-means clustering method is used to partition the initial preference relations. This method is inefficient since the distance between every pair of preference relations needs to be computed. In addition, the model does not capture the case where preferences are incompletely specified. Furthermore, this work only considers the preferences of the opponent; the approach can be applied to predict the preferences of the user being represented by the agent as well.

The focus of this thesis is to predict a user's preferences based on a very small number of known preferences. Three techniques are developed and experiments are conducted on data from real users. The results of these experiments demonstrate that the idea of clustering partial preference relations and predicting new preferences works well, and the results are shown to be significant. This motivates future research in the preference elicitation area.

This thesis includes seven chapters, and is structured as follows. In Chapter 2, necessary detailed background knowledge on automated negotiation, preference elicitation and clustering is presented. Also, one of the existing methods for inferring preferences, referred to as Conditional Preference Networks, is included. Chapter 3 presents information with regard to clustering partial preference relations. It describes a

2 graphical preference model referred to as a Conditional Outcome Preference Network

(COP-net), which is used to represent partial preference relations. A method for computing the distance between COP-nets is demonstrated, and the Y-means clustering method is discussed. Chapter 4 presents three techniques used for clustering COP-nets and predicting users' preferences. Chapter 5 describes the implementation of the three techniques. In Chapter 6, the principles of the experiments are presented in detail, and experimental results are analyzed. Finally, Chapter 7 concludes the thesis, and offers some thoughts on future work.

3 Chapter 2 Background

Agent-oriented theory and intelligent systems are considered in many problems in computer science. Agents can serve their users well if they have sufficient information regarding the users' preferences and goals, as well as some knowledge about other agents with which they will interact. How to elicit useful information plays a critical role in agent-oriented systems. This chapter provides a description of some background information on utility theory, preference elicitation, and other research areas relevant to this thesis.

2.1 Automated Negotiation

2.1.1 Introduction

The term "agent" has no universally accepted definition, but in most conditions, it can be defined as "an encapsulated computer system that is situated in some environment and that is capable of flexible, autonomous action in that environment in order to meet its design objectives" [39]. Either to achieve their individual goals or to manage the dependencies among them, agents often need to interact with each other. In order to work together, they must come to mutually acceptable agreements by negotiating. In other words, negotiation is a useful way to resolve conflicts. The most interesting thing is that each agent is automated, and thus the negotiation process must also be automated.

4 Automated negotiation is used extensively in electronic commerce [27, 36].

However, there are two major challenges that are faced in automated negotiation. The first challenge is to build strategies to provide efficient searching and quick matching.

The second challenge is to come up with an autonomous process that can simulate and learn humans' behaviour to perform commercial activities. For example, an agent can help users to trade items online on behalf of these users.

2.1.2 Mechanisms

Three basic components should be considered when designing an automated negotiation system [28]:

• Negotiation protocol: This is the method that is used to control the interaction process among negotiation parties. The protocol specifies the conditions under which interaction can take place, which deals can be made, and what sequence of offers can be allowed.

• Negotiation objects: These are the outcomes that should be achieved after negotiation. Outcomes may include either a single issue, such as the price, or a set of issues, such as the price and the color. According to the agreement structure and negotiation protocols, which kind of operations can be performed on an agreement should be determined. For example, a take-it-or-leave-it offer means that parties in negotiation either can accept an offer or just refuse it.

• Agent's decision making models: These are the strategies that specify the actions

(e.g. offers or responses) the agents plan to make during the negotiation process in order to achieve their goals. A particular protocol can be compatible with many different

5 strategies, and then generate different results. The decision making models greatly influence the reward from an agreement that is ultimately made by participants. In other words, a better decision making model can result in a better reward.

Negotiation mechanisms are composed of negotiation protocols, negotiation objects, and decision making models. There are some properties that are generally considered desirable for negotiation mechanisms [31]:

• Computational efficiency: negotiation mechanisms should be as computationally efficient as possible.

• Communication efficiency: users always prefer a mechanism that deals with communication among negotiation parties efficiently.

• Individual rationality: a mechanism should provide an agent incentive to participate in negotiation.

• Distribution of computation: a mechanism should distribute the computation over the agents as a whole.

• Pareto efficiency: Pareto efficiency is an important notion in . Consider a set of alternative allocations. An allocation is Pareto efficient if there is no other outcome that makes one party better off and no parties worse off.

• Symmetry: in general, no parties should be able to completely control the process in automated negotiation.

The relative importance of the three components of mechanisms varies depending on different negotiation and environmental contexts, so there is no global best approach or technique for automated negotiation. In the process of automated

6 negotiation, if a user's preferences are obtained by an agent in advance, this helps to determine appropriate strategies to use on behalf of the user. At the same time, the user can also obtain better service from the agent, and the efficiency of negotiation can be improved greatly.

2.1.3 Application of Automated Negotiation

Negotiation has been applied in various domains, such as psychology, law, business administration, etc. [14]. If there is a conflict, negotiation can be employed. For example, each country has its own laws. If there is a conflict between two countries' laws, it can be resolved by negotiation. Recently, the rapid adoption of the Internet as a commercial medium has brought a significant change to traditional ways of doing business. In order to stay competitive and make shopping convenient, many companies provide on-line shopping. Price usually is the central argument in on-line shopping, but there are other product attributes, such as color, style, etc. Users can make acceptable purchases by negotiating with sales parties over the terms of the above attributes. In addition to finding acceptable exchanges of retail goods, privacy and protection of users' personal information are a growing concern [30]. Websites usually ask users for some personal information, such as their name and home address, in order to: process pay out and delivery, or update the website to attract new customers and treat them effectively, among other reasons. People fear that this information will be shared, used inappropriately, etc., and thus users are reluctant to give away their information freely.

To entice users to release private information, these electronic commerce websites can

7 offer some rewards, such as a discount, free downloading, large email boxes, etc. To determine fair exchanges, negotiation can be performed between users and websites.

2.2 Utility Theory

Utility theory has been used in automated negotiation more and more extensively

[2]. It can be used to determine choices that the decision maker will make or should make, among all possible alternatives. Using preference relations and utility functions over alternatives, the utility for a course of action can be calculated. This section provides some background research on preference relations and utility functions.

2.2.1 Preference Relations

A preference relation is a ranking system that can help users to make the best decision over a set of outcomes, and is the foundation of all choice theory in economics

[15]. Given a pair of outcomes, a preference relation indicates which of the two, if any, is preferred. Given a set X of alternatives, with x, y in X, a preference relation can specify:

1. a strict preference relation, denoted by x >- y, which means that x is preferred over y>

2. an indifference relation, denoted by x ~ y, which means that x and y are equally preferred;

3. a weak preference relation, denoted by x > y, which means that x is preferred over j; or they are equally preferred.

8 In summary, the preference relation is a binary relation on all alternatives, or outcomes. Preference relations have the following properties:

• Connected: for all x, y in X, either x >~y, or y >- x, or x ~ y;

• Transitive: for all x, y, z in X, if x >_y, y >~z, then x yz\ if x X j> , y >_ z, then x y

• Asymmetric for ">~": for all x, y in X, with x±y, ifx>~y, then y)hx; symmetric for "~": for all x, y in X, if x ~ y, then y ~ x.

There are two typical classes of outcomes: single-attribute and multi-attribute cases. In the single-attribute case, it is often easy to specify the preferences over attribute values, particularly when the attribute is price or cost. In this case, the buyer prefers less money over more, and the seller prefers the opposite. However, in the multi-attribute case, it is hard to know the preference relations among all outcomes. Assume we have a set A = {A1,A2, . . ., An} of attributes, and each attribute At has a number of possible

values. Then, the outcome space is the Cartesian product 4 xA2x...xAn. If there are n attributes and each attribute has 2 values, then there are 2" possible outcomes. One can see that it is difficult to make decisions in this situation, since the large number of outcomes makes it difficult to learn all preferences. In order to learn preference relations, many preference elicitation technologies are being developed [7]. This is a hot research area in automated negotiation.

9 2.2.2 Utility and Utility Functions

In economics, utility is a measurement of the happiness or satisfaction of a consumer toward a good or service. Let Xbe the outcome set. A utility function u: X—•

R is a function that assigns a real number (typically between 0 and 1) to each possible outcome, to indicate how satisfactory that outcome is. Higher utility indicates a higher degree of satisfaction. Thus a preference relation can be induced by a utility function.

Given two outcomes x andy, and the user's utility function u, x is preferred overy (x>-y) if and only if the utility of x is greater than the utility oiy (u(x) > u(y)). Thus, the utility function is crucial in rational decision making. The basic idea of utility theory presented by von Neumann and Morgenstern in 1947 [33] is that a rational decision maker always chooses the alternative for which the expected value of the utility (expected utility) is maximum. In other words, a strategy that maximizes the expected utility is the correct sequence of decisions. So, by considering utilities of outcomes, a proper strategy can be decided.

Modern utility theory dates back to Daniel Bernoulli's essay [1]. He proposed expected utility theory and hypothesized that the utility of an outcome (e.g., a prize in a lottery) was not directly proportional to the monetary value (measured in dollars) of the prize. "Expected Utility Theory (EUT) states that the decision maker (DM) chooses between risky or uncertain prospects by comparing their expected utility values, i.e., the weighted sums obtained by adding the utility values of consequences multiplied by their respective probabilities" [19]. Expected utilities of choices present the decision maker's preference order among these outcomes. For example, consider a choice C that will

10 result in one of two mutually exclusive outcomes 0, and 02. If the probabilities of Oi

and 02 are P1 and P2 respectively, the expected utility of this choice is calculated as:

EU(C) = P, x u(0,) + P2 x w(02) (2.1)

The concept of utility applies to both single-attribute and multi-attribute cases.

The single-attribute utility is defined as a function that maps consequences which only have one attribute to real numbers. Assume a single-attribute utility function is u, and the only attribute A has n values (e.g. ax, a2,..., an). If ai >_ aj, then w(a,) > u(dj).

The multi-attribute utility is defined as a function that maps consequences that have two or more attributes to real numbers. Assume a multi-attribute function is u, that there are attributesv4,,v42,...,An, and that each attribute has one or more values (a, and am are

values for attribute A, bj and bn are values for attribute B, etc., where i,j, n,m> 1). If

(at,bj, ...) > {am,bn, ...), then u(a„bj, ...) > u(am,bn, ...).

In some cases, these multi-attribute utility functions can be computed as a function of the individual attribute value utilities. Two classes of such famous multi- attribute utility functions were proposed by Keeney and Raiffa [29] in 1976: additive utility functions and multi-linear utility functions. The utility function is additive when the attributes are considered to be additive utility independent, while the utility function is multi-linear when the attributes are mutually utility independent. Brief definitions of these functions are given here. Refer to Keeney and Raiffa's research for more detailed discussion.

11 Definition 2.1 A simple lottery is defined as the set {{xl,pl), (x2,p2), ..

(xn,pn)}, such that ^J"=/?,•=!• In a simple lottery, the outcome *,. occurs with

probability pt.

Definition 2.2 Attributes Y and Z are additive independent if the paired preference comparison of any two lotteries, defined by two joint probability distributions on Y x Z, depends only on their marginal probability distributions.

Let u be a two-attribute utility function which has attributes A and B {a is a value

of A and b is a value of B), and assume one-attribute utility functions uA and uB are known. The form of additive utility functions is as follows:

u(a,b) = k1xuA(a) + k2xuB(b) (2.2)

where kx and k2 are scaling constants which sum to 1.

The fundamental concept of multi-attribute utility theory is: utility independency.

Definition 2.3 Attribute Y is utility independent of attribute Z when conditional preferences for lotteries on 7 given Z do not depend on the particular value of Z. If Fis utility independent of Z and Z is utility independent of Y, then Y and Z are said to be mutually utility independent.

Referring to the two-attribute utility function u above, if A and B are mutually utility independent, the form of multi-linear utility functions is as follows:

u(a,b) =klxuA(a) + k2 xuB(b) + k3 xuA(a)xuB(b) (2.3)

where kY, k2, and &3 are scaling constants which also sum to 1. In most practical applications, additive utility functions are used extensively.

12 2.3 Preference Elicitation

In order to build effective negotiation strategies in automated negotiation or to help users make the best decision in decision making systems, it is important that the agent is able to have enough knowledge about the user's preferences. Preference elicitation has become more and more important in recent years because of the explosion of on-line information, as well as the large number of on-line consumers. In simple words, preference elicitation is used to retrieve preference information from users and to construct a good model of the user's preferences. If there is no relevant information at the beginning of interaction, the agent should elicit as much information about preferences as possible and model preferences accurately so that it can help users achieve their goals. However, preferences are difficult to determine, because 1) they are usually incomplete initially, 2) they will change in different contexts, 3) there is typically a large number of outcomes, and 4) users themselves may not be sure of all of their preferences. Thus, techniques that can efficiently and effectively extract and infer users' preferences are needed.

The most straightforward way to elicit the user's preferences is by asking questions. Consider a fictional user who would like to buy a car from an auto dealer.

Obviously, there are too many choices on the Internet. If the agent knows some of this user's preferences, it can save time in finding a car that is suitable for the user. The agent asks the user a question and determines that the user most prefers the brand "Honda".

Thus it can search for cars from "Honda" auto dealers. The information about other brands is filtered out. However, in the real world, each user has his/her own preferences, so the number of questions that should be asked to obtain different users' preferences is

13 probably unreasonably large. Designing an efficient preference elicitation technique is very important. There are several preference elicitation techniques: the Certainty

Equivalent Method, the Probability Equivalent Method, the Lottery Equivalent Method, the Value Function Elicitation, and the Paired Comparison Method [33]. Using these methods, an agent can elicit the user's preferences correctly, but the methods themselves are complicated and increase the complexity of the solutions. Thus, these methods are impractical when they are applied in real life. Pu et al. [21] presents a practical method of eliciting user preferences, and it consists of four steps. At first, the user describes his high value preferences; then, according to these preferences, computers estimate and search for possible solutions; thirdly, the user critiques the solution alternatives and modifies his own preference model; fourthly, based on the user's new preference model, computers estimate and search for possible solutions. The second step, third step, and the last step are iterated again and again, until the user finds a satisfactory preference model.

Utility elicitation is a special case of preference elicitation. These two terms are used interchangeably in many decision making systems, even though they are not the same. Theoretically, the preference elicitation problem is the same as that faced in decision and multi-attribute utility theory [29]. Unfortunately, eliciting utility information from users is difficult, because each user has a totally different utility function, and the possible outcome probabilities and utilities are extremely hard to determine. The standard gamble approach to utility elicitation and the representation of uncertainty over utilities by a probability distribution [6] can work well here. The uncertainty of the user's utility can be reduced by asking gamble questions about the

14 user's preferences. For example, let there be three outcomes o15 o, o2, and let the preference relation of a user be o1 >- o>- o2 (Oj is preferred over o which is preferred over o2). Suppose that the utilities w(o,) and w(o2)are known, and that w(o)is unknown or possibly a random variable that is retrieved from a known distribution. A question of the following pattern should be asked: "Which will she/ he choose: receiving o for sure or a gamble in which o, will be received with probability s, and o2 will be received with probability (1-s)?" If the user chooses o, we know thatu(o)>u(o1)s+ u(o2)(l-s); otherwise, u(o)

2.4 Clustering

2.4.1 Introduction

The process of grouping a set of physical or abstract objects whose members are similar in some way into classes of similar objects is called clustering [26]. This is an

15 important unsupervised learning problem in machine learning. The goal of clustering is to determine intrinsic grouping in a set of unlabeled data. The idea is that each item in a cluster should be more similar to all other items in the same cluster than to all other items in different clusters, according to some measure of similarity.

2.4.2 Clustering Techniques

As a branch of statistics, clustering methods concentrate mainly on distance- based methods. There are many sophisticated clustering methods in the literature. In general, they are divided into two major groups [10]: hierarchical and partitional clustering. Within each of the types there exists a wealth of subtypes and different algorithms, so sometimes they are divided into five groups [26] in detail: partitioning methods (e.g. k-means method and k-medoids clustering), hierarchical methods (e.g.

Balanced Iterative Reducing and clustering Using Hierarchies), density-based methods

(e.g. Density-Based Spatial Clustering of Applications with Noise), grid-based methods

(e.g. Statistical Information Grid), and model-based methods (e.g. Statistical Approach).

Hierarchical clustering methods are usually good when they are used to merge smaller groups into bigger ones, or split bigger groups into smaller ones. However, these kinds of methods cannot be utilized to merge and split groups simultaneously. In other words, at any stage of the procedure, a hierarchical clustering technique performs either a merger of clusters or a division of a cluster from the previous stage. It will conceptually give rise to a tree-like structure of the clustering process. It is understood that the clusters of items formed at any stage are non-overlapping or mutually exclusive.

Hierarchical clustering techniques proceed by either a series of successive mergers or a

16 series of successive divisions [26]. Hierarchical clustering is represented by a two dimensional diagram known as a dendrogram, which shows how the clusters are related.

The basic idea of hierarchical clustering is to group all single data points into a whole cluster or to partition a whole group into single data points, and a clustering of the data points into disjoint groups is obtained by cutting the dendrogram at a desired level.

Among hierarchical clustering methods, distances between clusters are measured in three ways [32]: the shortest distance from any member of one cluster to any member of the other cluster (single-linkage clustering), the largest distance from any member of one cluster to any member of the other cluster (complete-linkage clustering), and the average distance from any member of one cluster to any member of the other cluster (average- linkage clustering).

Partitional clustering directly partitions the whole data set into small, disjoint groups. The most commonly used method in partitional clustering is the K-means algorithm. The process of clustering by the K-means algorithm follows four steps as follows:

1. randomly choose k data points from the training data set as centroids for k clusters;

2. calculate the distances between each data point and each centroid, and then assign the data point to its closest centroid ;

3. re-calculate the centroids based on current clusters, as the centroids are updated by finding the averages of all data points in current clusters;

4. iterate step 2 and 3 until the centers do not change.

The K-means algorithm is easy to understand and implement, but it has two shortfalls: first, the number k of clusters should be specified in advance, and the

17 clustering results depend strongly on the value of k; second, there may exist empty clusters, which are meaningless for classification. The Y-means algorithm [23] is an improvement over the K-means algorithm, and overcomes the two shortfalls of the K- means algorithm. The choice of clustering algorithms closely relies on the type of data available and the particular purpose and application of clustering. In this thesis, the Y- means algorithm is employed to cluster partial preference relations, and is discussed in more detail in Chapter 3.4.

2.4.3 Applications of Clustering

Clustering is used extensively in many research areas [11]. For example, in business, clustering helps marketers find different groups in their user bases, and provide correct services to individual user groups based on their purchasing patterns. In order to infer information about a user's preferences, the preference relations obtained from previous users during prior negotiation periods can be grouped into several clusters.

When a new user is encountered, his/her preference relations can then be classified into one of these clusters. According to the similar features in this cluster, some conclusions regarding the unknown preferences of the user can be drawn. So clustering is an important augmentation to preference elicitation. Clustering also can be used by search engines on the Internet. When a user uses a search engine such as "google", the expected results are found, usually together with other relevant information. For example, if a user searches for a definition for standard deviation from the Internet, he/she can obtain this definition, and in the meantime, other results that are related to the definition are found as well, such as the computation method for standard deviation. In addition to the above

18 two kinds of applications, clustering is used in other domains, such as Biology,

Insurance, etc. [39]. All in all, because clustering can improve relative efficiency, it has many practical applications.

2.5 Existing Methods for Inferring Preferences

2.5.1 Conditional Preference Networks (CP-networks)

Users generally do not have the same preferences over the same outcomes in decision systems, so no fixed approach can elicit the user's preferences other than directly interacting with the user and retrieving useful information. One solution, described above, is to elicit the user's utility for each outcome. Another useful way to help decision makers to make decisions in the negotiation process is to ask the user to compare two alternatives and assess which is more preferred. Asking all possible comparisons may be infeasible in practice, since the outcome space is unreasonable huge. Boutilier et al. [3] proposed a structure for modeling preferences learned by posing such comparison elicitation queries. This structure can be used to determine which of two outcomes is preferred, if sufficient information to support the conclusion exists. The structure can also handle conditional preferences. A preference is said to be conditional when the relationship between two outcomes is dependent on the presence (or absence) of a particular event. A conditional preference takes the form c: ay b, which indicates that, when c is present, a is preferred over b.

Boutilier et al. [15] defined a directed graph that represents the user's preferences, referred to as a Conditional Preference Network (CP-network). There is a CP-network

19 over n attributes, where each node represents an attribute. For each attribute, the user identifies a set of parent attributes that can influence his preferences based on different values of the current attribute. Conditional preferences can then be elicited with these dependencies in mind. According to this information, an annotated graph that represents the user's preferences is generated. Consider the following example with five attributes,

A, B, C, D, and E, where each attribute has two values (a and a are values for A, b and b are for B, c and c are for C, d and d are for D, and e and e are for E). A and B have no parents and they both are parents of C, which is the parent of D and E. Assume the following conditional preferences:

1. For attribute A, a>a\

2. For attribute B, b >b;

3. For attribute C, (a A b) v (a A b): c >- c and (a A b) v (a A b): c >- c;

4. For attribute D, c:d> d and c:d>d;

5. For attribute E, c : e >- e and c:e> e.

The corresponding CP-network is shown in Figure 2.1.

Boutilier et al. [15] also proposed a dominance checking algorithm that can analyze the CP-networks to determine whether one outcome is preferred over another.

For example, in Figure 2.1, the outcome abode is preferred over the outcome abode

(abode >- abode). This can be shown by a following sequence of "flips":

abode > abode (since a>- a, a is flipped to a)

abode >- abode (since b>b, b is flipped to b)

abode >• abode (since a /\b, c is flipped to c)

20 abode >- abode (since c is present, d is flipped to d)

abode >- abode (since c is present, e is flipped to e)

a> a ( K) (Bjbyb

f~~\ (aAb)v(a/\b):c>c >^_^\ (a A Z>) v (a A b) : c >- c

c:d> d c\d>~ d

Figure 2.1 An example of a CP-network

2.5.2 Conditional Outcome Preference Networks (COP-nets)

Chen et al. [9] defined a directed graph that represents the preferences of a set of outcomes, and is referred to as a conditional outcome preference network (COP-net).

The main goals of using these networks are to judge relations among outcomes, and predict utilities for all possible outcomes when some outcomes' utilities and an incomplete preference relation over a set of outcomes are given. Creating a complete

COP-net consists of three steps: first, create an initial COP-net; then check whether there are any cycles in the initial COP-net and consult the user to correct any such inconsistencies; finally, the last step is to remove redundant edges from the initial COP- net. From COP-nets, preferences over pairs of outcomes can be easily ascertained. Based

21 on COP-nets, utilities for entire outcomes can be induced by three COPN utility functions: the Bounded COPN utility function, the Random-path COPN utility function, and the Longest-path COPN utility function. Detailed information about COP-nets and these COPN utility functions will be presented in Chapter 3.

2.5.3 Minimax Regret

The minimax theorem is a method used in in order to minimize the maximum possible loss [38]. Conversely, it can be understood as a method used to maximize the minimum gain as well. Minimax regret is a criterion that is used to protect the decision-maker from the worst possible risk, and is widely accepted in the decision making research area. It was first introduced by Savage in 1951 and then was axiomatized as a decision principle.

A key problem about preference elicitation is that incomplete information usually exists in the procedure of decision making. The maximum expected utility criterion becomes less useful when probabilities over outcomes are subjective or unknown. In this case, minimax regret can produce more favorable results. In preference elicitation, the minimax regret criterion dictates that a preference relation or utility function be chosen such that maximum possible regret of the choice is minimized.

22 Chapter 3 Clustering Partial Preference Relations

3.1 Motivation

Classification is a form of supervised learning in machine learning. It is the process of partitioning a group of existing objects into different classes, in order to learn useful structures, models, or information, and can be used to predict unknown structures, models, and information [11]. Different from classification, clustering is a form of unsupervised learning in Machine Learning, where generally the manner in which the data should be grouped together is not demonstrated by examples. In other words, clustering does not have the knowledge of the classes to which objects can belong. The goal of clustering is to identify intrinsic grouping in a large number of data. According to some criteria that are particularly chosen, data points that are similar can be grouped in a number of ways.

In order to make decisions on a user's behalf in the process of automated negotiation, an agent must obtain as much information as possible about the user's preferences over possible outcomes, but without asking the user an unreasonable number of questions. The user also can obtain better service from the agent, and the efficiency of negotiation can be improved greatly. It is possible to use clustering in the elicitation of users' preferences. Consider an agent with a database containing a number of other users' preference relations over some possible outcomes. Based on the current information in the database, existing users whose partially learned preference relations are similar are grouped into one cluster, and new preferences for a particular user can be inferred by observing those of other users in the same cluster. The following sections in

23 this chapter provide information about partial preferences, distance computation between two partial preference relations, and the Y-means clustering algorithm.

3.2 Partial Preferences

3.2.1 Complete Preference Relations and Partial Preference Relations

In this section, the concepts of complete preference relations and partial preference relations are introduced, where complete preference relations are those where all preferences are already known, and partial preference relations are those where only some of preferences are known.

Complete preference relations can be understood by referring to the example below: there are apples (denoted by A), bananas (denoted by B), pears (denoted by P), and grapes (denoted by G). A person has the following preference relation with respect to the above fruits:

A ©

©

Figure 3.1 An example of complete preference relations

24 The relation in Figure 3.1 represents a complete preference relation, because the relation between any pair of fruits is known. Complete relations over outcomes are helpful in making proper decisions in automated negotiation. Unfortunately, in the real world, one is more likely to only have a partial preference relation. Assume that a person has specified preferences among the above four fruits as shown in Figure 3.2.

% 0

Figure 3.2 An example of partial preference relations

It is said that this relation is a partial preference relation, because there is no relation between B and G and between G and P. In other words, there is no indication whether the person prefers B over G, or prefers G over B, or is indifferent between them.

If agents know users' complete preferences, they can filter a large amount of information and accurately pursue what each particular user wants. As a result, users save much time and obtain the best help from agents. In summary, the key goal of this thesis is to elicit unknown relations between two random outcomes in automated negotiation.

25 3.2.2 Conditional Outcome Preference Networks

A complete preference relation over a set of outcomes can be constructed based on a complete set of preferences. When some preferences are unknown, these unknown preferences can be inferred or estimated by examining the known preferences.

Conditional Outcome Preference Networks (COP-nets) [9] are used to infer new preferences when possible and to estimate a user's complete utility function, when some utility values and a partial set of preferences over a set of outcomes are available. A

COP-net creates nodes for all possible outcomes, where an outcome is an assignment of values to each attribute. The steps for building a COP-net are as follows:

1. consider all known preferences;

2. list all feasible outcomes;

3. apply each known preference to each outcome to get a set of pairs of outcomes, where the preference is known for each pair;

4. build a directed graph by creating a node for every outcome and by placing directed edges if two outcomes have the directed relation.

An initial COP-net is generated after the above four steps. In initial COP-nets, there can exist some cycles, which would indicate that some outcomes are preferred over themselves. This situation is not allowed, because an outcome cannot be preferred over itself in decision making. Therefore, cycles should be removed from the initial COP-net by asking the user to correct the inconsistencies. Redundant edges are also removed by performing a transitive reduction on the graph. Thus, if there are two paths that start from the same node and also end at the same node, edges on the path that reflects the

26 preferences that are wholly included by the other path are removed. For example, in

Figure 3.3, Path 1 is the redundant edge, and should be removed.

Figure 3.3 An example for redundant edge

It is obvious and easy to judge the relations among the outcomes by observing a reduced COP-net. Consider nodes ni and rij in a COP-net, with corresponding outcomes o, and Oj. If ni is an ancestor of n}, then the relation between them is o, >- Oj; if ni is a descendent of «., then oi < o}; if ni is neither an ancestor nor a descendent of rij, then the relation between o, and Oj is unknown.

With COP-nets, a user can express preferences in three ways: 1) preferences over values for the same attribute, 2) preferences over values for different attributes, and 3) conditional preferences. For example, suppose a user wants to buy a computer. He/she can specify that a laptop is preferred over a desktop (preferences within an attribute); that a good video card is preferred over a good CPU (preferences across attributes); that given an IBM laptop, a good CPU is preferred over a good video card (conditional preferences).

27 Ceteris paribus is a Latin phrase which means "all other things being equal" [13], and usually is applied to isolating descriptions of events from other potential environmental variables. The ceteris paribus assumption is used in this thesis to simplify a set of known preferences for building a COP-net. Consider Example 3.1.

Example 3.1 Suppose that there is a set of attributes {A, B, C], where each attribute has binary values (a and a are values for attribute A, b and b for B, c and c for Q, with the following preferences:

a> a (3.1)

a y b (3.2)

Consider preference (3.1). The "all else equal" principle applied here implies that abc > abc, abc >- abc, abc >- abc, and abc >- abc. In the meantime, preferences across attributes are allowed as well. Preference (3.2) means that abyab, and implies that

abc y abc and abc >- abc. For brevity, "bar" values are often omitted. Thus, abc is represented simply as a, for example.

The following example shows how a COP-net is constructed based on some known preferences, while the next demonstrates how to calculate utilities for each outcome.

Example 3.2 Suppose that there is a set of attributes {A, B, C}, where each attribute has binary values (a and a are values for attribute A, b and b for B, c and c for C), and that there are the following non-conditional preferences:

a> a

b>b

c>~ c

28 a> c

The COP-net for Example 3.1 is shown in Figure 3.4, where Table 3.1 denotes which outcome is represented by each node.

n Node n0 "i n2 n3 n4 »5 e n7 Outcome t a b ab c ac be abc

Table 3.1 Node representation for Figure 3.4

According to the principle that ancestors are preferred over descendents, it is easy to determine a number of preferences that are not directly evident from the specified set. For example, by observing that «, is an ancestor of n6, it can be concluded that a>- be .

If a COP-net is built, and if the user's utilities for some of the outcomes are known, the utilities of the remaining outcomes in the COP-net can be estimated by three methods: the Bounded method, the Random-Path method, and the Longest-Path method

[9]. The basic idea of the Bounded method is to set lower and upper bounds for each outcome whose utility is unknown, and the average of these bounds is assigned as the utility value for this outcome. The principle of the Random-Path method is to choose a path of outcomes in the COP-net randomly where utilities are unknown, and to assign utilities to these outcomes while preserving the preference ordering. The selection criteria are as follows. Letp = (o0,ol, ..., on) be a path in a COP-net. A path/? is chosen to compute unknown outcomes' utilities by the Random-Path method

29 Figure 3.4 An example of COP-nets

if (1) utilities for o0 (denoted by u(o0)) and on (denoted by u(on)) are known or computed already, and are not known for all other outcomes on p; (2) for all paths

satisfying the above requirement, u(o0) is minimal; and (3) for all paths satisfying the

above two requirements, u(on) is maximal. Requirements (2) and (3) ensure consistency in utility assignment. If more than one path satisfies these criteria, one is chosen randomly. Utilities for outcomes are computed by the following formula [9]:

30 u(oi) = u(on) + (3.3) n-\

The theory of the Longest-Path method works similarly to the Random-Path method, with the difference that the Longest-Path method always selects the longest path which meets the above (1), (2), and (3) requirements at the same time (for more information, refer to Chen's thesis [8]).

Example 3.3 Consider the COP-net shown in Figure 3.5, where each node represents an outcome. Let w(Oj) = 0.82 and u(o6) = 0.1. Utilities for outcomes o2 to o5 are computed by the three techniques as follows.

Bounded: the upper bound for o2 to o5 is 0.82, and the lower bound is 0.1. Therefore, A 0 R? — 0 1 u{o,) = = 0.46 for all i =2 to 5. 2

Random-Path: there are two possible paths from ox to o6: /?, = (ol,o2,o3,o5,o6)and

p2 = (o,,o4,o5,o6). If px is chosen first, then w(o2) = 0.64, w(o3) = 0.46, and

u(o5) = 0.28. The path p3(ox,o4,o5) is chosen next, and correspondingly u(o4) = 0.55.

If p2 is chosen first, then w(o4) = 0.58 and u(o5) = 0.34. The path p4(ol,o2,oi,o5) is

selected next, and u(o2) = 0.66 and u(o3) = 0.5 .

Longest-Path: px = (ol,o2,oJ,o5,o6) is chosen first, since px is longer than p2. So the utilities for outcome o2 to o5 are computed as in the first possibility shown above for the Random-Path method.

31 Figure 3.5 A COP-net for computing utilities

Based on COP-nets, utilities of all possible outcomes can be computed, and a vector is formed by these utilities. If there are n partial preference relations, n vectors are formed. The distances between two partial preference relations would then be computed as the Euclidean distance between two vectors (this method will be discussed in the following subsection in detail).

3.3 Distance Measurement

3.3.1 Probabilistic Distance

An important component of clustering algorithms is determining the distance between two data points. Here, each data point is represented by a partial preference

32 relation. As mentioned in Chapter 2, Ha et al. [25] proposed a method to calculate the distance between two partially specified preference structures, named probabilistic distance. Probabilistic distance is defined on the broader class of weak preferences, and can be applied both on complete preference relations and partial preference relations.

3.3.1.1 Probabilistic Distance on Complete Preference Relations

The probabilistic distance between two complete preference relations is defined as the probability that the two relations will disagree on the preference for a randomly chosen pair of alternatives. The distance is computed by:

d =—-—xT (3.4) n(n -1) where n is the number of alternatives, and T is number of conflicts over all pairs of alternatives. For example, consider three kinds of fruits: apples (A), bananas (B), and pears (P). The preference relation of user w, is rx: A>- B >- P, and the preference relation of user u2 isr2: B> A>~ P . Based on this example, the number of alternatives

77 is 3, and the number of conflicts Tfor all alternatives is 1. Hence, the distance between

r. and r? is x 1 = —. 3x2 3

3.3.1.2 Probabilistic Distance on Partial Preference Relations

A weak order is a binary relation on a set of alternatives. A weak order extension of a partial preference relation is an ordering of its elements that keeps the existing

33 specified order among alternatives, while other alternatives which have no relation are randomly ordered. An intuitive way of calculating distances between two partial preference relations is to repeatedly generate a possible weak order extension of each partial preference relation, and compute the probability of preference conflict for the two extensions over the set of pairs of alternatives. The distance is then computed as the average of these probabilities. However, this process is impractical, because the number of weak extensions for one partial preference relation can be unreasonably large.

Therefore, a simulation technique, which includes an iterative method, is used to estimate the probabilistic distance between partial relations. It consists of the following three steps:

1. randomly generate two weak extensions that are consistent with the two partial preference structures respectively;

2. compute the probability of preference conflict in these two structures;

3. iterate step 1 and 2, and compute the average probability of conflict, until the standard error1 [16] of the average probability falls below some threshold. This average probability is then taken as the probabilistic distance between the two partial preference structures.

The following example shows how to calculate probabilistic distance between two partial preference relations.

Example 3.4 Consider the two partial preference relations in Figure 3.6 to represent two different users' preferences:

1 Standard error is the standard deviation of the difference between the measured or estimated values and the true values.

34 Figure 3.6 Two partial preference relations

In the first step, two weak order extensions, adcb and abed, are randomly generated. The probability of preference conflict can then be computed: for adcb, there are 6 combinations in pairs as ad, ac, ab, dc, db, and cb; for abed, there are also 6 combinations in pairs as ab, ac, ad, be, bd, cd; for any pair of them, they conflict three

3 times, so the probability of preference conflict for them is — = 0.5. Again two weak 6 order extensions are generated: aedb and abed, and the probability of preference conflict is computed as —, giving an average of 0.4167. The process of calculating the average is iterated until the standard error is sufficiently low. Ultimately, the distance between the above two preference relations is 0.4257.

3.3.2 Distance Computation with COP-nets

Since clustering in this thesis is performed dynamically, a fast technique is needed for calculating the distance between two partial preference relations. We employ

35 a technique from research related to preference networks (COPN) that produces a vector of utility estimates over the set of outcomes (as discussed in Section 3.2.2). These vectors will be used to denote the co-ordinates of the preference relations in n- dimensional space, allowing for the Euclidean distance between any two to be computed. Assume that there are two points P = (pl,p2, •••, P„) and Q = (

0*=,Eo>,-*i)2 (3.5)

where dPQ refers to the Euclidean distance between the points P and Q, and pi and qi are respectively the /'* co-ordinates in points P and Q.

This method is much faster than a brute-force technique, probabilistic distance, from the literature. A flow chart used to calculate distances is shown in Figure 3.7.

COPN Partial preference Utilities for Vector formed relation each outcome —• by utilities Euclidean distance COPN Partial preference Utilities for Vector formed / —• relation each outcome by utilities

Figure 3.7 Flow chart for calculating distances

Example 3.5 Suppose that there is a set of attributes {A, B, C}, and that each attribute has binary values (a and a are values for attribute A, b and b for B, c and c

36 for C), and that there are the following two partial preference relations, P^ with a set of

unconditional preferences and P2 with two conditional preferences:

Pi: a >- a

b > b

~c > c

a > b

P2: a y a

b >- b

c >- c

a > c

a: b > c

a: c >- b

The first step is to construct the COP-nets for Pt and P2 respectively, as shown in Figure 3.8.

The second step is to estimate the utility for each possible outcome. There are three attributes in this example, and each attribute has two values, so the number of possible outcomes is 23 = 8. Assume that the utility of the best outcome u(fi)= 1, and that the utility of the worst outcome u(abc) = 0. According to the Longest-path COPN

Utility Function [9], the utilities for P^ 's outcomes are ul(a) = 0.75, ux{b) = 0.5, u^ab) = 0.25 ,Wj(c) = 0.75, u(ac) = 0.5, and u^bc) = 0.25. The vector of Px, denoted by Vp , is formed by its outcomes' utilities as (1, 0.75, 0.5, 0.25, 0.75, 0.5, 0.25, 0).

37 Figure 3.8 The COP-nets for Pl and P2

The utilities for P2's outcomes are w2(a) = 0.83, u2(b) — 0.S3, u2(ab) = 0.33, u2(c) = 0.67, u2(ac) = 0.5, and u2(bc) = 0.17 . The vector of P2, denoted by Vp2, is formed by its outcomes' utilities as (1, 0.83, 0.83, 0.33, 0.67, 0.5, 0.17, 0). The third step is to compute the Euclidean distance between VP and Vp .

2 2 2 2 The distance Z)^ =A/(1-1) +(0.75-0.83) + + (0.25-0.17) +(0-0) =

0.37. So the distance between these two partial preference relations is 0.37.

38 3.4 Y-means Clustering Method for Partial Preference

Relations

Distances between two partial preference relations are calculated by the COP-net method. Once distances between all pairs of relations are computed, Y-means [14] clustering is performed. Y-means clustering is similar to the K-means algorithm, with a major difference. The one drawback of the K-means algorithm is that the number k of clusters must be specified in advance, and the performance of the K-means algorithm greatly depends on the choice of the value of k. However, it is hard to find an optimal value for k. If the distribution of the data set is unknown, the best k is especially hard to obtain. The Y-means clustering method overcomes the problem well. Even though one also needs to determine an initial value for the number of clusters, this value can be adjusted during the clustering. The general Y-means method is described as follows. A data point is said to be an outlier in a cluster if it is very different from most other data points in a same cluster. "Very different" here is generally measured by the Euclidean distance. Hence, the definition for an outlier used in this thesis is that it is the farthest data point from the centroid of the cluster and that the distance between this data point and the centroid is greater than a threshold.

The Y-means algorithm is composed of the following four steps:

1. Given an initial value for k and an initial set of k centroids, run the K-means algorithm, and obtain k clusters;

2. Determine whether there is an outlier in each cluster. If there is, this outlier is used as a new center for a new cluster. Re-run K-means algorithm, and obtain k+1 clusters;

39 3. Iterate step 2 until there are no more outliers;

4. Because of the splitting done in step 2, there may be too many small clusters.

The distances between the centers of each pair of clusters are computed, and compared with a threshold. If the distances are less than the threshold, then the two clusters are merged. The new center is the midpoint of the two old ones.

Partial Preference Relations are ^-dimensional vectors that are formed by utilities of all possible outcomes. To perform Y-means clustering on a set of these vectors, the initial value of k is randomly chosen from 2 to n (where n denotes the number of vectors), and k vectors are randomly chosen as the centroids. Distances between every partial relation and centroid are calculated by the COP-net method, and according to the distances, each relation is clustered with the closest centroid. The centroid in each cluster is then re-calculated as the average value of all vectors in this cluster. This process is iterated until the centroids do not change. Up to this point, the Y-means method works exactly as the K-means algorithm, leaving k clusters. Often, in each cluster, there will be one or more partial relations that are far away from the centroid.

The Y-means method checks to see whether there are outliers in each cluster.

Chebyshev's theorem says that for any data distribution, at least (1-1/m2) of the objects in any data set will be within m standard deviations of the mean, where m is any integer greater than one [22]. An empirical rule [37] indicates that for any normal distribution, about 99.99994% of the objects will lie within five standard deviations of the mean.

Therefore, the threshold r used to judge whether a partial relation is an outlier is:

r = 5xcr (3.6)

40 where a is the standard deviation of the mean of all distances from the centroid in each cluster. If a partial relation is the farthest relation in the cluster, and the distance between it and the centroid is greater than the threshold r, this partial relation is an outlier in this cluster. The Y-means method splits the cluster by removing the outlier from the current cluster, and the outlier is used as a new centroid for a new cluster. Then all the partial relations are clustered again into k+1 clusters. The process of finding outliers and re- clustering is iterated, and stops when there is no single outlier left. As a result, clusters may be split too small, so some clusters whose distances are sufficiently small should be merged. When using Chebyshev's theorem, there are at least 50% of the objects within m = 1.414 standard deviations of the mean. Thus, the threshold tm used for merging clusters is calculated as:

tm = m{a, +

h where at is the standard deviation of the i' cluster, and

With linking, the new big cluster has two centers which are the two old centers, and there is no new centroid to be assigned. In this thesis, the linking method is employed, because clusters can be in arbitrary shapes as they are in the real world. The process of merging clusters is iterated, and it stops when all clusters that are close enough are merged together. The value of the initial k is adjusted by splitting and merging clusters.

41 All in all, this chapter first introduced the concept of preference relations. Then it concentrated on introducing COP-nets, and a new method that is used to calculate distances between two partial preference relations is proposed based on the research of

COP-nets. When distances among partial relations are obtained, the Y-means clustering method, which is an improvement of the K-means algorithm, is applied to partition partial relations into clusters. In the mean time, an existing method used to calculate distances between partial relations, probabilistic distance, is also presented.

42 Chapter 4 Inferring Preferences

The goal of clustering in this thesis is to help in the elicitation of users' preferences. Initially, all previously encountered users' preference relations are partitioned into clusters. When a new user is encountered, he/she will be classified into one of these existing clusters. Based on investigating the users' preferences in this particular cluster, general conclusions can be made. For example, consider a user for whom it is not known whether red or blue is preferred. Assume that he/she is assigned to cluster Cj, and in cluster C1, 80% of users prefer red over blue. Thus, since the user agrees on many preferences shared by members of this group, the conclusion that he/she has a much higher probability of preferring red over blue can be drawn. This is only a simple example, but it gives a basic idea of how users' preferences can be estimated by performing clustering.

Different clustering methods can cause different results over the same data points. This thesis tests various strategies for clustering partial preference relations and for inferring new preferences based on the set of clusters obtained. As mentioned in

Chapter 3, the Y-means clustering algorithm, which is an improvement of the K-means clustering method that does not specify the number of clusters in advance, is used in this thesis. This chapter describes three different applications of the Y-means algorithm for clustering partial preference relations, and demonstrates the technique used for estimating preferences for each application. These three methods are summarized in

Table 4.1.

43 Method for Inferring Preferences Methodology

Partial relations are partitioned by the Y- means clustering method into clusters, and Direct Inference Based on Clustering a new user is classified into one of the clusters. Preferences are then inferred.

Partial relations are partitioned by a specified criterion into clusters, and a new user is classified into one of the clusters. Partial relations in this particular cluster to Pre-processing Inference Based on which the user belongs are then further Clustering partitioned by the Y-means clustering method, and preferences are inferred. The specified criterion relates to the preferences to be inferred for the new user.

Partial relations are partitioned by the Y- means clustering method into clusters, and a new user is classified into one of the clusters. Partial relations in the cluster to Post-processing Inference Based on which the user belongs are partitioned by a Clustering specified criterion, and the user is classified again into one of the sub- clusters. Then, preferences are inferred. The specified criterion relates to the preferences to be inferred for the new user.

Table 4.1 Three methods for inferring preferences

44 4.1 Direct Inference Based on Clustering

The Direct Inference Based on Clustering method is the most basic of the three exhibited in this thesis. It consists of only one clustering process, and during the clustering process, all partial preference relations are partitioned by the Y-means clustering method. Assume that the partial relations are grouped into n clusters. When a new user u comes up, suppose the agent would like to know the preference between outcome o, and outcome o.. There are two main steps used to obtain this unknown preference. The first step is to determine to which cluster u belongs. Based on w's known partial preferences, a COP-net can be constructed. As mentioned in Chapter 3, u's utility estimates for all outcomes are calculated by the COP-net, and these utilities compose a vector. In order to classify u into one of the clusters, the distances between u and the centers of the n clusters should be computed by Euclidean distance. The user u belongs to the cluster for which the distance between u and the center is the smallest. The second step is to deduce preferences. Assume u is assigned to the m'h cluster. The average utilities of all possible outcomes, which belong to all partial preference relations in the m'h cluster, can be calculated using the COP-net method. The confidence interval2 [20] for the average utility of outcome ot (denoted by ciu(Oi)) for all preference relations in the m'h cluster is compared with the confidence interval for the average utility of outcome Oj (denoted by ciu(Oj)). If the average utility for o, is greater than the

average utility of outcome Oj and there is no overlap between ciu(ot) and ciu(Oj), then it is predicted that oi > Oj; on the other hand, if the average utility for oi is less than the

2 Confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

45 average utility of outcome o., and there is no overlap between cz'w(0;) and ciu(Oj),

then it is predicted that oj >- ot; if there is an overlap between ciu(oi) and ciu(Oj), then the preference between oi and o . is deemed to be unknown. Figure 4.1 is the flow chart for the Direct Inference Based on Clustering method.

In summary, the key idea of the Direct Inference Based on Clustering method is to calculate two outcomes' estimated utilities, and obtain the confidence intervals for these estimated utilities. From the comparison between the two confidence intervals, the preference between the two outcomes is predicted.

Partial Preference Relations i Y-means Clustering Clusters Classification j Preference Relations in the Particular Cluster Calculation I The Confidence Intervals for Averages Utilities of Two Outcomes

Comparison T

r Results J

Figure 4.1 Flow Chart of the Direct Inference Based on Clustering

46 4.2 Pre-processing Inference Based on Clustering

The Pre-processing Inference Based on Clustering method is more complicated than the Direct Inference Based on Clustering method, and is composed of two clustering phases: cursory clustering and the Y-means clustering. At first, all partial preference relations are cursorily grouped into three large clusters according to the possible outcomes for the desired preference to be predicted. Consider the unknown relation between two outcomes oi and Oj of the user u needed by an agent. The

criterion here is specified as the relation between oi and o;. For each preference relation in the database, there are three possible relations between o, and Oj: oi >- o;,

Oj >- o,, or neither oi >- Oj nor o; >- ot (i.e. the relation between oi and o;. is unknown).

Partial relations are partitioned by the criterion as follows: if a partial relation includes oi >- Oj, it is clustered into the first group; if a partial relation includes o} > oj, it is clustered into the second group; and if a partial relation includes neither o, >- o} nor

Oj >- oi, it is clustered into the third group. The above process is performed over partial relations iteratively, and as a result, three clusters are constructed. Each of these three clusters is then partitioned by the Y-means clustering method into sub-clusters. Once the sub-clusters are formed, the user u is classified in one of the sub-clusters. If he/she is classified in a sub-cluster that belongs to the first super-cluster, the relation between o, and Oj is predicted to be o, >- oy. Similarly, if he/she is classified in a sub-cluster that belongs to the second super-cluster, the relation between oi and o; is predicted to be

Oj >- oj. However, if he/she is classified in a sub-cluster that belongs to the third super-

47 cluster, it means the relation between ot and Oj is still unknown. In this case, there are two possible courses of action: 1) concede that the preference cannot be determined, or

2) perform a test on this sub-cluster similar to that done by the Direct Inference method discussed in Section 4.1. Thus, the mean estimated utilities of oi and Oj in this cluster are analyzed to see if there is sufficient evidence to make a conclusion on which is preferred. Figure 4.2 depicts the flow chart for the Pre-processing Inference Based on

Clustering method.

Partial Preference Relations

Cursory Clustering Three Clusters

Y-means Clustering

Sub-clusters

Classification

^—-^ ~-^_^^ Yes User belongs to a sub- "—-^ cluster in first cluster •—"""

TNO ^~~-"-""^ ^^^ Yes User belongs to a sub- ^""*—--^ cluster in second cluster ^—'

I No

User belongs to sub-cluster in third cluster

Results

Figure 4.2 Flow Chart of the Pre-processing Inference Based on Clustering method

48 4.3 Post-processing Inference Based on Clustering

The Post-processing Inference Based on Clustering method is similar to the Pre­ processing Inference Based on Clustering method; however, it works in a reverse fashion. It includes three clustering phases: the Y-means clustering method, cursory clustering, and the Y-means clustering method again. In the first step, all preference relations are partitioned into n clusters by the Y-means clustering method. A new user u is encountered, and, as before, we assume that w's unknown relation between two outcomes ot and o. is needed by an agent. The user is classified in one of the clusters formed in the previous stage by comparing the distances between u and each cluster's center. Assume that the user u belongs to the m'h cluster. At the second step, partial preference relations in the m'h cluster are cursorily clustered into three sub-clusters by a specified standard. The standard is designated as relations between ot and Oj. As mentioned above, there are three possible relations between them: oi >• o}, Oj >- oi, or neither o, >• Oj nor oy >- oi. Partial relations are grouped by the criterion: if partial relations include o, >- o., they are partitioned into the first sub-cluster; if partial relations

include oy >- oi, they are partitioned into the second sub-cluster; otherwise, partial relations are partitioned into the third sub-cluster. When three sub-clusters are built, each sub-cluster should be partitioned again by the Y-means clustering method into k small sub-clusters, and u is classified again. Depending on the results of clustering, the conclusion of the relation between oi and o. is made. If u is classified in a small cluster that belongs to the first sub-cluster, the relation between oi and o . is oi >- Oj. If u is classified in a small cluster that belongs to the second sub-cluster, the relation between

49 oi and o is Oj >- o,. Finally, if u is classified in a small cluster that belongs to the third sub-cluster, the relation between oi and o; is still unknown. As before, there are two possible courses of action: 1) concede that the preference cannot be determined, or 2) perform a test on this sub-cluster similar to that done by the Direct Inference method discussed in Section 4.1. Thus, the mean estimated utilities of ot and oy in this cluster are analyzed to see if there is sufficient evidence to make a conclusion on which is preferred. Figure 4.3 depicts the flow chart for the Post-processing Inference Based on

Clustering method.

4.4 A Simple Example

The three methods of inferring preferences based on clustering use different orders of clustering, and the resulting inference may not be the same in each case. In this section, Example 4.1 shows the results of the three methods, based on the same set of partial preference relations for one particular user.

Example 4.1 Given 10 users shown in Table 4.3 and a new user shown in Table

4.2, each user has a partial preference relation specified over outcomes with 4 attributes:

A, B, C, D, and each attribute has two values: a and a for A, b and b for B, etc. The new user's relation over outcomes b and d is to be determined. Preferences below for attribute values use the ceteris paribus assumption.

50 Partial Preference Relations Y-means Clustering Clusters 1 Classification Preference Relations in the Particular Cluster Cursory Clustering

Three Clusters I Classification and Y-means Clustering Yes

User belongs to a sub-cluster in the third cluster

Results

Figure 4.3 Flow Chart of the Post-processing Inference Based on Clustering

Method

New user A set of preferences

u a> c, O d

Table 4.2 A new user and his preferences

51 User A set of preferences

wi by d,b y a

u2 c> a,d >b

W3 b>- d,a y c

w4 a>- b, a:c>- b, a:b>- c

u5 b>- a,a:c>- d, a:d> c

u6 a>- c,d >-b

u7 d> a,c)~b, by d

u% b y a,b y c,d y b

u9 a y b,b:c y a,b :ay c

M10 cy d ,c:by a,c:ay b

Table 4.3 Users and their preferences

In the Direct Inference Based on Clustering method, the first step is to build

COP-nets for each user and the new user. Here, the COP-net of w, is built and illustrated in Figure 4.4. The same procedure is followed for the other users as well.

52 Figure 4.4 The COP-net for «,

The second step is to estimate utilities for each user's outcomes by the Longest- path method; these utilities form a vector. All users' vectors are shown in Table 4.4, and the vector of the new user is shown in Table 4.5. Each vector in the following discussion follows this format: ut => (0,a,b,ab,c,ac,bc,abc,d,ad,bd,abd,cd, acd,bed,abed).

53 User Vector

ux (1, 0.9, 0.86, 0.68, 0.4, 0.24, 0.32, 0.16, 0.73, 0.45, 0.59, 0.23, 0.24, 0.08, 0.16, 0)

U-y (1, 0.9, 0.29, 0.17, 0.4, 0.56, 0.23, 0.11, 0.34, 0.23, 0.17, 0.06, 0.29, 0.17, 0.11, 0)

u3 (1, 0.9, 0.83, 0.7, 0.4, 0.3, 0.3, 0.2, 0.65, 0.5, 0.47, 0.3, 0.2, 0.1, 0.1, 0)

u4 (1, 0.9, 0.33, 0.27, 0.4, 0.2, 0.13, 0.07, 0.79, 0.58, 0.27, 0.2, 0.33, 0.13, 0.07 0)

us (1, 0.9, 0.95, 0.52, 0.4, 0.13, 0.33, 0.07, 0.32, 0.2, 0.27, 0.13, 0.24, 0.07, 0.17, 0)

u6 (1, 0.9, 0.65, 0.5, 0.4, 0.3, 0.2, 0.1, 0.83, 0.7, 0.47, 0.3, 0.3, 0.2, 0.1, 0)

u (1, 0.9, 0.36, 0.18, 0.4, 0.22, 0.31, 0.13, 0.63, 0.13, 0.22, 0.04, 0.27, 0.09, 0.18, 0) l

u% (1, 0.9, 0.85, 0.52, 0.4, 0.2, 0.27, 0.07, 0.93, 0.7, 0.78, 0.33, 0.33, 0.13, 0.2, 0)

u9 (1, 0.9, 0.72, 0.54, 0.4, 0.63, 0.36, 0.18, 0.67, 0.27, 0.2, 0.13, 0.33, 0.17, 0.07, 0)

u10 (1, 0.9, 0.95, 0.54, 0.4, 0.34, 0.29, 0.19, 0.3, 0.17, 0.23, 0.09, 0.21, 0.11, 0.06, 0)

Table 4.4 The vectors for each user

New user Vector

u (1, 0.9, 0.8, 0.61, 0.4, 0.32, 0.32, 0.24, 0.32, 0.24, 0.24, 0.16, 0.16, 0.08, 0.08, 0)

Table 4.5 The vector for the new user

54 The ten users are partitioned into 2 clusters by the Y-means clustering method in the third step, shown in Table 4.6.

Cluster 1 Cluster 2

User MM "^9 "*)5" **2 5 w4 j Wy ^6 '^8' ^9 »^10

Table 4.6 Clusters formed by the Y-means clustering method

The computed centroid for each cluster is shown in Table 4.7.

Cluster Center

(1, 0.9, 0.83, 0.57, 0.4, 0.31, 0.3, 0.14, 0.63, 0.43, 0.43, 0.22, 0.26, 0.12, 0.12, 0)

(1, 0.9, 0.33, 0.21, 0.4, 0.33, 0.22, 0.1, 0.58, 0.31, 0.22, 0.09, 0.3, 0.13, 0.12, 0)

Table 4.7 The centers for clusters

According to the Euclidean distance between u and these four centers, u belongs to the cluster with the closest center. Thus, the new user u is classified in Cluster 1.

When confidence intervals of 95% are used, the lower bound of the confidence interval for the average utility of outcome b (denoted by lower _u(b)) in Cluster 1 is 0.7535, and the upper bound of the confidence interval for the average utility of outcome b

(denoted by upper _u(b)) is 0.9064. The lower bound of the confidence interval for the average utility of outcome d (denoted by lower _u(d)) is 0.5564, and the upper bound of the confidence interval for the average utility of outcome d (denoted by upper _u(d)) is 0.7093. Obviously, there is no overlap between these two confidence intervals, and the

55 mean utility for b is greater than the mean utility for d, so the preference between b and d for the new user is predicted as b> d.

In the Pre-processing Inference Based on Clustering method, the users are partitioned into 3 clusters by the relation between b and d in the first step, and these three clusters are shown in Table 4.8.

Cluster \(J»d) Cluster 2 (d >• b) Cluster 3 (neither by d nor d >- b)

User Mi ^ Id-i ^ Id j U2, U6 , «8 "4 5 Wf, 5 W<) 9 WjQ

Table 4.8 Clusters formed by a specified criterion

Cluster 1 is partitioned again by the Y-means clustering method into two sub- clusters shown in Table 4. 9.

Sub-cluster 1 Sub-cluster 2

User 19 1 w7

Table 4.9 Sub-clusters from cluster 1

Cluster 2 is partitioned again by the Y-means clustering method into two sub- clusters shown in Table 4. 10.

Sub-cluster 3 Sub-cluster 4

User u2 u6, w8

Table 4.10 Sub-clusters from cluster 2

56 Cluster 3 is partitioned again by the Y-means clustering method into two sub- clusters shown in Table 4. 11.

Sub-cluster 5 Sub-cluster 6

User ^5 9 ^9 » WJO uA

Table 4.11 Sub-clusters from cluster 3

The computed centroid for each sub-cluster is shown in Table 4.12.

Cluster Center

Sub-cluster 1 (1, 0.9, 0.85, 0.69, 0.4, 0.29, 0.3, 0.19, 0.7, 0.47, 0.55, 0.25, 0.21, 0.1, 0.11, 0)

Sub-cluster 2 (1, 0.9, 0.36, 0.18, 0.4, 0.22, 0.31, 0.13, 0.63, 0.13, 0.2, 0.04, 0.3, 0.09, 0.18, 0)

Sub-cluster 3 (1, 0.9, 0.3, 0.17, 0.4, 0.56, 0.23, 0.1, 0.34, 0.23, 0.17, 0.06, 0.29, 0.17, 0.11, 0)

Sub-cluster 4 (1, 0.9, 0.75, 0.5, 0.4, 0.25, 0.24, 0.09, 0.88, 0.7, 0.63, 0.32, 0.32, 0.17, 0.15, 0)

Sub-cluster 5 (1, 0.9, 0.87, 0.53, 0.4, 0.37, 0.33, 0.15, 0.43, 0.2, 0.23, 0.12, 0.26, 0.12, 0.1, 0)

Sub-cluster 6 (1, 0.9, 0.33, 0.27, 0.4, 0.2, 0.13, 0.07, 0.79, 0.58, 0.27, 0.2, 0.33, 0.13, 0.07, 0)

Table 4.12 The centers for each sub-clusters

The user u is classified into one of the above six clusters. According to the distances between the new user and the six centers, he is assigned in Sub-cluster 5.

Hence, the relation between b and d is deemed to be unknown.

In the Post-processing Inference Based on Clustering method, in the first phase, the users are partitioned into two clusters as shown in Table 4.6. The new user u is

57 classified in Cluster 1, so Cluster 1 is cursorily clustered into 3 sub-clusters by the relation between b and d, shown in Table 4.13.

Sub-cluster 1 Sub-cluster 2 Sub-cluster 3 (byd) (neither b >- d nor d >- b)

User «!, M3 u6, w8 Wj •> Ug , £/|Q

Table 4.13 Sub-clusters formed by a specified criterion

Based on the algorithm of the Post-processing Inference Based on Clustering, each sub-cluster should be partitioned again by the Y-means clustering method. They are described in the following three tables.

Sub-cluster 1 is partitioned again by the Y-means clustering method into two clusters shown in Table 4. 14.

Cluster 1 Cluster 2

User w, u3

Table 4.14 Small clusters from Sub-cluster 1

Sub-cluster 2 is partitioned again by the Y-means clustering method into two clusters shown in Table 4. 15.

Cluster 3 Cluster 4

User u6 w8

Table 4.15 Small clusters from Sub-cluster 2

58 Sub-cluster 3 is partitioned again by the Y-means clustering method into two clusters shown in Table 4. 16.

Cluster 5 Cluster 6

User ^5 » "lO u9

Table 4.16 Small clusters from Sub-cluster 3

The computed centroids for these six clusters are listed in Table 4.17.

Cluster Center

Cluster 1 (1, 0.9, 0.86, 0.68, 0.4, 0.24, 0.32, 0.16, 0.73, 0.45, 0.59, 0.23, 0.24, 0.08, 0.16, 0)

Cluster 2 (1, 0.9, 0.83, 0.7, 0.4, 0.3, 0.3, 0.2, 0.65, 0.5, 0.47, 0.3, 0.2, 0.1, 0.1, 0)

Cluster 3 (1, 0.9, 0.65, 0.5, 0.4, 0.3, 0.2, 0.1, 0.83, 0.7, 0.47, 0.3, 0.3, 0.2, 0.1, 0)

Cluster 4 (1, 0.9, 0.85, 0.52, 0.4, 0.2, 0.27, 0.07, 0.93, 0.7, 0.78, 0.33, 0.33, 0.13, 0.2, 0)

Cluster 5 (1, 0.9, 0.95, 0.53, 0.4, 0.24, 0.31, 0.13, 0.31, 0.19, 0.25, 0.11, 0.22, 0.09, 0.12, 0)

Cluster 6 (1, 0.9, 0.72, 0.54, 0.4, 0.63, 0.36, 0.18, 0.67, 0.27, 0.2, 0.13, 0.33, 0.17, 0.07, 0)

Table 4.17 The centers for clusters

According to the Euclidean distance between u and these six centers, u belongs to the cluster with the closest center. Thus, the new user u is classified in Cluster 5.

Cluster 5 belongs to the sub-cluster 3, so the relation between b and d is deemed to be unknown.

Note that Example 4.1 is only an illustrative example, and some experimental results will be shown for more complicated test cases in Chapter 6.

59 Chapter 5 Implementation

This chapter primarily describes the details of how to implement the three clustering methods. As explained in Chapter 4, in order to infer preferences, there are three methods: Direct Inference Based on Clustering, Pre-processing Inference Based on

Clustering, and Post-processing Inference Based on Clustering. In this chapter, three

Java projects representing the above three methods will be described and discussed.

These three projects are named Direct^inference, Pre-processing_inference, and Post- processing_inference.

5.1 Input and Output

The input for these projects is a set of text files. Each text file represents one user's preferences. In other words, if there are 100 users, then there are 100 text files. In each text file, there are a number of preference rules which stand for a particular user's preferences over some possible outcomes. While, in theory, the work presented in this thesis can be applied when attributes can take any number of values, the actual implementation only handles the two-value case. Recall that, in order to express preferences simply and conveniently, all "bar" values of each attribute are typically omitted. To demonstrate how a preference file would be constructed in a real situation, consider an example scenario in on-line privacy. Here, a user has preferences for revealing his name, telephone number, email address, and credit card number. Since he has concerns that his information will be shared or used improperly, he has the following preferences: he prefers giving away nothing over giving away any item; he prefers

60 giving away his name over email address; he prefers giving away his telephone number over credit card number; if his name is given, he prefers giving away his telephone number over email address. N denotes his name, T his telephone number, E his email address, and C his credit card number. N has two values: n (representing n being giving away) or n (representing n not being given away), similarly, t and / for T, e and e for

E, and c and c for C. In the text file representing this user, the above information is translated into preference rules as follows:

n>n

t>t

e>- e

Oc

n >- e

t>c

n:t> e

n:e>t

After each of the three methods predicts an unknown preference, they will return a string that contains the relation between the two outcomes for which the desired preference is predicted. For example, if the relation between outcomes oi and Oj should be inferred, the output of the result is o, > oj, or o, < oy, or ot ? Oj, where oi ? oj represents that the preference over ot and Oj is unknown.

61 5.2 Algorithm for Direct Inference Based on Clustering

The detailed description of the Direct Inference Based on Clustering method is given in the previous chapters, and is implemented by the Direct_inference project. In this project, there are 4 main classes: Utility Calculation, Clustering, Classification, and

Confidence jnterval. These 4 classes are explained in this section.

The UtilityCalculation class contains a method Calculation_utils(), and is used to compute utilities of possible outcomes. The inputs for the method Calculation_utils() are a COP-net and a set of outcomes with given utilities, and the output is an array that contains known or estimated utilities of all possible outcomes. By implementing the method Calculation_utils(), outcome utilities for each user are computed, and all users' utilities are outputted to a text file. As mentioned before, a vector is composed of each user's utilities.

The Clustering class includes the method assignVector(), the method splitClusterQ, and the method HnkClusterQ. The Clustering class is used to cluster vectors that are formed by utilities of users' outcomes. The method assignVectorQ is used to assign vectors to their closest centers one by one. The inputs for the method assignVectorQ) are a directory of a text file, which stores utilities of outcomes of all users, and the initial number of clusters, which is specified in advance. The text file under this designated directory is created in the previous step, and the initial number of clusters is randomly chosen from an interval which should be greater than 1 and smaller than the total number of vectors. Based on the method assignVectorQ, vectors are grouped into the initial number of clusters. The method splitClusterQ is used to split large clusters into several small clusters, and is a recursive process. The first step is to

62 get a farthest vector by the method getFptQ, and compare the distance between the farthest vector and the center with a threshold. If the distance is greater than the threshold, the farthest vector (referred to as an outlier) is removed from the current cluster and added to the set of centers as a new center. Thus, the current number of centers should increase by 1. The method assignVectorQ is employed again, and vectors are assigned based on the new set of centers. The above process is iterated until there are no more vectors whose distances from their own centers are greater than the threshold. A do-while statement is adopted here, and the particular condition to terminate the loop is a

Boolean variable noChange. In other words, if other outliers exist, noChange is set to false, and the do-while statement is executed. Otherwise, noChange is set to true, and the loop is exited. The output of the method splitClusterQ is a vector that contains centers of existing clusters.

The method HnkClusterQ is used to merge small clusters. After computing the distances between any pair of centers, the distance is compared with a threshold to determine whether these small groups should be merged or not. If the distances are less than the threshold, these centers remain the same in the Vector centerSet, and vectors in these clusters are given to a Vector groupNoList together. If the distance is greater than the threshold, these centers still remain the same in the centerSet, and vectors in these clusters are given to groupNoList, still in their separate clusters. The Vector centerSet is saved in a text file, and the output of the Clustering class is the Vector groupNoList.

The Classification class has a method classifyQ, and is used to classify a new user into one of the existing clusters. The method classifyQ has two inputs, each of which are directories. A text file under one directory stores the vector that is formed by the utilities of the new user's outcomes, and a text file under another directory saves the

63 centers from the previous step. The distances between the vector and the centers are computed, and the new user belongs to the cluster whose center is closest to the vector.

The output of this method is an integer clusterlD, which represents the cluster to which the new user belongs.

The Confidence interval class has a method calculateQ, and is used to calculate a confidence interval for the average utility of each outcome in a specific cluster. The inputs of the method calculateQ are a directory and the clusterlD. The text file under the directory is used to store utilities of all users' outcomes. The output of the method calculateQ consists of the confidence intervals for average utilities for all possible outcomes.

The sequence of implementing the Direct Inference Based on Clustering method is as follows: at first, all users' utilities are calculated by the methods in class

UtilityCalculation, and these utilities are saved in a text file; the second step is to read these utilities and store them in a two-dimensional array, and cluster these utilities using methods in the class Clustering; the third step is to classify a new user to one of the clusters using methods in the class Classification; and the last step is to calculate confidence intervals for outcomes in the particular cluster using methods in the class

Confidence interval, where confidence intervals for two outcomes are compared. If there is no overlap between two confidence intervals, the preference of these two outcomes can be inferred accordingly. If there is an overlap between two confidence intervals, the preference of these two outcomes cannot be inferred.

64 5.3 Algorithm for Pre-processing Inference Based on

Clustering

The Pre-processing Inference Based on Clustering method is realized in the Pre­ processingJnference project. The Pre-processingJnference project includes four main classes: Utility Calculation, CursoryClustering, Clustering, and Classification. Among these classes, most of them are described in Section 5.2 in detail, except the class

CursoryClustering, which is discussed in this section.

The class CursoryClustering has a method preProcessQ, which is used to group vectors formed by utilities into three large clusters through a specified criterion. The input of the method preProcessQ is a directory, and the sole text file under this directory saves all vectors. The specified criterion is saved as two strings: the left part as prefer and the right part as nonPrefer, and three two-dimensional arrays, consistent, opposite, and unknown, are defined. When each vector is read, the program checks whether there is a substring that is consistent with the relationship between the two strings prefer and nonPrefer, or opposite to the relationship, or if the relationship is unknown. If they are consistent, the vector is stored in the consistent array; if opposite, the vector is stored in the opposite array; otherwise, the vector is stored in the unknown array. The outputs of the method preProcessQ are these three two-dimensional arrays.

The sequence of implementing the Pre-processing Inference Based on Clustering method is as follows: at first, all users' utilities are calculated by using methods in the class UtilityCalculation, and these utilities are saved in a text file; the second step is to read these utilities into a two dimensional array, and group them by using methods in the class CursoryClustering into three large clusters, which are saved in three two-

65 dimensional arrays; the third step is to re-cluster the three clusters by using methods in the class Clustering one by one; and the last step is to classify a new user into one of the sub-clusters by using methods in the class Classification. Preferences between any two outcomes can be inferred by observing preferences in this particular cluster.

5.4 Algorithm for Post-processing Inference Based on

Clustering

The Post-processing Inference Based on Clustering method is realized in the

Post-processing_inference project, and it includes four main classes: Utility Calculation,

CursoryClustering, Clustering, and Classification. Since these four classes and their methods are presented in the above sections, this section will mainly discuss the process of implementing it.

The sequence of implementing the post-processing inference based on clustering method is as follows: at first, all users' utilities are calculated by using methods in the class Utility Calculation, and these utilities are saved in a text file; the second step is to read these utilities and store them in a two-dimensional array, and cluster these utilities by using methods in the class Clustering; the third step is to classify the new user into one of the clusters by using methods in the class Classification; the fourth step is to group all vectors in the particular cluster by using methods in the class

CursoryClustering into three large clusters which are saved in three two-dimensional arrays; the fifth step is to re-cluster the three large clusters by using methods in the class

Clustering one by one into sub-clusters; and the last step is to classify a new user into

66 one of the sub-clusters by using methods in the class Classification. Preferences between any pair of outcomes are inferred by directly examining the particular cluster to which the new user belongs.

67 Chapter 6 Experimentation

6.1 Experimental Goals

This thesis discusses research on inferring users' preferences. Three techniques are developed, and have been described in the previous chapters in detail. Briefly, according to the research on COP-nets, a user's utilities for possible outcomes can be estimated, forming a vector of these utilities for each specific user. The distance between two users' preference relations then can be measured by the Euclidean distance between these two vectors. Based on the above distance measurement method, all users can be grouped into clusters by the Y-means clustering method. When a new user is encountered, he/she is classified into one of these clusters. Some conclusions about the new user's preferences can then be drawn by examining this particular cluster.

The ultimate goal of the experiment that is described in this chapter is to determine which technique explored has the best ability to accurately predict users' unknown preferences. In order to achieve the ultimate goal, several experiments are run on different users with various numbers of known preferences. In each test run, a number of the user's preferences are revealed to the system to allow for COP-net construction and clustering. Next, two outcomes are chosen randomly under the conditions that 1) the user has a specified preference over the two outcomes, and 2) this preference was not one of those revealed in the first step. The goal is then to determine how accurately each method can predict these preferences. In order to obtain meaningful results, preference relations from actual people are needed. User ratings from a popular

68 collaborative filtering database are obtained for this purpose, and two methods for generating partial preference relations from ratings data are used in the experiments.

In this chapter, the principles and the design of the experiments are described, and the results of the experiments are analyzed.

6.2 Experimentation Methods

To achieve the ultimate goal, reasonable test cases are created to test the prediction of preferences, and the results of each technique are compared. This thesis designs two methods to obtain utilities of test users' outcomes. In the following sections, methods for choosing users who are used as training users and users who are used as testing users, as well as calculating the utilities of possible outcomes for both training users and testing users, are described in detail.

6.2.1 Experimentation Method One

6.2.1.1 Experimental Data

The experimental data in this thesis come from a website, named MovieLens

[34], which is used to recommend movies to users. The motivation for using MovieLens is that the data sets there are real: the website collects the data by interacting with true users. Because of this, the three techniques that are used to infer preferences can be tested on real results about users' preferences. Thus, the results are more meaningful than they would be if simulated users were generated, and are more persuasive.

69 The MovieLens data set that was used includes 100,000 ratings for 1,682 movies by 943 users. The data set is shown as follows:

196 : 242 : 3

186:302:3

22 : 377 : 1

12 : 303 : 3

Each line follows the format user : movie : rating. For example, the first line indicates that the user 196 gave the movie 242 a rating of 3. Here, ratings range from 1 to 5, where 5 means "must see" , 3 means "ok" and 1 means "awful".

In this thesis, the 512 most frequently rated movies are selected from the original data set, and are renumbered from 0 to 511. The convention for renumbering is that the more frequently a movie is rated, the lower the assigned number. For example, the movie 50 has been rated 583 times, making it the most frequently rated movie.

Therefore, the movie 50 after renumbering is movie 0. Similarly, the movie 959 has been rated by 64 users, making it the least frequently rated movie. Thus, the movie 959 is renumbered as movie 511. The renumbered movies are as follows:

0:50

1 :258

2: 100

3: 181

511 :959

70 Each line follows the format new movie number : old movie number. In the following discussion, the new movie number is used.

The methodology of the experiments is to cluster users into groups, and to classify a new user into one of the existing clusters. According to the cluster to which he belongs, preferences over pairs of outcomes are predicted for this user. The accuracy of prediction is then tested. To accomplish this, the 943 users are divided into two sets: the training data and the testing data. In particular, the first 743 users are used as training users and the last 200 users are used as testing users. In other words, clustering is performed over the first 743 users, and classification is performed on the last 200 users.

6.2.1.2 Experimental Design

As mentioned before, the distance between two preference relations is calculated as the Euclidean distance between the two respective utility vectors, as computed by the

COP-net method. Using these distances, all training users are clustered. Building these

COP-nets over the training data set coming from MovieLens ratings data is a crucial first step in the experimentation.

The first step is to create preference relations from the ratings data. Since a user typically only rates a small subset of movies, collaborative filtering is used to determine a likely preference ordering over the entire set. Collaborative filtering is usually defined as "a method of making predictions about the interests of a user by collecting relative information from many other users" [12]. In other words, collaborative filtering is used to predict a new user's taste based on a database of users' preferences, and is mainly adopted in recommendation systems. The basic idea of collaborative filtering is to find

71 similar users who have similar interests, and to recommend something that those similar users like [4]. For example, the MovieLens website can recommend movies to a particular user based on its own database; the "rate your music" website [35] can recommend music for users, and so on. Generally, collaborative filtering is achieved in two steps:

1) find some users who have similar interests with the active user;

2) calculate predicted ratings for the active user by investigating those similar users.

A user's likely rating for an item is calculated as follows. Assume that a user i has rated a set of items /,.. The average rating ri for user i is:

where |/;| denotes the number of items that are rated by user i,j denotes items in It, and rtJ denotes the rating that user i gave for item j. Assume that there is an active user a, and that a did not rate the itemy. Thus, a's rating fory is predicted as:

n ra J = ra + * X Z W(a' Ofoj ~ ri ) ' (6-2) where w(a,i) denotes a weight, which can reflect distance, correlation, or similarity between user a and user /', where k denotes the normalization factor, and where n denotes the number of users. In this thesis, w(a,i) is computed as the Pearson correlation coefficient. The correlation between user a and i is:

> (r • — r )(r —r)

72 Using the above method, unrated movies' ratings are predicted for each user, simulating the effect of each user rating all 512 movies. Therefore, the complete preference relation over all 512 movies for each user is built. Moreover, these generated ratings come from the set of real numbers from 1 to 5, rather than the set of integers, providing much stronger preference information due to the reduction in the number of ties.

Since only partial preference relations are desired, the second step is to determine the number of these preferences that will be known. This number can be arbitrarily chosen from the set of integer values.

The third step is to build a COP-net for each user by choosing the number of these preferences mentioned above to be used in the construction. Since the COP-net must have a topmost node, preferences are included that reflect that the user's favourite movie is preferred over all other movies. For example, the ranking for the user 261 is

{303, 35, 74, 503, ..., 308,450}. The first set of preferences is constructed as:

303 >- 35

303 >- 74

303 >- 503

303 >- 450

Since the COP-net needs a bottom node, preferences are included such that all movies are preferred over the least favourite movie. Again using the user 261 as an example, the second set of preferences is built as:

35 >- 450

74 >- 450

73 503 >- 450

308 >- 450

Finally, a number of known preferences is chosen, and these preferences are randomly generated from the total order. For example, let the number of known preferences be 5. Thus, 5 preferences are generated by randomly choosing pairs of movies from the ranking of a particular user. As for the user 261, these 5 preferences could be:

129 >- 262

56 >- 123

14 >- 428

120 >- 279

115 >- 319

Similarly, other numbers of preferences can be generated. Based on the above three sets of preferences, a COP-net is built. Giving the utility 1 for the most preferred movie and the utility 0 for the least preferred movie, utilities of other movies are estimated.

To test the accuracy of the three techniques, test data are created based on the remaining 200 users. Tests are then run on these data to determine how accurately the techniques can predict preferences over any two movies. The generation of testing data is more difficult than the generation of training data. Like the 743 training users, the 200 testing users give sparse ratings for the 512 movies as well. However, collaborative filtering cannot be applied to generate preference relations for these users. If the rankings of users were to be generated by using collaborative filtering, preferences from

74 the ranking would be used to build the COP-nets. If preferences that have already been used to build COP-nets are chosen for testing as well, then the results would be unfairly biased. Another method for building COP-nets for testing users is used instead. First, the most preferred movie is chosen from one of the highest rated movies. For example, if user 860 gave 4 movies a rating of 5, say 138, 178, 174, and 67, the most preferred movie can be chosen arbitrarily from these four movies. Assume that movie 178 is chosen. It should then have the highest rank, followed by the other three movies.

Similarly, the least preferred movie is chosen from one of the lowest rated movies, and is ranked lowest. Movies rated in between are ranked accordingly. All movies that have not been rated by the user have no preferential relation to the rated movies.

Example 6.1 Assume that 7 movies are rated, where the user has not provided a rating for movie 2. Possible methods to rank these movies are shown in Table 6.2.

Movie 0 1 2 3 4 5 6

Rating 3 2 5 5 3 2

Table 6.1 Movies and Their Ratings

Possible methods Rank

1 {3,4,0,5,1,2,6}

2 {3,4,5,0,6,2,1}

3 {4,3,0,5,6,2,1}

4 {4,3,5,0,1,2,6}

Table 6.2 Possible ranks of the movies

75 Once the ranking is created, a set of preferences is created accordingly. As with the training data, all preferences reflecting that a particular user's most favourite movie is preferred over all other movies, and that all movies are preferred over the least preferred movie, are included. The number of other known preferences is determined, as is done with the training data, and this number of preferences is generated randomly from the known preferences. The generation of these preferences is done differently from the training data, since collaborative filtering is not used to determine unknown ratings. The basic idea is to choose any pair of movies such that: 1) both movies are rated; and 2) these two movies have different ratings. In Example 6.1, choosing the preference over movies 3 and 4 is not allowed, because they have the same rating (5). At any time, choosing movie 2 is not allowed, because the user did not rate this movie. If the number of preferences is assigned as 3, possible preferences could be generated as follows:

0 >- 1

3 > 1

Based on the above set of known preferences, a COP-net is built accordingly.

Giving utility 1 to the most preferred movie and utility 0 to the least preferred movie, utilities of other movies are estimated.

This prepares the training data set and the testing data set for experiments.

Training data are then clustered by the Direct Inference Based on Clustering, the Pre­ processing Inference Based on Clustering and the Post-processing Inference Based on

Clustering methods, respectively. A user from the testing data set is then chosen, and two movies that were rated by this user are randomly selected. Because these two

76 movies are actually rated by the user, the preference between them is the user's true preference. It is important that these two chosen movies were not used to build the user's

COP-net. For example, in Example 6.1, the preference 5 >- 6 is used to build the COP- net, so it cannot be selected to test. The preference between the above two selected movies is predicted by the three methods developed in this thesis. The predicted preference is compared with the true preference to see if they match. If they match, then the prediction is correct; otherwise, it is incorrect. The above method is iterated several times, and the frequency of correct predictions of preferences is recorded.

6.2.2 Experimentation Method Two

6.2.2.1 Experimental Data

The training data set and the testing data set for experimentation method two also come from the MovieLens website. The range of training users are from user 1 to user

743, and the range of testing users are from user 744 to user 943. The processing of the data sets is the same as done in experimentation method one: the 512 most frequently rated movies are selected, and are renumbered by the frequency of rating.

6.2.2.2 Experimental Design

The purpose of the second experimentation method was to try an alternative way of generating partial preference relations for the test data. The method for generating partial preference relations for the training data is the same as is done in the first experiment, and is re-stated briefly here. Initially, movies that were not rated by a user

77 are predicted by performing collaborative filtering, and each user's known ratings and predicted ratings for movies are sorted. A number of known preferences is generated,

COP-nets are constructed, and utilities of movies are calculated. All training users are partitioned into clusters by measuring the Euclidean distances between any pair of them.

To determine preference relations for test users in experimentation method two, collaborative filtering is applied. However, it is applied in such a way that will not unfairly bias the results. This is done by initially choosing which preferences will be predicted in testing. Ratings relevant to these preferences are then removed before the collaborative filtering is done. Specifically, consider 5 ratings chosen to be removed from the original testing data set. For example, for user 744, the five removed ratings may be as follows:

744: 11 :5

744 : 50 : 3

744 : 83 : 3

744 : 278 : 4

744 : 107 : 5

Each line follows the format user : renumbered movie : rating. When these movies' ratings are removed, it means that they are now unknown. All unknown ratings are inferred by the collaborative filtering technique, and both known and predicted ratings of each user are ranked. Preferences are chosen when building the COP-nets as before.

When choosing two movies to test, at least one must be selected from the movies for which the ratings were removed. For example, user 744 rated movies 11, 107, 15, 102,

..., 10, 84. Assume that the movies that were removed are those shown above. Thus, one possible pair of testing movies is 50 and 107. The predicted preferences are compared

78 with the true preferences to see if they match. If they match, then the prediction is correct; otherwise, it is incorrect. This process is iterated several times, and how often the prediction of preferences is correct is recorded.

To summarize, the experiments are mainly composed of three phases:

1) utilities for training users' possible outcomes are calculated by the rankings of movies and a specified number of known preferences. In particular, the rankings are generated by applying the collaborative filtering technique to predict ratings for unrated movies;

2) utilities for test users' possible outcomes are calculated by the rankings of movies and the designated number of known preferences as well. However, in experiment method one, preferences used to construct the COP-net only come from pairs of rated movies. No collaborative filtering is done to artificially create preferences over unrated movies. In experiment method two, 5 ratings for each user are randomly removed, and the collaborative filtering technique is performed over the rest of the ratings to predict the ratings for unrated movies. These 5 ratings are then used for testing.

3) each test user is classified into one of the clusters formed by training users, and the preferences between two movies are predicted by the Direct Inference Based on

Clustering method, the Pre-processing Inference Based on Clustering method and the

Post-processing Inference Based on Clustering method, respectively. These preferences are compared with their true preferences, and the accuracy is recorded.

79 6.3 Analysis of Results

6.3.1 Analysis of Experiment Method One

In order to assess the accuracy of the three techniques, an analysis is performed to evaluate the performance of each. Each technique predicts preferences based on a small number of known preferences. In the experiments in this thesis, the number of known preferences is set to 10, 30, and 50. There are 200 users used to test, and each user is tested 5 times, so there are 1000 trials for each method in total. The performance for the Direct Inference Based on Clustering method is shown in Table 6.3.

Number of Number of Number of Number of Success Prediction known correct incorrect unknown rate rate preferences predictions predictions preferences

10 323 106 571 75.29% 42.9%

30 437 179 384 70.94% 61.6%

50 471 200 329 70.19% 67.1%

Table 6.3 Performance of the Direct Inference Based on Clustering method

When the number of known preferences is 10, in 1000 predictions, the number of correct predictions is 323, and the number of incorrect predictions is 106. Thus, the success rate of prediction is 323 / (323 + 106) * 100% = 75.29%. The difference between the number of correct predictions and incorrect predictions is found to be statistically significant using the sign test (p < 1.88xl0~25). The same conclusion can be drawn for the other two cases. When the number of known preferences is 30 and 50, the differences between the number of correct predictions and incorrect predictions are

80 found to be statistically significant using the sign test (p< 4.07x10 when 30 preferences are known, and p< 1.99x10"25 when 50 preferences are known). As the number of known preferences increases, more and more preferences are predicted (as indicated by the prediction rate in Table 6.3), and an increase in the number of correct predictions is seen. However, the number of incorrect predictions also increases. As a result, the successful percentage of predictions decreases. This is understandable, because clustering is performed only once based on the whole data set. When a new user is classified into one of the clusters, some of the partial preference relations in this cluster may be very different from the new user's preference relation. This is likely to negatively affect the results of prediction.

The performance for the Pre-processing Inference Based on Clustering method is shown in Table 6.4.

Number of Number of Number of Number of Success Prediction known correct incorrect unknown rate rate preferences predictions predictions preferences

10 192 91 717 67.84% 28.3%

30 310 99 591 75.79% 40.9%

50 219 67 714 76.57% 28.6%

Table 6.4 Performance of the Pre-processing Inference Based on Clustering method

When the number of known preferences is 10, in 1000 predictions, the number of correct predictions is 192, and the number of incorrect predictions is 91. Thus, the success rate of prediction is 192 / (192 + 91) * 100% = 67.84%. The difference between the number of correct predictions and incorrect predictions is found to be statistically

81 significant using the sign test (p< 2.78x10 9). The same conclusion can be drawn when the number of known preferences is 30 and 50. The differences between the number of correct predictions and incorrect predictions are found to be statistically significant using the sign test (/?<3xl0~25 when 30 preferences are known, and p < 4.37 xl0~19 when 50 preferences are known). As the number of known preferences increases, both the number of correct predictions and the number of incorrect predictions increase at first, and then decrease. The prediction rate increases at first, and then decreases as well. The successful percentage of prediction always increases.

The performance for the Post-processing Inference Based on Clustering method is shown in Table 6.5.

Number of Number of Number of Number of Success Prediction known correct incorrect unknown rate rate preferences predictions predictions preferences

10 186 71 743 72.37% 25.7%

30 303 91 606 76.90% 39.4%

50 363 78 559 82.31% 44.1%

Table 6.5 Performance of the Post-processing Inference Based on Clustering method

When the number of known preferences is 10, in 1000 predictions, the number of correct predictions is 186, and the number of incorrect predictions is 71. Thus, the success rate of prediction is 186 / (186 + 71) * 100% = 72.37%. Similarly, the difference between the number of correct predictions and incorrect predictions is found to be statistically significant using the sign test (p < 1.16 xlO"12). Additionally, the sign test

82 also demonstrates that the differences between the number of correct predictions and incorrect predictions are statistically significant as follows: p < 2.21 xlO"26 when 30 preferences are known, and p<\. 18x10^" when 50 preferences are known. As the number of known preferences increases, the number of correct predictions increases as well. The number of incorrect predictions increases at first, and then decreases as well.

The success rate of prediction and the prediction rate always seems to increase.

The performance of these three techniques is shown in Figure 6.1, where ml represents the Direct Inference Based on Clustering method, m2 represents the Pre­ processing Inference Based on Clustering method, and m3 represents the Post­ processing Inference Based on Clustering method.

GO -i

Rfl - •"••:' N

..•,•••-••: A.-.:-' —•—~r"^.J&r? r~ : -.' • • ."-:• -.1 (0 7^ - -m1 cc -m2 (O ^< ^"~-^ (0 •-:' i d> /U -i —6- -m3 u _^S : •••-- •'.-. ..-. ...v-j 3 CO 65 -

•..; - " - ' •' \ 60 ^—— •—:—H "- i ' :—i — ' —"—1 10 30 50 The Number of Known Preferences

Figure 6.1 Performance of three techniques

From the above figure, when the number of known preferences is very small, the

Direct Inference Based on Clustering method is the best method among these three techniques. However, as the number of known preferences increases, the accuracy of the

83 Direct Inference Based on Clustering method decreases. When the number of known preferences is relatively high, the Post-processing Inference Based on Clustering method is the best method among three techniques, with its accuracy increasing with the number of known preferences.

6.3.2 Analysis of Experiment Method Two

In experiment method two, there are 200 users to test as well, and each user is tested 5 times, giving 1000 trials in total for each method. The performance for the

Direct Inference Based on Clustering method is shown in Table 6.6.

Number of Number of Number of Number of Success Prediction known correct incorrect unknown rate rate preferences predictions predictions preferences

10 329 113 558 74.43% 44.2%

30 446 175 379 71.81% 62.1%

50 499 186 315 72.85% 68.5%

Table 6.6 Performance of the Direct Inference Based on Clustering method

When the number of known preferences is 10, the number of correct predictions is 329, and the number of incorrect prediction is 113. Thus, the success rate is 329 / (329

+ 113)* 100% = 74.43%. The difference between the number of correct predictions and incorrect predictions is statistically significant. When the number of known preferences is 10, 30, and 50, respectively, the corresponding jo-values are 1.54xl0~24, 2.41 xl0~27, and 9.49 xl0~33. As the number of known preferences increases, the number of correct

84 predictions increases as well. In the meantime, the successful percentage of prediction decreases first, and then increases. The prediction rate always seems to increase.

The performance for the Pre-processing Inference Based on Clustering method is shown in Table 6.7.

Number of Number of Number of Number of Success Prediction known correct incorrect unknown rate rate preferences predictions predictions preferences

10 134 80 786 62.62% 21.4%

30 200 139 661 58.99% 33.9%

50 221 141 638 61.05% 36.2%

Table 6.7 Performance of the Pre-processing Inference Based on

Clustering method

When the number of known preferences is 10, in 1000 predictions, the number of correct predictions is 134, and the number of incorrect predictions is 80. Thus, the success rate of prediction is 134 / (134 + 80) * 100% = 62.62%. Based on the statistical analysis using the sign test, the difference between the number of correct predictions and incorrect predictions is found to be significant (p< 0.000291). Similarly, when the number of known preferences is 30 and 50, respectively, the sign test shows that the differences between the number of correct predictions and incorrect predictions are statistically significant due to p < 0.00112 and p < 3.3 x 10~5. As the number of known preferences increases, the number of correct predictions increases, and at the same time the number of incorrect predictions increases as well. The successful percentage of prediction decreases at first, and then increases. The prediction rate always seems to increase.

85 The performance for the Post-processing Inference Based on Clustering method is shown in Table 6.8.

Number of Number of Number of Number of Success Prediction known correct incorrect unknown rate rate preferences predictions predictions preferences

10 260 142 598 64.68% 40.2%

30 380 192 428 66.43% 57.2%

50 390 226 384 63.31% 61.6%

Table 6.8 Performance of the Post-processing Inference Based on Clustering method

In 1000 predictions, when the number of known preferences is 10, the number of correct predictions is 260, and the number of incorrect predictions is 142. Thus, the success rate of prediction is 260 / (260 + 142) * 100% = 64.68%. The analysis of the sign test shows that when the number of known preferences is 10, the difference between the number of correct predictions and incorrect predictions is statistically significant (p< 5.38xl0~9); when the number of known preferences is 30, the difference is found to be statistically significant (p < 5.38 x 10~15); and when the number of known preference is 50, the /?-value \sp < 3.63x10~10. As the number of known preferences increases, the number of correct predictions increases, and the number of incorrect predictions increases as well. Since the number of incorrect predictions increases faster than the number of correct predictions, the successful percentage of prediction increases first, and then decreases. The prediction rate always seems to increase.

86 The performance of these three techniques is shown in Figure 6.2, where m1 represents the Direct Inference Based on Clustering method, and m2 represents the Pre­ processing Inference Based on Clustering method, and m3 represents the Post­ processing Inference Based on Clustering method.

10 30 50 The Number of Known Preferences

Figure 6.2 Performance of three techniques

A conclusion can be drawn from the above figure: the Direct Inference Based on

Clustering method is the best technique in experiment method two. In detail, when the number of known preferences is identical, it can predict more preferences than the other two techniques, and it also has the highest accuracy all the time.

Based on the experimental results and analysis, the performances of the three techniques are compared in the same experimentation method, as well as between experimentation method one and experimentation method two. Hence, the following conclusions can be drawn:

(1) Experimentation method two did not work as well as experimentation method one.

87 (2) For experimentation method one, the Post-processing Inference Based on

Clustering method is always best for more specified preferences.

(3) The Direct Inference Based on Clustering method seems to be better with fewer specified preferences.

(4) The Post-processing Inference Based on Clustering method is always better than the Pre-processing Inference Based on Clustering method.

According to the methodology, the Post-processing Inference Based on

Clustering method predicts users' preferences by directly observing the cluster where users belong. If the number of known preferences is limited, users have a high probability that they are classified into a cluster where the desired preferences are still unknown. This is the reason why the Post-processing Inference Based on Clustering method is best when the number of known preferences is high. Also, partial preference relations are partitioned twice in the Post-processing Inference Based on Clustering method, and new users are classified twice as well. Thus, the result of classification is more accurate, and the result of prediction is more accurate. This is the reason why the

Post-processing Inference Based on Clustering method is better than the Pre-processing

Inference Based on Clustering method.

88 Chapter 7 Conclusions and Future Work

7.1 Conclusions

The main objective of this thesis is to predict users' preferences over a set of outcomes based on a small number of known preferences. This is significant in many application areas where decision making is a critical component. Consider a recommendation system, for example. If such a system has a sufficient model of users' preferences, it can recommend the most appropriate products for each particular user.

Three techniques, the Direct Inference Based on Clustering method, the Pre­ processing Inference Based on Clustering method and the Post-processing Inference

Based on Clustering method, were developed in order to predict users' preferences.

These techniques utilize different methods to predict unknown preferences, based on a set of known preferences and given utilities. The basic idea of the three techniques is to partition preferences of users in an existing database into clusters, and to draw some general conclusions based on the cluster in which a particular new user is classified. In order to do clustering, a distance measure between two data points is needed. In this thesis, a data point is represented by a partial preference relation. For each partial preference relation, utilities are estimated for all outcomes according to research related to preference networks and partial preferences. These outcomes' utilities are then used to form a vector, which indicates the co-ordinates of the preference relation in n- dimensional space. The distance between two preference relations is then computed as the Euclidean distance between the two corresponding vectors. The Y-means clustering

89 algorithm is employed in the clustering procedure, allowing for the number of clusters formed to be determined dynamically.

Three techniques for clustering and preference prediction which incorporate the

Y-means algorithm are developed. In the first technique, partial preference relations are grouped by the Y-means clustering algorithm, and a new user is classified. Preferences between any pair of outcomes are inferred by examining the average utilities of the outcomes for the members of the cluster. For the second technique, given a pair of outcomes (a, b) for which the preference is to be estimated, partial preference relations are partitioned into three groups: those where a>b, those where b> a, and those where the preference is unknown. The Y-means clustering algorithm is then performed within each of the above three clusters. Preferences are inferred by directly observing into which sub-cluster the new user is classified. With regard to the third technique, all partial preferences are directly grouped by the Y-means clustering algorithm, and a new user is classified into one of these clusters. For this particular cluster, all preferences are clustered into sub-clusters according to the three possible relations over outcomes for the desired preference as discussed above. The new user is classified again, and preferences are predicted by examining the sub-cluster to which the new user belongs.

Two experiments were designed to test each of the above techniques, and experimental data were collected from real users. The goal of the experiments is to find the optimal method for predicting preferences. Based on a number of comparisons among the three techniques, it was found that the Direct Inference Based on Clustering method performs well when the number of known preferences is very few, and the Post­ processing Inference Based on Clustering method performs best when the number of known preferences is relatively large. In all situations, it appears that the Post-processing

90 Inference Based on Clustering method always outperforms the Pre-processing Inference

Based on Clustering method.

7.2 Future Work

In real life, users' preferences are sometimes complicated, and typically only a very small number of preferences can be known in advance. This thesis shows that each of the aforementioned techniques is able to predict users' preferences under both of these difficult conditions. Based on current results from the experiments, some extensions to the research can be proposed. This section describes several prospects for future work.

The first suggestion is in terms of clustering methods. By investigating the three techniques carefully, it is found that the accuracy of the predicted preference is greatly dependent on the clustering methods. If the results from clustering are more reliable, the preference prediction should have a higher accuracy. Presently, there are many existing sophisticated clustering methods, which are introduced in Chapter 2. As a suggestion for further research, it would be interesting to try to employ other clustering methods besides the Y-means clustering algorithm in the three techniques.

Another suggestion for future research deals with the experimental data. In this thesis, preferences are learned from data from real users. Hence, the experimental results reflect that the three techniques can be applied in the real world. While the data set used in this thesis provides a sufficiently large number of users and movies, the number of rating levels for each movie is not sufficient. At present, there are only 5 possible ratings for each movie. It would be interesting to see how the results would be affected if there

91 were more. More complicated preference relations could then be generated, allowing the users to specify a stronger set of preferences. In the future, other databases which contain data on pairwise preferences, rather than ratings, should be found and applied.

This thesis proposed three possible solutions for predicting preferences, and the corresponding experiments conducted confirm that the methods are effective, feasible and practical. These techniques thus provide a novel, rational contribution to the preference elicitation research area. Moreover, they can help agents to learn a user's preferences, while limiting the need for user interaction.

92 Bibliography

[1] Daniel Bernoulli. Exposition of a new theory on the measurement of risk.

Econometrica, pages 23-36, 1954.

[2] K. Binmore and N. Vulkan. Applying to automated negotiation. In

DIMACS Workshop on Economics, Game Theory and the Internet. Rutgers University, pages 1-9, New Brunswick, NJ, 1997.

[3] C. Boutilier, R. Brafman, C. Geib, and D. Poole. A constraint-based approach to preference elicitation and decision making. In AAAI Spring Symposium on Qualitative

Preferences in Deliberation and Practical Reasoning, pages 19-28, 1997.

[4] John S. Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the first Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43-52, San Francisco, 1998.

[5] S. Buffett and B. Spencer. Learning Opponents' Preferences in Multi-Object

Automated Negotiation. Seventh International Conference on Electronic Commerce

(ICEC'05), pages 300-305, Xi'an, China. August 15-17, 2005.

[6] Urszula Chajewska, Daphne Roller, and Ronald Parr. Making rational decisions using adaptive utility elicitation. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 363-369, Austin, TX, 2000.

[7] L. Chen and P. Pu. Survey of Preference Elicitation Methods. EPFL Technical

Report IC/2004/67, Switzerland, 2004.

[8] Shaoju Chen. Reasoning with Conditional Preferences across Attributes. Master thesis, University of New Brunswick, 2006.

93 [9] S. Chen, S. Buffett, and M. Fleming. Reasoning with Conditional Preferences across

Attributes. The 20th Canadian Conference on Artificial Intelligence, pages 369-380,

2007.

[10] Wikipedia contributors. Cluster analysis. Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Cluster_analysis&oldid=l 17287916, 2007.

[11] Wikipedia contributors, Euclidean distance, Wikipedia, The Free Encyclopedia, http://en.wikipedia.Org/w/index.php?title=Euclidean_distance&oldid= 128741134, 2007.

[12] Wikipedia contributors, Collaborative filtering, Wikipedia, The Free encyclopedia, http://en.wikipedia.org/w/index.php?title=Collaborative_filtering&oldid=144789495,

2007.

[13] Wikipedia contributors, Ceteris paribus, Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Ceteris_paribus&oldid=146175176,2007.

[14] Wikipedia contributors, Negotiation, Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Negotiation&oldid=152692289, 2007.

[15] Wikipedia contributors, Preference, Wikipedia, The Free Encyclopedia, http://en.wikipedia.Org/w/index.php?title=Preference&oldid=149555586, 2007.

[16] Wikipedia contributors, Standard error (statistics), Wikipedia, The Free

Encyclopedia, http://en.wikipedia.org/w/index.php?title=Standard_error_%28statistics

%29&oldid=150197168, 2007.

[17] Wikipedia contributors. Electronic commerce, Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Electronic_commerce&oldid= 153186757,

2007.

[18] Wikipedia contributors. Cluster analysis. Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Cluster_analysis&oldid=151711092, 2007.

94 [19] J. Davis, W. Hands, and U. Maki. Handbook of Economic Methodology. Edward

Elgar, London, 1997.

[20] Valerie J. Easton and John H. McColl. Statistics Glossary. Lancaster University, http://www.cas.lancs.ac.uk/ (accessed September 24, 2007).

[21] P. Pu, B. Faltings, and M. Torrens. User-Involved Preference Elicitation.

Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence

(IJCAI'03), 2003.

[22] H. R. Gibson. Elementary Statistics. Wm. C. Brown Publishers, Dubuque, Iowa,

1994.

[23] Y. Guan, A. Ghorbani and N. Belacel. Y-means: A Clustering Method for Intrusion

Detection. Proceedings of Canadian Conference on Electrical and Computer

Engineering, pages 1083-1086, Montreal, Quebec, Canada, 2003.

[24] Vu Ha and Peter Haddawy. Toward case-based preference elicitation: Similarity measures on preference structures. In Proceedings of the Fourteenth Conference on

Uncertainty in Artificial Intelligence, pages 193-201,Madison,WI, 1998.

[25] Vu Ha and Peter Haddawy, Similarity of personal preferences: Theoretical foundations and empirical analysis. Artificial Intelligence, pages 149-173, 2003.

[26] J. Han, and M. Kamber. Data Mining: Concepts and Techniques. Morgan

Kaufmann Publishers, USA, 2001.

[27] Bo He, Yuan Chen, Xianying Huang, and Wu Yang. Learning Mechanism of

Automated Negotiation in E-commerce. Intelligent Control and Automation, pages

4152^156,2006.

95 [28] N. R. Jennings, P. Faratin, A.R. Lomuscio, S. Parsons, M.J. Wooldridge, and C.

Sierra. Automated Negotiation: Prospects, Methods and Challenges. Group Decision and Negotiation, pages 199-215, 2001.

[29] L. Ralph Keeney and Howard Raiffa. Decisions with Multiple Objectives:

Preferences and value tradeoffs. Cambridge University Press, 1976.

[30] A. R. Lomuscio, M. Wooldridge and N. R. Jennings. A Classification Scheme for

Negotiation in Electronic Commerce. Group Decision and Negotiation, pages 31-56,

2003.

[31] Michael Maaser and Peter Langendoerfer. Automated Negotiation of Privacy

Contracts. Proceedings of the 29th Annual International Computer Software and

Applications Conference (COMPSAC05), Pages 505-510, 2005.

[32] Andrew Moore. K-means and Hierarchical Clustering - Tutorial Slides. http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/hierarchical.html

(accessed September 24, 2007).

[33] J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior.

Princeton, N.J., Princeton University Press, 2nd edition, 1947.

[34] John Riedl, Joseph Konstan, and Loren Terveen. MovieLens, GroupLens Research,

University of Minnesota, http://movielens.umn.edu, 2005.

[35] Hossein Sharifi. Rate Your Music, http://rateyourmusic.com/, 2000.

[36] Ming-Yu Tsai, Li-Chen Fu, and Ta-Chiun Chou. Automatic Negotiation with

Mediated Agents in E-commerce Marketplace. Proceedings of the 2005 IEEE

International Conference on e-Technology, e-Commerce and e-Service, pages 50-53,

2005.

96 [37] R. E. Walpole. Elementary Statistical Concepts, 2nd edition. Macmillan, New

York, 1983.

[38] Tianhan Wang and Craig Boutilier. Incremental utility elicitation with the minimax regret decision criterion. In Proceedings of the Eighteenth International Joint

Conference on Artificial Intelligence, Acapulco, 2003.

[39] M. Wooldridge. Agent-based software engineering. IEE Proceedings Software

Engineering, pages 26-37, 1997.

97 Curriculum Vitae

Candidate's Full Name:

Mian Qin

Universities Attended:

Beijing Technology and Business University, 1998 - 2002

Bachelor of Science, 2002

University of New Brunswick, 2005 - 2007