Bachelor Thesis

A Trust System for the openSUSE Build Service

Saarland University Faculty of Natural Sciences and Technology I Department of Computer Science

submitted by Marko Jung on 17th April 2009

Supervisor Professor Dr.-Ing. Thorsten Herfet Advisor Dr. Michael Schröder

Reviewers Professor Dr.-Ing. Thorsten Herfet Professor Dr. Joachim Weickert

Statement under Oath & Declaration of Consent

I hereby confirm under oath that I have written this thesis on my own and that I have not used any other media or materials than the ones referred to in this thesis. I agree to make both versions of my thesis (with a passing grade) accessible to the public by having them added to the library of the Computer Science Department.

Saarbrücken, 17th April 2009.

Marko Jung

Contents

1 Introduction 1 1.1 openSUSE Build Service ...... 1 1.2 Trust ...... 3 1.2.1 Definitions of Trust ...... 4 1.2.2 Properties of Trust ...... 5 1.2.3 Trust Metrics ...... 6 1.3 Aim of the Study ...... 7 1.4 Outline of the Thesis ...... 7

2 Notation and Terminology9 2.1 Notation ...... 9 2.2 Terminology ...... 9

3 State of the Art 11 3.1 Classification of Trust Metrics ...... 12 3.1.1 Network Perspective ...... 12 3.1.2 Computation Locus ...... 13 3.1.3 Link Evaluation ...... 13 3.1.4 Selection Method ...... 14 3.2 Trust Metrics ...... 14 3.2.1 PageRank ...... 15 3.2.2 EigenTrust ...... 16 3.2.3 Advogato ...... 19 3.2.4 TidalTrust ...... 21 3.2.5 Appleseed ...... 23

4 Architecture 29 4.1 openSUSE Build Service Terminology ...... 29 4.2 Major Components of the openSUSE Build Service ...... 31 4.3 Trust for Software Packages ...... 34 4.4 User-specific Trust ...... 35

v 4.5 Design of the openSUSE Trust Server ...... 36 4.5.1 Management of Trust Relations ...... 37 4.5.2 Management and Storage of Trust Formulae ...... 39 4.5.3 Solving of Trust Formulae using Appleseed ...... 40

5 Validation 43 5.1 Artificial Networks ...... 43 5.1.1 Random Graphs ...... 43 5.1.2 Small-world Model ...... 48 5.1.3 Model of Barabási and Albert ...... 52 5.2 Advogato Real-world Network ...... 58 5.2.1 Data Sets ...... 59 5.2.2 Leave-one-out Cross-validation ...... 61

6 Discussion 65 6.1 Computation of Trust using the Appleseed Trust Metric ...... 65 6.2 Validation using artificially generated Networks ...... 67 6.3 Validation using the Advogato Data Set ...... 68

7 Outlook 71

A Mathematical Symbols and Functions 73

B Appleseed Example 75

C Trust Metric Algorithms 77 C.1 The PageRank Algorithm ...... 77 C.2 EigenTrust Algorithms ...... 78 C.2.1 Simple non-distributed EigenTrust ...... 78 C.2.2 Basic EigenTrust ...... 78 C.2.3 Distributed EigenTrust ...... 79 C.3 Advogato ...... 80 C.4 TidalTrust ...... 81 C.5 Appleseed ...... 83

D Further Simulations using the Small-world Model 85 D.1 Rewiring Probability p=0.30 ...... 86 D.2 Rewiring Probability p=0.45 ...... 88 D.3 Rewiring Probability p=0.60 ...... 90 D.4 Rewiring Probability p=0.75 ...... 92 D.5 Rewiring Probability p=0.90 ...... 94

Acknowledgements 97

Bibliography 99 List of Figures

1.1 openSUSE Build Service web-client ...... 2

3.1 Properties of Trust Metrics ...... 12 3.2 Scalar vs Group Trust Metrics ...... 13 3.3 PageRank: Simplified PageRank Calculation ...... 15 3.4 Advogato: Graph conversion ...... 20 3.5 Advogato: Calculation of the Network Flow ...... 22 3.6 TidalTrust: Determination of the Trust Threshold ...... 22 3.7 Appleseed: Node Chains ...... 24 3.8 Appleseed: Rank Sinks ...... 24 3.9 Appleseed: Normalisation Issue ...... 24 3.10 Appleseed: Backward Propagation ...... 24 3.11 Appleseed: Distribution of Trust and Distrust ...... 27

4.1 Example for an openSUSE Build Service Project ...... 30 4.2 Major Components of the openSUSE Build Service ...... 32 4.3 openSUSE Trust Service Web-Interface listing Trust Relations ...... 37 4.4 openSUSE Trust Service Web-Interface presenting a Trust Value ...... 41

5.1 Random Graphs: Example ...... 44 5.2 Random Graphs: General statistics ...... 46 5.3 Random Graphs: Maximal distributed Trust ...... 47 5.4 Random Graphs: Maximal distributed Trust vs discovered Nodes ...... 47 5.5 Small-world Model: Example ...... 48 5.6 Small-world Model: General Statistics ...... 50 5.7 Small-world Model: Maximal distributed Trust ...... 51 5.8 Small-world Model: Maximal distributed Trust vs discovered Nodes ...... 51 5.9 Model of Barabási and Albert: Example ...... 53 5.10 Model of Barabási and Albert using linear preferential Attachment: General Statistics ...... 54 5.11 Model of Barabási and Albert using quadratic preferential Attachment: General Statistics ...... 55 5.12 Model of Barabási and Albert using linear preferential Attachment: Maximal distributed Trust ...... 56 5.13 Model of Barabási and Albert using linear preferential Attachment: Maximal distributed Trust vs discovered Nodes ...... 56 5.14 Model of Barabási and Albert using quadratic preferential Attachment: Maximal distributed Trust ...... 57 5.15 Model of Barabási and Albert using quadratic preferential Attachment: Maximal distributed Trust vs discovered Nodes ...... 57 5.16 Advogato Data Set: Maximal distributed Trust vs discovered Nodes ...... 59 5.17 Advogato Data Set: General Statistics ...... 60 5.18 Advogato Data Set: Histograms for general Statistics ...... 60 5.19 Advogato Data Set: ROC Plots ...... 63 5.20 Advogato Data Set: Sensitivity vs Specificity Plot and Recall vs Precision Plot . . 63

B.1 Appleseed: Test Network ...... 75

D.1 Small-world Model: General Statistics (p = 0.30) ...... 86 D.2 Small-world Model: Maximal distributed Trust (p = 0.30) ...... 87 D.3 Small-world Model: Maximal distributed Trust vs discovered Nodes (p = 0.30) 87 D.4 Small-world Model: General Statistics (p = 0.45) ...... 88 D.5 Small-world Model: Maximal distributed Trust (p = 0.45) ...... 89 D.6 Small-world Model: Maximal distributed Trust vs discovered Nodes (p = 0.45) 89 D.7 Small-world Model: General Statistics (p = 0.60) ...... 90 D.8 Small-world Model: Maximal distributed Trust (p = 0.60) ...... 91 D.9 Small-world Model: Maximal distributed Trust vs discovered Nodes (p = 0.60) 91 D.10 Small-world Model: General Statistics (p = 0.75) ...... 92 D.11 Small-world Model: Maximal distributed Trust (p = 0.75) ...... 93 D.12 Small-world Model: Maximal distributed Trust vs discovered Nodes (p = 0.75) 93 D.13 Small-world Model: General Statistics (p = 0.90) ...... 94 D.14 Small-world Model: Maximal distributed Trust (p = 0.90) ...... 95 D.15 Small-world Model: Maximal distributed Trust vs discovered Nodes (p = 0.90) 95

List of Tables

3.1 Trust Metric Properties ...... 14

5.1 Random Graphs: Constructed models ...... 45 5.2 Small-world Model: Constructed Models ...... 49 5.3 Model of Barabási and Albert: Constructed Models ...... 52 5.4 Model of Barabási and Albert: Run Times ...... 53 5.5 Advogato Data Set: Run Times ...... 62 5.6 Evaluation of Trust Metrics on the Advogato Data Set ...... 64

B.1 Appleseed: Trust Distribution in the Test Network ...... 76

List of Listings

4.1 XML Representation of the Trust Relations of a User ...... 38 C.1 PageRank Algorithm ...... 77 C.2 Simple non-distributed EigenTrust Algorithm ...... 78 C.3 Basic EigenTrust Algorithm ...... 78 C.4 Distributed EigenTrust Algorithm ...... 79 C.5 Advogato Algorithm ...... 80 C.6 TidalTrust Algorithm ...... 81 C.7 Appleseed Algorithm ...... 83 Acronyms

AMD64 New name for the x86-64 instruction set (see below).

API Application Programming Interface is a set of routines, data structures, object classes and protocols provided in order to support the building of applications.

ARM Advanced RISC Machine (prior Acorn RISC Machine) is a in embedded systems widely used 32-bit RISC processor architecture. chroot Change root prison. A chroot on Unix operating systems is an operation that changes the apparent disk root directory for the current running process and its children.

CRUD Create, retrieve, update and destroy (CRUD) are the four basic functions of persistent systems.

DEB is the extension of the Debian package format. dsc Debian source file describing a Debian source package.

FTP File Transfer Protocol is one of the oldest network protocols to exchange files over a network.

GNU GNU’s Not Unix is a operating system consisting only of free software. This name also sometimes refers to the GNU Project.

GPG GNU Privacy Guard is a free software alternative to the PGP cryptographic software.

HTTP Hypertext Transfer Protocol is a stateless network protocol initially intended to transfer hypermedia information such as in the World Wide Web. i586 synonym for the Pentium®instruction set mostly used to avoid this trademark.

IR Information Retrieval the science of searching for information in or as documents or databases.

MAE Mean absolute error is in statistics a quantity used to measure how close forecasts or predictions are to the eventual outcomes.

MD5 Message-Digest algorithm 5 is a commonly used cryptographic hash function which a 128-bit hash value. It is standardised in RFC 1321.

ix osc openSUSE Client is the official command line client for the openSUSE Build Service. obs openSUSE Build Service

P2P Peer-to-Peer is a technology for computer networks that accumulates the connection and bandwidth of network participants rather than the conventional centralised client-server model.

PGP Pretty Good Privacy is a proprietary computer program used to encrypt and decrypt data.

POSIX Portable Operating System Interface for Unix is the name for a collection of standards to specify the application programming interface of the Unix operating system [The Open Group, 2004].

REST Representational State Transfer is a common architecture for designing web services or other hypertext applications, see Fielding [2000, chapter 5].

RISC Reduced Instruction Set Computer categorises a processor design which follows the principle that simplified instructions may gain higher performance and save resources because they can be executed more efficiently.

RMSE Root mean squared error is a measure of the differences between values predicted by a model and the values observed from the thing being modelled.

ROC Receiver Operating Characteristic is a graphical plot of the sensitivity vs. (1 - specificity).

RPM RPM Package Manager is one of the major package management systems on the platform. spec file specification fileused to instruct RPM how to create a RPM file using some sources.

TAR derived from tape archive is a file format to collect many files into one larger file. Stan- dardised in POSIX.1-1988 and later revisions.

URI Uniform Resource Identifier is used to identify a resource on the Internet.

URL Uniform Resource Locator is a special version of an Uniform Resource Identifier to specify where a resource can be retrieved in the Internet. x86-64 x86-64 is a super-set of the x86 instruction set architecture being able to run 16-bit, 32-bit and 64-bit x86 programs.

XML Extensible Markup Language is a general specification for the creation of markup lan- guages. Chapter 1

Introduction

Software and product development in the surrounding field of open source follows different rules than the ones in proprietary software development [Raymond, 2001; Stewart et al., 2005]. Due to the openness of the whole process, the ability to incorporate contributors from even outlying corners of the world with various knowledge, education, skill-levels and culture, as well as the philosophy behind traditional open source projects, trust has become a precious commodity among project participants. Even if trust and accountability are invisible and intangible resources, projects only seem to function well if these mechanisms for social control are in place [Scacchi, 2007]. The openSUSE project is ’s effort to open the development of the SUSE Linux products to a broader audience of users and developers. Although, based on the definition of open source [Open Source Initiative, 1998] or free software [Stallman and Gay, 2002], respectively, the software development was already open for nearly all components of the SUSE products, the distribution development process was done in-house with some beta testing by selected customers under non-disclosure agreements. SUSE Linux 10.0 was the first version having a public beta test, allowing users to support Novell engineers by reporting bugs and submitting fixes for them. Starting with this release, Novell increased its efforts to open the whole product development process by either providing interfaces to internal tools, like openFATE (http://features.opensuse.org), enabling openSUSE contributors to suggest and vote on new features for upcoming releases, or creating new tools to empower external developers to contribute.

1.1 openSUSE Build Service

SUSE engineers identified that an open and extensible infrastructure for creating software packages and even full distributions plays a crucial role in the formation of a vital community around their products. Hence, they decided to develop a new system, the openSUSE Build Service (obs), in order to build their products having ease of use, transparency and accessi- bility in mind. Since its first public release under the GNU General Public License version 2 Chapter 1. Introduction

Figure 1.1: The openSUSE Build Service web-client shows the ’openSUSE.org tools’ project containing several software packages as well as their build state.

2 [Free Software Foundation, 1991] in February 20061, the openSUSE Build Service evolved to a complete distribution development platform, which is developed by SUSE engineers in co-operation with a few external contributors. Due to the fact that the build system is one of a distributors mission critical infrastructure components, SUSE emphasises its commitment to open source by encouraging external contributions to the build system. Although the developer community around the openSUSE Build Service is commendably concise, several important features have been introduced by openSUSE community members, for example support for embedded devices running on the ARM architecture or 1-click installations. With the public availability of openSUSE 11.1 in December 2008, SUSE released its first product fully built with the openSUSE Build Service. This retires AutoBuild, the first full automated system to build Linux distributions used in a corporate environment, which has been used for more than eleven years. All features required for building own products have been included in the 1.5 release of the openSUSE Build Service that has been announced March 19th, 20092. Now it is possible to build entire releases within an build service instance and generate installation media like DVD images, ftp trees or Live CDs. In addition, installable USB sticks and images suitable for several virtualisation platforms can be created with the system. The openSUSE Build Service offers everybody the opportunity to build packages for many Linux distributions (including openSUSE, SUSE Linux Enterprise, Fedora, RedHat, Debian

1http://lists.opensuse.org/opensuse-announce/2007-01/msg00002.html 2http://lists.opensuse.org/opensuse-announce/2009-03/msg00019.html 1.2. Trust 3

and Ubuntu) with relatively little effort. Besides knowledge in software packaging, a valid email address is the only requirement for the usage of the public openSUSE Build Service. The service can be accessed by a broad variety of clients including an official supported command line client called osc, a web client (see Figure 1.1) and several graphical interfaces such as a plug-in for Eclipse. The elimination of as many barriers as possible in the contribution process for the cre- ation of new packages as well as the improvement of existing ones in the official development tree, openSUSE Factory, brought the build service an impressive growth. At time of writ- ing 62,283 packages in 11,103 repositories maintained by 13,758 users are available in the official openSUSE Build System [Möller et al., 2009]. Its wide recognition is not restricted to openSUSE contributors but also developers from other large open source projects (like KDE or Gnome) and even competing distributions take advantage of it. As a result, the number of available versions and variations per package in the official system is relatively high. Unfortunately one of the major strengths of the product — its openness — is also one of its greatest weaknesses: everyone who exhibits a little technical knowledge is able to introduce a new copy of an existing package including malicious code, a Trojan or even a root kit. This package would be immediately available to the public at the openSUSE software portal and the world-wide content distribution network, because the build service publishes all successfully built packages on user request. Especially users without much technological awareness might accidentally install these unwanted packages using the 1-click installation mechanism, thereby causing serious harm to their computing environment. The main problem seems to arise from the fact that there is no social control mechanism in the packaging process: as no easy way of reviewing who contributed to a package or where its sources came from exists. A first step towards solving this issue is the introduction of a trust system enabling both users and developers to evaluate software from the openSUSE Build Service.

1.2 Trust

Trust is a social phenomenon which is present in every existing society [Baier, 1986; Ya- mamoto, 1990]: when dealing with uncertainty, everybody makes trusting decisions, most of us every single day of our lives and many times per day [Luhmann, 1979]. In fact, societies rely heavily on trust among its members [Cook, 2001; Fukuyama, 1996; Uslaner, 2002] and it has even been argued that we as a human race would not be able to face the complexities of the environment without trusting other people to reason sensibly about the alternative possibilities of our everyday life [Luhmann, 1979]. Bok describes trust as ‘. . . a social good to be protected just as much as the air we breathe or the water we drink. When it is damaged, the community as a whole suffers; and when it is destroyed, societies falter and collapse.’ The concept of trust is a multi-disciplinary subject studied in diverse fields amongst them evolutionary biology [Bateson, 1990], sociology [Luhmann, 1979, 1990], social psychology [Deutsch, 1962], political science, economics [Dasgupta, 1990], history [Gambetta, 1990b; Pagden, 1990], philosophy [Hertzberg, 1988; Lagenspetz, 1992; Wittgenstein, 1977] and com- puter science. The main impulse of work on trust has come from the areas of sociology, social psychology and philosophy [Marsh, 1994]. Due to the prominent role of social trust, it has 4 Chapter 1. Introduction

become an important research topic in many sub fields of computer science: trust is used as motivation for recommender systems and online interaction, for the content selection for Web documents [Blaze et al., 1997], as a descriptor of security and encryption, as a measure for quality in peer-to-peer systems, as a name for authentication methods or digital signa- tures, in medical systems [Blaze et al., 1996], telecomputing, mobile code, mobile computing [Feigenbaum and Lee, 1997; Wilhelm et al., 1998] and electronic commerce [Clark, 1999; Jøsang, 1999; Ketchpel and Garcia-molina, 1996; Su, 1999], as a factor in game theory and as a model for agent interactions [Artz and Gil, 2007; Golbeck, 2006a].

1.2.1 Definitions of Trust

Unfortunately, as trust is a commonly used term, its definitions vary between various disci- plines, persons and in specific contexts, resulting in the fact that trust relationships are never absolute [Deutsch, 1973; Shapiro, 1987]. The different views of trust and the circumstance that all human beings have a different idea of what trust is [Golembiewski and McConkie, 1975], makes it difficult to formalise the concept of trust. Trust, as an invisible and accepted common good that decays with misuse and grows with use [Marsh, 1994], is stated by the Oxford Reference Dictionary as ‘the firm belief in the reliability or truth or strength of an entity’. It is a complex composition of many different attributes, amongst them reliability, dependability, honesty, security, despair, confidence, hope, innocence or impulsiveness, competence, truthfulness and timeliness which may have to be considered depending on the environment in which trust is specified [Deutsch, 1973; Golembiewski and McConkie, 1975; Grandison and Sloman, 2000; Marsh, 1994]. In general, decisions on trust depend on several different factors: the decision whether or not we deem a person to be trustworthy is not only influenced by past experience with the person itself or its friends but also on the reputation of the person, either directly through past personal experience or reported by others through recommendations or third party verification. It is further influenced by our confidence in or our evidence to believe in the person’s good intentions towards ourselves [Yamamoto, 1990] and our opinion on the person’s previous actions. The experience and opinion of others, rumours, our own predisposition to trust formed through previous experiences in our life and the motives to profit by extending our trust, play an important role as well [Golbeck, 2005]. Even if it is easy to recognise manifestations of trust as we experience them every day, trust is hard to define because it comes along in many different forms. As a result of the multi-various human understanding of trust, one of the major problems for the usage of trust in mathematical analysis or algorithms is to find a definition of trust which captures all important social features on the one hand site whilst being simple, clear, focused and narrow enough to allow its modelling and computation in a quantifiable, generalisable way. Thus, despite a common agreement on the importance of trust as well as its nature, adequate definitions and conceptualisations are rare, vague and often not particular useful limiting their usage [Marsh, 1994]. In order to give a reference point, we present some general definitions derived from previous research: one of the most popular and widely accepted definition of trust is the one of Deutsch, who stated that a trusting behaviour occurs when a person A needs to make a decision about an ambiguous path whose implications can be both, good or bad. The impact of the latter one is greater, motivating A to make the correct choice, and the occurrence of the 1.2. Trust 5

result is contingent on the action of another person B. If A chooses to go down the path, A has made a trusting choice, believing that B will ensure the good outcome. Deutsch’s assumption that trust decisions are based on a form of cost-benefit analysis is also present in many other definitions of trust [Golembiewski and McConkie, 1975; Shapiro et al., 1992]. One form of trust is ‘reliability trust’. It is based on the concept of dependence on and the reliability of the trusted party. The definition by Gambetta gives a good example for it: ‘Trust is the subjective probability by which an individual, A, expects that another individual, B, performs a given action on which its welfare depends.’ Furthermore, Gambetta describes trust as a means of coping with the freedom of others: ‘Trust (or symmetrically distrust) is a particular level of the subjective probability with which an agent assesses that another agent or group of agents will perform a particular action, both before he can monitor such action (or independently of his capacity ever to be able to monitor it) and in a context in which it affects his own action.’ A high trust in a person might not always be sufficient to enter a situation of dependence especially in the case of danger as Falcone and Castelfranci realised: ‘For example it is possible that the value of the damage per se (in case of failure) is too high to choose a given decision branch, and this independently either from the probability of failure (even if it is very low) or from the possible payoff (even if it is very high). In other words, that danger might seem to the agent an intolerable risk.’ This statement leads us to another form of trust: ‘decision trust’, defined by McKnight and Chervany as follows: ‘Trust is the extent to which one party is willing to depend on something or somebody in a given situation with a feeling of relative security, even though negative consequences are possible.’ Using these concepts, Sztompka simplified Deutsch’s general idea of trust, in a way such that his definition of trust only contains belief and commitment as key components: ‘Trust is a bet about the future contingent actions of others.’ A person A believes that the trusted person B will act in a certain way and this belief is used as the foundation to make a commitment to a particular action. In contrast to the definitions stated so far, Mui and Mohtashemi introduced the concept of personal trust: trust is ‘a subjective expectation an agent has about another’s future behaviour based on the history of their encounters’. Instead of considering the actions themselves, Grandison and Sloman impose context in their definition of trust as ‘the firm belief in the competence of an entity to act dependably, securely, and reliably within a specified context’. In our work, we will use Gambetta’s definition to compute trust over social networks (a graph with people as nodes and trust relationships as edges) or across paths of trust where two people may not have direct trust information about each other and they must rely on a third person.

1.2.2 Properties of Trust

For the development of algorithms to compute trust, there are three relevant functional properties: personalisation, transitivity and asymmetry [Golbeck, 2006a]. The latter one is important as two people who are involved in a relationship and who mutually trust each other, do not necessarily express identical trust as a result of their different psychological backgrounds and histories. For an example consider the trust between a mother and a child. It is asymmetric as a result from the fact that the child is not capable of many tasks. 6 Chapter 1. Introduction

The second crucial property of trust is its personalisation: trust is inherently a subjective phenomenon as it is the belief that the actions of the trusted person will lead to a good result. Since there is rarely an absolute truth about the trustworthiness of a single person, the values qualifying a good outcome may vary between people with different interests and priorities and thus, the trust two different people A and B express in the same person C may vary greatly. Another relevant characteristic of trust is its transitivity. Often trusting one information or its source requires trusting another associated source. It is not only a feature of trust but also an important characteristic of social networks [Holland and S., 1972; Rapoport, 1963]. Even if trust is not perfectly transitive in a mathematical sense that if person A trusts person B and B trusts person C , then person A will trust person C , trust can still be passed between people. For instance, it is a common procedure to ask friends on their opinion regarding a particular subject, because individuals are more willing to interact with friends of friends than with strangers [Heider, 1958]. Therefore, we can distinguish between the trust we place in the person itself and the trust we express in the person’s recommendations. Despite this dichotomy, it is in most computations of trust preferable to let a single value represent both of these ideas as the recommendation of the trustworthiness of an unknown person normally becomes a foundation for the belief about the actions of the person, thus leading to some amount of trust. As an example consider the case where person A asks person B for a recommendation for a good mechanic to fix its broken car. If A beliefs B’s recommendation of a person C , A will develop some trust for C based on B’s recommendation and the belief that C will take the necessary steps to fix the car producing the desired outcome. The same argument can be used to construct longer chains of trust in which trust can be passed along the trustworthy people. However, because trust is not perfectly transitive, we expect trust to degrade while propagating along the chain of acquaintances. We usually trust our own friends more than the friends of our friends, a phenomenon which has been widely studied in computer science [Gray et al., 2003; Guha et al., 2004; Jøsang, 1996; Jøsang and Kinateder, 2003; Richardson et al., 2003; Ziegler and Lausen, 2004].

1.2.3 Trust Metrics

As we could see in Section 1.2.2 trust transitivity is usually applied when there is no link between a pair of users. This means that no trust decision has yet been made and reputation is used as a measure of trust. The resulting web of trust allows a quantification of trust which is used to make estimations about the trust between any two entities. The quantification of this trust and the associated algorithms are called trust metrics [Artz and Gil, 2007]. The first application of a trust metric ranges back to the nineties where metrics for public key authentication [Beth et al., 1994; Levien and Aiken, 1998; Maurer, 1996; Reiter and Stub- blebine, 1997a,b] were used to support the Public Key infrastructure [Zimmermann, 1995]. Since then, new research areas such as peer-to-peer networks [Kamvar et al., 2003; Kinat- eder and Pearson, 2003; Kinateder and Rothermel, 2003], mobile computing [Eschenauer et al., 2002] and rating and reputation systems for online communities [Guha, 2003; Levien, 2004; Levien and Aiken, 1998] have raised the interest in trust computation. Pirzada and McDonald have presented a reputation-based system in which nodes in ad-hoc networks indirectly monitor the performance of other nodes nearby and thus, they can make decisions 1.3. Aim of the Study 7

which nodes are trustworthy to route traffic. Similar to this idea, Dash et al. developed an application in which tasks are allocated to the agent who is performing best. Using statistics to determine reputation from past performances, Jøsang and Ismail presented a technique which combines reputation feedback data using a beta probability distribution [Artz and Gil, 2007].

1.3 Aim of the Study

Trust systems represent an important trend in decision support for services provided via the internet. They enable users to rate the trustworthiness of other members of the same system. These collected trust statements can assist other participants in their decision process whether or not to trust and interact with a certain party. This thesis documents the first approach in modelling a social network between develop- ers, packagers and users of open source software by embedding it into the openSUSE Build Service. This network consists of subjective trust ratings between these participants building the foundation to compute trust predictions for software packages created by some of them. These trust values should help users to find the right packages. However, trustworthiness of remote entities is hard to assess as computerised media remove most of the familiar styles of interaction discussed in Section 1.2.1. Some of the trust metrics proposed during the last decade are investigated to find the most suitable one for this purpose. In addition, the first definition of a contribution and reputation based trust rating for software packages is provided and a ready to use prototype developed. An evaluation shows the performance of the proposed trust system as well as the correctness of its corresponding implementation. As another research contribution, the test scenario to evaluate and compare different trust metrics proposed by Massa and Souren is applied to the chosen algorithm [Massa and Souren, 2008]. The results of this tests can be incorporated into Trustlet.org3, an open platform for trust metrics research.

1.4 Outline of the Thesis

The remaining part of this bachelor thesis is structured as follows: Chapter2 introduces typographical conventions and key terminology. Chapter3 describes the state of the art in trust metrics by illustrating the five most recognised algorithms. Chapter4 provides a brief introduction to the openSUSE Build Service, defines trust for software packages and illustrates the architecture of the trust system. Chapter5 evaluates the described trust system using artificially generated networks as well as real-world data sets. Chapter6 discusses the trust concept and its implementation as well as the validation results. Chapter7 shows future directions and possible improvements.

3http://www.trustlet.org 8 Chapter 1. Introduction Chapter 2

Notation and Terminology

Notations and typographical conventions which are used in the later chapters are introduced in Section 2.1. Furthermore, this chapter contains a definition of the key terminology for describing graphs in Section 2.2.

2.1 Notation #» #» Vectors are denoted by v Rn where v is a vector in the vector space Rn with the compo- #» ∈ #» #» nents v = (v1,v2,...,vn ). The length of a vector v is denoted by v . 1 denotes a vector with every component equal to one. || || Matrices are denoted by roman uppercase letters: M Mm n (R) is a matrix with m rows × and n columns, where the components are real numbers.∈Mi j denotes the component in row i and column j of matrix M. The transpose of a matrix M is denoted by MT. Sets are denoted by uppercase letters: S with the components S = (s1,s2,...,sn ). S1 S2 denotes that S1 is a subset of S2. S s denotes the set S without the element s . S denotes⊆ the number of elements in the set S. \ | | Other variables and constants are denoted by lowercase letters: x. If a variable can change over time, the time is denoted by a superscript: x t is the value of x at time t .

2.2 Terminology

A graph G = (V, E ) consists of a finite set of vertices (or nodes) V and a set of edges E connecting them. Graphs can be distinguished into directed graphs and undirected ones. A directed graph only contains directed edges, in which case the edge is a tuple (vx ,vy ) V V , whereas an undirected graph only contains undirected edges, in which case the edge∈ is a× set v ,v 2V . The underlying undirected graph of a directed graph is a graph G V, E with x y = ( 0) ∈  E 0 = vx ,vy (vx ,vy ) E (vy ,vx ) E . | ∈ ∧ ∈ + If there is a weight we R0 associated with every edge e E , then the graph is weighted. ∈ ∈ 10 Chapter 2. Notation and Terminology

For a set of nodes V = v1,v2,...,vn , the adjacency matrix A M V V (R) is defined as: { } ∈ | |×| | ( 1 if ei j E , Ai j = 0 otherwise∈ where ei j denotes the edge from node vi to node vj . In case of an undirected graph, the adjacency matrix is symmetric. For a weighted graph, the adjacency matrix is ( ci j if ei j E , Ai j = 0 otherwise∈ where ci j is the weight associated with ei j . A path between two nodes vx and vy is defined as a sequence of nodes v0,v1,...,vn with v0 = vx and vn = vy such that (vi ,vi +1) E holds for a directed graph (respectively ∈ vi ,vi +1 E for an undirected graph) for 0 i < n. The length of the path is n and thus, a trivial{ path} ∈ from v to v with the length 0 exists≤ for every node in a graph. The distance of two nodes in a graph is the minimal length of the path connecting them. The degree δv of a node v in an undirected graph is the number of edges containing the node: e E v e . For directed graphs, the indegree consisting of the number of δv = δv− |{ ∈ | ∈ }| + edges terminating in node v , is distinguished from the outdegree δv , which is the number of edges originating from node v :  δv− = (vx ,vy ) E vy = v + | ∈ | | δv = (vx ,vy ) E vx = v . | ∈ | | The transitivity of a graph, also called the clustering coefficient cc, is measured as the fraction of triangles in the graph: 3 t cc = · (2.1) tv where t is the number of triangles and tv is the number of connected triples of vertices in the graph. Thereby, a connected triple means a single node with edges running to an unordered pair of others. Chapter 3

State of the Art

The abundance of information as well as the global connectivity through the web led to the fact that social trust between individuals became a precious good nowadays. Trust has a major impact on decisions whether to believe or disbelieve information asserted by other users and belief should only be accorded to statements from people we deem trustworthy [Ziegler and Lausen, 2004]. In a social network, estimations of the trust a user, the source, has within another unknown user, the sink, are based on trust statements along the network path connecting both of them. If two individuals are directly connected in this network, they might have trust ratings for each other. Unfortunately, in huge networks, trust judgements based only on personal experience become unfeasible and thus, if two peers are either not directly connected or they do not have these trust evaluations for each other, recommendations of other users on the path connecting both of them must be used to infer the trust they have in each other. An easy and also tempting strategy would be to trust all people who are trusted by persons we trust. However, common sense tells us not to rely on this procedure as trust would propagate through the network and be accorded whenever two individuals can reach each other via at least one trust path. More complicated trust metrics are needed to estimate the trust value the source should have in the sink. These metrics do not only need to take social and psychological aspects into account but also computability as well as scalability considering the size of most modern networks. Unfortunately, inferred trust ratings will never be as accurate as direct ones and thus, it is important to understand how the trust values of intermediate users affect the accuracy to be able to improve inferred trust relationships within a network. In addition, it is necessary to understand how the length of the path connecting the source and the sink, determined by the number of edges the source must traverse before reaching the sink, affects the accuracy of inferring trust relationships. However, these problems have been approached in many different ways over the last years and we will introduce some of the major contributions from the literature in Section 3.2. A classification of trust metrics and their properties can be found in Section 3.1. 12 Chapter 3. State of the Art

local global

threshold

centralised centralised distributed

ranked

scalar group group

Figure 3.1: Properties to classify trust metrics.

3.1 Classification of Trust Metrics

Trust metrics can be classified using distinctive features. (see Figure 3.1). Prior work mainly distinguished between local or global trust (see Section 3.1.1) [Guha, 2003], distributed and centralised computed trust metrics (see Section 3.1.2), scalar or group trust metrics (see Section 3.1.3) [Levien, 2004] and trust metrics using a rank or a threshold as criteria for their selection of the results (see Section 3.1.4) [Maresch, 2005].

3.1.1 Network Perspective

In terms of network perspective, trust metrics can have a global or a local scope. The first take a global view of the world and try to make judgements based on incorporated global information such as the views of other users, previous history, etc. They compute the overall reputation and are based on complete trust graph information, taking all peers and all trust links connecting them into account. In contrast, the latter allow for personal bias by taking the opinion of the source as an additional input parameter and thus, operate only on partial trust graph information. In general, local trust metrics seem to exhibit a more natural point of view as they are in better accordance with the subjective concept of trust, considering the possibility that the trust of one user in a person might be completely different from the trust of another user in exactly the same person. On the one hand, local trust metrics can be more precise through the incorporation of personal views of the user, but on the other hand, they are computationally more expensive as trust must be calculated separately for each individual. In contrast to this, global metrics just require a single run for the entire community and are therefore best suited for centralised environments where a single trustworthy entity computes the trust for all instances [Massa and Avesani, 2005]. However, one of the most important differences is the fact that local trust metrics can be attack-resistant by excluding malicious users from the trust propagation whereas malicious exploitation of links is an inherent and unavoidable problem for global metrics [Gori and Witten, 2005]. The probably most famous example for a global trust metric is the PageRank algorithm (see Section 3.2.1) [Page et al., 1998] which is used by Google to compute web page reputation and from which many global trust metrics borrow ideas [Guha, 2003; Kamvar et al., 2003; Richardson et al., 2003]. Other well-known examples for global trust metrics are EigenTrust and Advogato which are discussed in Section 3.2.2 and 3.2.3, respectively. Local trust metrics comprise metrics for modelling the Public Key Infrastructure [Beth et al., 1994; Maurer, 1996; 3.1. Classification of Trust Metrics 13

User A ? User B ? User D trust trust trust trust trust

User C User E User F User G

Scalar Group

Figure 3.2: Trust propagation in a scalar trust metric (left) and in a group trust metric (right).

Reiter and Stubblebine, 1997b], Sun Microsystems’s Poblano [Chen et al., 2001], Golbeck’s metric for Semantic Web trust, TidalTrust (see Section 3.2.4) [Golbeck et al., 2003], and Appleseed (see Section 3.2.5) [Ziegler and Lausen, 2004].

3.1.2 Computation Locus

Another possibility to differentiate between trust metrics is given by the place where trust relationships between users are evaluated and quantified. Centralised techniques require full access to all trust information to be able to perform all computations in a single machine whereas distributed metrics equally deploy the load of computation on every trust node in the network. In the latter approach, a user merges the data with its own trust assertions and propagates synthesised values to its successor nodes in the trust graph upon receiving trust information from its predecessor nodes. This procedure is inherently global and it leads to an asynchronous calculation of trust computation. Its convergence depends on the eagerness of nodes to propagate information. Although the computation load is decreased in comparison to the centralised approach, the required space is increased due to the fact that the nodes need to store trust information about any other node in the system.

3.1.3 Link Evaluation

The third classification considers the evaluation of links in trust metrics. Scalar trust metrics analyse trust assertions independently by tracking trust paths from sources to targets. In contrast, group trust metrics parallely evaluate groups of trust assertions ‘in tandem’, resulting in the fact that the relationship of one entity to another one is dependent of the relationships all other group members have to each other. As a result of their different functional design (see Figure 3.2), scalar metrics are inherently local due to the fact that they compute trust between two users, whereas group trust metrics generally calculate trust ranks for sets of individuals. A good example for a group trust metric is PageRank (see Section 3.2.1) [Page et al., 1998] which computes the reputation of one page by taking the rank of all referring pages into account, thus entailing parallel evaluation of relevant nodes due to mutual dependencies whereas TidalTrust (see Section 3.2.4) [Golbeck et al., 2003] serves as an example for a scalar trust metric. 14 Chapter 3. State of the Art

Criteria Algorithm

Advogato Appleseed EigenTrust TidalTrust PageRank

Perspective global local global local global Computation Locus centralised centralised distributed centralised centralised Link Evaluation group group group scalar group Selection Method threshold rank rank threshold rank Graph Properties social network social network — social network closed graph Evaluation based on direct ratings direct ratings feedback direct ratings transactions

Table 3.1: Overview of the distinctive features of the trust metrics introduced in Section 3.2.

3.1.4 Selection Method

To support a user in the decision who is trustworthy and who is not, a trust metric should only present the most important, lucrative results to the user, in order not to overwhelm him with too much information, as most networks are composed of a large number of participants. In general, trust metrics can be distinguished into two groups: trust metrics ranking results and those using a threshold above which results are classified as important. The first ones evaluate an entity relative to all its competitors. As long as there are some entities possessing the desired feature, they offer always a solution. However, due to the fact that the results are only ranked between all users and no threshold is used, they can not guarantee the quality of the obtained. On the other hand, trust metrics using a threshold can not guarantee to achieve results but when they do, the user can be sure that the selected entities exhibit a minimum trust. Rank based metrics are also always group metrics as a rank only has a meaning in relation to other ranks with the same sense [Maresch, 2005]. On the contrary, threshold based metrics evaluate each entity individually. Appleseed (see Section 3.2.5) [Ziegler and Lausen, 2004] is an example for a rank based trust metric as it evaluates a group of users in a local environment before it orders them taking their trustworthiness as ranking order. In contrast, TidalTrust (see Section 3.2.4) [Golbeck et al., 2003] uses a rating between the participants as an alternative for non existing direct evaluations and a threshold given by the trust source is taken as criteria for trustworthiness.

3.2 Trust Metrics

After introducing the characteristic properties used to classify trust metrics, we will describe some algorithms in this section which had major impact on the computation of trust during the last years. These methods show different approaches to solve the problem of trust calcula- tion and thus, they can be used as good examples for techniques with local or global network perspectives, distributed or centralised computation locus, group or scalar link evaluation and rank or threshold based selection methods. A comparison of the main distinctive features between the trust metrics used in the discussed algorithms can be found in Table 3.1. 3.2. Trust Metrics 15

10

30 50 10

40

10 80 74

40 24

Figure 3.3: To illustrate the simplified PageRank calculation, four web pages with a rank of 30, 80, 50 and 74, respectively, are depicted together with their incoming and outgoing links. Note that the forward links of each page evenly contribute to the ranks of the pages they point to.

3.2.1 PageRank

With the goal of helping search engines as well as users to quickly make sense of the vast heterogeneity of the World Wide Web, the PageRank algorithm, a method for computing a global ‘importance’ ranking of every web page based on the graph of the web, was published in 1998 by Page et al.. Many previously implemented web search engines were based on the idea that highly linked pages are more ‘important’ than pages with only a few links and thus, a back link count was used as a way to identify ‘important’ or ‘high quality’ web pages. Unfortunately, simple citation counting does not always correspond to our common sense notion of importance. As an example consider a web page with a single link off the openSUSE project home page. It is only a single link but an important one and thus, it should be ranked higher than web pages with many more links from obscure places. PageRank takes an advantage of this observation and allocates a web page a high rank when the sum of the ranks of its back links is high. A simplified version of PageRank defines a simple ranking, r as

X r (v ) r (u ) = c (3.1) δ+ v Bu v ∈ where u is a web page, Fu is the set of pages u points to and Bu is the set of pages that point to + u . δu = Fu is the number of links from u and c is a factor used for normalisation so that the total rank| of| all web pages is constant. Note that the rank of a page is evenly divided among its forward links to contribute to the ranks of the pages they point to and c < 1 as there are a number of pages without any forward link and thus, their weight is lost from the system. An example for a set of pages is given in Figure 3.3. Unfortunately, there is a problem with this simplified version of the algorithm: consider two web pages that point only at each other but at no other page and a page pointing at one of them. During iteration, this loop will accumulate rank but never distribute any rank forming a trap, called# » rank sink. To overcome the described difficulty, the authors introduced a rank source, e (u ), a vector over the web pages, and defined the PageRank of a set of web pages as 16 Chapter 3. State of the Art

#» an assignment, r , to the web pages which satisfies: 0 # » # » X r (v ) # » r (u ) = c 0 + c e (u ) (3.2) 0 δ+ v Bu v ∈ #» such that c is maximised and r 1. 0 1 = Another possibility to present|| || the algorithm makes use of an eigenvector formulation: the adjacent graph A is defined as a square matrix with rows and columns corresponding to web pages: A is 1/δ+, with δ+ defined as the outdegree of node i , if there is an edge from i to j i ,j i #» i #» #» #» and 0 otherwise. r is treated as a vector over web pages and is defined as r c A r e 0 #» #» #» 0 = (#» 0 + ) which can be rewritten as r c A e 1 r , where 1 is the vector of all ones, as r 1. c #» 0 = ( + ) 0 #» 0 1 = is maximised and thus, r is an eigenvector× of A e 1 with eigenvalue c. The|| algorithm|| 0 ( + ) can be found in Appendix C.1. × Intuitively, the PageRank algorithm can be thought of as a random walk on graphs mod- elling a ‘random surfer’ behaviour to approximate the overall importance of web pages. The surfer randomly clicks on successive links: each walk starts# by » choosing a web page, modelled as a node u , using an initial seed probability distribution e u . He then follows links randomly ( ) #» until he stops: at each step, the walk ends with probability e 1, informally known as the ‘Manhattan distance’ || || #» X # » e 1 = e (u ) , (3.3) || || u | | or otherwise the next node is chosen uniformly from its successors. The rank assigned# » to each node u is the probability that a random walk will end in this node, given by r u . 0( ) This iterative procedure takes O( E ) time per iteration. For the most web pages, it converges in O(l og V ) iterations and| thus,| a useful approximation of PageRank requires O( E l og V ) time. | | | One| problem| | of PageRank is the handling of dangling links which point to any page with- out any outgoing link. They are removed from the system whilst the PageRank is calculated due to the fact that there is a large number of them (for example pages which have not been downloaded yet as it is hard to sample the entire web at once). Even if they do not affect the ranking of any other page directly, the normalisation of the other links to the same page as the link which was removed will change. However, the authors suggest that the effect of dangling links on the whole system should be small. The PageRank algorithm is used by Google’s search engine, which treats each web page as a node and each link as a certificate, to estimate the relative overall importance of a web page. Google uses the PageRank information, together with other factors such as standard IR measures, proximity and anchor text (text of links pointing to web pages), to rank search results attempting to present the most useful pages first [Page et al., 1998; Ruderman, 2004].

3.2.2 EigenTrust

EigenTrust [Kamvar et al., 2003] uses a variation of the PageRank algorithm to calculate trust values. It was rather designed for peer-to-peer (P2P) systems with the objective of reducing inauthentic files distributed by malicious peers than for human social networks. There is a 3.2. Trust Metrics 17

fundamental difference between trust in P2P networks, where it is based on the reliability of a node to adhere correct parameters and is seen as a measure of performance, and trust in social networks. As a result, trust in P2P networks can be seen as an absolute truth, either a file is corrupt or not, causing that a single peer’s performance does not differ much from one peer to another. As a P2P network generally does not have a peer providing both, corrupt and intact versions of the same element to different requesting peers, each peer can expect to have the same experience as every other one. This minimises the requirement for a trust rating personalised for each node. The information provided by one peer reflects the truth of all peers which is beneficial for trust calculations. In a social network, on the other hand, there is no absolute truth: two individuals opinion about the trustworthiness of the same person can differ dramatically or can be dependent on the topic (for example religion or politics, to name some extreme). Even if previous work in P2P reputation systems has been carried out, [Aberer and Despo- tovic, 2001; Cornelli et al., 2002] most of them suffer from the drawback that they either aggregate the ratings of only a few peers and therefore, they do not get an entire view about the reputation of a peer or they congest the network by asking for a local trust value of a peer at every single query. In contrast to these methods, EigenTrust aggregates trust values of all agents in a natural manner avoiding excessive system messages at the same time: to determine the global trust values of all peers in the network, each peer maintains a local trust rating for each peer with whom it has interacted previously (termed acquaintances). For example, each time a peer i downloads a file from peer j , it can rate the transaction as positive, t ri j = 1, or in case the downloaded file is either inauthentic or tampered with, or if the download is interrupted, as negative, t ri j = 1. The local trust value si j is then computed as the sum of the ratings of the individual transactions− that peer i has downloaded from peer j : X si j = t ri j (3.4) To avoid the problem of malicious peers subverting the system by assigning arbitrarily high local trust values to other malicious peers and arbitrarily low ones to good peers, the aggre- gated local trust values require normalisation to ensure that all values will be between 0 and 1. The normalised local trust value is defined as

max(si j ,0) ci j = P . (3.5) j max(si j ,0)

In a next step, the peer determines the transitive trust value that it holds in those peers which are trusted by its acquaintances using the information about the weighted local trust values it has about its acquaintances together with the local trust values held by its acquaintances about the other peers: X t ti k = ci j c j k . (3.6) j The main idea behind the notion of transitive trust is that a peer will have a high opinion of those peers who have provided authentic files and moreover, this peer is more likely to trust the opinion of those peers, since peers who are honest about the files they provide are also likely to be honest in reporting their local trust values. The described process results 18 Chapter 3. State of the Art

#» in a matrix of trust values: if C is defined#» to be a matrix [ci j ] and ti is defined to be a vector T #» containing the values t ti k , then ti = C ci . In this manner, each peer has a view of the network which is wider than its own experience but it is still reflecting the experience the peer has with its acquaintances. If this process is carried out iteratively, this matrix converges resulting in a network wide, global view of the trust values of the peers in the network. The#» global trust values correspond to the left principal eigenvector of this matrix, meaning that t represent the global trust vector and its elements, t j , quantify how much trust the system as a whole places in peer j . The algorithm is given in Appendix C.2.1. There are three practical issues which are not addressed in the simple algorithm so far: the a priori notion of trust, inactive peers, and malicious collectives. The first one is important as there are usually some peers in the network that are known to be trustworthy (such as the operators of the P2P network or the early users as they often are less likely to have intentions to destroy the network they build). For this reason, the authors of EigenTrust defined some #» distribution p over pre-trusted peers, defining pi = 1/ P if i P where P is a set of peers | | ∈ known to be trusted, and pi = 0 otherwise. This definition is also used in the case of inactive peers (for instance a peer i that does not download from anybody else, or which assigns a zero score to all other peers, ci j ), in which ci j is redefined as:

( max(si j ,0) P P if j max(si j ,0) = 0, j max(si j ,0) ci j = 6 (3.7) p j otherwise #» The third application of the distribution p is in the addressing of the problem of the formation of malicious collectives in P2P networks. Hereby, collectives are broken up by placing at least some trust in the peers P (which are not part of the collective) for every peer by taking

#» k 1 T #» k #» t ( + ) = (1 a) C t ( ) + a p (3.8) − where a is some constant less than 1. The algorithm is described in Appendix C.2.2. Similar to the above described BasicEigenTrust algorithm in which a central server knows all ci j values and performs all computations, the DistributedEigenTrust algorithm (given in Appendix C.2.3) allows all peers in the network to cooperate to calculate and store the global #» trust vector. Thereby, each peer stores its local trust vector ci as well as its own global trust value ti which is computed as follows:

(k +1) (k ) (k ) ti = (1 a)(c1i t1 + + cni tn + a pi . (3.9) − #» ··· #» T #» This is the component-wise version of t (k +1) = (1 a) C t (k ) + a p . Due to the fact that only the pre-trusted peers require to know their pi , they− remain anonymous. Another benefit is that in most P2P networks, each peer has limited interaction with other peers, meaning k 1 k ( + ) ( ) (k ) that the computation ti = (1 a)(c1i t1 + + cni tn ) + a pi is not intensive, since most ci j are zero. On the other hand,− because the set··· of peers which have downloaded files from peer i and the set of peers from which peer i has downloaded files are small, the number of messages is small as well. In the rare case of a network only consisting of heavily active peers, EigenTrust can enforce these benefits through limiting the number of local trust values that each peer can report. 3.2. Trust Metrics 19

However, the main disadvantage in this procedure arises from the fact that each peer computes and reports its own trust value ti and thus, malicious peers can easily report false trust values to subvert the system. To further ensure that malicious peers do neither report incorrect trust values nor manipulate their own trust values, Kamvar et al. proposed another version of the EigenTrust algorithm, called SecureEigenTrust, in which the trust value of each peer is computed and stored in more than one peer, so called score managers, and together with some other peers in the network. If a peer needs a trust value, it can query all score managers for it and a majority vote on the trust value then settles conflicts arising from malicious peers among the score managers presenting faulty trust values as opposed to the correct one. Unfortunately, there is still one problem remaining: due to the fact that the algorithm is based on finding the principal eigenvector, trust must first be normalised in order to enable working with the matrix. Thus, the normalised trust value from a person who has many trust ratings will be lower than if only one or two people had been rated. This is in strong contrast to the social point of view of trust as an infinite resource: it is possible to have very high trust for a large number of people without meaning that this trust is any weaker than the trust held by a person who only trusts a few people. Many extensions and applications of the original EigenTrust algorithm can be found in literature: Zhou and Hwang applied the distributed version of EigenTrust in their implemen- tation PowerTrust to the online auction system eBay. In eBay, sellers and buyers can rate each other after each transaction and the overall reputation of a user is the sum of these ratings which is stored and managed on a centralised system [Kamvar et al., 2003]. Pow- erTrust calculates global trust values in exactly the same manner as the DistributedEigenTrust algorithm with the exception that nodes which enjoy very high feedback in eBay, so called power nodes, are used as pre-trusted peers. Keeping in mind that the pre-trusted peers are essential for the EigenTrust algorithm, guaranteeing its convergence and the breaking of malicious collectives, and thus, their choice is crucial, Chirita et al. proposed an extension to the DistributedEigenTrust algorithm to compute personalised trust values. In their method, each peer is able to choose its own set of pre-trusted peers (usually the peers which the source peer trusts most) and it carries out the trust calculation relative to its selected set to obtain a personalised trust assessment. Another example for an extension comes from Abrams et al. who modified EigenTrust in a way such that peers do not have an incentive to propagate false recommendations about other peers in order to increase their own trust value.

3.2.3 Advogato

In 1998, Levien and Aiken created the Advogato website (http://advogato.org) which serves as a community discussion board and resource for free software developers as well as Levien and Aiken’s testbed for their research on trust metrics. Levien further proposed the Advogato trust metric to explore which users in an online community are trusted by its members. The trust metric uses a global reputation for individuals in the network, just as PageRank or EigenTrust, to evaluate a set of peer certificates. These certificates are used to control the access to post and edit website information. They are represented as a graph, and used with the goal to accept as many valid accounts as possible whilst reducing the impact of attackers. 20 Chapter 3. State of the Art

A A- 1 8

9 super- A+ sink

1

B B-

2

3 B+

Figure 3.4: The conversion of the graph splits the two nodes A and B with the capacities 9 and 3 into two nodes, A and A+ and B and B+, respectively, and introduces a new node, the supersink, serving as a single sink for the− network algorithm− to which a single unit capacity edge is added from both, A and B . Two edges with the capacities 9 1 = 8 and 3 1 = 2 are added from A to A+ and B to B+, respectively,− as well− as an edge with infinite capacity from− A+ to B .− This modified graph fulfils− the single source,− single sink criteria of most standard network flow algorithms. −

The mapping of certificates into a graph is dependent on the certification level l : in the graph, each account corresponds to a node and a directed edge exists from node s to node t when account s has certified account t at level l or higher. The algorithm performs certification to three different levels, Apprentice, Journeyer, and Master, by running the trust metric three times, using the ‘level’ value in the certificate as threshold. Advogato takes a number of members to trust and a set of authoritative nodes in the network, the trust seed, with predefined edges to accounts, as input to perform the calculation of the trust metric and to determine the trust level of a person, and thus their membership within a group, as well as the members of the network trusted by those being part of the trust seed. For this purpose, Advogato assigns a capacity, defined as a function of the shortest distance from the seed to a node, to each node in the graph by using a breath-first search technique starting from the seed. Nodes closer to the root have high capacity which diminishes with distance. The currently used values are:

cap(0) = 800 cap(1) = 200 cap(2) = 200 cap(3) = 50 cap(4) = 12 cap(5) = 4 cap(6) = 2 cap(i ) = 1 i > 6 ∀ As described before, the core of the trust metric is a network flow operation with a single source, multiple sink problem. Due to the fact that standard network flow algorithms are 3.2. Trust Metrics 21

specified as a single source, single sink problem with capacity constraints on edges rather than on the nodes, Levien and Aiken modified the graph by adding a new node, called the supersink, which serves as a single sink for the network flow algorithm. For this purpose, each node x needs to be split into two nodes, x and x+ and an unit capacity edge from x to the supersink node needs to be added. Furthermore,− for each node x with capacity c, an− edge is added from x to x+ with capacity c 1 and for each edge from s to t in the original graph, an infinite capacity− edge from s + to t− is added in the new graph. The pseudo code for the transformation step is given in Appendix− C.3. An example for such a conversion can be found in Figure 3.4. Using this conversion, the calculation of the network flow is fairly straightforward, using a standard algorithm to compute a maximum flow from the seed to the supersink such as the Ford-Fulkerson algorithm [Ford and Fulkerson, 1956]. Once the network flow has been calculated, the metric certifies each node for which there is flow from x to the sink. Since any node with flow from x to x+ also has flow from x to the sink, any node− through which trust flows is itself certified− [Ruderman, 2004]. An example− is given in Figure 3.5. Due to the fact that the Ford-Fulkerson algorithm picks the shortest augmenting graph from the source, any node with a flow from x to x+ also has a flow from x to the sink. Its complexity is O( f E ), where f denotes the− maximum flow. Thus, in a graph,− f is simply the number of nodes| ∗ || accepted,| so∗ that the algorithm takes O( V E ). ∗ Even in a rich web of interconnections where the flow reaches| || | almost every node, the Advogato trust metric has the favourable property of a low probability to fail in the face of a sufficiently massive attack due to the fact that it only accepts a few accounts from a malicious clique of accounts as long as there are only a few certificates from the ‘good’ accounts to the bogus ones. Those certificates represent a bottleneck in the network flow and by identifying individual ‘bad’ nodes as well as any nodes that certify them, the metric is able to cut out an unreliable portion of the network. Therefore, computations are primarily based on ‘good’ nodes, the number of bad nodes accepted only scales linearly and the network as a whole remains secure [Levien and Aiken, 1998]. However, a remaining problem is that the impact of a confused node, meaning one that behaved well but has issued certifications to bad nodes, increases as it gets closer to the source node [Ruderman, 2004]. Although some members in the network enjoy supreme trust and in general, trust is computed by a centralised community server relative to their recommendations, the metric is not only applicable to this particular scenario. In the same way, an individual user can act as a trust seed, thus converting the global Advogato trust metric to a local one and calculating its own personalised trusted peers. Nevertheless, a remaining disadvantage of Advogato is that it only supplies a binary value for trustworthiness and it does not provide any support for weighted trust relationships.

3.2.4 TidalTrust

In 2005, Golbeck developed the trust metric TidalTrust for social networks incorporating the idea that the most accurate information comes from the highest trusted neighbours on the shortest paths [Golbeck, 2005, 2006b; Ziegler and Golbeck, 2007]. As shorter paths lead to more accurate information, TidalTrust restricts the depth of the search to lower the error rate. At the same time, through the imposed limitation, fewer nodes will be accessible. To 22 Chapter 3. State of the Art

17 cap(0) = 20

7 5 4 cap(1) = 7

2 2 2 2 2 2 2 cap(2) = 2

cap(3) = 1

Figure 3.5: The figure shows an example of a graph after the network flow has been calculated. A capacity has been assigned to each node, with values decreasing from the sink with the highest capacity of 20 to the nodes further away from it. The maximal flow into each x node is given as value noted in it and a set of servers (all nodes except from the dotted one) is chosen with a flow from− x to the supersink. −

9 9 9 10 9 8 8 S 8 8 D 10 6 10 8 9 10

Figure 3.6: The trust threshold is determined by the trust ratings given on the edges of the graph. Each node stores the maximum trust strength on the path leading to it. Due to the fact that both nodes adjacent to the sink have a maximum trust rating of 9, 9 becomes the new trust threshold. The bold edges indicate the two paths which will be used in the calculation as all their trust ratings on the edges are at or above the maximal threshold. balance the two factors and to preserve the benefits of a shorter path length without limiting the number of inferences that can be made, the shortest path length required to connect the source with the sink becomes the depth and thus, the depth varies from one calculation to another one. On the other hand, to account for the fact that direct neighbours are more trusted than indirect ones, TidalTrust establishes a minimum trust threshold and it only considers connections in the network with trust ratings at or above this threshold. Due to the fact that the highest trust value along all possible paths is not known before the search, this value can not be fixed and it is calculated whilst searching for paths from the source to the sink by tracking the seen trust values. It represents the largest trust value, max, that can be used as a minimum threshold such that a path can be found from the source to the sink.

Incorporating these elements, TidalTrust computes the trust ti s of a source i to a sink s as: P ti j t j s j adj(j ) ti j max ti s = P∈ | ≥ (3.10) ti j j adj(j ) ti j max ∈ | ≥ The process of the search for the sink starts by the source asking each of its neighbours, adj, to obtain a rating of the sink. In the ideal case, the polled neighbour has a direct trust 3.2. Trust Metrics 23

rating of the sink and returns this value. In all other cases, it recursively queries all of its neighbours to obtain a trust statement for the sink. During this procedure, each reached node keeps track of both, the current depth from the source and the strength of the path to it. The strength of a path to each neighbour is defined as the minimum of the rating of the node by the source and the node’s rating of its neighbour and thus, nodes adjacent to the source will record the rating assigned to them by the source. The neighbour records the maximum strength path leading to it. Once a path is found from the source to the sink, the depth is set at the maximum allowable depth. Since the algorithm implements a breath first search technique, the first path found will be at the minimum depth. If this search is complete, the trust threshold, max, is established by taking the maximum of the trust paths leading to the sink and each node can complete the calculations of a weighted average taking into account the information from nodes that they have rated at or above the max threshold. The algorithm outline can be found in Appendix C.4 and an example is given in Figure 3.6. TidalTrust runs in linear time, O(V + E ), to the size of the adjacent list representation of the network: O(V ) is needed to visit every node and at most O(E ) time is required to scan the adjacent ones and to make the calculations for the path length which runs in O(1) time [Golbeck, 2005]. The personal aspect of trust which is incorporated in TidalTrust implies also a restriction in terms that there is no ‘correct’ or ‘incorrect’ trust value except when considered from the perspective of a given individual and thus, it does not translate directly into systems with an absolute truth [Golbeck, 2006a]. The information available in a social network is not sufficient to let a user know how much to trust another one: the correctness of the data which can be determined without anyone’s opinion is important as well. TidalTrust was implemented in a real-world project, FilmTrust1 [Golbeck, 2006b], a movie recommender system in which trust is used to filter, aggregate and sort information. Users can write reviews of movies, rate them as well as the trustworthiness of their friends in the social network to gain personalised views of the movie pages, displaying personalised recommended movie ratings ordered by relevance. Another application of TidalTrust is TrustMail, an email client. It uses the trust rating of each sender as a score for the message which is then used to sort messages [Golbeck, 2005].

3.2.5 Appleseed

In 2004, Ziegler and Lausen developed Appleseed, a local group trust metric using concepts from spreading activation models in psychology to evaluate trust in Semantic Webs. The main constituents of their model for Semantic Web trust infrastructure are an agent set V = a 1,...,a n where every agent a ε V is assumed to be uniquely identifiable and a partial trust function set T = Wa 1 ,...,Wa n which is publicly accessible for any agent in the system. In the latter, every agent a is associated with one partial trust function W : V 0,1 a [ ]⊥ corresponding to the set of trust association that a has stated. Due to the fact that the→ number of individuals for whom agent a has assigned explicit trust values is much smaller than the

1http://trust.mindswap.org/FilmTrust 24 Chapter 3. State of the Art

A B A B

D C D C

Figure 3.7: Node chains Figure 3.8: Rank sinks

A 0.7 0.7 A 0.7 B D 1 0.7 1 0.25 1 1 1

B C C E F G 0.25

Figure 3.9: Normalisation issue Figure 3.10: Backward propagation overall number of agents in the system, these functions will be sparse: ( p if trust(a i ,a j ) = p wa i (a j ) = (3.11) if no rating for a j from a i . ⊥

The higher the values of wa i (a j ) the more trustworthy a i deems a j ; a value of 0 on the other hand means that a i considers a j as not trustworthy at all. The two constituents of the trust model define together a directed trust graph with nodes being represented by agents a ε V and directed edges(a i ,a j ) ε E V V from nodes a i to nodes a j being trust ⊆ × statements with weight wa i (a j ). To start a search, source node s is activated through an injection of energy e . To propagate e to other nodes along the edges, e is completely divided among the successor nodes with respect to their normalised local edge weight. To avoid endless, marginal and negligible flow, energy streaming into a node must exceed a threshold t in order not to run dry. Trying to interpret this basic intuition behind the spreading activation models in terms of trust computation gives rise to some problems: all energy that has passed through a node x will be accumulated representing its rank, but at the same time, all energy contributing to the rank of x is passed without loss to its successor nodes. Assume we would apply the spreading activation to the graph depicted in Figure 3.7, the computed trust ranks in b and d would be identical. However, common sense tells us that trust decays with the distance to the source of the trust as people tend to trust individuals trusted by their own friends more than individuals trusted only by friends of friends [Guha, 2003; Jøsang and Kinateder, 2003] and thus, d should be regarded as less trustworthy compared to b. But an even more serious problem arises if the energy (or trust, respectively) becomes trapped in a cycle (see Figure 3.8). This energy will never be accorded to any nodes outside the ones belonging to the cycle and thus, the latter nodes will eventually acquire infinite trust rank. 3.2. Trust Metrics 25

The developers of Appleseed handle the two problems described above by introducing a global spreading factor d denoting the portion of energy that a node distributes among its successors whilst retaining a fraction of 1 d for itself. d may also be seen as the ratio between direct trust in a node and trust in the− ability of this node to recommend others as trustworthy peers. Low values of d favour trust proximity to the source of trust injection whereas high values allow trust to also reach nodes which are further away. A common practise is the application of an edge weight normalisation in the computation of trust metrics [Guha, 2003; Kamvar et al., 2003; Page et al., 1998] so that the quantity of energy distributed from a node x to its successor y depends on the relative weight, for example W (x,y ) compared to the sum of weights of all outgoing edges of x: W (x,y ) ex y d in x (3.12) = ( ) P W x,s → (x,s )εE ( ) where in(x) denotes the energy influx into node x. However, while normalised trust seems to be reasonable for models with non-weighted edges, unwanted effects occur when the edges are weighted. As an example, the amount of energy that node a in Figure 3.9 accords to its successors b and d is identical. b has made only one trust statement, W (b,c) = 0.25, telling us that its trust in node c is rather weak, whereas d issues full trust in all its successors. By applying the edge weight normalisation described above, the node c will be trusted three times as much as e , f and g , which is not reasonable. To overcome this problem, Appleseed makes use of backward propagation of trust to the source: when the metric computation takes place, additional ‘virtual’ edges(x,s ) from every discovered node x ε V s to the trust source s are created and assigned full trust (existing backward links along with\ their weights are ‘overwritten’ as every node is supposed to blindly trust the source, see Figure 3.10). The addition of these backward links helps to make the trust distribution much fairer: coming 0.25 back to the previously described example, c obtains now an energy of ea b d from = 1+0.25 1 → the source s , while e , f and g receive ea d = d 4 each and thus, their trust rank is 1.25 times the trust assigned to c. In general, backward→ links are more favourable for nodes close to s as their eventual trust rank will increase, while on the other hand, nodes further away from s will be penalised. A further advantage of adding backward links is that dead ends (nodes with zero outdegree) are avoided making the computation scheme easier as no special cases require attention. After making these adjustments to tailor the initial spreading activation model for local group trust metrics capturing trust and the lack of trust, Appleseed’s underlying algorithm can be described as follows (see Appendix C.5 for pseudo code): The algorithm works with partial trust graph information which only access new nodes if they are reached by the energy flow, + taking as input the trust seed s ε V , the trust injection e ε R0 , the spreading factor d ε [0,1] and the accuracy threshold tc ε R+ serving as a convergence criteria. At the beginning all trust ranks corresponding to the energy in the node are initialised to 0. As soon as a node is discovered for the first time, a virtual edge for backward propagation from it to the source is added. In every iteration, for all nodes x reached by the flow, the amount of incoming trust is calculated according to the following formula: ! X W (p,x) in(x) = d in(p) P (3.13) p,s εE W (p,s ) (p,x)εE ( ) 26 Chapter 3. State of the Art

and thus, incoming flow for x is determined by all flow that predecessors p distribute along the edges. Taking the spreading factor d into account, the trust rank of x is updated as follows:

trust(x) trust(x) + (1 d ) in(x). (3.14) ← − Due to the fact that the computation of in(x) is recursive and several iterations will be required to make calculated information converge to the least fix point, a termination criteria must be satisfied for convergence: Suppose that Vi V represents the set of nodes which have been ⊆ discovered until iteration i , and trusti (x) are the current trust ranks for all x ε V . Then, the algorithm terminates after iteration i if no new nodes have been discovered (Vi = Vi 1) and if the changes of trust ranks with respect to their prior iteration i 1 are not greater− than the − accuracy threshold tc ( xεVi : trusti (x) trusti 1(x) tc ). Besides the algorithm∀ discussed above,− Ziegler− and≤ Lausen published an extended version in 2005, directly incorporating distrust into the iterative process of the Appleseed trust metric computation. They adapted the trust normalisation in a way such that the trust quantity is computed as follows: ex y = d in(x) sign(W (x,y )) w, (3.15) → where W x,y q w ( ) (3.16) = P | W x| ,s q (x,s ) E ( ) ∈ | | and sign() is a newly introduced function returning the sign value of x. In contrast to the previously given definition, W : E [ 1,1] to permit the expression of distrust. To avoid negative nominators and unduly boosted→ − positive trust statements (for example if the sum of positive trust ratings slightly outweighed the sum of negative ones causing the denominator to converge towards zero), the absolute values W (x,y ) of all weights are considered for a relative weighting. | | To give an example for the distribution of trust and distrust, suppose node a in Figure 3.11 would have an energy influx of in(a) = 2, the global spreading factor is d = 0.85 and for simplicity, backward propagation of trust to the source is neglected and the weight normalisation is linear (q = 1). In this scenario, the denominator of the normalisation equation is 0.75 + 0.5 + 0.25 + 1 = 2.5 and thus, the trust energy that a propagates | | | − | | | | | to its trusted successor nodes, b, d and e , is ea b = 0.51, ea d = 0.17 and ea e = 0.68, → → → respectively, whereas the energy distributed to the distrusted node c is ea c = 0.34. Note that during this distribution, trust energy becomes lost (1.7 was provided whilst→ − the sum of energy accorded along outgoing edges of a amounts to 1.02), resulting in a negative trust weight W (a,c) = 0.5. Using the definition− given above, there occur two problems which can be explained supposing a negative influx in(x) for the node x which causes an allocation of a negative trust, in(x)(1 d ) < 0, to x. First, as the energy, in(x) d , that x may distribute among its successors will be negative− as well, the algorithm would propagate· distrust. This would mean that if a user a distrusts b which trusts c, a would distrust c as well simply for being trusted by a user that a distrusts which does not reflect a real-world behaviour. Second, distrust is seen as the negation of trust which implies that if user a distrusts user b and user b distrusts user c then a trusts b. But common sense tells us that the enemy of our enemy is not always our 3.2. Trust Metrics 27

E N 0.75 1 M 0.25 0.25 D A -0.5 0.75

-0.25 F C B

0.75

G

Figure 3.11: An example for the distribution of trust and distrust in Appleseed. friend. An example for both unwanted effects can be constructed using the graph given in the Figure 3.11. Assume node c has a trust energy influx of in(c) = 0.34. Noting that the trusted agent a distrusts c which distrusts f , f would be assigned a− positive trust value of d ( 0.34) ( 0.24). On the other hand, node g which is trusted by f would be accorded a negative· − trust· − of d ( 0.34) (0.75). However, the two· − problems· can be solved by introducing a novel function, out(x), which replaces the term d in(x) when computing the energy distributed along edges from x to successor nodes y and· which does not allow distrusted nodes to distribute energy: ( d in(x) if in(x) 0, out(x) = (3.17) 0 otherwise.≥

This leads to the modified equation

ex y = out(x) sign(W (x,y )) w, (3.18) → where W x,y q w ( ) (3.19) = P | W x| ,s q (x,s ) E ( ) ∈ | | which does not change the behaviour with respect to the Appleseed algorithm published in 2004 in which relationships of distrust were not considered. One of Appleseed’s biggest advantages is that it is scalable for huge network sizes as its performance rather depends on value of the energy injected into the system and the spreading factor than on the number of nodes in the network. There are also extensions available to make the computation even faster: to hinder trust energy from overly covering vast parts of the entire network, the number of nodes which will be discovered can be limited. Another possibility to gain large speed-ups is to define an upper-bound for the path length (according to Milgram’s ‘six degrees of separation’ paradigm, a maximum path length between three and six would be reasonable). Furthermore, Appleseed is highly resistant against attacks both through the introduction of a spreading factor, the normalisation of the trust statements and the fact that it satisfies the bottleneck property: nodes can not raise their impact by modifying the structure of trust statements they issue as the amount of trust accorded to an agent a only depends on its predecessors and does not increase when a adds more nodes. On the other 28 Chapter 3. State of the Art

hand, in addition to suffering from the same problems than most trust metrics, the Bootstrap problem and, in the case of local metrics, the introduction of a new user, finding appropriate values for Appleseed’s parameters is rather difficult as spreading activation models are not commonly used in trust calculations and thus, experience of the user will be needed. Chapter 4

Architecture

This chapter begins with an introduction to the terminology of the openSUSE Build Service (see Section 4.1), followed by a brief overview of its existing, rather complex, components in Section 4.2. Section 4.3 covers the definition of trust for binary software packages. The way it is computed using the Appleseed trust metric is explained in Section 4.4. Finally, Section 4.5 covers the software contribution of this thesis, the openSUSE Trust Service.

4.1 openSUSE Build Service Terminology

The main purpose of the openSUSE Build Service is packaging. Therefore, the most relevant entity is a package. It represents all data required to build an installable software package mainly consisting of a set of source files, meta data files and control files. The source files of a software product are generally provided as a compressed tar archive containing the source code from an upstream project. In addition, several patches might exist to modify the code with the objective to fix bugs or security issues, or to gain better integration with other parts of a . The required control information provides the specification how to generate binary packages from these sources. It has to exit either in form of so called spec files to build binary packages in RPM Package Manager format [Foster- Johnson, 2005] or, to build a package in Debian software package format [Dassen et al., 2008, Chapter 7.1], as Debian source control file (dsc) in combination with further required files like the Debian rules file. The third type of files, the meta data, describes the contents of the package, provides build information and grants access to package maintenance. A full description of all meta data can be found in the unreleased openSUSE Build Service book Gunreben et al. [2007, Chapter 2]. To organise these packages, the openSUSE Build Service introduces the concept of projects collecting software packages developed by a team of packagers, who may also be referred to as maintainers. Thereby, the build service distinguishes between so called home projects and general projects. While every maintainer owns a generic home project at home:MAINTAINER’S LOGIN, where he can work on software packages he is interested in, a general project is meant to be a collection of packages belonging together to form a somehow connected or themati- 30 Chapter 4. Architecture

Amarok HEAD

KDE Stable , kdelibs, kdebase, ... HEAD

SLE11

Figure 4.1: This example shows the project KDE:Stable (blue). Besides a list of maintainers, it contains sources for the KDE base system like QT, kdelibs and kdebase. For this project, the build repositories ‘openSUSE Factory’, ‘SUSE Linux Enterprise 11 (SLE11)’ and ‘Red Hat Enterprise Linux 5 (RHEL5)’ are activated for the architecture i586. The openSUSE Build Service tries to build all packages for all activated build repositories resulting in three repositories ‘openSUSE Factory-i586’,‘SLE11-i586’ and ‘RHEL5-i586’ (yellow). Hence, the addition of further build targets or architectures like x86_64 would result in further repositories. To reduce duplicates of both, sources and binary packages, the openSUSE Build Service has the ability to link sources from other projects (source link) and include binary packages from repositories (aggregates). The exemplary KDE project could contain a source link to source packages for KDE: from another project to rebuild the music player with an own build configuration. There may also be an aggregate to include binary packages for KDE:Kaffeine from another repository. cally consistent package set. This set can range from projects containing just a single program such as ‘Mozilla Firefox’ or ‘screen’ to broader collections such as devel:languages:python, GNOME:STABLE, KDE:KDE4, or even openSUSE:Factory (the development tree of openSUSE). For each project, at least one maintainer exists who can nominate or remove other maintain- ers and create or delete packages and sub-projects. Although the access control mechanisms are not sophisticated yet, a system for defining different roles has been implemented to give certain levels of access rights to users who work on the same package or project. In addition to uploading all information required to build a package, a maintainer may also link sources of a package from another project to save storage space and, more important, delegate the maintenance of this piece of software to another project representing a team of packagers. By linking a package, all changes from the origin are inherited automatically but there are ways to modify the way the package is build. A different way to import packages from another project are aggregate links which instruct the build service to copy the binary package from the original project. Hence, there is no way of modifying, meaning the same RPM will appear in both projects. Experience taught that aggregates are less confusing to users as well as less resource consuming than source links, thus, they should be used where possible. 4.2. Major Components of the openSUSE Build Service 31

In addition to the sources for packages, each project must initially contain at least one build repository for which these are meant to be built. The build repositories are used to create the build environment. In general, these build repositories can be seen as base distributions such as openSUSE 11.1, openSUSE Factory and releases from competing distributions. More advanced build repositories can be created by combining these base sets with further pack- ages like non-free parts of distributions or binary packages from other build service projects. If a package depends on another one at run time (i.e. "Requires" in RPM specification), it will often also have dependencies during build time (i.e. "Build-Requires"). However, the build service does not automatically add packages from build dependencies that can not be found in the specified build repository. In the case that a build requirement is missing, a build failure would be the most likely result. All resulting packages from a project build with a particular build environment are col- lected in a repository. Each repository contains the binary packages for the project it repre- sents combining compiled programs, libraries, support files and documentation, the sources of these packages and debug information. A project with more than one build repository results in several repositories holding the same packages build for different distributions or products (see Figure 4.1). These repositories can be automatically published which is why they are also known as installation sources. It is the duty of the maintainer to ensure that all dependencies are available by choosing the right sources and build repositories for building a package. All published packages are distributed from the main download location to several mir- rors world-wide using MirrorBrain [Poeml, 2008a]. This extensive software collection can be easily searched using the software search at http://software.opensuse.org/search. The results page of this website provides convenient 1-click install buttons. These enable users of SUSE based distributions to instruct their package management system to add the required repositories to its list of software sources and install the selected package including its dependencies with just a few clicks.

4.2 Major Components of the openSUSE Build Service

The openSUSE Build Service can be divided into three major components: the back-end, a well protected, behind-the-firewall server farm that performs the package building process, the front-end providing a public interface to these servers, and a variety of clients (see Figure 4.2). Today, there are several types of clients available providing convenient user interfaces. Probably most important for developers is the official command line client osc enabling users to work with the build service similar to widespread revision control systems like Subversion [Pilato et al., 2008]. A user can for instance check-out the project KDE4:Stable using osc co KDE4:Stable or update his working with osc up. As a special feature, the command line client is able to build packages locally before uploading them to the web-service having the advantage that build results are available instantaneously. Due to the fact that developers do not have to wait for assignment of central resources, which may take hours when the service is highly utilised, they save time and can run experiments much more efficiently. At the same time, this feature saves resources on the compute cluster of the build service. 32 Chapter 4. Architecture

Web Client Command line Software search Installers build.o.o client “osc” software.o.o (e.g. YaST)

Mirror openSUSE API Interface obs Frontend

Scheduler Source Server Dispatcher Build obs Backend Repository Server Signing Server Hosts

Figure 4.2: Major components of the openSUSE Build Service. All clients facilitate the openSUSE API (front-end) to communicate with the build service. The front-end hides the complexity of the back-end which handles the build process, manages source files and binary packages and interacts with the mirror system.

To locally build binary packages, the command line client retrieves binary packages from the build system which performs the dependency resolution. It installs them into a local chroot environment in which the package is build. By using a chroot environment, osc is even capable of building packages for distributions other than the one it is executed on. In addition to the text interface, there exist a broad variety of graphical clients like a web-client written in Ruby on Rails [Ruby et al., 2009] (see Figure 1.1), several desktop applications and even an Eclipse plug-in. The major advantage of this broad collection of clients is that each of them places emphasis on different use cases. The openSUSE Build Service Wiki page1 contains a well maintained section presenting all available client implementations. All clients communicate with the build service using simple HTTP operations through the front-end, also known as openSUSE API. The API implements a well-defined interface2 to the build service following the REST scheme to retrieve, create, modify and delete resources which are either source files, package files or meta, control and status information in the form of XML data. In addition, functions like triggering a rebuild, tagging packages and projects or similar operations which are not represented as data files are available. However, the main purpose of the front-end is to ensure data consistency and security for the build service by validating all uploaded data using XML schema checks and imposing restrictions on uploaded content. The advantage of using rather simple HTTP operations is that a broad variety of existing programming languages, frameworks and tool chains can be used to access the openSUSE API in an efficient way. Thereby, most operations on the API require a valid user authentication. For this reason all requests are filtered by a Novell iChain proxy3 which checks user credentials and ensures that only authenticated requests are served. The

1http://en.opensuse.org/Build_Service 2https://api.opensuse.org/apidocs/ 3Proprietary security and authentication solution http://www.novell.com/products/ichain/overview.html 4.2. Major Components of the openSUSE Build Service 33

official openSUSE Build Service facilitates Novell iChain to be able to use the same account information than many other openSUSE and Novell services. To run an instance of the build service without an iChain server, the API can also maintain its own user database mainly for authentication and permission control. The front-end itself interacts with the back-end, the heart of the system which handles the sources, organises and schedules the build processes and manages repositories. The back- end is mainly written in Perl and consists of several per-purpose servers which communicate with each other using HTTP in order to build new packages. Due to the fact that not all of these components are involved in the implementation of the openSUSE Trust Server, only the relevant ones are described here. More detailed information on the back-end can be found in Schröder et al. [2006] and Schröder [2007]. Two main servers exist for data storage: the source server and the repository server. The source server stores all sources, meta data including spec/dsc files and build configurations. The repository server analogously handles the binary packages. All source files are stored versioned, thus, a history of older versions can be retrieved easily. Both servers provide convenient methods to put and retrieve data using HTTP requests. The scheduler monitors uploaded sources and awaits build triggers which can have several causes like changes to binary packages used to build another package or even user requests. In order to save compute power by avoiding unnecessary builds, the scheduler scans the dependencies of all packages to be built. As an example, consider a package that depends on others. If some of these packages need to be rebuild themselves, the build of the original is postponed. After computing a job sequence, the scheduler writes these new jobs to a job queue where the job dispatcher takes them and assigns them to idle build hosts. These build hosts create a new clean build environment per job which needs to be sand-boxed because a user may use the build service to build malicious code. Thereby, the build process itself is not the dangerous process because it is executed by a non-privileged user. The installation of binary packages, the build environment consists of, is much more dangerous as this procedure has to be done as superuser. Therefore, the openSUSE Build Service facilitates the Xen hypervisor [Barham et al., 2003] to create a virtual machine for each build. To build a package, the build host retrieves source code, patches, and its build configuration from the source server and all required binary packages from the repository server. After installing them into a virtual machine image, it boots the created virtual machine and executes the build process. Finally, the newly created binary packages get submitted along with a detailed log file to the repository server which takes care that each binary package gets signed by the signing server4. Finally, as soon as the build and signing process for a whole repository is finished and publication has not been disabled, the publisher pushes it to the mirror interface of the build service (see Schröder et al. [2006, Section 6.5] and Poeml [2008b]). The back-end becomes more flexible and scalable by following the Unix principle to build smaller, per purpose components. A small instance of the build service can be run on one machine by installing all back-end services and the front-end on it. The official openSUSE Build Service provided by Novell uses several physical servers and storage subsystems for the back-end as well as another dedicated machine for the openSUSE API. Obviously, there is a reasonable sized cluster consisting of machines with different hardware architectures

4The signing server is not a part of the back-end, it is included in the openSUSE Build Service source repository. 34 Chapter 4. Architecture

(currently Intel Pentium (i586), AMD64 (x86_64) and 64-bit PowerPC) configured as build hosts. On the other hand, these complex interactions between many components result in a steep learning curve for understanding the back-end in detail. Fortunately, the openSUSE API hides this complexity so that, in addition to studying packaging rules and guidelines, the users only need to learn the usage of at least one of the clients to benefit from the advanced packaging process the openSUSE Build Service offers.

4.3 Trust for Software Packages

Most packages in the build service are developed by multiple packagers and, thus, the trust of a package should be dependent on the trust placed in each contributor. The trust of a package is further recursively influenced by the trust value of all packages which are used in the build environment (either a base package of the environment or one which has been added as a build dependency). In addition, each package is built using an individual build configuration whose editors have to be considered in the trust formula. All of these different factors should be reflected by the trust value of a package. Therefore, we define the trust for a binary software package pbuild as   trust(m1),...,trust(mn ), trust(pbuild) = min trust(build configuration editor), (4.1)  trust(p1),...,trust(pm ) where m1,...,mn are maintainers contributed to a package and p1,...,pm are packages from the build environment and build dependencies of the package to build (pbuild). Although this definition is a very pessimistic approach to define trust for software pack- ages, it ensures that the computed values never increase beyond the lowest trust estimation for any involved developer. On the other hand, the trust value of formula 4.1 can only decrease with time as more and more trust values for users will be added to the term. To avoid the problem that the trust in the whole system will be strictly monotonic decreasing, there has to be a way to raise the trust value for a package. For this purpose, we introduce the con- cept of reviews of source packages into the openSUSE Build Service. Similar to the software development process, a reviewer can inspect the source code, meta data and specification. After ensuring that the examined source package seems trustworthy to him, he may label this version as reviewed. Despite the fact that a review only considers all contributions to the source package, described by the first line of the above formula, it has the effect that the trust value of this package may raise up to a certain amount of trust value of the reviewer. This procedure leads to a change of the trust formula

n  o max trust(r1),min trust(m1),...,trust(mn ) , (4.2) where r1 is a reviewer of the package who reviewed all check-ins until his review. Using distributivity of the minimum and maximum functions, this term (4.2) can be transformed to

n   o min max trust(m1),trust(r1) ,...,max trust(mn ),trust(r1) (4.3) 4.4. User-specific Trust 35

which can be interpreted as follows: all changes of each maintainer (m1,...,mn ) have been reviewed individually by reviewer r1. To generalise it in a way that more reviews by different reviewers can be considered, each trust(mn ) value of the trust formula (4.1) has to incorporate all reviews for the given maintainer. The new term trust’(mn ) contains not only the trust value of the maintainer mn but also those of all reviewers r1,...,rm that reviewed his contributions:

trust(mn ) trust’(mn ) = max trust(mn ),trust(r1),...,trust(rm ) . (4.4) −→ { } Thus, the resulting definition for trust in a binary package pbuild is   trust’(m1),...,trust’(mn ), trust(pbuild) = min trust(build configuration editor), (4.5)  trust(p1),...,trust(pm ).

As a repository is a collection of software packages, its trust value is defined as the mini- mum trust of all included packages.

4.4 User-specific Trust

On closer inspection, the previously described formal definition of trust for a software package in the openSUSE Build Service simply relies on trust between users in the system. We decided to apply this concept since most definitions of trust are linked to humanity as an independent entity (see Section 1.2.1). Thereby, the decision process whether or not or how much to trust a particular person often seems to be automatic and unconscious and mostly driven by morals, ethics and emotions [Marsh, 1994]. Due to the fact that trust can be expressed in many different ways and relies on various factors, it usually shows a range of different strengths rather than fitting in the conception of trust or non-trust [Marsh and La, 1992; Marsh, 1994]. As a result, a binary representation of trust should be avoided. Another important observation is that trust is a subjective concept. On account of this, personalised trust values should be computed including not only the trust value but also its origin. Considering the simple example whether or not the president of the United States is trustworthy, puts the idea of a personalised model across: some people might have strong faith in him whilst others might not and therefore, an average trust rating of a controversial person would not be helpful to either group. For this reason, the provenance information for the trust affirmation should be combined with the direct trust ratings assigned by the source to weight ratings from trusted people higher than from untrusted ones. As trust metric for the openSUSE Build Service, we implemented Appleseed (described in Section 3.2.5) as it allows us to compute local trust values in a non-binary fashion. An investigation supporting the choice of a local trust metric is Stanley Milgram’s ‘small world theory’ [Milgram, 1970], commonly referred to as ‘six degrees of separation’. Milgram stated that members of any large social network are connected to each other through at most six intermediate users. Relating his concept to our work, we can conclude that the length of the average trust path connecting two users in the openSUSE Build Service is small, implying that a local trust metric can be easily used. The trust network requires only partial exploration when calculating the trusted neighbourhoods: originating from the source, just trust edges 36 Chapter 4. Architecture

that seem to be promising will be followed (for instance edges with high trust weights) and, thus, the computation should converge very fast. Furthermore, we advocate Appleseed’s group trust metric as it brings the advantages of performing parallel evaluations of groups of trust assertions in contrast to a scalar trust metric as well as the opportunity to extend the algorithm to incorporate the concept of distrust. A remaining problem is the appearance of social pressure in social networks. This leads to the problem that trust ratings become less fair as users may feel obligated to give high trust rates to their friends or even change their own trust statements if they are publicly available to the rest of the community. Because social networks rely on honest information in order to work properly, privacy in networks is an important issue. Even if distributed approaches have the advantages that computed trust information for a user in the system is immediately available and that users have to disclose their trust assertions only to peers they trust [Richardson et al., 2003], we preferred to keep all trust information stored in a central place as the openSUSE Build Service is offered as a web-based service. In this way, all users only require to disclose all trust information to a central trust server but not to other users minimising social pressure and assuring privacy.

4.5 Design of the openSUSE Trust Server

To integrate a trust metric into the openSUSE Build Service, a software component is required to store and manage trust relations between users, calculate trust values using the Appleseed trust metric and store and solve trust formulae. It is not trivial to add these new features into a distributed project like the openSUSE Build Service without causing too many dependencies. Thus, we chose to create a new autonomous module to introduce as few dependencies as possible: the openSUSE Trust Server. This design follows the Unix philosophy of one tool per purpose and enables a seamless integration into the existing build service ecosystem requiring only relatively small changes in the code base as long as the trust server facilitates a RESTful interface [Fielding, 2000, Chapter 5] for inter-service communication like every build service component. Another advantage of a stand-alone trust server is that the migration to a trustful openSUSE Build Service can be done in distinct small steps. All existing components may be altered successively to facilitate the functions of the trust system taking stress away from the core developers to provide a new version at a given time. The reference implementation of the trust server is written in Ruby on Rails [Ruby et al., 2009]. We chose this web framework because it provides easy methods to write a web service offering both an appealing web-interface and a RESTful URL scheme for exchanging informa- tion using XML and further data formats. Another decisive factor was the existence of two build service components, the web-client and the openSUSE API, written in Ruby on Rails. Furthermore, the openSUSE project uses this framework for applications like the openSUSE User Directory5, making Ruby on Rails a major building block of the openSUSE infrastructure. Thus, we hope to lower the barrier for free software developers to contribute to the openSUSE Trust Service.

5https://users.opensuse.org 4.5. Design of the openSUSE Trust Server 37

Figure 4.3: openSUSE Trust Service web-interface listing trust relations for the user ‘mjung’.

4.5.1 Management of Trust Relations

One of the main functions of the trust server is the management and storage of the trust state- ments between users of the build service. To gain maximal control which trust judgements are presented to a user, we decided to store trust information data on a secured host. For both privacy reasons and to encourage users to provide honest trust ratings, we only present a user a list of trust statements he made himself on other packagers as well as one containing all users that gave a rating about him (see Figure 4.3). However, to avoid social pressure, the values of the latter one will not be disclosed. Accordingly, the use cases for managing trust relations are their creation, modification and deletion as well as their presentation as described above. The essential information a trust relation is comprised of are a truster, the user providing a trust statement, the confidant and the given amount of trust. Thereby, the user can choose between six trust levels: full/absolute, high, medium, low, minimal, and none which are linearly mapped to the range [1,0]. A natural understanding of these trust levels could be derived from Zimmermann’s definition of the trust levels in PGP.Exemplary, the holder of a medium trust level could be seen as a packager who understands the basic principles, rules and implications of package building and properly applies patches and bug fixes. In contrary, the owner of a full trust rating has an excellent understanding of the packaging process and represents high standards in packaging. In addition to this trust value, the trust system requires minimal knowledge about each user: login, real name and internal user id. This information is already present in a relational database facilitated by the openSUSE API (front- end). To avoid duplicate storage of data with all its drawbacks, like data inconsistencies, 38 Chapter 4. Architecture

1

2

3 12927

4 304

5 1742

6 0.8

7 Met her at the openSUSE conference. I think she is a very conscientious packager.

8 2008-08-10T08:15:42Z

9 2008-20-11T23:47:11Z

10

11

12

13 24342

14 304

15 5311

16 0.2

17 He does not take much care of his packages and barely provides patched security or bug-fix releases.

18 2009-04-01T11:19:04Z

19 2009-04-01T11:19:04Z

20

21 Listing 4.1: XML representation of the trust relations of user 304. The first relation gives high trust (0.8) to user 1742, the last one minimal trust (0.2) to user 5311. This information can be retrieved from the trust server by sending a HTTP-GET request to https://trustserver/relations.xml. the trust server utilises this information. For performance reasons, we favoured a direct, read-only database connection over a HTTP based XML interface to the user information. Thus, some minor code modifications in the trust server might be required, in case the schema of the table containing the user information in the database changes. However, these modifications will most likely be trivial due to the usage of the Ruby on Rails framework or even avoidable by the introduction of a database view. Furthermore, each relation may contain a description to add some notes and remarks. These descriptions are kept private and are only visible to the truster. All trust relations are stored in the relational database of the trust system including some management information like timestamps for its creation and last modification. The trust server provides access to these relations to authorised users. Similar to all openSUSE services, the trust system also uses the Novell iChain proxy for this task. A valid user may either access the information using the web-interface or retrieve them as XML data. Both interfaces provide a listing of all trust statements the logged-in user has made and methods for creating new statements, updating, or deleting existing ones. Listing 4.1 shows an exemplary XML representation of the trust relations of a fictitious user which is provided on a HTTP-GET request for the URL https://trustserver/relations.xml. To give another example, a trust relation can be deleted by its owner, the truster, by transferring a HTTP-DELETE to https://trustserver/relations/RELATIONID.xml. The behaviour to 4.5. Design of the openSUSE Trust Server 39

create, view or update records follows the same scheme using HTTP-GET and HTTP-PUT requests. This XML interface for managing trust relations has been developed with two goals in mind: first to enable all build service clients to incorporate support for the trust system and second, to have the opportunity to integrate all trust related activities into the openSUSE API. Until the majority of clients support the trust system, the embedded web-interface of the trust server fills this gap. It provides from start a comfortable user interface for working with the trust system.

4.5.2 Management and Storage of Trust Formulae

As previously discussed in Section 4.3, the trust of each package is dependent on the trust value of all packagers and reviewers who contributed to the source package, all packages which are used in the build environment as well as the last editor of the build configuration. The concrete trust formula for a package is computed at build time, as the information which packages have to be installed into the build environment is only available at this point. Trust ratings for users, on the other hand, should rather be calculated in real-time than using the values from the last build of the package. The main reason is that the last build might be months ago and the trust in a user may have fallen dramatically due severe incidents since then. These occurrences should be incorporated in the trust of all packages this person has contributed to, even those that were build a long time ago. To fulfil these requirements, the build host of a package has to compile a trust formula containing variables for users and packages during build-time. For this purpose, it parses the changelog of the source package to retrieve a list of all contributions and reviews. This information is required to generate the first line of the minimum term of formula 4.5. As long as the back-end does not store the last editor of the build configuration, the middle term trust(build configuration editor) is substituted by all maintainers of a project because they have the permission to modify the build configuration. The collected list of trust information for all packages installed in the build environment determines the third row of the trust formula. In a next step, the formula is send to the trust server which is responsible for its storage and evaluation. Each received formula is normalised by removing duplicate user and package identifiers and sorting the remaining variables lexicographically. The resulting term is stored using its MD5 sum as primary key. In combination with the URL of the trust server, this key forms the uniform resource identifier (URI) for the trust formula. The URI is returned as result for each successful formula storage request and has to be used to retrieve and solve a formula. The URI https://trustserver/formulas/6c57405d5c513d108fcab913649595c6 would in example present the formula with the md5 6c57405d5c513d108fcab913649595c6. The unique identifier for the trust of a package needs to be associated to it in an immutable way. For this reason, the URI is directly integrated into the meta information of the package, the package header. Even though this procedure raises the complexity of the implementation, it brings several advantages: first of all, the trust information, in form of the unique trust formula identifier, is instantaneously available without the requirement of looking it up on a second location. The extraction of this information from the package header is rather simple because there exist libraries for all major programming languages to access both the RPM 40 Chapter 4. Architecture

API6 and the Debian package management system. For instance, if a user finds a package stored on a web-server or local volume, he is still able to extract the trust information out of the package itself. Secondly, as described in Section 4.2, each package built by the openSUSE Build Service is signed by the signing-server before it is available for downloading. Hence, the signature of a package could also be used to verify the integrity of the relation between each package and its corresponding trust formula identifier. Although the compilation of the trust formula has to be done by the build host, the trust server is responsible for storing and retrieving them. Therefore, it provides just two of the typical CRUD (create, retrieve, update and destroy) operations used in RESTful applications. However, only the repository server of the back-end is authorised to create new formulae, which may never be changed or deleted. As users only have the permission to retrieve and solve these formulae, the methods to delete or change them are not available through any public interface.

4.5.3 Solving of Trust Formulae using Appleseed

The trust server starts to solve a trust formula to obtain the trust value for the package of interest by performing an Appleseed calculation (an example is given in AppendixB). The user who requested the trust value is taken as source for the energy injection e . The algorithm returns ranked results in form of tupels consisting of the users and their corresponding trust values. The obtained trust values can range from 0 to n N with n < e . As they are hardly comparable, the trust server scales them linearly in a range∈ [0,1] neglecting the trust value of the source user in the result set. These values are then used by the trust server to solve the trust formula. Due to the fact that usually every package depends on other ones, two rules were intro- duced to overcome the problems arising through recursion: first, packages from the base build system of official distributions always receive full trust and thus, they are neglected in all trust formulae. We justify this decision with the assumption that every user trusts the producers of his distribution (otherwise he would not use it). Secondly, in case a formula could be solved but the required trust information for some maintainers is not available because they are not covered by the individual trust network, the formula will still be solved. However, it will be tagged that the result depends on partial trust information. This rule has been introduced to overcome initialisation issues in the starting phase when the trust network most likely will not be very dense. The reference implementation of the openSUSE Trust Server uses its database to cache values of all computed trust formulae to accelerate calculations and to save computing power. For this purpose, a new table was introduced containing the user id, the md5 sum of the formula, the trust value, the number of missing trust statements and a time stamp of the last computation. As a matter of course, the expiration of cache entries is configurable but we proposed a maximal value of one day. To solve a trust formula, the trust server provides a HTTP-GET method which returns the trust value either in XML or an appealing illustration as shown in Figure 4.4. All clients should present the resulting trust value together with its corresponding level (described in

6http://www.rpm.org/max-rpm/ch-rpm-rpmlib.html 4.5. Design of the openSUSE Trust Server 41

Figure 4.4: openSUSE Trust Service web-interface presenting a trust value for a package.

Section 4.5.1) because this representation will support the evaluation process of the user. For instance, the openSUSE Software Portal7 should present the trust values along with the search results supporting the users decision process which version of a package to choose. Another good example is the integration of the trust values in the package managers. They could store the trust values of all used repositories when they are added as installation source. In case the trust in one of them decreases below a certain threshold, the system alerts the user. Further features have been added in the reference implementation of the openSUSE Trust Server: the user may retrieve his personal trust network containing all scaled values from the Appleseed computation using himself as source. He can review the trust estimations of the metric or even download a visualisation of his network as vector graphic or in Graphviz format (.dot). In addition, the personal trust network can be searched for a particular user to get a trust prediction for this person. This function is practicable in several use cases in which the maintainer of a package has to interact with other individuals (for example when reviewing a source code check-in or a commit request). The trust prediction may support his decision process or be used as indicator to investigate contributions by marginal trusted parties. To foster usage of these features and not disturb existing work flows, a XML interface has been implemented to retrieve and query the personal trust network using build service clients. For instance, the presentation of the predicted trust value for a user sending a ‘commit request’ in the command line client osc would require just one additional HTTP-GET request to the trust server.

7http://software.opensuse.org/search 42 Chapter 4. Architecture Chapter 5

Validation

In order to evaluate the implemented algorithm and to compare it with the many other different trust metrics that have been proposed during the last couple of years (see Chapter3), networks are required for testing. Unfortunately, natural occurring networks need a long time to develop and they often show fixed topological features leading to the problem that testbeds and frameworks to evaluate, compare and analyse the various models under a set of representative and common conditions are frequently missing [Massa and Souren, 2008; Sabater and Sierra, 2005]. For this reason, it is necessary to automatically generate artificial networks to enable experiments on networks with different sizes and properties. The models used to create these networks are explained in Section 5.1. However, as synthesised data sets hardly represent real-world situations, we decided to further test the model on a real-world data set derived from the social network Advogato (see Section 5.2) as the data set for the openSUSE Build Service was not yet available.

5.1 Artificial Networks

In this section, we try to artificially model different aspects of actual networks. Perhaps the easiest useful models of networks are random graphs, first studied by Rapoport [Rapoport, 1957, 1968] and Erdös and Rényi [Erdös and Rényi, 1959, 1960, 1961], and the small-world model proposed by Watts and Strogatz [Watts, 1999a,b; Watts and Strogatz, 1998], both described in the Sections 5.1.1 and 5.1.2, respectively. In contrast to these two models, the third examined model, the one of Barabási and Albert depicted in Section 5.1.3, tries to understand how networks come to have certain characteristic, structural features.

5.1.1 Random Graphs

Random graphs, one of the simplest models for networks, are well studied by mathematicians [Bollobás, 2001; Janson et al., 1999; Karonski´ , 1982]. Among the first researchers who studied them intensively was Rapoport and his colleges [Rapoport, 1957, 1968; Solomonoff and Rapoport, 1951] who constructed ‘random net’, a model for large, random networks, and 44 Chapter 5. Validation

3 2 3 2

4 1 4 1

5 0 5 0

6 9 6 9

7 8 7 8

1 Figure 5.1: Two random graphs with n = 10 nodes. Each of the 2 n(n 1) possible edges is independently present with a probability p = 0.15 on the left-hand side and p = 0.3 on the right-hand− side, respectively.

Erdös and Rényi [Erdös and Rényi, 1959, 1960, 1961] who independently rediscovered the model a decade later. To construct a random graph, a network is created by placing undirected edges between a fixed number of n nodes at random, whereby each of the 1/2 n(n 1) possible edges is independently present with a probability p. The number of edges connected− to each node is distributed according to a binomial distribution or Poisson distributed when taking the limit of large n and holding the mean degree z = p(n 1) constant [Newman, 2003], since the presence or absence of edges is independent and thus,− the probability pk that a node has degree k is: n z k e z k n k − pk = p (1 p) − . (5.1) k − ' k ! The structure of a random graph varies with p: if p is small, there are only a few edges and all components are small. On the other hand, if p is large, the majority of the nodes are highly joined together building a single large component, whereas the remaining nodes occupy smaller components (see Figure 5.1). The mean number of neighbours with a distance l from a node in a random graph is z l and therefore, z l n if the entire network is encompassed. As a result, a typical distance through the network' is l l og (n) . This logarithmic increase in = l og (z ) the number of degrees of separation with the size of the network is typical for the so-called small-world effect, the observation that two users can be connected by only a short chain of intermediate acquaintances. We synthesised random graphs with 1,000, 5,000, 10,000 and 15,000 nodes n between which directed edges were placed at random. Each edge was independently present with a 3 5 7 probability of n 1 , n 1 and n 1 , respectively, and a weight was randomly assigned to each edge in the range− of [−0,1]. To incorporate− symmetric trust statements, 33% of all edges were inverted. The inverted edges were randomly chosen and obtained a randomly assigned weight. Self-edges and multiple edges between nodes were removed from the network. All constructed models exhibit low transitivity as well as low average path length which increased with the number of nodes and decreased with the number of edges in the network (see Table 5.1). 5.1. Artificial Networks 45

n p ∅ path length transitivity 1,000 3/n-1 5.26 0.0064 1,000 5/n-1 3.89 0.0108 1,000 7/n-1 3.39 0.0145 5,000 3/n-1 6.38 0.0010 5,000 5/n-1 4.80 0.0019 5,000 7/n-1 4.08 0.0027 10,000 3/n-1 6.92 0.0006 10,000 5/n-1 5.13 0.0010 10,000 7/n-1 4.42 0.0013 15,000 3/n-1 7.19 0.0004 15,000 5/n-1 5.34 0.0007 15,000 7/n-1 4.61 0.0009

Table 5.1: Average path length and transitivity of the generated random graphs with n nodes. Each edge was independently present with a probability p.

Every node in an artificial graph was used as a trust seed to perform an Appleseed trust network computation using the default parametrisation proposed by Ziegler and Lausen: a trust injection e = 200, a spreading factor d = 0.85 and an accuracy threshold tc = 0.2. Although the concrete trust values for each local trust network have been discarded, we collected data to measure the general performance of the algorithm on networks with a comparable size to the openSUSE Build Service. The number of nodes discovered in each run increased with the number of edges in the network. In the two small graphs with 1,000 and 5,000 nodes, it rose from 953 over 999 to 7 1,000 and from 4,796 over 4,984 to 5,000, respectively. At p = n 1 , the models were very dense and thus, the entire nodes in the network were reached. Likewise,− in the larger networks with 10,000 and 15,000 nodes, the algorithm missed out only three and eight nodes. Concurrently, the standard deviation increased with the size of the graph (e.g. from a mean of 149.28 to 3 712.54 to 1,379.56 to 2,205.06 in a network with p = n 1 and 1,000, 5,000, 10,000 and 15,000 nodes, respectively). It decreased with the number of edges− as the nodes became less isolated 3 5 (149.28, 31.59 and 0 on average for a model with 1,000 nodes and p = n 1 , p = n 1 and 7 − − p = n 1 , respectively) as depicted on the left-hand side in Figure 5.2. As− the network size increased, the maximal depth between the nodes increased too (from 3 8.90 to 9.87 to 10.95 to 11.37 on average for a model with p = n 1 and 1,000, 5,000, 10,000 and 15,000 nodes, respectively) whilst it decreased with the number− of edges making the network 3 denser (from a mean value of 8.90 to 5.82 to 4.96 for a model with n = 1,000 and p = n 1 , 5 7 − p = n 1 and p = n 1 , respectively) as illustrated in the centre of Figure 5.2. The standard deviation− exhibited− exactly the same effect: for example, it decreased from an average of 1.58 3 5 7 to 0.46 to 0.27 for a graph with 1,000 nodes and p = n 1 , p = n 1 and p = n 1 (analogue magnitudes for n = 5,000, n = 10,000 and n = 15,000), and− it increased− from 1.58− to 1.60, 1.71 3 and 1.86 on average for p = n 1 and n = 1,000, 5,000, 10,000 and 15,000, respectively. The average number of iterations− the algorithm required until termination slightly de- creased with an increasing degree of the network (from 24 iterations to 21). At the same time, the mean standard deviation decreased from 3.41 to 0.84 and 0.25 for n = 1,000 and 3 5 7 p = n 1 , p = n 1 and p = n 1 , respectively (values are similar for n = 5,000, n = 10,000 and n = 15−,000). On− the other hand,− the size of the network neither influenced the number of 46 Chapter 5. Validation 14 30 20000

p:3/(n−1) p:5/(n−1) p:7/(n−1) p:3/(n−1) p:5/(n−1) 12 p:7/(n−1) 25 p:3/(n−1) p:5/(n−1) p:7/(n−1)

15000 p:3/(n−1)

p:5/(n−1) 10 p:7/(n−1) 20 8 15 Depth 10000 Iterations Discovered Nodes 6 10 4 5000 5 2 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure 5.2: Appleseed performance on the generated random graphs. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. iterations nor their standard deviation (see right-hand side of Figure 5.2). 3 The maximal trust distributed in a model with p = n 1 dropped quickly in each iteration from an average of 30.00 in the first to 12.45 in the second,− 8.82 in the third and 5.20 in the fourth iteration. After the fifth iteration with an average of 5.04, the maximal distributed trust only changed marginally until it was lower than the threshold tc and the algorithm terminated (see Figure 5.3). The corresponding values for the first four iterations in networks 5 7 with p = n 1 and p = n 1 were 30.00, 7.91, 5.79 and 3.48, and 30.00, 5.88, 4.42 and 2.86, respectively.− These values− show that even if the maximal distributed trust per iteration was not influenced by the size of the network, it was dependent on its density as depicted in Figure 5.3. In the same way, the total number of nodes discovered in every iteration was only 3 5 7 affected by the network degree: for p = n 1 , p = n 1 and p = n 1 , the total nodes discovered by the breadth first search based algorithm− rose rapidly− per iteration− from an average of 5.79, 7.52 and 10.03 in the first, respectively, 19.3, 48.73 and 87.69 in the second, 72.73, 267.83 and 532.72 in the third, 235.03, 793.94 and 982.45 in the fourth, 555.04, 992.61 and 999.95 in the fifth, to 839.57, 999.97 and 1000 in the sixth, covering nearly all nodes in the graph (see Figure 5.4). 5.1. Artificial Networks 47 7 n: 1,000 p:3/(n−1) n: 5,000 p:3/(n−1) 6 n:10,000 p:3/(n−1) n:15,000 p:3/(n−1)

5 n: 1,000 p:5/(n−1) n: 5,000 p:5/(n−1) n:10,000 p:5/(n−1) 4 n:15,000 p:5/(n−1) n: 1,000 p:7/(n−1) n: 5,000 p:7/(n−1) 3 n:10,000 p:7/(n−1) n:15,000 p:7/(n−1) 2 Maximal Distributed Trust 1 0

0 5 10 15 20 25 Iterations

Figure 5.3: Maximal distributed trust per iteration on the random graph data sets. The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 p:3/(n−1) p:3/(n−1)

15 p:5/(n−1) 15 p:5/(n−1) p:7/(n−1) 7500 p:7/(n−1) 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 p:3/(n−1) p:3/(n−1)

15 p:5/(n−1) 15 p:5/(n−1) 500 p:7/(n−1) 2500 p:7/(n−1) Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure 5.4: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the random graph models. 48 Chapter 5. Validation

3 2 3 2

4 1 4 1

5 0 5 0

6 9 6 9

7 8 7 8

Figure 5.5: Left-hand side: A one-dimensional lattice with connections between all nodes (n = 10) separated by k = 3 or less spaces in the lattice. Right-hand side: A small-world model was created by rewiring a small fraction p = 0.2 of the links to new sites chosen at random.

5.1.2 Small-world Model

Similar to the random graph model which was described in the previous section, the small- world model, first presented by Watts and Strogatz [Watts, 1999a,b; Watts and Strogatz, 1998], is able to take a geographical component into account. Nodes in the network have positions in space and in most networks, it is a valid assumption that geographical proximity plays a role in the decision process whether or not two nodes are connected with each other. To incorporate this idea, the small-world model starts by building a network on a low-dimensional regular lattice of any dimension or topology (although the best studied one is the one-dimensional one as it is easiest to observe by eye). If we take a one-dimensional lattice of L nodes with periodic boundary conditions and connect each node to its neighbours which are k or less lattice spaces apart, we will get a network with L k edges similar to the one depicted on the left-hand side in Figure 5.5. In a next step, the algorithm· adds or removes edges to enable remote parts of the lattice to join each other by going through each edge in turn and, with probability p, moving one end of the edge to a new location chosen uniformly at random from the lattice whilst avoiding both the formation of double edges and the creation of self-edges (an example is illustrated on the right-hand side of Figure 5.5). For p = 0, we have a regular L lattice with a mean geodesic distance between nodes of 4k not showing a small-world effect, for p = 1, every edge is rewired to a new random location producing almost a random graph with typical geodesic distances in the range of l og (L) , whereas in between these two extremes, l og (k ) the model shows both, high transitivity and low path lengths, as Watts and Strogatz could show by numerical simulation. We simulated small-world models by building networks on one-dimensional lattices of 1,000, 5,000, 10,000 and 15,000 nodes. Each node was connected to its neighbours which were less than 3, 5 and 7 lattice spaces apart and a random weight in the range of [0,1] was assigned to each edge. A rewiring probability of 0.15, 0.3, 0.45, 0.6, 0.75 and 0.9 was used to enable remote parts of the lattice to join each other. In a next step, self-edges and multiple edges between the nodes were removed. Due to the fact that the generated edges in the original model were undirected but trust is expressed in a direction (from a trustee to a 5.1. Artificial Networks 49

n k ∅ path length transitivity 1,000 3 6.28 0.222 1,000 5 4.40 0.247 1,000 7 3.77 0.268 5,000 3 7.80 0.215 5,000 5 5.49 0.249 5,000 7 4.62 0.257 10,000 3 8.46 0.215 10,000 5 5.94 0.244 10,000 7 5.01 0.256 15,000 3 8.81 0.213 15,000 5 6.22 0.247 15,000 7 5.23 0.256

Table 5.2: Average path length and transitivity of the generated small-world models. Each model was created on an one-dimensional lattice containing n nodes. Each node was connected to all neighbours less than k lattice spaces away and a rewiring probability of p = 0.15 allowed remote parts of the lattice to join each other. confidant), we chose a direction for each edge at random (with a probability of 0.5 for each direction). Furthermore, we randomly inverted 33% of all edges to incorporate symmetric trust statements. A weight in the range of [0,1] was randomly assigned to each of these edges. Our generated models were in good agreement with the observation of Watts and Stro- gatz stating that small-world models show high transitivity and low path lengths. The first increased with k and decreased with n whilst the latter exhibited the opposite effect. The range of the average path length was thereby similar to those of the random graph models (see Table 5.2). An Appleseed trust network calculation was carried out for each single node in a small- world model using a trust injection of e = 200, a spreading factor of d = 0.85 and an accuracy threshold of tc = 0.2. Just like the random graph test, the concrete trust values have been neglected and more general data was collected to measure the overall performance of the algorithm on different networks with properties similar to those expected for the network of the build service. The number of discovered nodes was dependent on the density of the network: the more edges the network contained, the more nodes were reached by the algorithm. As a result, in a graph with degree k = 7, Appleseed was able to detect all nodes (the only exception was the model with 15,000 nodes where the algorithm missed out a single one). However, even in the less dense networks, it detected more than 98.9% of all nodes. Similar to the number of discovered nodes, its standard deviation decreased with the degree of the network. Consider the model with n = 5,000 nodes as an example: the standard deviation fell from an average of 350.51 to 0.02 to 0 (as the entire nodes in the network were discovered) for k = 3, k = 5 and k = 7, respectively. At the same time, the average standard deviation rose with the size of the graph (e.g. from 70.29 to 350.51 to 681.32 to 877.15 in a network with degree k = 3 and 1,000, 5,000, 10,000 and 15,000 nodes) as illustrated on the left-hand side in Figure 5.6. As a self-evident fact, the maximal depth between the nodes increased with increasing graph size (for instance from an average of 9.71 for a model with degree k = 3 and n = 1,000, to 12.20, 12.99 and 13.94 for n = 5,000, 10,000 and 15,000 nodes, respectively). In contrast, the maximal depth as well as its standard deviation decreased with the density of the graph. As 50 Chapter 5. Validation 30 20000

k:3 p:0.15 15 k:5 p:0.15 k:7 p:0.15 k:3 p:0.15 k:5 p:0.15 k:7 p:0.15 25 k:3 p:0.15 k:5 p:0.15 k:7 p:0.15

15000 k:3 p:0.15 k:5 p:0.15 k:7 p:0.15 20 10 15 Depth 10000 Iterations Discovered Nodes 10 5 5000 5 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure 5.6: Appleseed performance on the generated small-world models. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. an example, consider the network with 5,000 nodes: the maximal depth fell from an average of 12.99 to 8.34 to 6.99 (mean standard deviation of 1.17, 0.49 and 0.18) with an increasing degree of k = 3, k = 5 and k = 7. The two effects are depicted in the centre of Figure 5.6. Similar to the random graph models which we have been examined in Section 5.1.1, the average number of iterations Appleseed required until termination was reduced with a higher network degree (from 24 iterations to 21). Likewise, the mean standard deviation decreased from 1.76 to 0.46 to 0.35 for n = 1,000 and k = 3, k = 5 and k = 7 (values were similar for n = 5,000, n = 10,000 and n = 15,000). In contrast, the size of the network influenced neither the number of iterations nor their standard deviation (see right-hand side of Figure 5.6). The maximal trust distributed in each iteration was independent of the number of nodes in the model but influenced by the network degree (see Figure 5.7). Exemplary, consider the small-world graphs with n = 1,000 nodes: the maximal distributed trust in a network with k = 3 fell rapidly from an average of 30.00 in the first to 11.92, 9.01, 5.03 in the fourth iteration but even quicker from 30.00, to 7.51, 5.89 and 3.54 and from 30.00, to 5.60, 4.47 and 2.94 with an increased degree of k = 5 or k = 7. After the fifth iteration, the maximal distributed trust only changed marginally in all considered models until the algorithm stopped. Similar to 5.1. Artificial Networks 51 7 n: 1,000 k:3 p:0.15 n: 5,000 k:3 p:0.15 6 n:10,000 k:3 p:0.15 n:15,000 k:3 p:0.15

5 n: 1,000 k:5 p:0.15 n: 5,000 k:5 p:0.15 n:10,000 k:5 p:0.15 4 n:15,000 k:5 p:0.15 n: 1,000 k:7 p:0.15 n: 5,000 k:7 p:0.15 3 n:10,000 k:7 p:0.15 n:15,000 k:7 p:0.15 2 Maximal Distributed Trust 1 0

0 5 10 15 20 25 Iterations

Figure 5.7: Maximal distributed trust per iteration on the small-world data sets. The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 k:3 p:0.15 k:3 p:0.15

15 k:5 p:0.15 15 k:5 p:0.15 k:7 p:0.15 7500 k:7 p:0.15 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 k:3 p:0.15 k:3 p:0.15

15 k:5 p:0.15 15 k:5 p:0.15 500 k:7 p:0.15 2500 k:7 p:0.15 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure 5.8: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the small-world models. 52 Chapter 5. Validation

n m linear quadratic

∅ path length transitivity ∅ path length transitivity 1,000 3 3.65 0.0145 2.00 3.0010 05 − 1,000 5 3.06 0.0147 1.99 6.0210 06 − 1,000 7 2.77 0.0195 1.99 0 5,000 3 3.98 0.0041 2.00 2.1610 06 − 5,000 5 3.05 0.0042 1.99 2.4010 07 − 5,000 7 2.79 0.0052 1.99 2.4010 07 − 10,000 3 4.06 0.0019 1.99 2.4010 07 − 10,000 5 3.11 0.0022 1.99 1.8010 07 − 10,000 7 2.84 0.0029 2.00 3.0010 07 − 15,000 3 4.34 0.0015 2.00 1.0710 07 − 15,000 5 3.20 0.0015 1.99 5.3310 08 − 15,000 7 2.90 0.0019 1.99 5.3310 08 −

Table 5.3: Average path length and transitivity of the generated Barabási and Albert models with n nodes and a degree m. The textual labels ‘linear’ and ‘quadratic’ refer to the preferential attachment method used. the maximal distributed trust, the total number of nodes discovered in each iteration was solely influenced by the number of edges in the network: for a small-world graph with 1,000 nodes, the total number of nodes detected by Appleseed rose from a mean value of 4.86, 7.5 and 10.05 in the first to 805.89, 999.99 and 1000 (for k = 3, k = 5 and k = 7) in the seventh iteration, covering nearly all nodes in the model (see Figure 5.8). It should be noted that all figures, tables and values in this section only refer to the small- world models with a rewiring probability of p = 0.15. However, all other networks show similar magnitudes to the ones described above. The corresponding figures can be found in AppendixD.

5.1.3 Model of Barabasi´ and Albert

Other than the two models discussed so far which attempt to create networks incorporating properties of real-world networks such as the small-world effect, the model of Barabási and Albert tries to understand how networks come to have these characteristic structural features in the first place. In order to achieve this goal, the network is gradually evolved by adding both new nodes and undirected edges to the network with degree m, which is never changed, reflecting the growth processes of real-world networks. The model makes use of the mechanism of cumulative advantage, also referred to as preferential attachment or Matthew effect, after the biblical edict ‘For to every one that hath shall be given . . . ’ (Matthew 25:29), by attaching one end of each edge to another node with a probability proportional to the degree of this node. An example of the different network structures which can be created using this method is given in Figure 5.9. As every node appears with an initial degree of m, it automatically has a non-zero prob- ability of receiving new edges. The probability that a new edge is connected to a node of degree k is k pk k pk P = . (5.2) k k pk 2m 5.1. Artificial Networks 53

5 5 6 4 6 4

7 3 7 3

8 2 8 2

9 1 9 1

10 0 10 0

11 19 11 19

12 18 12 18

13 17 13 17

14 16 14 16 15 15

Figure 5.9: Left-hand side: A network with n = 20 nodes and degree m = 3 was gradually grown using a linear preferential attachment method. Right-hand side: The same network than on the left-hand side but this time grown using a quadratic preferential attachment method.

As mentioned before, there is a correlation between the age of the node and its degree, with older ones having higher mean degrees. For the case m = 1, the probability distribution of a degree of a node i with age a is

!k r a r a pk (a) = 1 1 1 , (5.3) − n − − n leading to an exponential distribution for a particular age a with a characteristic degree scale a 1 2 that diverges as (1 n )− as a n. As a result, earlier nodes show higher expected degrees than those added at− a later stage→ and the power-law degree distribution of the entire graph is primarily influenced by the first nodes. We synthesised networks using the model of Barabási and Albert with 1,000, 5,000, 10,000 and 15,000 nodes. In each time step, 3, 5 and 7 edges were added and a weight in the range [0,1] was randomly assigned to each edge. When we added a new edge to the network, we used both linear preferential attachment as well as quadratic preferential attachment to enhance the effect that older nodes show a higher mean degree and are therefore more important. However, in the used model, all edges were undirected in contrast to the trust network of the openSUSE Build Service which is a directed graph. As the model is lacking this crucial feature, it was changed in a way that a directed graph was generated using the indegree of each node to compute its probability that a new edge is connected to it. In addition, we

median number of time per n discovered nodes total time AS computations AS computation

1,000 817 23 s 1,000 23.0 ms 5,000 4,325 12 m 28 s 5,000 149.6 ms 10,000 8,715 1 h 02 m 49 s 10,000 376.9 ms 15,000 13,049 2 h 44 m 52 s 15,000 659.5 ms

Table 5.4: Run times of the Appleseed (AS) computations performed on the generated Barabási and Albert models with n nodes and a degree m = 5. The linear preferential attachment method was used. 54 Chapter 5. Validation 12 25 14000

m:3 m:5 m:7 m:3

12000 m:5 m:7 10

m:3 20 m:5 m:7 m:3 m:5 10000 m:7 8 15 8000 6 Depth Iterations Discovered Nodes 6000 10 4 4000 5 2 2000 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure 5.10: Appleseed performance on the generated models of Barabási and Albert (created using linear preferen- tial attachment). Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. removed self-edges and multiple edges between all nodes and we randomly inverted 33% of all edges to incorporate symmetric trust statements. A weight in the range of [0,1] was randomly assigned to each of the inverted edges. Compared to the two types of models we have previously described, the generated models of Barabási and Albert featured very low transitivity as well as very low average path lengths. The first decreased with the number of nodes in the network but increased with its degree. When the linear preferential attachment technique was used, the average path length showed the opposite effect. However, if a quadratic preferential attachment was used, the average path length as well as the transitivity changed only marginally with the properties of the network with an average path length around 2.00 and a transitivity close to 0 (see Table 5.3). As for the two other models described in the Sections 5.1.1 and 5.1.2, a trust network calculation was executed for every node in each model using Appleseed with its standard parameters (trust injection e = 200, spreading factor d = 0.85 and accuracy threshold tc = 0.2). All concrete trust values have been discarded. More general data was collected to measure the overall performance of the algorithm. The run times of the Appleseed computations are 5.1. Artificial Networks 55 4 30 5000

m:3 m:5 m:7 m:3 m:5 m:7 25 m:3 4000 m:5

m:7 3 m:3 m:5 m:7 20 3000 2 15 Depth Iterations Discovered Nodes 2000 10 1 1000 5 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure 5.11: Appleseed performance on the generated models of Barabási and Albert (created using quadratic preferential attachment). Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. summarised in Table 5.4. All computations were performed by a Linux server with two Quad Core Intel®Xeon®CPUs (E5420), with each of the 8 cores running at 2.50GHz and a total of 8GB system memory. Each CPU core was used to perform a single Appleseed calculation because the implementation of the algorithm is single threaded. Both the size of a network and its degree showed an impact on the number of nodes which were detected by Appleseed in every run. For the Barabási and Albert models generated using the linear preferential attachment technique, 74-76%, 81-87% and 89-92% of all nodes in a graph with degree m = 3, m = 5 and m = 7, respectively, were discovered, depending on its size (see the left-hand side in Figure 5.10). Even if these values were lower than the ones for the random graphs (approximately 95%, 99% and 100%) and for the small-world models (on average 98%, 99% and 100%), they were in better agreement with them than the ones created using the quadratic cumulative advantage method. The latter seemed neither to be influenced by the number of nodes nor the number of edges in the graph: the average number of nodes discovered by the algorithm lay between 28-31% of the network size with a constant mean standard deviation of 0.46 as depicted on the left-hand site in Figure 5.11. The underlying 56 Chapter 5. Validation 7 n: 1,000 m:3 n: 5,000 m:3 6 n:10,000 m:3 n:15,000 m:3

5 n: 1,000 m:5 n: 5,000 m:5 n:10,000 m:5 4 n:15,000 m:5 n: 1,000 m:7 n: 5,000 m:7 3 n:10,000 m:7 n:15,000 m:7 2 Maximal Distributed Trust 1 0

0 5 10 15 20 Iterations

Figure 5.12: Maximal distributed trust per iteration on the models of Barabási and Albert (created using linear preferential attachment). The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 m:3 m:3

15 m:5 15 m:5 m:7 7500 m:7 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 m:3 m:3

15 m:5 15 m:5 500 m:7 2500 m:7 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure 5.13: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the models of Barabási and Albert (created using linear preferential attachment). 5.1. Artificial Networks 57 30 n: 1,000 m:3 n: 5,000 m:3

25 n:10,000 m:3 n:15,000 m:3 n: 1,000 m:5

20 n: 5,000 m:5 n:10,000 m:5 n:15,000 m:5

15 n: 1,000 m:7 n: 5,000 m:7 n:10,000 m:7

10 n:15,000 m:7 Maximal Distributed Trust 5 0

0 5 10 15 20 25 Iterations

Figure 5.14: Maximal distributed trust per iteration on the models of Barabási and Albert (created using quadratic preferential attachment). The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 m:3 m:3

15 m:5 15 m:5 m:7 7500 m:7 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 m:3 m:3

15 m:5 15 m:5 500 m:7 2500 m:7 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure 5.15: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the models of Barabási and Albert (created using quadratic preferential attachment). 58 Chapter 5. Validation

reason for this observation was that most nodes were not or only weakly connected to the rest of the network if a quadratic preferential attachment method was implied. In contrast, using the linear cumulative advantage technique enabled most nodes to be a member of the largest connected component in the network. As a result, the average standard deviation was slightly higher, between 0.29 and 0.70 if the linear preferential attachment approach was used, but it was still a lot smaller compared to both random graphs and small-world models where it easily reached values in the range of 0-2205.06 and 0-877.15, respectively. The fact that the models generated using the quadratic cumulative advantage method were weaker connected became most obvious by looking at the maximal depth between the nodes: the average maximal depth in all models lay in the range of 2-4 with a mean standard deviation of 0.01-0.03 (see the centre of Figure 5.11), whereas it took values in the range of 5.94-10.36 with an average standard deviation of 0.12-0.49 in the case of linear attachment. In the latter, the maximal depth between the nodes increased with the size of the graph (e.g. for a model with m = 3: 6.87, 9.13, 10.15 and 10.36 for n = 1,000, 5,000, 10,000 and 15,000 nodes) and it decremented with the number of edges (e.g. for a network with n = 15,000 nodes: 10.36, 8.15 and 7.02 for m = 3, m = 5 and m = 7). An illustration is given in the centre of Figure 5.10. For all models generated using the linear cumulative advantage method, the average number of iterations, the algorithm required until it ended, lay in the same range as for the random graphs and the small-world models and decreased slightly with a higher degree of the network (from 23 to 20 iterations) as depicted in the right-hand side of Figure 5.10. However, for the models created using the quadratic preferential attachment technique, it varied strongly between 4 iterations for weakly connected nodes and 25 iterations for nodes contained in the largest single component (see the right-hand side of Figure 5.11). In the linear models, the maximal trust distributed in each iteration as well as the number of total nodes discovered showed a behaviour similar to the one seen in the random graphs and the small-world models: both were affected by the density of the network as depicted in the Figures 5.12 and 5.13. In the quadratic models, on the other hand, neither the maximal trust distributed nor the total nodes discovered in each iteration step were dependent on the size or the degree of the network (see Figures 5.14 and 5.15).

5.2 Advogato Real-world Network

As Massa and Souren as well as Sabater and Sierra pointed out, there is a lack of testbeds and frameworks to evaluate, compare and analyse the different trust metrics under a set of representative and common conditions. For this reason, Paolo Massa and Kasper Souren founded Trustlet in 2007, a platform consisting of a wiki for open research on trust metrics, with the goal to collect and distribute trust network data sets as well as trust metrics code as free software either under the Creative Commons Attribution license [Creative Commons, 2004] or the GNU General Public License [Free Software Foundation, 1991]. Unfortunately, most of the collected real-world networks make use of a binary representation for the presence or absence of trust restricting the evaluation of trust metric performance. However, one of the data sets, the Advogato social network, exhibits many interesting features: it is a real-world, directed, weighted, large social network making it an ideal testbed. 5.2. Advogato Real-world Network 59

Reflexive Non−reflexive 30 30 6987 4647 25 25 5822 3872 20 20 4658 3098 15 15 3494 2324 Total Nodes Total Nodes 10 10 2329 1549 Maximal Distributed Trust Maximal Distributed Trust 5 5 774 1164 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure 5.16: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the Advogato data sets (left-hand side: reflexive data set, right-hand side: non-reflexive data set).

Advogato.org was developed in 1998 by Levien and Aiken to serve as a community dis- cussion board, a resource for free software developers and Levien and Aiken’s own testbed for their research on trust metrics. Trust statements in this social network are a directed, weighted set of peer certificates used to control the access to post and edit website informa- tion. Thereby, four different base levels are specified: Masters (the authors of a free software project as well as excellent programmers), Journeyers (important contributors), Apprentices (contributors still in need of acquiring skills) and Observers (default for users without trust certification).

5.2.1 Data Sets

Massa and Souren used in their study the Advogato data set from the 12. May 2008 to validate the performance of different trust algorithms, amongst them PageRank (see Section 3.2.1) and both the global as well as the local version of Advogato (see Section 3.2.3) [Massa and Souren, 2008]. The data set contained 7,294 nodes (the largest fully-connected component contained 70.5% of all nodes, the second largest only 7 nodes) and 52,981 trust relations (17,489 Master, 21,977 Journeyer, 8,817 Apprentice and 4,698 Observer judgements) of which 33% were symmetric. 4,607 out of the 7,294 users only received one kind of trust rating, whereas 183 users could be found with 5 or more trust edges that all have the same value. The mean in-degree and out-degree of the nodes in the used data set was 7.26 and the mean shortest path length was 3.75. The average clustering coefficient of the graph was 0.116. Unfortunately, when we downloaded the above mentioned data set from the Trustlet website1, we discovered that the provided data set differed from the one described in the paper. The data set used in our test scenario only contained 6,987 nodes but more trust ratings. From the 56,921 trust statements, 17,551 certified a Master, 22,882 a Journeyer, 10,670 an Apprentice and 5,818 an Observer. Because this data set contained reflexive edges which might cause problems during the Appleseed computation, we generated a second data set without these self-edges resulting in 4,647 nodes and 51,751 trust relations of which 16,795

1http://www.trustlet.org/datasets/svn/AdvogatoNetwork/2008-05-12/ 60 Chapter 5. Validation 8 25 7000 reflexive non−reflexive 6000 20 6 5000 15 4000 4 Depth Iterations 3000 10 Discovered Nodes 2000 2 5 1000 0 0 0

Figure 5.17: Appleseed performance on both, the reflexive and the non-reflexive data set. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination.

Reflexive Non−reflexive 3000 3000 2000 2000 Frequency Frequency 1000 1000 0 0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Discovered Nodes Discovered Nodes 1500 1500 1000 1000 Frequency Frequency 500 500 0 0 0 2 4 6 8 10 0 2 4 6 8 10 Depth Depth 2500 2500 1500 1500 Frequency Frequency 500 500 0 0 5 10 15 20 25 30 5 10 15 20 25 30 Iterations Iterations

Figure 5.18: Histograms for Appleseed performance on both the reflexive and non-reflexive data sets as shown in Figure 5.17. 5.2. Advogato Real-world Network 61

were a Master, 21,543 a Journeyer, 8,743 an Apprentice and 4,670 an Observer. However, even if the data sets were not a hundred percent identical, they seemed to be similar enough to compare the results obtained with our implemented model with those obtained by the trust metrics investigated by Massa and Souren. To achieve consistency, we mapped Advogato’s certifications, the textual labels Observer, Apprentice, Journeyer and Master, to the (by Massa and Souren) arbitrarily chosen values of 0.4, 0.6, 0.8 and 1.0, respectively. The trust network was calculated for each node in the two data sets using Appleseed as described before for the artificially generated models. In both trust networks, Appleseed discovered 3,228 nodes on average with a mean maximal depth of 4.72 and it terminated after 17 iterations on average (see Figure 5.17). The maximal distributed trust as well as the total number of nodes discovered in each iteration showed a behaviour analogue to the one in the artificially generated models: trust flew only marginally after the fifth iteration; at the same time, the maximal number of nodes in the network was detected (see Figure 5.16). The average number of total nodes indicated by the dashed line in the figure decreased after the twentieth iteration due to the fact that most of the Appleseed runs already terminated except for a few ones with nodes which did not yet gain a stable trust distribution state. Figure 5.18 describes the frequency of measured values in the general statistics. Neglecting isolated nodes that caused the bars on the left border of each histogram, all Appleseed computations exhibited an only small standard deviation from the mean values. The run times of the Appleseed computations on the two data sets are depicted in the first two lines of Table 5.5. All computations were performed by a Linux server with two Quad Core Intel®Xeon®CPUs (E5420), with each of the 8 cores running at 2.50GHz and a total of 8GB system memory. Each CPU core was used to perform a single Appleseed calculation because the implementation of the algorithm is single threaded.

5.2.2 Leave-one-out Cross-validation

In a second step, we applied the leave-one-out cross-validation technique which is often used in machine learning to validate how well the results of a statistical analysis will generalise to an independent data set. As its name suggests, it uses a single observation from the original sample as validation data and the remaining observations as training data. The process is repeated several times such that each observation in the sample is used once as validation data. In our test case, one trust edge was taken out of the trust graph and the trust metric was used to predict the missing trust value. This procedure was repeated for all edges in the network to obtain a prediction graph which can be compared with the original one in several ways: if the trust metric was not able to predict the value of an edge in the network, some edges in the prediction graph can contain an undefined trust value and thus, the coverage can serve as a measure of the amount of edges that were predictable as well as the fraction of edges that were predicted correctly. Further measures are the mean absolute error (MAE):

n 1 X MAE pi ti (5.4) = n i =1 | − | 62 Chapter 5. Validation

n median total time number of time per discovered nodes AS computations AS computation

7,342 4,488 20 m 36 s 7,342 168.3 ms 5,343 4,488 14 m 38 s 5,343 164.3 ms 7,342 4,488 5 h 06 m 41 s 56,936 323.2 ms 5,343 4,488 4 h 43 m 34 s 51,766 328.7 ms

Table 5.5: Run times of the Appleseed (AS) computations performed on the two Advogato data sets (with and without self-edges). The reflexive data set contained 7,342 nodes whereas the non-reflexive one contained only 5,343. The first two lines where obtained by running Appleseed for each node to generate the general statistics. The latter ones refer to the leave-one-out cross-validations.

where n is the number of observations, pi is the predicted value and ti is the true value of the i -th observation and the root mean squared error (RMSE):

p 2 RMSE = Var(Θˆ Θ) + (Bias(Θˆ,Θ)) (5.5) − where Θˆ is the estimator with respect to the estimated parameter Θ. Both methods were used to measure accuracy through looking for the differences between values which were predicted by the model and the values which were actually observed. We compared the results of the previously described leave-one-out cross-validation of our implementation of Appleseed with those of the algorithms tested earlier by Massa and Souren: Random, OutA, OutB and eBay are trivial trust metrics used to estimate a baseline. The first randomly predicted a trust score in the range [0.4, 1]. The second and the third one predicted the trust value a user A should place in user B simply by averaging over the trust statements outgoing from A and B, respectively, whereas eBay predicted the trust value by averaging over the trust statements incoming in user B. As most of the tested trust metrics returned values in a continuous interval, Massa and Souren applied a threshold function in a way such that a predicted trust of 0.746 became 0.8 (Apprentice). AdvogatoGlobal refers to the original global Advogato trust metric (see Section 3.2.3) using the founders of the community, ‘raph’, ‘federico’, ‘miguel’ and ‘alan’, as members enjoying supreme trust, in contrast to AdvogatoLocal, the local version of the algorithm. The latter one was only used on a subset containing 8% of all edges of the original network as its current implementation is rather slow. PageRank’s predicted values followed a power-law distribution (there were a few large scores and many tiny ones), thus Massa and Souren decided to rescale the obtained results by sorting them and mapping them linearly in the range [0.4,1]. As well as PageRank, Appleseed performed a ranking and thus, we applied exactly the same procedure to achieve outcomes in the range of [0.4,1]. The run times of the two leave-one-out tests are given in last two lines of Table 5.5. Table 5.6 summarises the results of the evaluation of the different trust metrics on the Advogato data set. The first column describes the overall coverage of the predicted graph compared to the original one. All algorithms seemed to have a high coverage of 96% or more, except for the trivial OutB one and the Appleseed which was run on the reflexive data set (only 92% ). This observation can be explained by the high density of the Advogato network resulting in many paths from one member to another one. The second, third and fourth column of the table compare the frequencies of wrong trust predictions, MAE and RMSE scores, respectively. RMSE scores are a variant of MAE which tend to emphasise large errors favouring trust 5.2. Advogato Real-world Network 63

Master Journeyer

● ● 1.0 1.0

● ●

● ● 0.8 0.8

● ●

● ● 0.6 0.6

● ● 0.4 0.4

True positive rate ● True positive rate ●

● ● 0.2 0.2

● ●

● ● 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate

Apprentice Observer

● ● 1.0 1.0

● ●

0.8 ● 0.8 ●

● ●

0.6 0.6 ●

● ●

● 0.4 ● 0.4 True positive rate True positive rate

● ●

● ● 0.2 0.2

● ●

● ● 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate

Figure 5.19: Appleseed performance using the non-reflexive data set. The specificity corresponds to 1 minus the false positive rate. The false positive rate is estimated as the number of false positives divided by the number of negative samples. The number of negative samples is given by the sum of false positives and true negatives. The true positive rate is estimated as the number of true positives divided by the number of positive samples. Thereby, the number of positive samples is equal to the sum of true positives and false negatives.

Sensitivity vs Specificity Precision vs Recall 1.0 0.8 0.8 0.7 0.6 0.6 Precision Sensitivity 0.4 0.5 0.2 0.4 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Specificity Recall

Figure 5.20: Left-hand side: Sensitivity vs specificity plot to measure the performance of Appleseed using the non-reflexive data set. The specificity (also known as recall) is calculated as the number of true negatives divided by the number of negative samples. The number of negative samples is given by the sum of false positives and true negatives. Right-hand side: Recall vs precision plot on the same data set. The precision, also called positive predictive value, is computed as the number of true positives divided by the sum of the true positives and the false positives. An ideal plot would pass through the upper left corner corresponding to 100% precision and 100% recall. 64 Chapter 5. Validation

Algorithm Coverage Fraction wrong predictions MAE RMSE

Random 1.00 0.737 0.223 0.284 OutA 0.98 0.486 0.106 0.158 OutB 0.92 0.543 0.139 0.205 eBay 0.98 0.350 0.086 0.156 AdvogatoLocal 1.00 0.550 0.186 0.273 AdvogatoGlobal 1.00 0.595 0.199 0.280 Appleseed (reflexive) 0.92 0.613 0.186 0.231 Appleseed (non-reflexive) 0.96 0.598 0.161 0.222 PageRank 1.00 0.501 0.124 0.191

Table 5.6: Evaluation of trust metrics on the Advogato data set. metrics that stay in a small range of errors without many outlying predictions that might undermine the confidence of a user in the trust metric. In all three cases, the baseline was given by the Random trust metric which produced the highest frequency of incorrect trust values (74% ) and the worst MAE and RMSE values (0.223 and 0.284, respectively). The trust metric with the best performances were eBay (35% , 0.086 and 0.156, respectively), followed by OutA (49% , 0.106 and 0.158, respectively), PageRank (50% , 0.124 and 0.191, respectively) and OutB (54% , 0.139 and 0.205, respectively). The performance of the other four proposed algorithms was not really good predicting more than 50% of all certificates wrong. In detail, AdvogatoLocal, AdvogatoGlobal, Appleseed run on the data set containing self-edges and Appleseed run on the non-reflexive data set exhibited a fraction of 55% , 60% , 60% and 61% of wrong predictions, a MAE of 0.186, 0.199, 0.161 and 0.186, and a RMSE of 0.273, 0.280, 0.222 and 0.231, respectively. These values suggest that the performance of Appleseed on both, the reflexive and the non-reflexive Advogato data set, was fairly similar and thus, self-edges contained in the data set did not negatively influence the algorithm. The fact that Appleseed was not performing well in the leave-one-out tests on both data sets was confirmed by the Receiver Operating Characteristic (ROC) curve where the true positive rate (also known as sensitivity or recall) is plotted against the false positive rate (100-specificity). In case of a perfect discrimination where there is no overlap in the two distributions, the ROC plot passes through the upper left corner corresponding to 100% sensitivity and 100% specificity. In contrast, the results obtained using Appleseed showed a ratio of approximately 1:1, independently from the predicted certification level as illustrated in Figure 5.19. This fact implies that Appleseed was not suitable to predict the certification level of the users in the data set as we either got the true or the false value with a probability of 50%. This fact was assured by both, the sensitivity-specificity graph plotted for all certification levels at once and the recall-precision plot, depicted in Figure 5.20. Chapter 6

Discussion

Most technically aware users subconsciously mistake trust in software with software quality. Some of them even imply the meaning of trusted computing [Mitchell, 2005] with the term trust. We often encountered these confusions whilst discussing this study within several communities. Considering the broad range of software, which is written in plenty of program- ming languages being available in the openSUSE Build Service, there is no reasonable way of automatically assessing its quality or reliability. Correspondingly, the decision to model a social network as well as to define trust for software packages, which is just based on the subjective ratings of the users in the packagers, is the only applicable one. The trust formula is easily understandable because it consists of very few elements: the minimum trust in all contributors of a package (which may be raised by some reviewers) and, as a recursive factor, all packages it depends on. The resulting concepts for trust in the openSUSE Build Service are well known to its users through existing websites and online communities (e.g eBay and Flickr). They are intuitive and, thus, a high acceptance can be expected. However, due the fact that the whole trust system is incorporated into a central server, the users have to trust the operators of this system that the executed code is in consistency with the provided source code. Nevertheless, as the whole openSUSE Build Service has the same underlying service concept, it can be assumed that its current users already placed their trust in it.

6.1 Computation of Trust using the Appleseed Trust Metric

The fact that controversial users exist in every society was one of the main reasons to imple- ment Appleseed for the openSUSE Build Service to enable the prediction of trustworthiness in a personalised way. Another incentive for incorporating a local trust metric was the fact that many rating systems based on global trust metrics experience the problem that the community as a whole benefits more from a trust rating than the rater itself and thus, many members free-ride (let other users provide ratings rather than rate themselves). On the other hand, the choice of a local trust metric has the disadvantage that bootstrapping the trust network requires even more trust ratings than global ones. Even after the network has reached a reasonable size and coverage, another problem, the introduction of a new user entering the 66 Chapter 6. Discussion

community occurs. As he is initially not connected to anybody in the network, he can not compute local trust values for others and, thus, any packages of interest. A solution for this problem could be the installation of a group of super users receiving high trust ratings from each newbie. This group could for instance include the core developers of the openSUSE distribution as well as further packagers who either got elected by the openSUSE members or appointed by the openSUSE board. Due to the fact that these relations to the super users are only intended to integrate the new member into the network, he can modify or delete them anytime. Another possibility would be the usage of a global trust metric that could be utilised to provide trust values for packagers which are not covered by an individual’s personal trust network. Not only new users would benefit from this extension but also existing ones could apply these global values for filling gaps in their networks when solving trust formulae. To avoid a positive bias in the trust judgements due to social pressure manifested in the exchange of courtesies, the hope of getting a better voting for a positive rating in return or the avoidance of negative ratings due to the fear of provoking retaliation, all trust ratings are centrally stored on the openSUSE Trust Server. In this way, all users only require to disclose their trust information to a central service but not to other users, minimising social pressure and assuring privacy. To encourage users to provide honest trust ratings, he is only presented a list of trust statements he made himself on others as well as one containing all members which gave a rating about him but not their trust values. The concrete trust formula for a package is compiled at build time, as the information which packages have to be installed into the build environment is only available at this point. However, as the trust in a packager might vary over time, the trust server recognises and reflects current trends weighting towards current behaviour by calculating trust ratings for users in real-time rather than using the values from the last build of the package. The main reason is that the last build might be several months old and the trust in a user may have fallen dramatically due to severe incidents since then. Such occurrences should be incorporated in the trust of all packages this person has contributed to, even those that were build a long time ago. To accelerate calculations and to save computing power, the trust server uses its database to cache values of all computed trust formulae. A difficulty arises when more than the legitimate number of ratings is provided, referred to as ballot box stuffing. In traditional voting schemes (such as political elections) ballot stuffing usually causes too many votes in favour for a particular person but in online applications, it can also happen with negative votes thereby discriminating users. This issue is hard to solve in systems based on subjective ratings and thus, there will not by any solution handling them for all possible scenarios. In case trust boosts or drops in a short period for a user or a group, this may also be good indicator for the build service team to identify conspicuous situations. For this reason, the trust network has to be carefully monitored by its operators. An additional common problem of trust systems is the change of identity or pseudonym of a user who has experienced significant loss of trust through the community. As trust systems are build on the assumption that identities are long lived, allowing ratings from the past to be used in the future, the change of identities needs to be prevented or at least discouraged. Despite the fact that general projects exhibit well working social control mechanisms to decide whether or not to grant contribution rights to a new packager, the private home projects undercut this valuable system. Certainly, a user who lost trust may create a new 6.2. Validation using artificially generated Networks 67

account using another login name and start from scratch but due to the properties of local trust metrics, he has to earn reputation again to get well connected in the existing trust network. Hence, he could also stay with the existing account and try to convince community members to reconsider their trust statements. In case social control mechanisms fail, the scale for trust ratings could be extended from [0,1] to [0,2] whereby the upper half can only be reached by users who proofed their identity to the operators of the build service. To avoid the laborious creation of a private certification authority, existing infrastructures like CACert.org1 or an openSUSE GPG Keyring could be used. Furthermore, the trust rating scale could be extended by [ 1,0] to incorporate distrust as described in Section 3.2.5. Although the concept of distrust is already− present in the reference implementation of the openSUSE Trust Server, it is not yet integrated in the interfaces to manage trust relations. The reason for this is that the majority of users mistake distrust with the absence of trust [Marsh, 1994]. To avoid further problems in the bootstrapping phase of trust network, due to the fact that distrust is not propagated, we decided to enable distrust ratings only after the network reached a mature state.

6.2 Validation using artificially generated Networks

All synthesised models are in good agreement with one of the key features in real-world networks: most pairs of nodes in the networks seem to be connected by a short path (see Tables 5.1, 5.2 and 5.3) showing a so-called ‘small-world effect’, first described by Stanley Milgram in the 1970’s [Milgram, 1970]. Hence, these artificially generated networks allow us to get a first basic intuition about the way networks behave [Newman, 2003]. Unfortunately, random graphs are very different from real-world networks considering most of the other interesting, important features making them inadequate to describe real- world behaviour. One of the alterations is the lack of a ‘community structure’ (illustrated in Figure 5.1) which is present in most social networks as people tend to divide into groups. Especially in the open source community where packagers are mostly dedicated to one or a small number of distributions, we expect groups to play an important role in the trust network of the build service. Nodes which are members in such a network group show a high connection between themselves and other group members but hardly any edge links different groups with each other [Scott, 2000; Wasserman and Faust, 1994]. In contrast, each edge is independently present with the same probability p in a random graph, leading to a network with nodes which are equally connected, unlike the members of real-world networks. In comparison, nodes in small-world models show a stronger connection to their immediate neighbours (see Figure 5.5). Even if the synthesised models of Barabási and Albert do not show a community structure, they exhibit another interesting feature: as the network is grown gradually using a cumulative advantage strategy, older nodes in the model end up being connected better than younger ones (depicted in Figure 5.9). This property of the graph is in agreement with the observation that the founders or early members are better connected than new ones in most real-world networks. For instance, established members of open source projects are generally better integrated in its community. They enjoy a high trust and are often elected to leading roles or technical committees.

1http://www.cacert.org 68 Chapter 6. Discussion

Another major difference concerns the transitivity in the network: in many real-world social networks, the clustering coefficient tends to be considerably higher than in random graphs with a similar number of nodes and edges. The reason for this observation is self- evident: in real-life a friend of our friend is likely to be our friend, too. Thus, if there is an edge connecting user A with B and B with C , it is likely that there is also a link between user A and C . However, in random graphs, on the contrary, the transitivity is approximately in the range of p regardless of whether or not two nodes have a common neighbour (see Table 5.1). It was even shown that the probability of the formation of a triangle tends to a non-zero limit as a real-world network becomes large, so that the clustering coefficient C = O(1) as n , whereas C O n 1 for a large number of nodes n on a random graph. This implies = ( − ) a difference→ ∞ by a factor in the order of n between real-world networks and random graphs [Watts and Strogatz, 1998]. The clustering coefficient of all generated models of Barabási and Albert are in a similar range (see Table 5.3). Small-world models on the other hand show a high transitivity with values similar to those reported in real-world networks: for example, the Advogato data set we examined in Section 5.2.1 exhibits a transitivity of 0.116 whereas the clustering coefficients for the artificially generated small-world models lie in the range of 0.213-0.268 (as depicted in Table 5.2). A further deviation can be found when looking at the degree distribution: as mentioned earlier in Section 5.1.1, in a random graph, each edge is present or absent with an equal probability leading to a binomial degree distribution (or a Poisson one in the limit of large graph size). In contrast, real-world networks are mostly found to be far away from a Poisson distribution, showing a distribution with a long right tail of values far above the mean and many of them follow either power-laws or exponentials in their tail [Amaral et al., 2000; Newman, 2001, 2003; Pennock et al., 2002; Sen et al., 2003]. In contrast, the model of Barabási and Albert shows a power-law degree distribution (due to preferential attachment technique used) which is seen in many social networks (power-laws with exponential cutoffs have for instance been reported in some collaboration networks) [Newman, 2001; Price, 1976]. In summary, even if artificially generated models do not perfectly reflect real-world behaviour of social networks, it could be shown that the calculation of the Appleseed trust metric is applicable to networks of the size of the openSUSE Build Service. In general, the run time is less than 700 ms to compute a trust network covering more than 13,000 nodes. As a result, it can be assumed that solving a trust formula incorporating the cache mechanism, described in Section 4.5.3, would take at most a few seconds.

6.3 Validation using the Advogato Data Set

As seen in Section 5.2.2, the coverage achieved by all algorithms in the leave-one-out cross- validation test on the Advogato data was very high (96% or more, except for the trivial algo- rithm OutB and Appleseed run on the reflexive data set) even if the coverage is usually a weak point of local trust metrics. The cause for this observation is the high density of the Advogato network resulting in many paths from one member to another one allowing even local trust metrics to cover and predict most of the trust statements between users. To ensure a good coverage in the trust network of the openSUSE Build Service, we should aim to encourage all 6.3. Validation using the Advogato Data Set 69

users to provide a similar number of trust statements as observed in the Advogato data set (mean indegree and outdegree of 7.26). One of the best performances (in terms of all four measurements: coverage, fraction of wrong predictions, MAE and RMSE) was achieved by the simple OutA algorithm (see Table 5.6). The explanation for this seems to be related with the structure of the chosen social network: Observer certificates were used as default and there were few reasons to certify other members as Observers. Therefore, this status was only used to express that a user changed his mind about another user and to downgrade a previous certification as much as possible explaining its infrequent usage. Since the number of Observer edges was very small compared to all other certification levels (only about 9-10% of all certificates were Observer ones), trust metrics averaging over all trust edges in the network tend to predict values close to higher values of trust. In the case of the prediction of an Observer edge, they led to a large error which highly influenced the result of the RMSE formula. In contrast, trust metrics averaging over the outgoing edges (like OutA) did not incur large errors in this particular scenario as most agents who used Observer edges tended to use it multiple times so that the average of their outgoing edges was most often close to 0.4. However, it should be noted that the various trust metrics have different intentions which explain their performance in different situations: Advogato was designed with the goal to explore which users in an online community are trustworthy with the underlying assumption that only few malicious nodes exist and not to infer the correct trust value between two users (see Section 3.2.3). PageRank (Section 3.2.1) and Appleseed (Section 3.2.5), on the other hand, were created to produce a ranking in order to give the user the possibility to discover more interesting nodes first. One of the occurring problems in the described test scenario was that a linear mapping of such a ranking to an arbitrarily chosen range of values was applied to both the results of PageRank and Appleseed which did not reflect the observed values. Most often Appleseed only predicted a few high trust values and many low ones. Instead of mapping only the high trust values to Master certificates, as one would expect it, the linear mapping procedure also maps much smaller values to this certificate if they are high in rank. The described issue becomes most obvious considering an example: suppose the following trust ratings of a user A to seven other users B-H in the network were obtained:

trust(A,B) = 18.98 trust(A,C) = 11.52 trust(A,D) = 10.76 trust(A,E) = 3.42 trust(A,F) = 3.41 trust(A,G) = 3.11 trust(A,H) = 3.02.

The proposed mapping technique will produce a mapping in which nodes with a totally different trust judgement will gain the same label: for instance trust(A,D) and trust(A,E) will be certified as Journeyer. On the other hand, trust ratings with only a marginally difference in value will end up with different labels: as an example trust(A,G) will be certified as Ap- prentice, trust(A,H) as Observer. This observation explains the fact that the performance 70 Chapter 6. Discussion

of both PageRank and Appleseed was much worse than for instance eBay and suggests that applying another mapping technique would most likely produce better results. Instead of a linear mapping, clustering could be used to partition the observed trust values into groups. Exemplary, a k -means clustering method with k clusters, whereby k is equal to the number of certificates, could be applied. Another possibility would be to linearly map the values instead of the ranks to certificate range which we applied in the openSUSE Trust Server. The above mentioned example highlights the fact that the evaluation of trust metrics is a complicated task as the goals of different trust metrics vary and the outcome depends greatly on the both the evaluation measurements and the data sets used. Unfortunately, the data set for the openSUSE Build Service was not available at validation time and thus, the tests should be repeated as soon as it will become available. Chapter 7

Outlook

One of the reasons Appleseed was chosen as trust metric for the openSUSE Build Service is its ability to incorporate the concept of distrust as described in Section 3.2.5. This extension is already implemented in the library used to compute trust networks. After the trust network has obtained a reasonable size, this feature should be made available to the users by adding distrust to the trust relation management features of the trust server.

Another useful extension would be the addition of a global trust metric. Its provided trust values could be used to fill gaps in the personal network of each user which may occur due to the usage of a local metric. Thereby, the completeness of trust predictions used to solve trust formulae could be raised to approximately a hundred percent. Because this procedure would slightly diminish the highly personalised fashion of the proposed trust concept, each user should be able to opt-out from this feature. In addition, a global trust metric could also be facilitated to present trust values for packages to anonymous users, by solving formulae exclusively using the results of this metric.

A further possibility would be the usage of the trust values obtained by both the local and a global trust metric at the same time. The user could determine the weight of the influence of each metric himself to regulate the importance of his personal trust ratings in comparison to the exploration of a greater global network.

To improve individuality in trust value computation without the requirement to imple- ment a second trust metric, options to modify the Appleseed standard parametrisation could be introduced. The most obvious parameter the user could change is the spreading factor d . It can be seen as the ratio between direct trust in a user, also referred to as direct trust edges, and trust in its ability to recommend others, so called recommendation edges [Beth et al., 1994; Maurer, 1996]. The default value of this parameter is 0.85, although other values might be reasonable as well: as an example consider the case in which a user wants to emphasise the proximity of nodes in the trust network in trust predictions. To achieve this goal, he should decrease the spreading factor to allow only a small portion of trust to get passed on, as high values enable trust to reach nodes further away. 72 Chapter 7. Outlook

If the openSUSE User Directory is enhanced by a group functionality similar to groups in Launchpad1 or the Fedora Account System2, the trust server should enable users to rate their trust in these groups by providing a single rating for a bunch of users. Such groups could also be used as initial trust relations to integrate new users of the build service into the trust network. As a powerful tool for the operators of the openSUSE Build Service, the work of Dellarocas and Whitby et al. could be integrated to identify unfair ratings, referred to as ballot box stuffing (for a discussion of the problem see Section 6.1). The authors proposed two different methods to detect (and exclude) ratings that are likely to be unfair when judged by statistical analysis. Due to the fact that SUSE Linux uses the RPM package format, the integration of the trust formula into the Debian package format needs further investigation, as the reference imple- mentation of this thesis neglects the integration of them into the trust system. Nevertheless, the computation of the formula stays exactly the same. As all concepts and principles of the described trust system apply for both RPM and Debian packages, the whole trust system is designed in a manner to enable a later incorporation of Debian based distributions with little effort. The developments resulting from this thesis enhance the openSUSE Build Service by a solid foundation to incorporate trust ratings on packagers (and their products) provided by its community members. The ready to use reference implementation offers an attractive web- frontend to interact with this system. Nevertheless, the acceptance and, thus, the importance of this contribution highly depends on the integration of these features into the existing infrastructure and tools of the openSUSE project as well as those of competitors as it is available under a free software licence.

1http://www.launchpad.net 2http://https://fedorahosted.org/fas/ Appendix A

Mathematical Symbols and Functions

A adjacent graph defined as a square matrix with rows and columns corresponding to web pages (Section: 3.2.1)

+ a constant used to place some trust in the pre-trusted peers: a R0 (Section: 3.2.2) ∈ adj adjacent node: adj V (Section: 3.2.4) ∈ Bu set of pages that point to the web page u (Section: 3.2.1)

C matrix of normalised local trust values ci j (Section: 3.2.2) c factor used for the rank normalisation: c < 1 (Section: 3.2.1) cc clustering coefficient of a graph ci j normalised local trust value peer i stores about peer j : ci j R (Section: 3.2.2) #» ∈ ci vector containing the local trust values ci j of peer i (Section: 3.2.2) cap(i ) capacity at distance i from the source node: cap(i ) N (Section: 3.2.3) ∈ d global spreading factor denoting the portion of energy that a node distributes amongst its successors: d [0,1] (Section: 3.2.5) ∈ + + δi outdegree of node i : δi N (Section: 3.2.1) # » ∈ e (u ) rank source defined as a vector over web pages (Section: 3.2.1)

+ ex y energy in the spreading activation model that flows from node x to node y : ex y R0 → (Section: 3.2.5) → ∈

Fu set of pages a web page u points to (Section: 3.2.1) 74 Appendix A. Mathematical Symbols and Functions

+ in(x) energy influx in node x: in(x) R0 (Section: 3.2.5) ∈ l certification level: Apprentice, Journeyer or Master (Section: 3.2.3) max largest trust value that can be used as threshold such that a path can be found from the source to the sink: max N (Section: 3.2.4) ∈ P set of pre-trusted peers (Section: 3.2.2) #» p some distribution over pre-trusted peers (Section: 3.2.2) r (u ) page rank of web page u : r (u ) N (Section: 3.2.1) # » ∈ r u page rank of a set of web pages defined as a vector over web pages (Section: 3.2.1) 0( ) si j local trust value peer i stores about peer j : si j Z (Section: 3.2.2) ∈ sign(x) sign value of node x (Section: 3.2.5) T partial trust function set publicly accessible for every agent a V in the system: ∈ T = Wa 1 ,...,Wa n (Section: 3.2.5) t threshold above which energy flows: t R+ (Section: 3.2.5) ∈ tc accuracy threshold serving as a convergence criteria: tc R+ (Section: 3.2.5) #» ∈ t vector containing the global trust values ti (Section: 3.2.2) ti global trust value the system as a whole places in peer i : ti R (Section: 3.2.2) #» ∈ ti vector containing the transitive trust values ti j of peer i (Section: 3.2.2) ti j trust, a node i puts into a sink j : ti j R (Section: 3.2.4) ∈ t ri j (trust) rating of a transaction: +1 and 1 for a positive and negative rating, respectively (Section: 3.2.2) − trusti (x) trust ranks for all nodes x V at iteration i (Section: 3.2.5) ∈ t ti j transitive trust value peer i calculates about peer j : ci j R (Section: 3.2.2) ∈ Wa partial trust function agent a V is associated with corresponding to the set of trust association that a has stated:∈W : V 0,1 (Section: 3.2.5) a [ ]⊥ → + wa i (a j ) weight of the trust statement agent a i has about agent a j : wa i (a j ) R0 (Section: 3.2.5) ∈ Appendix B

Appleseed Example

The used Appleseed implementation was verified using several small test networks. One of them is depicted in the left-hand side of Figure B.1. The right-hand side of this figure shows the network after all backward edges were introduced and the edge normalisation has been applied. Using this processed network, a full manual calculation of the Appleseed algorithm was performed and the resulting trust values of each node per iteration were compared to the ones obtained from debug output of the implemented algorithm. These values are given in Table B.1.

0.7 0.35 A D A D 0.1 0.5 0.3 0.9 0.15 1 0.43 0.8 0.57 0.6 0.43 0.5 1 B F B F C 0.3 C 0.48 0.09 0.2 1

E E

Figure B.1: Exemplary test network. Right-hand side: graph traversed by Appleseed through the introduction of all backward edges by changing edge weights or adding missing edges and the application of edge normalisation. 76 Appendix B. Appleseed Example

Iteration Node ABCDEF

1 0.00 - 30.00 - - 0.00 2 14.57 0.00 30.00 0.00 0.00 10.93 3 14.57 1.86 40.62 8.32 0.88 10.93 4 19.73 1.86 50.02 8.32 0.88 14.80 5 24.29 2.52 53.77 11.26 1.20 18.22 6 26.12 3.10 60.43 13.86 1.47 19.59 7 29.35 3.33 64.70 14.91 1.59 22.01 8 31.43 3.74 68.23 16.75 1.78 23.57 9 33.14 4.01 71.83 17.94 1.91 24.86 10 34.89 4.23 74.42 18.91 2.01 26.17 11 36.15 4.45 76.80 19.91 2.12 27.11 12 37.30 4.61 78.84 20.63 2.19 27.98 13 38.29 4.76 80.49 21.29 2.26 28.72 14 39.10 4.88 81.96 21.86 2.33 29.32 15 39.81 4.98 83.19 22.31 2.37 29.86 16 40.41 5.08 84.23 22.72 2.42 30.30 17 40.91 5.15 85.12 23.06 2.45 30.68 18 41.34 5.22 85.87 23.35 2.48 31.01 19 41.71 5.27 86.51 23.60 2.51 31.28 20 42.02 5.32 87.06 23.80 2.53 31.52 21 42.29 5.36 87.52 23.98 2.55 31.71 22 42.51 5.39 87.91 24.13 2.57 31.88 23 42.70 5.42 88.25 24.26 2.58 32.03 24 42.86 5.44 88.53 24.37 2.59 32.15 25 43.00 5.47 88.78 24.46 2.60 32.25 26 43.12 5.48 88.98 24.54 2.61 32.34 27 43.22 5.50 89.16 24.61 2.62 32.41 28 43.30 5.51 89.30 24.67 2.62 32.48 29 43.38 5.52 89.43 24.71 2.63 32.53 30 43.44 5.53 89.54 24.76 2.63 32.58 31 43.49 5.54 89.63 24.79 2.64 32.62 32 43.53 5.54 89.71 24.82 2.64 32.65 33 43.57 5.55 89.77 24.85 2.64 32.68 34 43.60 5.56 89.83 24.87 2.65 32.70 35 43.63 5.56 89.88 24.89 2.65 32.72 36 43.65 5.56 89.92 24.90 2.65 32.74 37 43.67 5.57 89.95 24.91 2.65 32.76 38 43.69 5.57 89.98 24.93 2.65 32.77 39 43.70 5.57 90.01 24.93 2.65 32.78 40 43.72 5.57 90.03 24.94 2.65 32.79 41 43.73 5.57 90.04 24.95 2.65 32.80 42 43.74 5.58 90.06 24.96 2.65 32.80 43 43.74 5.58 90.07 24.96 2.66 32.81 44 43.75 5.58 90.08 24.96 2.66 32.81 45 43.75 5.58 90.09 24.97 2.66 32.82

Table B.1: The trust of each node per Appleseed iteration of the test network shown in Figure B.1. ‘-’ denotes that the node has not been discovered yet. Appendix C

Trust Metric Algorithms

C.1 The PageRank Algorithm

The PageRank algorithm as introduced by Page et al. and discussed in Section 3.2.1 is given by: #» 1 begin function pageRank( s ):

2 // initialise #» #» 3 r 0 = s ; 4

5 // iterate

6 repeat: #» #» 7 r A r ; i +1 =#» i #» 8 d r r ; #»= i#»1 i +1 1 9 r || r|| − || d E ;|| i +1 #»= i +1 #»+ 10 δ = r i +1 r i 1; 11 until(δ|| ε);− || #»≤ 12 return r i +1; 13 end; Listing C.1: PageRank Algorithm #» Where s is an arbitrary vector over web pages, A is the adjacency matrix which rows and + columns correspond to web pages. Ai ,j = 1/δi if there is an edge from i to j and Ai ,j = 0 + otherwise. δi is the outdegree of node i . The factor E is a user defined random parameter and models the behaviour of a ‘random surfer’. The surfer gets bored and chooses the next web page at random. ε is the chosen convergence threshold. 78 Appendix C. Trust Metric Algorithms

C.2 EigenTrust Algorithms

C.2.1 Simple non-distributed EigenTrust

The family of EigenTrust algorithms were designed by Kamvar et al. and discussed in Section 3.2.2. The simple non-distributed EigenTrust algorithm is defined as: #» 1 function simpleEigenTrust( e ): 2 //#» initialise 0 #» 3 t ( ) = e ; 4

5 // iteration 6 repeat:#» #» k 1 T k 7 ( + ) C ( ); t #»= t #» k 1 k 8 δ = t ( + ) t ( ) ; || − || 9 until(δ#» < ε); k 1 10 return t ( + );

11 end; Listing C.2: Simple non-distributed EigenTrust Algorithm #» e is the vector of length m representing an uniform probability distribution over all m-peers. C is matrix of the normalised local trust values and ε is the chosen convergence threshold.

C.2.2 Basic EigenTrust

The basic EigenTrust algorithm is given by: #» 1 begin function basicEigenTrust( p ): 2 //#» initialise 0 #» 3 t ( ) = p ; 4

5 // iteration 6 repeat:#» #» k 1 T k 7 t ( + ) C t ( ); #» = #» #» (k +1) (k +1) 8 t #»= (1 a#») t + a p ; (k +1)− (k ) 9 δ = t t ; || − || 10 until(δ#» < ε); k 1 11 return t ( + );

12 end; Listing C.3: Basic EigenTrust Algorithm #» Where p is the distribution over pre-trusted peers with pi = 1/ P if i P when P is a set | | ∈ of trusted peers and pi = 0 otherwise. a is a chosen constant less than 1. C.2. EigenTrust Algorithms 79

C.2.3 Distributed EigenTrust

Let Ai be the set of peers which have downloaded files from peer i , and let Bi be the set of peers which peer i has downloaded files from. Then, the distributed EigenTrust algorithm can be written as: #» 1 begin function distributedEigenTrust(A, B, p ):

2 for i A do: ∈ (0) 3 Query all peers j Ai for t j = p j ;

4 ∈

5 repeat: (k +1) Pn (k ) 6 t 1 a c j i t a pi ; i = ( ) j =1 j + −(k +1) 7 send c t to all peers j B ; #»i j i #» i (k +1) (k ) ∈ 8 δ = t t ; || − || (k +1) 9 wait for all peers j Ai to return ci j t j ∈ 10 until(δ#» < ε); k 1 11 return t ( + );

12 end; Listing C.4: Distributed EigenTrust Algorithm 80 Appendix C. Trust Metric Algorithms

C.3 Advogato

The Advogato algorithm was described by Levien and Aiken in 1998. Its mathematical struc- ture is discussed in Section 3.2.3 of this thesis. The Advogato algorithm applies the Ford-Fulkerson maximum integer network flow. In order to apply the Ford-Fulkerson algorithm, the graph G needs to be transformed into a single source, single sink graph G . Since the Ford-Fulkerson algorithm is well known and can 0 be found in [Ford and Fulkerson, 1956], only the transformation step is given here. The input of the transformation algorithm is the old directed graph G = (V, E ,Cν ), the output is the new directed graph G V , E ,C E , where V and V are the old and new nodes, respectively, and 0 = ( 0 0 0 ) 0 E and E the old and new edges of the graphs. supersink is the single sink of the transformed 0 graph G and the capacities Cν and C E are constraining nodes. 0 0 The pseudo code of Advogato’s transforming step is given by:

1 begin function transform(G = (V, E ,Cν ): 2 // initialise

3 E ; 0 = 4 V ;; 0 = 5 ;

6 // iterate

7 for all x V do:

8 add node∈ x + to V ; 0 9 add node x to V ; − 0 10 if Cν 1 then: 11 add≥ edge x ,x + to E ; ( − ) 0 12 set C x +,x C x 1; E ( −) = ν ( ) 0 − 13 for all (x,y ) E do: 14 add edge ∈x +,x to E ; ( −) 0 15 set C E x +,y inf 0 ( −) = 16 end do;

17 add edge x ,supersink to E ; ( − ) 0 18 set C E x ,supersink 1; 0 ( − ) = 19 end if;

20 end do;

21 return G V , E ,C E ; 0 = ( 0 0 0 ) 22 end; Listing C.5: Advogato Algorithm C.4. TidalTrust 81

C.4 TidalTrust

The TidalTrust algorithm has been developed by Golbeck and is discussed in Section 3.2.4. The pseudo code of the TidalTrust algorithm taking a source node sou r c e as well as a sink node s i nk as parameters is given below (taken from Lesani and Montazeri):

1 begin function tidalTrust(source, sink):

2 if (source = sink) then:

3 return MAX_TRUST_VALUE;

4 end if:

5

6 queue.queue(source);

7 nextLevelQueue = new Queue();

8 maxDepth = DUMMY_MAX_VALUE;

9 depth = 1;

10 strengthFromSource[source] = MAX_TRUST_VALUE;

11

12 while ((not queue.isEmpty()) and (depth maxDepth)) do:

13 node = queue.dequeue(); ≤

14 tideStack.push(node);

15 if (not trustGraph.isAdjacent(node, sink)) then:

16 for all adjacentNode in adjacents of node do:

17 if (not visiteds[adjacentNode]):

18 visiteds[adjacentNode] = true;

19 nextLevelQueue.queue(adjacentNode);

20 end if;

21

22 if (nextLevelQueue.contains(adjacentNode)) do:

23 pathStrength = min(

24 strengthFromSource[node],

25 trustGraph.getTrust(node, adjacentNode))

26 strengthFromSource[adjacentNode]= max(

27 strengthFromSource[adjacentNode],

28 pathStrength);

29 // In case one of the parameters hasa dummy

30 // value, the min and max functions return the

31 // other parameter.

32 end if;

33 end for;

34 else: // if(trustGraph.isAdjacent(node, sink))

35 maxDepth = depth;

36 pathStrength = strengthFromSource[node];

37 strengthFromSource[sink] = max(

38 strengthFromSource[sink],

39 pathStrength);

40 end if;

41 82 Appendix C. Trust Metric Algorithms

42 if (queue.isEmpty()) do:

43 queue = nextLevelQueue;

44 depth = depth + 1;

45 nextLevelQueue = new Queue();

46 end if;

47 end while;

48

49 if (maxDepth == DUMMY_MAX_VALUE) do:

50 return DUMMY;

51 end if;

52

53 // If the maxDepth value has not changed since the

54 // beginning, no path is found to the sink.

55 requiredStrength = strengthFromSource[ sink ];

56 // The required strength is computed in the forward wave.

57 // The backward wave:

58 while (not tideStack.isEmpty()) do:

59 node = tideStack.pop();

60 computeTrustToSinkFromParticipatingSetOfNeighbors(node);

61 end while;

62

63 return ;

64 end; Listing C.6: TidalTrust Algorithm C.5. Appleseed 83

C.5 Appleseed

In 2004, Appleseed, discussed in more detail in Section 3.2.5, was developed by Ziegler and Lausen. Let G = (V, E ,W ) be the trust graph where V is the set of agents (nodes) represented by the URI of their machine-readable personal homepages and Vi V the set of nodes discovered until step i . Let E be the set of all edges and let W (x,y ) be the⊆ weight of the edge connecting node x and node y . in(x) is the amount of incoming trust for a node x and trusti (x) is the trust rank for node x in the i -th iteration for a given source node s . tc denotes the chosen accuracy threshold, and d [0,1] a global spreading factor. The algorithm is initialised∈ by the source node s V and an incoming trust injection e . The initial trust rank trust0 is set to 0. The pseudo code∈ of the Appleseed trust metric itself is given by:

+ + 1 begin function TrustA (s V,e R0 ,d [0,1],tc R ): 2 // initialise ∈ ∈ ∈ ∈ 3 in0(s ) = e; 4 trust0(s ) = 0; 5 i = 0; 6 V0 = s ; 7

8 // iterate

9 repeat: 10 i = i + 1; 11 Vi = Vi 1; − 12 for all x Vi 1 do: ∈ − 13 trusti (x) = trusti 1 + (1 d ) ini 1(x); − − · − 14 ini (x) = 0; 15 for all (x,u ) E do; 16 if u / Vi then:∈ ∈ 17 Vi = Vi u ; ∪ { } 18 trusti (u ) = 0; 19 ini (u ) = 0; 20 add edge (u ,s ) and W (u ,s ) = 1; 21 end if P 22 w W x,u / W x,u ; = ( ) (x,u ) E ( 0) 0 ∈ 23 ini (u ) = ini (u ) + d ini 1(x) w ; − 24 end do; · ·

25 end do;

26 m = maxy Vi trusti (y ) trusti 1(y ) ; ∈ { − − } 27 until(Vi Vi 1 = ∅ and m tc ); \ − ≤ 28 return trust: (x,trusti (x)) x Vi ; 29 end; { | ∈ } Listing C.7: Appleseed Algorithm 84 Appendix C. Trust Metric Algorithms Appendix D

Further Simulations using the Small-world Model

One of the three models used for synthesising artificial networks was the small-world model, first presented by Watts and Strogatz [Watts, 1999a,b; Watts and Strogatz, 1998]. Details on the model (including an example) can be found in Section 5.1.2. The special characteristic of the small-world model is its ability to take a geographical component into account as the nodes in the network have positions in space. We think, it is a valid assumption that geographical proximity plays a role in the decision process whether or not two nodes are connected with each other. We generated small-world models by building networks on one-dimensional lattices of 1,000, 5,000, 10,000 and 15,000 nodes. Each node was connected to its neighbours which were less than 3, 5 and 7 lattice spaces away and a random weight in the range of [0,1] was assigned to each edge. A rewiring probability of 0.15, 0.3, 0.45, 0.6, 0.75 and 0.9 was used to enable remote parts of the lattice to join each other. In a next step, self-edges and multiple edges between the nodes were removed. Due to the fact that the generated edges in the original model were undirected but trust is expressed in a direction (from a trustee to a confidant), we chose a direction for each edge at random (with a probability of 0.5 for each direction). Furthermore, we randomly inverted 33% of all edges to incorporate symmetric trust statements. A weight in the range of [0,1] was randomly assigned to each of these edges. Due to the fact that the results of our runs with the various rewiring probabilities were just slightly different, we decided only to present one of the cases (p = 0.15) in Chapter5. For sake of completeness, we present the results of the Appleseed computations using rewiring probabilities of 0.3, 0.45, 0.6, 0.75 and 0.9 on the following pages. 86 Appendix D. Further Simulations using the Small-world Model

D.1 Rewiring Probability p=0.30 14 30 20000

k:3 p:0.30 k:5 p:0.30 k:7 p:0.30 k:3 p:0.30 k:5 p:0.30 12 k:7 p:0.30 25 k:3 p:0.30 k:5 p:0.30 k:7 p:0.30

15000 k:3 p:0.30

k:5 p:0.30 10 k:7 p:0.30 20 8 15 Depth 10000 Iterations Discovered Nodes 6 10 4 5000 5 2 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure D.1: Appleseed performance on the generated small-world models using a rewiring probability p = 0.30. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. D.1. Rewiring Probability p=0.30 87 7 n: 1,000 k:3 p:0.30 n: 5,000 k:3 p:0.30 6 n:10,000 k:3 p:0.30 n:15,000 k:3 p:0.30

5 n: 1,000 k:5 p:0.30 n: 5,000 k:5 p:0.30 n:10,000 k:5 p:0.30 4 n:15,000 k:5 p:0.30 n: 1,000 k:7 p:0.30 n: 5,000 k:7 p:0.30 3 n:10,000 k:7 p:0.30 n:15,000 k:7 p:0.30 2 Maximal Distributed Trust 1 0

0 5 10 15 20 25 Iterations

Figure D.2: Maximal distributed trust per iteration on the small-world data sets using a rewiring probability p = 0.30. The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 k:3 p:0.30 k:3 p:0.30

15 k:5 p:0.30 15 k:5 p:0.30 k:7 p:0.30 7500 k:7 p:0.30 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 k:3 p:0.30 k:3 p:0.30

15 k:5 p:0.30 15 k:5 p:0.30 500 k:7 p:0.30 2500 k:7 p:0.30 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure D.3: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the small-world models using a rewiring probability p = 0.30. 88 Appendix D. Further Simulations using the Small-world Model

D.2 Rewiring Probability p=0.45 14 30 20000

k:3 p:0.45 k:5 p:0.45 k:7 p:0.45 k:3 p:0.45 k:5 p:0.45 12 k:7 p:0.45 25 k:3 p:0.45 k:5 p:0.45 k:7 p:0.45

15000 k:3 p:0.45

k:5 p:0.45 10 k:7 p:0.45 20 8 15 Depth 10000 Iterations Discovered Nodes 6 10 4 5000 5 2 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure D.4: Appleseed performance on the generated small-world models using a rewiring probability p = 0.45. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. D.2. Rewiring Probability p=0.45 89 7 n: 1,000 k:3 p:0.45 n: 5,000 k:3 p:0.45 6 n:10,000 k:3 p:0.45 n:15,000 k:3 p:0.45

5 n: 1,000 k:5 p:0.45 n: 5,000 k:5 p:0.45 n:10,000 k:5 p:0.45 4 n:15,000 k:5 p:0.45 n: 1,000 k:7 p:0.45 n: 5,000 k:7 p:0.45 3 n:10,000 k:7 p:0.45 n:15,000 k:7 p:0.45 2 Maximal Distributed Trust 1 0

0 5 10 15 20 25 Iterations

Figure D.5: Maximal distributed trust per iteration on the small-world data sets using a rewiring probability p = 0.45. The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 k:3 p:0.45 k:3 p:0.45

15 k:5 p:0.45 15 k:5 p:0.45 k:7 p:0.45 7500 k:7 p:0.45 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 k:3 p:0.45 k:3 p:0.45

15 k:5 p:0.45 15 k:5 p:0.45 500 k:7 p:0.45 2500 k:7 p:0.45 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure D.6: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the small-world models using a rewiring probability p = 0.45. 90 Appendix D. Further Simulations using the Small-world Model

D.3 Rewiring Probability p=0.60 14 30 20000

k:3 p:0.60 k:5 p:0.60 k:7 p:0.60 k:3 p:0.60 k:5 p:0.60 12 k:7 p:0.60 25 k:3 p:0.60 k:5 p:0.60 k:7 p:0.60

15000 k:3 p:0.60

k:5 p:0.60 10 k:7 p:0.60 20 8 15 Depth 10000 Iterations Discovered Nodes 6 10 4 5000 5 2 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure D.7: Appleseed performance on the generated small-world models using a rewiring probability p = 0.60. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. D.3. Rewiring Probability p=0.60 91 7 n: 1,000 k:3 p:0.60 n: 5,000 k:3 p:0.60 6 n:10,000 k:3 p:0.60 n:15,000 k:3 p:0.60

5 n: 1,000 k:5 p:0.60 n: 5,000 k:5 p:0.60 n:10,000 k:5 p:0.60 4 n:15,000 k:5 p:0.60 n: 1,000 k:7 p:0.60 n: 5,000 k:7 p:0.60 3 n:10,000 k:7 p:0.60 n:15,000 k:7 p:0.60 2 Maximal Distributed Trust 1 0

0 5 10 15 20 25 Iterations

Figure D.8: Maximal distributed trust per iteration on the small-world data sets using a rewiring probability p = 0.60. The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 k:3 p:0.60 k:3 p:0.60

15 k:5 p:0.60 15 k:5 p:0.60 k:7 p:0.60 7500 k:7 p:0.60 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 k:3 p:0.60 k:3 p:0.60

15 k:5 p:0.60 15 k:5 p:0.60 500 k:7 p:0.60 2500 k:7 p:0.60 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure D.9: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the small-world models using a rewiring probability p = 0.60. 92 Appendix D. Further Simulations using the Small-world Model

D.4 Rewiring Probability p=0.75 14 30 20000

k:3 p:0.75 k:5 p:0.75 k:7 p:0.75 k:3 p:0.75 k:5 p:0.75 12 k:7 p:0.75 25 k:3 p:0.75 k:5 p:0.75 k:7 p:0.75

15000 k:3 p:0.75

k:5 p:0.75 10 k:7 p:0.75 20 8 15 Depth 10000 Iterations Discovered Nodes 6 10 4 5000 5 2 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure D.10: Appleseed performance on the generated small-world models using a rewiring probability p = 0.75. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. D.4. Rewiring Probability p=0.75 93 7 n: 1,000 k:3 p:0.75 n: 5,000 k:3 p:0.75 6 n:10,000 k:3 p:0.75 n:15,000 k:3 p:0.75

5 n: 1,000 k:5 p:0.75 n: 5,000 k:5 p:0.75 n:10,000 k:5 p:0.75 4 n:15,000 k:5 p:0.75 n: 1,000 k:7 p:0.75 n: 5,000 k:7 p:0.75 3 n:10,000 k:7 p:0.75 n:15,000 k:7 p:0.75 2 Maximal Distributed Trust 1 0

0 5 10 15 20 25 Iterations

Figure D.11: Maximal distributed trust per iteration on the small-world data sets using a rewiring probability p = 0.75. The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 k:3 p:0.75 k:3 p:0.75

15 k:5 p:0.75 15 k:5 p:0.75 k:7 p:0.75 7500 k:7 p:0.75 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 k:3 p:0.75 k:3 p:0.75

15 k:5 p:0.75 15 k:5 p:0.75 500 k:7 p:0.75 2500 k:7 p:0.75 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure D.12: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the small-world models using a rewiring probability p = 0.75. 94 Appendix D. Further Simulations using the Small-world Model

D.5 Rewiring Probability p=0.90 14 30 20000

k:3 p:0.90 k:5 p:0.90 k:7 p:0.90 k:3 p:0.90 k:5 p:0.90 12 k:7 p:0.90 25 k:3 p:0.90 k:5 p:0.90 k:7 p:0.90

15000 k:3 p:0.90

k:5 p:0.90 10 k:7 p:0.90 20 8 15 Depth 10000 Iterations Discovered Nodes 6 10 4 5000 5 2 0 0 0 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 1,000 5,000 10,000 15,000 Total Nodes Total Nodes Total Nodes

Figure D.13: Appleseed performance on the generated small-world models using a rewiring probability p = 0.90. Left-hand side: Average number of nodes the algorithm detected in a run. Centre: Maximal depth from the source node to the most distant node in the network. Right-hand side: Average number of iterations the algorithm required until termination. D.5. Rewiring Probability p=0.90 95 7 n: 1,000 k:3 p:0.90 n: 5,000 k:3 p:0.90 6 n:10,000 k:3 p:0.90 n:15,000 k:3 p:0.90

5 n: 1,000 k:5 p:0.90 n: 5,000 k:5 p:0.90 n:10,000 k:5 p:0.90 4 n:15,000 k:5 p:0.90 n: 1,000 k:7 p:0.90 n: 5,000 k:7 p:0.90 3 n:10,000 k:7 p:0.90 n:15,000 k:7 p:0.90 2 Maximal Distributed Trust 1 0

0 5 10 15 20 25 Iterations

Figure D.14: Maximal distributed trust per iteration on the small-world data sets using a rewiring probability p = 0.90. The light grey line indicates that the algorithm would have needed less iterations if a slightly higher threshold tc = 0.5 would have been used. 30 30 15000 10000 25 25 8333 12500 20 20 6667 10000 k:3 p:0.90 k:3 p:0.90

15 k:5 p:0.90 15 k:5 p:0.90 k:7 p:0.90 7500 k:7 p:0.90 5000 Total Nodes Total Nodes 10 10 5000 3333 Maximal Distributed Trust Maximal Distributed Trust 5 5 2500 1667 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations 30 30 5000 1000 25 25 833 4167 20 20 667 3333 k:3 p:0.90 k:3 p:0.90

15 k:5 p:0.90 15 k:5 p:0.90 500 k:7 p:0.90 2500 k:7 p:0.90 Total Nodes Total Nodes 10 10 333 1667 Maximal Distributed Trust Maximal Distributed Trust 5 5 833 167 0 0 0 0

0 5 10 15 20 25 0 5 10 15 20 25 Iterations Iterations

Figure D.15: Maximal distributed trust (solid line) and total number of nodes (dashed line) discovered by the algorithm per iteration on the small-world models using a rewiring probability p = 0.90. 96 Appendix D. Further Simulations using the Small-world Model Acknowledgements

I would like to express my gratitude to my supervisor, Professor Dr. Thorsten Herfet, for giving me the opportunity to work on this interesting subject and thereby contribute to the free software community. I highly appreciate his expertise, understanding and patience. I would like to thank my advisor at SUSE, Dr. Michael Schröder, for his motivation and encouragement as well as sharing his vast knowledge and experience. Thanks to Adrian Schröder for sharing his vision on trust for the openSUSE Build Service and being one of the biggest critics of my work. I am thankful to Roland Haidl for believing in my ideas and supporting them throughout the whole organisation. Many thanks go to friends and colleagues at SUSE in Nuremberg: Thomas Schmidt, André Duffeck, Marcus Rückert, Dr. Peter Poeml, Klaas Freitag, Andreas Bauer, Susanne Oberhauser, Robert Lihm, Cornelius Schumacher, Daniel Bornkessel, Christopher Hofmann, Jan Blunck, Dirk Müller, Stephan Binner, Anja Stock, Marcus Meißner, Rüdiger Oertel, Michael Löffler, Andreas Jäger, as well as Martin Lasarsch, Michael Matz and Richard Günther. Besides valuable discussions they confirmed not only once that there is a life beyond work. We had a lot of fun! I would like to thank the open source community, especially the developers of the Ruby on Rails framework and members of the openSUSE project, for easing and supporting the implementation of a prototype system. I would also like to thank my friends for their support and putting up with me during the last year: Phred, Sebastian, Emme, Kristina and Kai, Wolfgang, Reinhard, Guido, and most of all Patrick Drechsler. I am thankful to my parents as well as Monika for their support and encouragement all the years. Most importantly, I am deeply indebted to Susanne Pfeifer for the support she provided me over the past years. Susanne, I deeply thank you for always believing in me and sharing your smile and love with me every day.

Bibliography

Aberer, K. and Despotovic, Z. (2001). Managing trust in a peer-2-peer information system. In Proceedings of the 10th International Conference of Information and Knowledge Manage- ment (ACM CIKM), New York, USA. Available from: http://portal.acm.org/citation. cfm?doid=502585.502638. (Cited on page 17.)

Abrams, Z., McGrew, R., and Plotkin, S. (2004). Keeping peers honest in eigentrust. In Proceedings of the 2nd Workshop on the Economics of Peer-to-Peer Systems (P2PEcon 2004), Cambridge, MA, USA. Available from: http://theory.stanford.edu/~za/ HonestEigenTrust/HonestEigenTrust.pdf. (Cited on page 19.)

Amaral, L. A., Scala, A., Barthelemy, M., and Stanley, H. E. (2000). Classes of small-world networks. Proc Natl Acad Sci U S A, 97(21):11149–11152. Available from: http://dx.doi. org/10.1073/pnas.200327197. (Cited on page 68.)

Artz, D. and Gil, Y. (2007). A survey of trust in computer science and the semantic web. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 5:58– 71. Available from: http://www.isi.edu/~gil/papers/jws-trust-07.pdf. (Cited on pages4,6, and7.)

Baier, A. (1986). Trust and antitrust. Ethics, 96(2):231–260. Available from: http://www. jstor.org/stable/2381376. (Cited on page3.)

Barabási, A. L. and Albert, R. (1999). Emergence of scaling in random networks. Sci- ence, 286(5439):509–512. Available from: http://nd.edu/~networks/Publication% 20Categories/03%20Journal%20Articles/Physics/EmergenceRandom_Science% 20286,%20509-512%20(1999).pdf. (Cited on pages 43, 52, 53, and 68.)

Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. (2003). Xen and the art of virtualization. In SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 164–177, New York, NY, USA. ACM Press. Available from: http://www.cl.cam.ac.uk/research/srg/netos/ papers/2003-xensosp.pdf. (Cited on page 33.)

Bateson, P. (1990). The biological evolution of cooperation and trust. In Gambetta [1990c], chapter 2, pages 14–30. Available from: http://www.nuffield.ox.ac.uk/users/ gambetta/Trust_making%20and%20breaking%20cooperative%20relations.pdf. (Cited on page3.) 100 Bibliography

Beth, T., Borcherding, M., and Klein, B. (1994). Valuation of trust in open networks. In ESORICS ’94: Proceedings of the Third European Symposium on Research in Computer Security, pages 3–18, London, UK. Springer-Verlag. Available from: http://www.springerlink. com/content/p8hm118g42741370/. (Cited on pages6, 12, and 71.)

Blaze, M., Feigenbaum, J., and Lacy, J. (1996). Managing trust in medical information systems. Technical report, AT&T Labs. Available from: https://eprints.kfupm.edu.sa/49793/1/ 49793.pdf. (Cited on page4.)

Blaze, M., Feigenbaum, J., Resnick, P., and Strauss, M. (1997). Managing trust in an information-labeling system. In Special issue of selected papers from the 1996 Amalfi Conference on Secure Communication in Networks, pages 491–501. Available from: http://www.si.umich.edu/~presnick/papers/bfrs/Paper.ps. (Cited on page4.)

Bok, S. (1978). Lying: Moral Choice in Public and Private Life. New York: Pantheon Books. (Cited on page3.)

Bollobás, B. (2001). Random Graph. Academic Press, New York, 2nd edition. (Cited on page 43.)

Chen, R., Yeager, W., and Microsystems, S. (2001). Poblano: A distributed trust model for peer-to-peer networks. JXTA Security Project White Paper. Available from: http://gnunet. org/papers/jxtatrust.pdf. (Cited on page 13.)

Chirita, P. A., Nejdl, W., Schlosser, M., and Scurtu, O. (2004). Personalized rep- utation management in p2p networks. In Proceedings of the ISWC 2004 Work- shop on Trust, Security, and Reputation on the Semantic Web, Hiroshima, Japan. Available from: http://www.kbs.uni-hannover.de/Arbeiten/Publikationen/2004/ chirita04personalized.pdf. (Cited on page 19.)

Clark, T. H. (1999). Electronic intermediaries: Trust building and market differentiation. In in 32nd Annual Hawaii International Conference on Systems Sciences. Society Press. Available from: http://doi.ieeecomputersociety.org/10.1109/HICSS.1999.772939. (Cited on page4.)

Cook, K., editor (2001). Trust in Society. New York: Russell Sage Foundation. (Cited on page3.)

Cornelli, F.,Damiani, E., Di, S. D. C., Paraboschi, S., and Samarati, P.(2002). Choosing rep- utable servents in a p2p network. In In Proceedings of the 11th World Wide Web Conference, pages 376–386. Available from: http://portal.acm.org/citation.cfm?doid=511446. 511496. (Cited on page 17.)

Creative Commons (2004). Creative Commons attribution liceense. Available from: http: //creativecommons.org/licenses/by/2.0/. (Cited on page 58.)

Dasgupta, P. (1990). Trust as a commodity. In Gambetta [1990c], chapter 4, pages 49– 72. Available from: http://www.nuffield.ox.ac.uk/users/gambetta/Trust_making% 20and%20breaking%20cooperative%20relations.pdf. (Cited on page3.) Bibliography 101

Dash, R. K., Jennings, N. R., and Parkes, D. C. (2004). Trust-based mechanism design. In In Proc. 3rd Int. Conf. on Autonomous Agents and Multi-Agent Systems, pages 748–755. ACM Press. Available from: http://eprints.ecs.soton.ac.uk/9352/. (Cited on page7.)

Dassen, J. H. M., Stickelman, C., Kleinmann, S. G., Rudolph, S., Vila, S., Rodin, J., and Fernandez-Sanguino, J. (2008). The Debian GNU/Linux FAQ. Debian Project. Version 4.0.3. Available from: http://www.debian.org/doc/FAQ/. (Cited on page 29.)

Dellarocas, C. (2000). Mechanisms for coping with unfair ratings and discriminatory behavior in online reputation reporting systems. In In ICIS, pages 520–525. Available from: http: //portal.acm.org/citation.cfm?id=359640.359802. (Cited on page 72.)

Deutsch, M. (1962). Cooperation and trust. some theoretical notes. Nebraska University Press. Nebraska Symposium on Motivation. (Cited on pages3,4, and5.)

Deutsch, M. (1973). The Resolution of Conflict. New Haven and London: Yale University Press. (Cited on page4.)

Erdös, P. and Rényi, A. (1959). On random graphs. Publicationes Mathematica, 6:290– 297. Available from: http://www.renyi.hu/~p_erdos/1959-11.pdf. (Cited on pages 43 and 44.)

Erdös, P. and Rényi, A. (1960). On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 5:17–61. Available from: http://www.math-inst.hu/~p_erdos/1961-15.pdf. (Cited on pages 43 and 44.)

Erdös, P.and Rényi, A. (1961). On the strength of connectedness of a random graph. Acta Mathematica Scientia Hungary, 12:261–267. Available from: http://www.math-inst.hu/ ~p_erdos/1961-19.pdf. (Cited on pages 43 and 44.)

Eschenauer, L., Gligor, V. D., and Baras, J. (2002). On trust establishment in mobile ad-hoc networks. In Security Protocols, volume 2845/2003 of Lecture Notes in Computer Science, pages 47–66. Springer-Verlag. Available from: http://www.springerlink.com/content/ 97ge3hn0k6crcdht/. (Cited on page6.)

Falcone, R. and Castelfranci, C. (2001). Social trust: A cognitive approach. In Castel- franchi, C. and Tan, Y.-H., editors, Trust and Deception in Virtual Societibes, pages 55–90. Kluwer Academic Publishers. Available from: http://www.istc.cnr.it/doc/61a_360p_ Trust-libroKluwer.pdf. (Cited on page5.)

Feigenbaum, J. and Lee, P. (1997). Trust management and proofcarrying code in secure mobile-code applications. In the DARPA Workshop on Foundations for Secure Mobile Code. Available from: http://www.cs.nps.navy.mil/research/languages/statements/ leefei.ps. (Cited on page4.)

Fielding, R. T. (2000). Architectural Styles and the Design of Network-based Software Architec- tures. PhD thesis, University of California, Irvine. Available from: http://www.ics.uci. edu/~fielding/pubs/dissertation/top.htm. (Cited on pagesx and 36.) 102 Bibliography

Ford, L. R. and Fulkerson, D. R. (1956). Maximal flow through a network. Canadian Journal of Mathematics, 8:399–404. Available from: http://www.rand.org/pubs/papers/P605/. (Cited on pages 21 and 80.)

Foster-Johnson, E. (2005). RPM Guide. Fedora Project. Available from: http://docs. fedoraproject.org/drafts/rpm-guide-en/. (Cited on page 29.)

Free Software Foundation (1991). Gnu general public license v2.0. Available from: http: //www.gnu.org/licenses/old-licenses/gpl-2.0.txt. (Cited on pages2 and 58.)

Fukuyama, F.(1996). Trust: The Social Virtues and the Creation of Prosperity. New York: Free Press. (Cited on page3.)

Gambetta, D. (1990a). Can we trust trust? In Gambetta [1990c], chapter 13, pages 213– 238. Available from: http://www.nuffield.ox.ac.uk/users/gambetta/Trust_making% 20and%20breaking%20cooperative%20relations.pdf. (Cited on page5.)

Gambetta, D. (1990b). Mafia: The price of distrust. In Gambetta [1990c], chapter 10, pages 158– 176. Available from: http://www.nuffield.ox.ac.uk/users/gambetta/Trust_making% 20and%20breaking%20cooperative%20relations.pdf. (Cited on page3.)

Gambetta, D., editor (1990c). Trust: Making and Breaking Cooperative Relations. Basil Black- well, Oxford. Available from: http://www.nuffield.ox.ac.uk/users/gambetta/Trust_ making%20and%20breaking%20cooperative%20relations.pdf. (Cited on pages 99, 100, 102, 104, and 105.)

Golbeck, J. (2005). Computing and Applying Trust in Web-based Social Networks. PhD thesis, University of Maryland, College Park. (Cited on pages4, 21, 23, and 81.)

Golbeck, J. (2006a). Computing with trust: Definition, properties, and algorithms. In Se- curecomm and Workshops. Available from: http://ieeexplore.ieee.org/xpls/abs_all. jsp?arnumber=4198839. (Cited on pages4,5, and 23.)

Golbeck, J. (2006b). Generating predictive movie recommendations from trust in social networks. In Proceedings of The Fourth International Conference on Trust Management. (Cited on pages 21 and 23.)

Golbeck, J., Parsia, B., and Hendler, J. (2003). Trust networks on the semantic web. In In Proceedings of Cooperative Intelligent Agents. (Cited on pages 13 and 14.)

Golembiewski, R. T. and McConkie, M. (1975). The centrality of interpersonal trust in group processes. In Cooper, C. L., editor, Theories of Group Processes, pages 131–185. Wiley. (Cited on pages4 and5.)

Gori, M. and Witten, I. (2005). The bubble of web visibility. Commun. ACM, 48(3):115–117. (Cited on page 12.)

Grandison, T. and Sloman, M. (2000). Abstract a survey of trust in internet applications. IEEE Communication Surveys and Tutorials. (Cited on pages4 and5.) Bibliography 103

Gray, E., marc Seigneur, J., Chen, Y., and Jensen, C. (2003). Trust propagation in small worlds. In In Proc. of 1st Int. Conf. on Trust Management (iTrust’03, pages 239–254. (Cited on page6.)

Guha, R. (2003). Open rating systems. Technical report, Stanford Knowledge Systems Labora- tory, Stanford, CA, USA. (Cited on pages6, 12, 24, and 25.)

Guha, R., Raghavan, P.,and Tomkins, A. (2004). Propagation of trust and distrust. In Proceed- ings of the 13th international conference on World Wide Web, pages 403–412, New York, NY, USA. ACM Press. Available from: http://portal.acm.org/citation.cfm?doid=988672. 988727. (Cited on page6.)

Gunreben, B., Eichwalder, K., Schraitle, T., Schäfer, M., and Sundermeyer, F. (2007). The openSUSE Build Service. Novell, online prerelease edition. Available from: https://build. opensuse.org/documentation/obs/. (Cited on page 29.)

Heider, F.(1958). The Psychology of Interpersonal Relations. New York, NY, USA: Wiley. (Cited on page6.)

Hertzberg, L. (1988). On the attitude of trust. Inquiry, 31(3):307–322. (Cited on page3.)

Holland, P. and S., L. (1972). Some evidence on the transitivity of positive interpersonal sentiment. American Journal of Sociology, 77:1205–1209. (Cited on page6.)

Janson, S., Luczak, T., and Rucinski, A. (1999). Random Graphs. John Wiley, New York. (Cited on page 43.)

Jøsang, A. (1996). The right type of trust for distributed systems. In Proceedings of the 1996 New Security Paradigms Workshop. (Cited on page6.)

Jøsang, A. (1999). Trust-based decision making for electronic transactions. In Proceedings of the Fourth Nordic Workshop on Secure Computer Systems (NORDSEC’99. (Cited on page4.)

Jøsang, A. and Ismail, R. (2002). The beta reputation system. In In Proceedings of the 15th Bled Electronic Commerce Conference. (Cited on page7.)

Jøsang, A. and Kinateder, M. (2003). Analysing topologies of transitive trust. In ‘Proceedings of the Workshop of Formal Aspects of Security and Trust (FAST), pages 9–22. (Cited on pages6 and 24.)

Kamvar, S. D., Schlosser, M. T., and Garcia-Molina, H. (2003). The eigentrust algorithm for reputation management in p2p networks. In WWW ’03: Proceedings of the 12th interna- tional conference on World Wide Web, pages 640–651, New York, NY, USA. ACM. (Cited on pages6, 12, 16, 19, 25, and 78.)

Karonski,´ M. (1982). A review of random graphs. Journal of Graph Theory, 6:349–389. (Cited on page 43.) 104 Bibliography

Ketchpel, S. P.and Garcia-molina, H. (1996). Making trust explicit in distributed commerce transactions. In In Proceedings of the International Conference on Distributed Computing Systems, pages 270–281. (Cited on page4.)

Kinateder, M. and Pearson, S. (2003). A privacy-enhanced peer-to-peer reputation system. In In Proceedings of the 4th International Conference on Electronic Commerce and Web Technologies, volume 2378 of LNCS, pages 206–215. Springer-Verlag. (Cited on page6.)

Kinateder, M. and Rothermel, K. (2003). Architecture and algorithms for a distributed reputa- tion system. In In Proceedings of the First International Conference on Trust Management, pages 1–16. Springer-Verlag. (Cited on page6.)

Lagenspetz, O. (1992). Legitimacy and trust. Philosophical Investigations, 15(1):1–21. (Cited on page3.)

Lesani, M. and Montazeri, N. (2009). Fuzzy trust aggregation and personalized trust inference in virtual social networks. Journal of Computational Intelligence, 25. Accepted. (Cited on page 81.)

Levien, R. (2004). Attack Resistant Trust Metrics. PhD thesis, University of California, Berke- ley. Available from: http://www.levien.com/thesis/thesis.pdf. (Cited on pages6, 12, and 19.)

Levien, R. and Aiken, A. (1998). Attack-resistant trust metrics for public key cer- tification. In 7th USENIX Security Symposium, pages 229–242. Available from: http://www.usenix.org/publications/library/proceedings/sec98/full_papers/ levien/levien_html/levien.html. (Cited on pages6, 19, 21, 59, and 80.)

Luhmann, N. (1979). Trust and Power. Chichester: Wiley. (Cited on page3.)

Luhmann, N. (1990). Familiarity, confidence, trust: Problems and alternatives. In Gambetta [1990c], chapter 6, pages 94–107. Available from: http://www.nuffield.ox.ac.uk/users/ gambetta/Trust_making%20and%20breaking%20cooperative%20relations.pdf. (Cited on page3.)

Maresch, O. M. (2005). Reputationsbasierte trust-metriken im kontext des semantic web. Master’s thesis, Technische Universität Berlin. (Cited on pages 12 and 14.)

Marsh, S. and La, F.(1992). Trust and reliance in multi-agent systems: a preliminary report. In In Proceedings of the 4th European Workshop on Modeling Autonomous Agents in a Multi-Agent World. (Cited on page 35.)

Marsh, S. P. (1994). Formalising trust as a computational concept. PhD thesis, Univer- sity of Stirling. Available from: http://www.cs.stir.ac.uk/research/publications/ techreps/pdf/TR133.pdf. (Cited on pages3,4, 35, and 67.)

Massa, P. and Avesani, P. (2005). Controversial users demand local trust metrics: an experimental study on epinions.com community. American Association for Artificial Intelligence. Available from: http://sra.itc.it/people/massa/publications/aaai_ Bibliography 105

2005_controversial_users_demand_local_trust_metrics_an_experimental_study_ on_epinions_com_community.pdf. (Cited on page 12.)

Massa, P.and Souren, K. (2008). Trustlet, open research on trust metrics. In Scalable Comput- ing: Practice and Experience, Scientific International Journal for Parallel and Distributed Computing, volume 9, pages 31–43. Available from: http://www.scpe.org/vols/vol09/ no4/SCPE_9_4_10.pdf. (Cited on pages7, 43, 58, 59, 61, and 62.)

Maurer, U. (1996). Modelling a public key infrastructure. In Bertino, E., editor, Proceedings of the 1996 European Symposium on Research in Computer Security, volume 1146, pages 325–350. Springer-Verlag. (Cited on pages6, 12, and 71.)

McKnight, D. and Chervany, N. (1996). The meanings of trust. Technical report misrc working paper series 96-04, University of Minnesota, Management Information Systems Research Center. (Cited on page5.)

Milgram, S. (1970). The small-world problem. In Sabini, J. and Silver, M., editors, The Individual in a Socal World - Essays and Experiments, pages 28–35. McGraw Hill, New York, NY, USA, 2nd edition. (Cited on pages 27, 35, and 67.)

Mitchell, C., editor (2005). Trusted computing. IET. (Cited on page 65.)

Mui, L. and Mohtashemi, M. (2002). A computational model of trust and reputation. In In 35th Hawaii International Conference on System Science (HICSS. (Cited on page5.)

Möller, J.-S., Manns, S., Binner, S., Horstkötter, R., Fletcher, C., and Zaq, E. (2009). openSUSE weekly news, issue 66. http://en.opensuse.org/OpenSUSE_Weekly_News/66. (Cited on page3.)

Newman, M. E. (2001). The structure of scientific collaboration networks. Proc Natl Acad Sci U S A, 98(2):404–409. Available from: http://dx.doi.org/10.1073/pnas.021544898. (Cited on page 68.)

Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45:167–256. (Cited on pages 44, 67, and 68.)

Open Source Initiative (1998). Open source definition. Available from: http://opensource. org/docs/osd. (Cited on page1.)

Pagden, A. (1990). The destruction of trust and its consequences in the case of eigh- teenth century naples. In Gambetta [1990c], chapter 8, pages 127–142. Avail- able from: http://www.nuffield.ox.ac.uk/users/gambetta/Trust_making%20and% 20breaking%20cooperative%20relations.pdf. (Cited on page3.)

Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project. (Cited on pages 12, 13, 15, 16, 25, and 77.) 106 Bibliography

Pennock, D. M., , G. W., Lawrence, S., Glover, E. J., and Giles, C. L. (2002). Winners don’t take all: Characterizing the competition for links on the web. In Proceedings of the National Academy of Sciences, pages 5207–5211. (Cited on page 68.)

Pilato, C., Collins-Sussman, B., and Fitzpatrick, B. (2008). Version Control with Subversion. O’Reilly Media, Inc. Available from: http://svnbook.red-bean.com/nightly/en/index. html. (Cited on page 31.)

Pirzada, A. A. and McDonald, C. (2004). Establishing trust in pure ad-hoc networks. In Estivill- Castro, V., editor, Twenty-Seventh Australasian Computer Science Conference (ACSC2004), volume 26 of CRPIT, pages 47–54, Dunedin, New Zealand. ACS. (Cited on page6.)

Poeml, P.(2008a). The MirrorBrain. http://mirrorbrain.org/. (Cited on page 31.)

Poeml, P. (2008b). Scaling your download infrastructure with your succes. Presentation at ApacheCon Europe 2008. Available from: http://mirrorbrain.org/files/talks/ apachecon08-mirrors.pdf. (Cited on page 33.)

Price, D. J. d. S. (1976). A general theory of bibliometric and other cumulative advantage processes. J. Amer. Soc. Inform. Sci., 27:292–306. (Cited on page 68.)

Rapoport, A. (1957). Contribution to the theory of random and biased nets. Bulletin of Mathematical Biophysics, 19:257–277. (Cited on page 43.)

Rapoport, A. (1963). Mathematical models of social interaction. In Luce, D., Bush, R., and Galanter, E., editors, Handbook of Mathematical Psychology, volume 2. New York, NY, USA: Wiley. (Cited on page6.)

Rapoport, A. (1968). Cycle distribution in random nets. Bulletin of Mathematical Biophysics, 10:145–157. (Cited on page 43.)

Raymond, E. S. (2001). The cathedral and the bazaar: musings on Linux and open source by an accidental revolutionary. O’Reilly & Associates, Inc., Sebastopol, CA, USA. Available from: http://www.catb.org/~esr/writings/cathedral-bazaar/. (Cited on page1.)

Reiter, M. K. and Stubblebine, S. G. (1997a). Path independence for authentication in large- scale systems. In In ACM Conference on Computer and Communications Security, pages 57–66. ACM Press. (Cited on page6.)

Reiter, M. K. and Stubblebine, S. G. (1997b). Toward acceptable metrics of authentication. In In Proceedings of the 1997 IEEE Symposium on Research in Security and Privacy, pages 10–20. (Cited on pages6 and 13.)

Richardson, M., Agrawal, R., and Domingos, P.(2003). Trust management for the semantic web. In In Proceedings of the Second International Semantic Web Conference, pages 351–368. (Cited on pages6, 12, and 36.) Bibliography 107

Ruby, S., Thomas, D., Hansson, D. H., Breedt, L., Clark, M., Gehtland, J., Davidson, J. D., and Schwarz, A. (2009). Agile Web Development with Rails. The Facets of Ruby Se- ries. Pragmatic Bookshelf, 3rd edition edition. Available from: http://www.pragprog. com/titles/rails3/agile-web-development-with-rails-third-edition. (Cited on pages 32 and 36.)

Ruderman, J. (2004). A comparison of two trust metrics. (Cited on pages 16 and 21.)

Sabater, J. and Sierra, C. (2005). Review on computational trust and reputation models. Artif. Intell. Rev., 24(1):33–60. (Cited on pages 43 and 58.)

Scacchi, W. (2007). Free/open source software development: recent research results and emerging opportunities. In ESEC-FSE companion ’07: The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pages 459–468, New York, NY, USA. ACM. Available from: http: //doi.acm.org/10.1145/1295014.1295019. (Cited on page1.)

Schröder, M. (2007). The gory details of the build service backend. Pre- sentation at 7th Free and Open source Software Developers’ European Meet- ing (FOSDEM). Available from: http://files.opensuse.org/opensuse/en/f/fa/ BuildServiceBackend-FOSDEM2007.pdf. (Cited on page 33.)

Schröder, M., Schröter, A., and Schumacher, C. (2006). The opensuse build service: building software for your linux system. In Spenneberg, R., editor, Proceedings of the 13th Interna- tional Linux System Technology Conference, page 179. GUUG e.V., Lehmanns. Available from: http://www.lob.de/isbn/9783865411709. (Cited on page 33.)

Scott, J. (2000). Social Network Analysis: A Handbook. Sage Publications, London, 2nd edition. (Cited on page 67.)

Sen, P.,Dasgupta, S., Chatterjee, A., Sreeram, P.A., Mukherjee, G., and Manna, S. S. (2003). Small-world properties of the indian railway network. Phys. Rev. E, 67(3):036106. (Cited on page 68.)

Shapiro, D. L., Sheppard, B. H., and Cheraski, L. (1992). Business on a handshake. Negotiation journal, 8(4):365–377. (Cited on page5.)

Shapiro, S. P.(1987). The social control of impersonal trust. The American Journal of Sociol- ogy, 93(3):623–658. Available from: http://www.jstor.org/stable/2780293. (Cited on page4.)

Solomonoff, R. and Rapoport, A. (1951). Connectivity of random nets. Bulletin of Mathemati- cal Biophysics, 13:107–117. (Cited on page 43.)

Stallman, R. M. and Gay, J. (2002). Free Software, Free Society: Selected Essays of Richard M. Stallman. Free Software Foundation. Available from: http://www.gnu.org/doc/TOC-FSFS. html. (Cited on page1.) 108 Bibliography

Stewart, K. J., Darcy, D. P.,and Daniel, S. L. (2005). Observations on patterns of development in open source software projects. SIGSOFT Softw. Eng. Notes, 30(4):1–5. (Cited on page1.)

Su, J. (1999). Trust vs. threats: recovery and survival in electronic commerce. In in 19th IEEE International Conference on Distributed Computing Systems, pages 5–6307. (Cited on page4.)

Sztompka, P. (1999). Trust: A Sociological Theory. Cambridge, UK: Cambridge University Press. (Cited on page5.)

The Open Group (2004). The single unix specification, version 3 (incorporating ieee std 1003.1 and iso/iec 9945). Available from: http://www.unix.org/version3/. (Cited on pagex.) Uslaner, E. M. (2002). The Moral Foundations of Trust. Cambridge, UK: Cambridge University Press. (Cited on page3.)

Wasserman, S. and Faust, K. (1994). Social Network Analysis. Cambridge University Press, Cambridge. (Cited on page 67.)

Watts, D. (1999a). Small worlds. Princeton University Press, Princeton. (Cited on pages 43, 48, and 85.)

Watts, D. J. (1999b). Networks, dynamics, and the small-world phenomenon. The Ameri- can Journal of Sociology, 105(2):493–527. Available from: http://dx.doi.org/10.2307/ 2991086. (Cited on pages 43, 48, and 85.)

Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ’small-world’ networks. Na- ture, 393(6684):440–442. Available from: http://dx.doi.org/10.1038/30918. (Cited on pages 43, 48, 68, and 85.)

Whitby, A., Jøsang, A., and Indulska, J. (2004). Filtering out unfair ratings in bayesian rep- utation systems. In Proceedings of the Workshop on Trust in Agent Societies, at the Au- tonomous Agents and Multi Agent Systems Conference (AAMAS2004), New York, NY, USA. Available from: http://persons.unik.no/josang/papers/WJI2004-AAMAS.pdf. (Cited on page 72.)

Wilhelm, U. G., Staamann, S., and Buttyán, L. (1998). On the problem of trust in mobile agent systems. In In Symposium on Network and Distributed System Security. Internet Society, pages 114–124. Internet Society. (Cited on page4.)

Wittgenstein, L. (1977). On Certainty - Über Gewissheit. Basil Blackwell, Oxford. (Cited on page3.)

Yamamoto, Y. (1990). A morality based on trust: Some reflections on japanease morality. Philosophy East and West, XL(4):451–469. (Cited on pages3 and4.)

Zhou, R. and Hwang, K. (2007). Powertrust: a robust and scalable reputation system for trusted peer-to-peer computing. IEEE Transactions on Parallel and Distributed Systems, 18:460–473. Available from: http://dx.doi.org/10.1109/TPDS.2007.1021. (Cited on page 19.) Bibliography 109

Ziegler, C.-N. and Golbeck, J. (2007). Investigating interactions of trust and interest sim- ilarity. Available from: http://www.informatik.uni-freiburg.de/~cziegler/papers/ DSS-07-CR.pdf. (Cited on page 21.)

Ziegler, C.-N. and Lausen, G. (2004). Spreading activation models for trust propagation. In EEE ’04: Proceedings of the 2004 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE’04), pages 83–97, Washington, DC, USA. IEEE Computer Society. Avail- able from: http://www.informatik.uni-freiburg.de/~dbis/Publications/04/EEE04. html. (Cited on pages6, 11, 13, 14, 23, and 83.)

Ziegler, C.-N. and Lausen, G. (2005). Propagation models for trust and distrust in social networks. Information Systems Frontiers, 7(4-5):337–358. Available from: http://www. informatik.uni-freiburg.de/~cziegler/papers/ISF-05-CR.pdf. (Cited on pages 26 and 45.)

Zimmermann, P.R. (1995). The Official PGP User’s Guide. MIT Press, Boston, MA, USA. (Cited on pages6 and 37.)