Efficient Epistemic Updates in Rank-Based Belief Networks

Dissertation zur Erlangung des akademischen Grades eines Doktors der Philosophie (Dr. phil.)

an der

Geisteswissenschaftliche Sektion Fachbereich Philosophie

vorgelegt von Stefan Alexander Hohenadel

Tag der mündlichen Prüfung: 4. September 2012 1. Referent: Prof. Dr. Wolfgang Spohn 2. Referent: PD Dr. Sven Kosub

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-250406

Efficient Epistemic Updates in Rank-Based Belief Networks

Stefan Alexander Hohenadel

November 9, 2013

A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor in .

Department of Philosophy Faculty of Humanities This document was compiled on November 9, 2013 from revision 341 without changes. Abstract – The thesis provides an approach for an efficient update algorithm of rank-based belief networks. The update is performed on two input values: the current doxastic state, rep- resented by the network, and, second, a doxastic evidence that is represented as a change on a subset of the variables in the network. From these inputs, a Lauritzen-Spiegelhalter-styled up- date strategy can compute the updated posterior doxastic state of the network. The posterior state reflects the combination of the evidence and the prior state. This strategy is well-known for Bayesian networks. The thesis transfers the strategy to those networks whose semantics is specified by epistemic ranking functions instead of probability measures. As a foundation, the construction of rank-based belief networks is discussed, which are graphical models for rank- ing functions. It is shown that global, local and pairwise Markov properties are equivalent in rank-based belief networks and, furthermore, that the Hammersley-Clifford-Theorem holds for such ranking networks. This means that from the equivalence of the Markov properties it follows that a potential representation of the actual ranking function can be derived from the network structure. It is shown how by this property the update strategy of the Lauritzen- Spiegelhalter-algorithm can be transferred to ranking networks. For this purpose, the solution of the two main problems is demonstrated: first, the triangulation of the moralized input network and the decompositon of this triangulation to a clique tree. Then, second, message passing can be performed on this clique tree to incorporate the evidence into the clique tree. The entire approach is in fact a technical description of belief revision.

Zusammenfassung – Diese Dissertation stellt einen Ansatz für die effiziente Aktualisierung von Rangfunktion-basierten doxastischen Netzwerken vor. Die Aktualisierung eines doxasti- schen Netzwerks erfolgt auf der Basis von zwei Eingaben: dem aktuellen doxastischen Zu- stand, repräsentiert durch das Netzwerk, sowie einer doxastischen Evidenz, die als Änderung von Aktualwerten einer Untermenge von Variablen des Netzwerks formalisiert wird. Aus diesen Eingaben kann mithilfe der Strategie des Algorithmus von Lauritzen & Spiegelhalter der aktualisierte Folgezustand des Netzwerks berechnet werden, in welchem die Evidenz re- flektiert ist. Dieses Vorgehen ist für Bayessche Netze bereits seit langem bekannt und wird hier auf Netze angewandt, deren Semantik statt durch Wahrscheinlichkeitsmaße durch Rang- funktionen spezifiziert ist. Als Grundlegung wird auch die Bildung grafischer Modelle für Rangfunktionen, also Rangfunktion-basierter doxastischer Netzwerke ausführlich diskutiert. Unter anderem wird dabei gezeigt, dass globale, lokale und paarweise Markov-Eigenschaften in Rangfunktion-basierten Netzwerken äquivalent sind und dass weiterhin für solche Rang- netzwerke das Hammersley-Clifford-Theorem gilt. Dies bedeutet, dass aus der Äquivalenz der genannten Markov-Eigenschaften folgt, dass stets eine cliquen-basierte Potentialdarstel- lung der jeweiligen Rangfunktion aus der Netzwerkstruktur abgeleitet werden kann. Es wird gezeigt, wie durch diese Eigenschaft die Aktualisierungsstrategie des Lauritzen-Spiegelhalter- Algorithmus auf Rangfunktion-basierte Netzwerke übertragen werden kann. Hierzu wird die Lösung der beiden Hauptaufgaben gezeigt: Zunächst ist die Triangulierung des moralisierten Netzwerks sowie die Dekomposition des triangulierten Netzwerks zu einem Cliquenbaum auszuführen, danach kann auf dem so gewonnenen Cliquenbaum message passing ausgeführt werden, um die doxastische Evidenz in den Cliquenbaum zu inkorporieren. Dies entspricht einer technischen Darstellung des Revisionsvorgangs von Überzeugungszuständen.

5 6 Contents

Preface ...... 9

IB elief,Belief States, and Belief Change ...... 15

1.1 Introduction · 15 – 1.1.1 Remark on Sources and Citation · 16 – 1.2 A Normative Per- spective on · 16 – 1.2.1 Descriptive and Normative Perspective · 16 – 1.2.2 The and the Engineer · 17 – 1.2.3 Belief Revision and the Problem of Induction · 23 – 1.3 Elements of a Belief Theory · 26 – 1.4 Propositions as Epistemic Units · 28 – 1.4.1 The Concept of Proposition · 28 – 1.4.2 Propositions Form Algebras · 29 – 1.4.3 Atoms and Atomic Algebras · 32 – 1.4.4 Beliefs, Contents, and Concepts · 33 – 1.5 Epistemic States and Rationality · 37 – 1.5.1 Rationality Postulates · 37 – 1.5.2 Rational Belief Sets · 40 – 1.5.3 Belief Cores · 41 – 1.6 Transitions Between Epistemic States · 42 – 1.6.1 Descrip- tion of Epistemic Updates · 42 – 1.6.2 Transition by Consistent Evidence · 43 – 1.6.3 The Inconsistent Case · 44 – 1.6.4 The Transition Function · 45

IIR anking Functions and Rank-based Conditional Independence ...... 47

2.1 Introduction · 47 – 2.2 Ranking Functions · 48 – 2.2.1 Ranking Functions on Pos- sibilities · 48 – 2.2.2 Negative Ranking Functions · 49 – 2.2.3 Minimitivity, Completeness, and Naturalness · 52 – 2.2.4 Two-sided Ranking Functions · 56 – 2.2.5 Conditional Nega- tive Ranks · 57 – 2.2.6 Conditional Two-sided Ranks · 62 – 2.2.7 A Digression on Positive Ranking Functions · 65 – 2.3 Conditionalization and Revision of Ranking Functions · 68 – 2.3.1 Plain Conditionalization · 68 – 2.3.2 Spohn-conditionalization · 70 – 2.3.3 Shenoy- conditionalization · 73 – 2.4 Rank-based Conditional Independence · 74

IIIG raphical Models for Ranking Functions ...... 81

3.1 Introduction · 81 – 3.1.1 Content of this Chapter · 81 – 3.1.2 Historical Remarks · 82 – 3.1.3 Measurability and Variables · 87 – 3.1.4 Algebras Over Variable Sets · 90 – 3.1.5 Graph- theoretic Preliminaries · 91 – 3.2 Graphoids and Conditional Independence Among Vari- ables · 100 – 3.2.1 Conditional Independence Among Variables · 100 – 3.2.2 RCI is a Gra- phoid · 101 – 3.2.3 Agenda · 103 – 3.3 Ranking Functions and Their Markov Graphs · 104 – 3.3.1 D-Maps, I-Maps, and Markov Properties · 104 – 3.3.2 Markov Blankets and Markov Boundaries · 106 – 3.4 Ranking Functions and Given Markov Graphs · 109 – 3.4.1 Po- tential Representation of Negative Ranking Functions · 109 – 3.4.2 Representations of Negative Ranking Functions by Markov Graphs · 111 – 3.5 RCI and Undirected Graphs · 118 – 3.6 Ranking Networks · 119 – 3.6.1 DAGs as Graphical Models · 119 – 3.6.2 Strict Linear

7 Contents

Orderings on Variables · 121 – 3.6.3 Separation in Directed Graphs · 122 – 3.6.4 Directed Markov Properties · 123 – 3.6.5 Factorization in Directed Graphs · 128 – 3.6.6 Potential Representation of Ranking Networks · 130 – 3.7 Perfect Maps of Ranking Functions · 131 – 3.7.1 Ranking Functions and DAGs · 131 – 3.7.2 Characterization of CLs that have Perfect Maps · 133 – 3.7.3 Outlook: Can CLs Be Made DAG-isomorphic? · 136

IVB elief Propagation in Ranking Networks...... 137

4.1 Introduction · 137 – 4.2 Hunter’s Algorithm for Polytrees · 140 – 4.3 The LS- Strategy on Ranking Networks: An Outline · 142 – 4.4 Phase 1 – Triangulation and De- composition of the Network · 146 – 4.4.1 Methods for Obtaining the Clique Tree from the Initial Network · 146 – 4.4.2 Triangulating Graphs: A Brief Survey · 149 – 4.4.3 Desired Criteria for Triangulations of Graphs · 153 – 4.4.4 Generating the Elimination Ordering · 155 – 4.4.5 The MCS-M Algorithm · 158 – 4.4.6 Determining the Set of Cliques of the Fill-In-Graph · 159 – 4.4.7 Inline Recognition of Cliques · 161 – 4.4.8 Inline Tree Construction · 166 – 4.4.9 An Algorithm for Decomposing a Moralized Ranking Network · 171 – 4.4.10 A Digression on Triangulatedness and the Epistemic Domain · 174 – 4.5 Phase 2 – Message Passing on the Clique Tree · 176 – 4.5.1 Local Computation on Cliques · 176 – 4.5.2 Locally Available In- formation · 182 – 4.5.3 Pre-Initializing the Permanent Belief Base · 183 – 4.5.4 Bottom-Up Propagation: Conditional Ranks of the Cliques · 184 – 4.5.5 Top-Down Propagation: Joint Ranks of the Separators · 186 – 4.5.6 Processing Update Information · 188 – 4.5.7 Queries on the Clique Tree · 188 – 4.6 Conclusion · 189 – 4.6.1 Achievements · 189 – 4.6.2 Remarks on Aspects Not Discussed · 191 – 4.7 Outlook: Learning Ranking Networks and Induction to the Unknown · 191

AAC omputed Example for Decomposition ...... 193

BAC omputed Example for Updating ...... 199

2.1 The Ranking Network · 199 – 2.2 Initialization Phase · 200 – 2.2.1 Going Bottom-Up: Computing the Conditional Ranks · 203 – 2.2.2 Going Top-Down: Joint Ranks of the Cliques · 205 – 2.3 Update By New Evidence · 206

Acknowledgements ...... 209

Index of Definitions ...... 211

Index of Symbols ...... 215

Index of Algorithms ...... 219

Definitions and Theorems from “The Laws of Belief”...... 221

Literature ...... 225

Index ...... 251

8 Preface

This thesis introduces an efficient algorithm for iterated belief change. It is based on the concept of belief modelled by the devices of ranking theory as developed by Wolfgang Spohn since 1983 in a series of papers and recently presented comprehensively in his (2012). The algorithm will introduce a concrete description of how a prior epistemic state is changed by an available evidence into a posterior epistemic state, where epistemic states are imple- mented by Spohnian ranking functions among a set of variables. Since ranking functions are the foundation of the formal modeling of epistemic states, we will speak of “rank-based” belief states and an algorithm for “epistemic updates”. The current research for this thesis was inspired by the philosophical discussion of belief revision. However, the specific argumentation of this thesis is characterized by a high level of technical concreteness and uses ranking theory not as a philosophical position but as a mathematical framework for constructing data structures. The thesis is interdisciplinary in the sense that it enriches the work on epistemological problems utilizing devices common in computer science. Since the author does not wish to assign his inquiry specifically to either philosophy or computer science, nor recognizes any requirement to do so, he will just present his arguments without much theorizing on meta-levels. The main goal of this thesis is to develop a comprehensive algorithmical treatment of rank- based epistemic updates illustrated with a concrete proposal. The first step will be to show how belief states and evidences are represented by ranking functions. The second step will be to construct a graph-based data structure that represents the belief state adequately as a rank-based belief network, and in a third step the algorithm for updating the belief network is introduced. The algorithm requires a prior belief state and a new evidence as input and generates a posterior belief state reflecting the evidence as an output. While the first step is mostly reproductive in that it presents many argumentative material brought in previously by other authors, the second and the third step prevailingly introduce arguments developed originally by the author. An important inspiration for the argumentation of this thesis is the well-known concept of a Bayesian network. Bayesian networks are considered as a tool for representing belief states and furthermore a foundation for the mechanism of transition from a prior to a posterior belief state. This transition is considered to be triggered by the accessibility of new evidence. Ongoing research over the last 25 years led to a differentiated understanding of the strengths and weaknesses of Bayesian networks considering the representation of belief states and the capabilities of epistemic updates in relation to different fields of application.

9 Preface

We will represent epistemic states by ranking functions instead of probability measures, but nevertheless we will utilize many known facts about Bayesian networks that serve as an important blueprint for modeling the update mechanism. During the work on this thesis, the author was frequently confronted with the requirement of re-introducing rank-based versions of concepts already common in probability theory. This requirement always contains the danger of drifting into “re-inventing the wheel”, yet it was very insightful to start formal theorizing about ranks on the very basic foundations. The author decided to not only provide his arguments but also a general introduction to the topic. Therefore, we will explicitly discuss the minimal set of indispensable algebra basics and, later on, also address some specific graph-theoretic problems, such as finding a perfect vertex ordering or complete recognition of cliques in a triangulated graph. Those topics are neither part of ranking theory in particular nor yet of epistemology in general, but they rather play an important role in the argumentation, which means that they had to be respected for presentation in an adequate manner. It is therefore characteristic for this thesis that it utilizes formal tools of measure theory, graph-theory and algorithmics for a contribution to formal epistemology. Instead of just compiling relevant research results about Bayesian networks for the use on an epistemological topic, the author transfers already available knowledge to the case of ranking theory with the aim to make a progress in the research topic of ranking theory towards a general mechanism of efficient updating. What this thesis does not do is to introduce a philosophical position of the author. Instead, the author concentrates on showing that ranking theory, taking it as a philosophical position on its own, is well-suited to provide a foundation of concrete applications in computer science. This of course has implications for epistemology since it shows on a very concrete formal level how iterated belief change works. The contribution therefore is the transfer of research results from probability theory to the context of ranking theory as well as extending ranking theory with new concepts and arguments. To the best knowledge of the author, no other publication has described an update algorithm for multiply connected rank-based networks so far. Nonetheless, as we will see later, highly relevant considerations of updates for singly connected networks have already been discussed insightfully in the works of Daniel Hunter. ChapterI of this thesis presents a sketch of the philosophical field of questions to which this thesis intends to contribute, and introduces the relevant concepts formally, but only on a very high level of abstraction. Additionally, chapterI sketches rather shortly the basic problems of belief revision with only a few pointers to the connected philosophical discussions1. It also introduces the algebraic foundation of belief theory in the form of propositional algebras. It is argued why propositions are supposed to be the objects of belief during this inquiry and how propositions are formally defined. The chapter further defines epistemic states and epistemic updates from the most coarse-grained abstraction level.

1Those discussions will be mostly known to readers with a more philosophical background. Readers more common with the parts of the work related to computer science may feel that the historical and theoretical background of the philosophical problems may not seem to be highly relevant for the concrete arguments.

10 Preface

The concepts and many of the arguments in chapterI are not invented originally by the au- thor. They are mostly reproduced from chapters 1, 2 and 4 of (Spohn, 2012) but with a strong focus on the formal and technical aspects, leaving out most of the references to the philosoph- ical discussion. The author added a brief reflection about the use of “engineering-like” formal tools in the philosophical domain (section 1.2.2) and a short digression about belief content and the related discussion originating from the arguments of Kripke and Putnam (section 1.4.4). The entire other argumentative substance in the chapter is owed to Wolfgang Spohn. ChapterII presents the most important aspects of the current research status in ranking theory. It formally introduces ranking functions among propositions and discusses the most basic formal extensions and variations that were introduced into discussion so far. Namely there are two-sided ranking functions proposed in (Spohn, 2012, p. 76)2 as well as the vary- ing concepts of conditional ranks. Additionally, two different methods of conditionalization (Spohn-conditionalization and Shenoy-conditionalization) are discussed that directly corre- spond to Jeffrey- and Field-conditionalization in probability theory. Other important contents described in this chapter are for instance a rank-based version of Bayes’ theorem and a “chain rule” for ranks. For the formal concepts that are introduced, additionally some notable prop- erties are shown. The chapter compiles the relevant material that was widely published before, also dis- cussing distinct variations in terminology and also properties of the formal concepts. Taken on its own, chapterII should be adequate as a general introduction to the mathematical foun- dation of ranking theory. Like chapterI, it is mostly reproductive and contributes compiled introduction of formal material. The arguments and concepts presented have already been described by other authors, mainly Wolfgang Spohn. It seems very reasonable to the author to present a compiled version of this material for different reasons:

1. A clean and detailed introduction of all formal concepts, adjusted in terminology and formal language to a uniform manner, is very important for a concise presentation of the arguments of this thesis. Since the formal concepts stem from different discussions and multiple disciplines, the author considered it necessary to introduce all relevant concepts to ensure uniformity. For this reason, chapterII introduces ranking theory while chapter III introduces a considerable set of concepts from graph theory.

2. Before the advent of (Spohn, 2012), the foundation of ranking theory was scattered about a variety of texts; some of them were separated from each other by many years during which the concepts underwent some variation concerning the conventions of naming, notation, and also mathematical properties. (For instance, Wolfgang Spohn at one time decided to use natural numbers instead of ordinal numbers as codomain for ranking functions. Additionally, a variety of concepts representing conditional ranks were proposed, mostly with different properties.) Presenting own introductions of the formal concepts ensures maximal clarity about what concepts are to be discussed. Of course, this is cum grano salis since the reproduction obviously suffers from a lack of

2Confer definition 5.12.

11 Preface

philosophical depth, which is a consequence of considering ranking functions as mere material for algorithmic work. Whenever Wolfgang Spohn added insightful comments on why he introduces things in a particular manner and not in the style of argument x, the author of this thesis mostly just joined his view whenever it did not entail difficulties in the algorithmic part. One could also state that the thesis is written from an engineer- ing point of view – nevertheless, this fact is philosophically reflected in the beginning of chapterI.

3. The reader not familiar with ranking theory may use chapterII as both, a general in- troduction to this topic and a reference. Therefore, chapterII and partly chapter III are written in the style of a textbook and should be easy to read for readers used to formal concepts. In general, the proof level of chapterII is extraordinary low. The level of some proofs is surely almost trivial for readers with more than the most basic mathe- matical knowledge and indeed, along some passages in this chapter, the argumentation progresses quite slow. Although, it may demand some patience from the reader with appropriate background, this is perfectly congruent with the intention of the author since he wants to show from which basic mathematical structures the rank-based con- cepts originate, without leaving behind the reader less experienced in formal topics. Since all proofs are consequently marked, it is easy to just skip a particular proof, if the reader is not interested in it. In chapters III andIV, the proof level returns to a more conventional level.

4. Only a complete introduction of all concepts will give a “vertical” insight to the matter introduced. To clarify this: the most coarse-grained vertical perspective may state that the author considers propositions as an algebraic structure on which ranking functions can be defined. Ranking functions can form graphical data structures that can be shown to satisfy the Markov properties and can hence be updated by a Lauritzen-Spiegelhalter- style update algorithm. It was the aim of the author to make this vertical view as complete as possible.

The main contribution of chapterII is therefore to present the current research status in ranking theory as well as to introduce the topic to readers not already familiar with it. Chapter III uses the formal material introduced in the previous chapters to develop ranking networks as graphical models of ranking-functions. A ranking network is a data structure that represents the ranking function among a set of variables and is therefore the implementation of an epistemic state at a certain discrete time. In a formal respect, ranking networks are directed acyclic graphs with their vertices being conditional matrices of variables and their edges being subjectively valid relationships of causal influence between the variables. The network has a negative ranking function assigned that defines the conditional information for each vertex. The concept of a measurable variable is introduced along with many basic concepts from graph theory (which may not be common to all readers). Since it has already been shown that rank-based conditional independence is a graphoid3, the focus lies on what is consequently

3Confer (Hunter, 1991a) and recently (Spohn, 2012, p. 132) with theorem 7.10.

12 Preface the subsequent step: showing that the Markov properties hold for ranking networks. This is separately discussed for undirected as well as for directed graphs and for three different Markov properties. Those Markov properties are the formal precondition to make the up- date algorithm work since they enable the represented ranking function to be expressed by a potential representation. Some of the relevant proofs are either very similar to their counterparts in probability the- ory, or they are solvable on the mere graph-theory level at all, not requiring any arguments from ranking theory. Others are more sophisticated and quite non-trivial. For the same argu- mentation as pointed out for chapterII, this material is introduced in uniform terminology. The presumably most important contribution of chapter III is a rank-based version of the Hammersley-Clifford-theorem as it is introduced by theorem 3.67 on page 114. To the best knowledge of the author, the validity of this theorem in ranking theory was never shown before. ChapterIV shows how a transition from a prior to a posterior belief state can be formally represented and efficiently executed. The prior as well as the posterior belief state are thereby represented by ranking networks as they are defined in chapter III. The update process is separated in two phases: in the first phase, a “compilation” of the ranking network is required in which the permanent belief base is created (or in fact re- created) from the prior ranking network. The permanent belief base is a data structure that represents the prior belief state in a technically feasible manner. It is ensured to be a tree (in the graph-theoretic sense of the word), which ensures that well-known techniques for tree updates can be applied to incorporate the evidence. The compilation phase is only required if the evidence has modified the structure of the network by the insertion or deletion of edges or vertices. If the evidential input does not change the network structure, recompiling can be omitted, although a compilation definitively has to be performed at least once when the iterated update process starts. Technically, the compilation phase decomposes the ranking network to a clique tree and computes a potential representation of the ranking function that characterizes the prior belief state. This chapter presents adaption of graph algorithms like clique recognition, and the establishing of a perfect elimination ordering to derive a triangulation of the input network. The second phase is the “update phase”: it incorporates the evidential input in the perma- nent belief base. After the update phase is completed, the belief base reflects the evidential input. In technical terms, this can be performed by Pearl’s message passing algorithm, but the factual arithmetics has to be replaced by operations adequate for the ranking semantics of the vertices. The update algorithm works on ranking networks in general, which means, on each kind of directed acyclic graph without loops and multi-edges. Especially the network is not supposed to be singly connected. As the reader may already have noticed while reading the former paragraphs, the algorithm is heavily inspired by the technique of Lauritzen and Spiegelhalter for updating multiply connected Bayesian networks, and the author drew many benefits from the former works of Judea Pearl and Richard Neapolitan while developing it. Nonetheless, a number of more recent research results are also considered in this thesis. Improving Neapoli- tan’s approach, the author shows that the compilation phase can be completed by the run of

13 Preface one single procedure. ChapterIV ends with an outlook over further tasks that could be reformulated in specific rank-based terms, such as structure learning of ranking networks and implementation of further types of inference. The argumentation in the thesis leads from basic considerations of propositional belief for- mation towards a directly implementable system of permanent epistemic activity that keeps a belief base up-to-date with the available evidence by passively incorporating the new evi- dence.

14 I

Belief,Belief States, and Belief Change

1.1 Introduction

This chapter elaborates on the relationship between normative epistemology and engineering. It further describes the contribution this thesis provides to this field of research, and addition- ally introduces the basic notions of belief theory as they will be understood throughout the text. As already stated on page 11, the argumentative material in this chapter is not originally invented by the author but just tries to “distill” the most relevant parts of formal belief rep- resentation for a brief but sufficient introduction. As a blueprint, the author used mainly chapters 1, 2 and 4 of (Spohn, 2012). Section 1.2 contains the discussion of the perspective of normative epistemology about which the thesis is written. Some of the commonalities with general engineering are em- phasized. The section is not intended to comprehensively discuss the conditions of normative epistemology but rather it seeks to underline why normative epistemology is attractive for working on philosophical topics by the devices provided by mathematics and computer sci- ence. Since this thought perhaps seems unintuitive at first, an introductory remark about this aspect may be reasonable. Section 1.3 describes the belief revision system the thesis will develop from an intuitive view, emphasizing the mechanistic perspective that will successively become more dominant in the later chapters and eventually supersede the philosophical considerations. Section 1.4, as already stated, introduces a framework for representing propositional belief that is suitable to technical implementation. It reproduces main ideas introduced by Wolfgang Spohn in chapters 2 and 4 of his (2012). Sections 1.5 and 1.6 reproduce a subset of the argumentative material Spohn gives about the rationality of belief sets and the transition between rational epistemic states. We will concentrate on the more technical parts and leave out most of the references to the history of analytical philosophy. The sections may be seen as a “foreshadowing” on the implementation described in chapters III andIV.

15 IBelief,Belief States, and Belief Change

1.1.1 Remark on Sources and Citation

In this chapter, we will introduce an algebraic framework to represent belief states and be- lief change. This framework was developed by Wolfang Spohn in chapter 2 of his (2012). Although, Spohn makes also use of some ideas that are quite common in the particular dis- cussion about belief revision and the adquate representation of belief, the author of this thesis used Spohn’s work as main orientation for his own writing in this chapter. It is a foremost goal of this thesis to keep compatibility with Spohn’s concepts. In the remainder of this chapter, we will therefore mainly introduce those of Spohn’s formal con- cepts important for our aims, but without reproducing each of his arguments considering the epistemological aspects. Instead, the author discusses the concepts in his own words. Section 1.2 presents a genuine argumentation of the author of this thesis with the excep- tion of subsection 1.2.3, whose arguments are mainly owed to section 1.1 of (Spohn, 2012). Especially the essential “two questions” on page 25 were heavily inspired from equivalent questions found in (Spohn, 2012, p. 6). Section 1.3 introduces nearly the same base elements as Spohn does and should therefore be read as just stating the starting point from already well-known concepts. The idea to use Hintikka’s (minimal) requirements of rationality was taken from (Spohn, 2012, p. 48). Subsection 1.4.4 presents a genuine argumentation of the author of this thesis. The reader should explicitly acknowledge that the turns made in section 1.4 were completely based on the concepts introduced in (Spohn, 2012, chapter 2) and that sections 1.5 and 1.6 entirely consist of material compiled from the sections 4.1 and 4.2 in (Spohn, 2012, chapter 4) but completely presented in own words of the author. As already stated above, the author of this thesis tried to distill the mere formal parts of Spohn’s arguments whenever possible while commenting them in his own words. Therefore, explicit citations do not occur frequently throughout sections 1.4 – 1.6. To ensure lucidity about the sources, all theorems and definitions stemming from (Spohn, 2012) are presented in an index starting on page 221, where the source of each item used is made explicit. The quite generous reproduction of Spohn’s formal concepts seems sensible to the author because in the later chapters, he will connect them to concepts from different research fields and he wishes to keep maximal formal consistency throughout the entire thesis.

1.2 ANormative Perspective on Epistemology

1.2.1 Descriptive and Normative Perspective

Before we take a concrete, detailed, and well-structured look on the questions that form the agenda of this thesis, it has to be pointed out what implications our perspective on the topic will have. A normative perspective focusses on what rational beings should believe, given certain cir- cumstances, and what they should not. This is different from the perspectives of most special sciences like sociology and, mostly, psychology, which just describe the factual belief states of the subject. We will therefore call the perspective of these disciplines the descriptive perspec- tive.

16 1.2 ANormative Perspective on Epistemology

While a descriptive perspective is interested in understanding and explaining the way how beliefs actually are formed and modified in terms of a particular special science, the normative perspective tries to point out how these processes should function to be acceptable as rational. A descriptive perspective considers the scientific facts and its interest concentrates on the empirical conditions related to beliefs and their dynamics in respect to the concepts of some special science. The many fascinating questions concerning the empirical aspects of beliefs are scattered widely about the subjects of special sciences, such as the neurosciences, cognitive psychology, linguistics and sociology, to state only the most obvious ones. The perspective that focusses on the concrete empirical questions is by convention not a genuine philosophical perspective, although, its observations take influence on philosophical questions. The normative perspective tries to find out what beliefs we should acquire under certain circumstances and consequently presupposes the existence of rules that enable us to sketch idealizations about the dynamics of beliefs. The prescriptive aspect in the word “should” im- plies the presence of such an idealization because otherwise no prescription would be justified. The normative perspective is not interested in practical modifications of cognitive mechanisms but just in gaining a lucid conception of rational belief change. It is a notable fact that engi- neering disciplines like computer science with its topics of machine learning and data mining, to state only two, have a strong family resemblance with normative epistemology concerning the normative perspective. Idealization about the dynamics of belief enables us to draw a distinction between belief states that are sufficiently justified, correct, coherent, consistent, and which are thus reasonable or at least rational, and other belief states. Because all these adjectives are mortgaged with non-neutral theory contents, in other words, a normative perspective on belief theory claims to be able to make a structured distinction between belief revision processes that would need correction to some extent to fit ideal conditions and belief revision processes that would not need any such correction at all. This sounds rather vague, because nothing is said so far about the criteria for this distinction. The terms “correction” and “ideal conditions” appear to be mere placeholders. Furthermore, it has been argued very lucidly by different authors like for instance Jaegwon Kim and Wil- frid Sellars, what normativity could mean and that the term “normative” is not atomic but describes many different ways in which a statement can be normative. We will not engage in this discussions at this point. Regardless of any reasonable differentiation one could establish, it is sufficient for under- standing the arguments presented in this chapter that the characteristic aspect of the normative perspective sums up to the question what we should believe under which circumstances. The normative inquiry of belief tries to sketch a lucid picture of the rules or laws that hold for the unbiased acquisition of beliefs and in the course of the unfolding of my investigation, the vague character of the above stated sentence will disappear.

1.2.2 The Philosopher and the Engineer

At a first glance, it may seem that figuring out the details of the normative perspective falls naturally and primarily in the responsibility of the philosopher. But this conjecture previously

17 IBelief,Belief States, and Belief Change has been (and currently remains ) the subject of dispute. The arguments most interesting for our particular topic were raised in context of a discus- sion about a naturalized version of normative epistemology. Naturalists usually argue that epistemology is a technical discipline. A common objection against this conjecture is that epistemology has to contain normative parts and that it would be not capable for this discipline to be normative in the sense it need to be. One of the responses of the naturalists was that epistemology could be normative in the sense engineering disciplines are normative. The locus classicus is Quine’s short (1986), where he argues that “normative epistemology” is “a branch of engineering”, a mere “technology of truth-seeking”:

“Naturalization of epistemology does not jettison the normative and settle for the indiscriminate description of ongoing procedures. For me normative epistemol- ogy is a branch of engineering. It is the technology of truth-seeking, or, in a more cautiously epistemological term, prediction. Like any technology, it makes free use of whatever scientific findings may suit its purpose. It draws upon mathe- matics in computing standard deviation and probable error and in scouting the gambler’s fallacy. It draws upon experimental psychology in exposing perceptual illusions, and upon cognitive psychology in scouting wishful thinking. It draws upon neurology and physics, in a general way, in discounting testimony from occult or parapsychological sources. There is no question here of ultimate value, as in morals; it is a matter of efficacy for an ulterior end, truth or prediction. The normative here, as elsewhere in engineering, becomes descriptive when the terminal parameter is expressed. We could say the same of morality if we could view it as aimed at reward in heaven.” (Quine, 1986, p. 664f)

Instead of reading Quine as if he would agree to factually exclude normative epistemology from the core interests of philosophy, the discussion recognized the proposal, that normative epistemology could be “naturalized” by emulating engineering disciplines. For example, an engineer who builds a bridge knows by which properties a “good” bridge is characterized and what he has to do to build a “good” bridge. In the same way, an epistemologist knows what is a “good” revision and how “good” revision can be made even “better”. Quine’s “engineering reply” to the objection of naturalized epistemology was itself a subject of discussion, see for instance (Wrenn, 2006), where also a lucid catalogue of the different types of normativity in question is provided. We will not join this discussion since it is not directly relevant for what we are about to do. On the other hand, this is in brief exactly what we will do in the remainder of this in- quiry: doing epistemology on an engineering level, using the formal devices provided by engineering-like disciplines, namely computer science and mathematics. It is therefore nonetheless reasonable to comment on the conditions of such an approach. Quine is undoubtedly right in stating that many “branches of engineering”, especially but not only in computer science, focus on what could be justifiably entitled to be called “truth- seeking” on a level that could be called “technological”. Examples for such applications are neither seldom nor special. Think of every day require- ments like a database system that has to preserve data consistency during update procedures,

18 1.2 ANormative Perspective on Epistemology the Bayesian filters in most email clients that can learn to make good decisions about which of the messages you receive are undesired, or the fuzzy in a digital camera that tries to gen- erate exceptionally good looking pictures of what seems to be reality. The internal processing logic of the camera implements a calculus to decide which visual effects are undesired and which are acceptable. Any knowledge representation system regardless of its purpose has to implement basic requirements of rationality, at least consistency. It is not very surprising that most sciences rely on their own techniques of truth seeking if we remember that the occidental conception of science is directly and inseparably connected to truth. While the normative part of “truth-oriented” philosophy analyzes conceptions of truth, most sciences develop truth conceptions implicitly. They are to be implemented in algorithms, heuristics and strategies within the conceptual scope of the particular discipline or subdis- cipline. One may see this as a “technological” view on truth. But the analysis of the truth conception of a specific discipline is usually not part of the discipline itself with philosophy being the remarkable exception. We will return to this thought a little later. The differences in perspective and aim of analysis between philosophy and engineering seem to induce a quite lucid distinction between the different spheres of competence: the philosopher’s task seems to be analyzing what truth “is” and the technologist’s task seems to be to develop techniques to “generate” true assertions from knowledge. This seems to be clear and fair on first sight. But one should not conclude from this observation that the philosopher is supposed to cede a part of the authority concerning the investigation of truth to the protagonists of other sci- ences. Unless one rejects the idea that normative reasoning about truth is a philosophical task, the philosopher is also addressed when it comes to the question which beliefs can – read: should – be legally generated from beliefs that are already accepted as a part of a subjective knowledge base. Our contemporary experience in this fields shows that mathematics, statis- tics, and especially computer science, provide very useful devices for working out the details of this question. When philosophy develops a formal conception of truth, it is a legitime question to the if this conception is applicable in a practical sense. This, in particular, includes the question if it can be implemented technologically. One may conclude that the imple- mentation is not a specific philosophical task, but once the legitimateness of the question of implementability is accepted, it becomes impossible to draw a sharp demarcation line between the philosopher’s part and the engineer’s part in the subject of normative theory of reasoning. This, of course, sounds like a feuilleton argument but it can be stated more precise by consid- ering the relationship to rationality that is maintained in philosophy on the one hand and in engineering on the other hand. The best example for engineering in this context is computer science because, among other topics, computer science tries to implement conceptions of intelligence, to understand the formal processes of decision making and inductive inference and to provide systems with the ability to apply these techniques to unknown situations. This seems to directly imply an understanding of at least some currently unexplained capabilities of the human mind. It may seem that either this claim, formerly the legal territory of philosophy, was usurped by a another discipline or – in an even more pessimistic way – that this shows that at last

19 IBelief,Belief States, and Belief Change philosophy turned out to be of no more importance for finding out how thinking works. Computer science is as closely connected to logic and linguistics as philosophy. One may refuse to consider computer science as a mere engineering discipline, thinking of subjects like complexity theory and formal languages, which seem to reside in the competence sphere of computer science without being very “engineering-like”. However, it is undebatable that com- puter science is strongly influenced by engineering paradigms. But in contrast to philosophy, computer science does not model rationality from a reflexive point of view, it just uses – as every engineering discipline – an implicit normative perspective which is motivated by the search for techniques to gain precise and good results within the scope of its domain. What “good” results are is in most cases beyond discussions: it is simply provided as a definition. For the philosopher a “good” solution to an issue may be a one that does not contain inconsistencies, fits acceptably well with already existing attractive explanations of related issues, does not rise heavy contradictions with existing attractive models of explanation, and is, however, acceptably simple. One may find other criteria and may also argue that they are not uniformly accepted in the entire community but at least, there is some minimal consensus about the properties listed above. This is also true for computer science, but here, the rules are much stricter: Whoever enters the community claiming to have a “good” solution to a problem has to face three types of questions.

1. Is the proposed solution less resource-consuming than any existing concurrent solution? (Which means: is it asymptotically faster – or at least empirically faster, for typical situations? Does it need asymptotically less memory or bandwidth?)

2. Does the proposed solution improve the quality of results over any existing concurrent solution? (Which is of special importance on solution types based on heuristics or domains with different conventions of modeling.)

3. Is the proposed solution simpler as any existing concurrent solution? (Which means: is it easier to understand or to implement?)

Each of these questions targets an aspect of what computer science considers as relevant aspects of scientific progress in its scope. If not at least one of these questions can be answered positively, ideally supported by ap- propriate theoretical proofs or strong arguments and empirical tests, it will typically not be accepted as a valid improvement. The reason for this assessment obviously is that in case of three times “no” the proposal does typically neither promise a contribution to scientific progress nor to practical use. Hence it seems, computer science does not try to reflect on rationality but merely tries to teach machines to make rational decisions, supposing that rationality is already defined. But this is far too brief: in fact, there are more sciences of “rationality” than only philosophy. No engineer can implement conceptions, neither in software nor in hardware, that are not fully understood and formally clear. This does not require an engineer to understand some mysteries of mind but it does require her to understand the particular conception of rationality

20 1.2 ANormative Perspective on Epistemology she uses to solve her concrete problem. For her claim, it must be completely defined “what it is like to be in a rational state”. This requirement is in a way weaker as well as stronger than the requirement of the philoso- pher. The philosopher is by tradition more interested in the questions but in exact answers. Therefore, the requirement of the engineer is stronger, because her conception of rationality criteria must be sufficiently concrete to implement it. The engineer has to understand the in- herent structure of the capability she intends to implement regardless if she uses fuzzy logic, probability theory, neural networks, machine learning algorithms, or other tools. The philosopher is interested in understanding, what rationality, well, “means”. This ad- dresses any rationality concept. In this regard, the claim of the philosopher is stronger, since her discipline always contains the critical reflection of the notions it uses. But it is obvious that the engineer’s claim comes down to exactly the same question as the philosopher’s: what is the inherent structure of the concept of rationality in consideration? This equivalence may consequently be interpreted as a kind of concurrence and it may there- fore seem that computer science and the cognition sciences contribute much more substance than contemporary philosophy does. It appear to be a side effect of scientific progress, that increasingly more of the questions for- merly seeming to be inherently philosophic turned out to be answerable using the conceptions of special sciences. Of course, this is a statement from the point of view the special sciences keep. Philosophy could answer that most of the questions currently discussed in the neurosciences were first developed in philosophy. Nonetheless, the rise of the neurosciences and computer science does show that not all philosophical questions are eternal. Some aspects indeed came down to be answerable by concrete approaches of special sci- ences that became feasible by sufficient scientific progress. And, currently it is an ongoing task for philosophy to incorporate the important scientific knowledge brought in by special sciences. Acknowledgement, analysis, and incorporation of the progress made by the special sciences is, and indeed must be, a permanent stimulus for contemporary philosophy. So far, it is lucid that philosophy and engineering share some questions and insofar it is correct to state that “truth-seeking” has many technological aspects. But how about the converse question: is any approach on rationality that uses formal devices an “engineering- like” approach? Does epistemology turn into an engineering discipline by using “engineering- like” devices? Of course, one may be seduced to join the naturalist view, accept normative epistemology as a mere technical discipline, and feel committed to the view that reflecting rationality and motivating epistemology is no more part of epistemology itself. Nonetheless, there are not only commonalities but also extremely important differences between epistemology and computer science. The mere fact that there exist engineering tasks that yield concrete technical implementa- tions of truth-seeking strategies does not show that the theoretical conception, the rules of acquiring true beliefs, is as a whole a part of engineering. The philosopher’s attitude becomes suspicious if thoughts can be made concrete to a level such that an engineer can implement them. At a first glance, this might seem to be an argu-

21 IBelief,Belief States, and Belief Change ment that they lack the abstraction level philosophy prefers and requires being able to reflect its questions. But in fact each science has its own device at hand to reach progress in knowledge in its special perspective. The rules sufficient in engineering are in many aspects not sufficient in a philosophical perspective. The learning algorithms of the computer scientist and her solution seeking techniques show some property that disqualifies them as sufficient answers to the questions of the philosopher: they always serve a certain purpose and are only applicable under specialized, strict, and in some respect uncommon conditions, and to a particular purpose. The criteria of truth in engineering are criteria of optimality to certain purposes. They are specialized and context dependent. They – and this is the strongest difference to philosophy, as already stated above – do not contain their own reflection nor their own motivation, where the latter is purely instru- mental throughout the entire discipline. Building bridges is part of engineering but analyzing reasons why bridges should be build is not, nor why bridge building itself is “good”. This is completely different for epistemology or philosophy in general. Again, this does not show that truth seeking belongs more to engineering than to philos- ophy but it shows that the questions philosophy tries to answer are highly relevant to other subjects, that the devices philosophy uses are related to the devices of other sciences, and that their devices can be inspiring for philosophical methods and vice versa. The argument that computer science uses conceptions of truth that are in some way spe- cialized is nonetheless debatable, because it depends on the abstraction level on which the conception is analyzed. The different conceptions of course have the common minimal core of consistency and deductive closure, regardless whether a relational database, a Bayesian filter or an engine for automatic planning is considered. It is therefore not a strong argument that each task adds its own extensions to the conceptions. It is not the task of the philosopher to develop solutions to the aim of optimality. It is her task to sketch new sensible pictures of old problems and discuss new thoughts about known subjects, bringing aspects to light that were hidden before, but this is a purpose on its own and it does not take place to serve some special purpose or to meet special requirements. Therefore, the philosopher enjoys more liberty than the engineer (who has to answer the three questions). It seems just impossible to define criteria of optimality for truth-seeking on the level on which a philosopher analyzes truth-seeking. This surely seems to be a strong indication for the engineering sciences that their perspective of truth-seeking cannot catch up with the one of the philosopher, and that reasoning about truth-seeking algorithms may not be a philosophical task. But on the other hand, from the mere fact that the philosopher is not the only researcher who is interested in a certain perspective, it does not follow that she is not entitled to share this perspective or has to give up her task. Another valid argument against the conjecture that epistemology is “pure” engineering is directly connected to this aspect: the task of the engineer is always the solution of a distinct and special task, and the concepts of truth and consistency are always instruments for her and not object of her scientific reflection of what she does. Reflection of what she does is, as said above, not part of her domain. This is surely different for the case of the philosopher. An implementation of some heuristic learning algorithm may try to use techniques of mak-

22 1.2 ANormative Perspective on Epistemology ing good decisions that are similar to those used by human beings, but the decision made by the algorithm will always be oriented toward some special and narrowed principles. It is not made for reflection but for bare use. Nevertheless, use is a fundamentally different destination than reflection. This is the most important distinction between the normative perspective of the philosopher and the normative perspective of the engineer. The normative perspective of the philosopher is free to produce categorical “goodness”. This is not possible for the engineer, whose categories of goodness are never independent from the underlying unreflected purpose that is not part of engineering itself. In short, a third aspect is that normative epistemology is the contemporary approach to the problem of induction. This problem definitely lies in the responsibility of the philosopher. We will see in the next section that the two central questions of this thesis sum up to a formal approach to the problem of induction. This problem cannot be analyzed without introducing idealization into epistemology and therefore, philosophy has to consider this topic. However, the fact that different disciplines are interested in the same conception for differ- ent goals does not in any way show that inquiries in the conception are a definitive task of just one of them. Where some authors may see a strong demarcation line, there is just an interdis- ciplinary discussion about the shared conceptions of truth and rationality. Philosophy in its analysis may utilize formal devices from mathematics, statistics, and even computer science without becoming engineering. Philosophy is not only allowed to do so. There is furthermore a strong indication to use whatever powerful devices are available. On the other hand, phi- losophy without being able to access results from the special sciences, would be cut off from scientific progress. These two aspects form the structure of the interaction between philoso- phy and engineering disciplines. Considering these facts, the question whether epistemology should be engineering or not seems ill-stated and of no relevance for practical scientific work.

1.2.3 Belief Revision and the Problem of Induction

The goal of any theory of belief is to describe how a set of beliefs is built up and maintained. The first is described by belief formation, the second by belief revision. In the course of the inquiry, the author will sometimes use the shortened term “belief theory” that includes belief formation as well as belief revision. Both aspects provide a question to the beginning of the inquiry. To make clear what the starting point of our reflection is, consider the following analogy. We want to imagine the computational aspect of belief theory as a kind of update algorithm on a given set of beliefs. Progressing from this thought, the following picture emerges. There is a cognitive system – perhaps the human mind, but it could also be another, lower- level information processing system – that consists of two functional components. First, there is a set of beliefs that are “held” in the system. This set represents the doxastic state of the system, or, one could say, its “knowledge”. The second component is a kind of update “cinetics”, a mechanism that operates on this set of beliefs. This component represents the concrete strategy for integrating new information into the knowledge base of the system. This integration process implies the transition from its prior doxastic state to a new state.

23 IBelief,Belief States, and Belief Change

With these two components, the system is capable to perform a kind of administration job on its belief base: whenever it encounters a new piece of information, it has to check whether it can accept the information. If this is the case, it adds the information to its belief base such that the system has at each discrete time an updated belief base. On this basis, we can name it a knowledge management system. This sounds nearly trivial, but it clearly is not. In the beginning, the system is in some initial state, in which it is characterized by a set of initial or basic beliefs, in which the algorithm is started into a continuous loop of distinct, consecutive modification operations, that are triggered by new information becoming available to the system. Whenever a new information is made present to the system, the algorithm processes the information by adjusting the already present set of beliefs such that the posterior state of the system reflects the new evidence. Immediately we can identify two different cases: either the new information raises a conflict with the prior state or it does not. In the second case, where the presence of the new belief does not affect the other beliefs, it can directly be added to the belief base of the system. Because the system is clever, it tries to also obtain knowledge by using this new evidence to draw inferences from. Hence, the integration of the new belief results in a kind of rule-based “interaction” with the belief base. It may also be the case that the new information represents a very strong evidence that causes the system to erase one or more beliefs that are in conflict with the new information. The system has then to decide whether the new evidence should be rejected because the sum of conflict-free beliefs it already keeps has more weight than the single new evidence. If it decides to add the new belief to the belief base it has to find a way to resolve the conflicts the new evidence introduces. In the course of this resolving process, some manipulation may be necessary to complete the transition to a conflict-free posterior doxastic state. In short, the algorithm performs an integration of the new information into the set of beliefs that is held in the system in some way that will be the subject of further investigation. The re- sult of such an update operation will be a posterior doxastic state, reflecting the new evidence on the basis of the prior doxastic state. (We also consider the case where the new evidence is rejected as a case of integration.) The crucial point is: knowing this algorithm would provide a solution for the problem of induction since it would provide an objective formal technique to generate new beliefs from current beliefs. This process always involves inductive reasoning. This picture is very coarse-grained. But it performs two adequate functions in a certain respect: first, it shows us the elements of the theory that will be part of further investigation. The second and more beneficial effect is, that it leads our attention to the positions that are explanatory empty. Obviously there are unexplained aspects at two crucial points in the picture 4. Although the process of continuous epistemic update seems intuitively very comprehensi- ble, we lack any intuition about the state in which the system starts with its update activity. How is the initial state of the belief base characterized before considering the first empirical evidence? Which beliefs do we have initially when our mind “starts” to acquire beliefs? In other words:

4The author follows quite direct the argumentation in section 1.1 of (Spohn, 2012). Spohn also narrows down his exposition of the topic to the two questions introduced in this section (cf. (Spohn, 2012, p. 6)) but he chooses another route.

24 1.2 ANormative Perspective on Epistemology

How is the initial doxastic state to be characterized?

This question involved continental rationalism and anglo-saxon empirism in a discussion about whether the mind is a tabula rasa in the beginning of life or whether a human has basic ideas in her mind when she enters the world. The question for the initial state implies the question for a priori beliefs which immediately involves complex considerations of apriority. We cannot engage in further investigations in this point, however, the precise nature of the initial belief base will be subject of further inquiries. We have identified one crucial question, but there is a second important point. Considering the initial belief state as given, the question arises in which way new evidences affect the mod- ification of the belief base. When the presence of new evidence triggers an epistemic update, how is the transition of a prior doxastic state into a posterior state structured? Which beliefs can we derive from already acquired beliefs in the light of new evidence? Which beliefs can we legally derive from beliefs we already have? Stated more briefly: what should we inferen- tially believe? Hence, put in a more formal way, the second question is:

Which rules hold for the transition of one doxastic state to another?

It can easily be seen that these two questions form a condensed version of the problem of induction. The first question is about the belief base and the second about inferential beliefs, thus if we find an approach that answers these questions by giving reasonable theories, a complete inductive account is implemented. The concepts of derivation and inference used in this chapter and also the following chapters should not be understood as purely deductive because it would be a reduced conjecture not adequate to the problem. One reason for this is that deduction does not have the capability to “lead us from percep- tions to beliefs about laws, the future or the unobserved” as (Spohn, 2012, p. 3) says. But we know that we have the ability to draw such inferences. Hence, deduction is not the only device of inference that we practically use. The strength of deductive logic is possibly not sufficient to understand the inference pro- cesses involved by the transition from one doxastic state to another. Another demonstration for this comes from the fact that we draw concrete consequences from attitudes that are vague or uncertain. For example, we are acquainted with some vague attitudes, drawn by strong intuitions5 without having evidences and without fulfilling the rules of deduction in our inference pro- cess. However, we would not say that one does not act rational by following her intuitions to some extent. Those intuitions influence our beliefs as well as our actions. As a result, we have beliefs that are uncertain. To be precise, there are strong indications that none of our beliefs is of absolute certainty. When we speak of inference or derivation, these notions have to be wide

5The notion “intuition” always means intuitions in the colloquial sense, not Kantian intuitions.

25 IBelief,Belief States, and Belief Change enough to cover uncertainty. Thus, what on first sight seems quite easy and mechanistic enough to be implemented im- mediately in some algorithms turns out to be all pretty complex and intricate. (The engineer may respond that mere complexity in the details of computation does not introduce a differ- ence in principle and the philosopher will answer that she is not interested in coping with the complexity of implementation but with understanding how the transition works in principle.) The normative perspective of belief revision does of course not formulate the psychological aspects of beliefs, but tries to find out how a perfectly rational mind would acquire beliefs on the basis of uncertain information. That means, it formulates the rules that a transition from one doxastic state to another must fulfill to be a valid epistemic update given the new evidence. The idealization lies in the assumption that, given a new evidence, some updates on the belief base lead to a more “preferable” doxastic state than others. What it precisely means to be more “preferable” will be investigated in section 1.5.1. We clearly see that a formal normative theory of doxastic states and their rules of change will yield an approach to the problem of induction because it would provide a method to infer new beliefs that are not contained in what is already represented in our beliefs.

1.3 Elements of a Belief Theory

When we introduced the picture of the mechanistic knowledge management system above, we remarked that one of its benefits would be to clarify the elements that are important in the theorizing about belief theory. To conceptually introduce the elements of the theory, we substitute the purely mechanistic concept of the knowledge management system by the concept of a subject. This does not make the former example invalid because the knowledge management system can also be a subject and has to act like a subject. Following this thought, the picture contains the following elements. There are first the objects of belief that take the role of epistemic units. The epistemic units are those objects to which a subject is related by the belief relation. Usually, they are called “propositions”, but there is a great variety in the philosophical discussion about how propo- sitions should be defined. Traditional AGM belief revision theory identifies propositions with sentences and that is precisely why AGM theory is considered primarily a logical theory and not an epistemic theory. As a consequence, it is not open to different kinds of interpretations or applications. Another widely investigated definition of propositions is taking a proposition to describe a set of possible worlds. The thesis will not follow these restrictions but will rather follow Spohn’s definition of propositions, he gives in (Spohn, 2012, p. 17). This definition is more open to interpretation. We will come for propositions in detail in section 1.4. To epistemic units, a subject can have different epistemic attitudes. Gärdenfors speaks of three epistemic attitudes in (Gärdenfors, 1988): a subject can accept a proposition, reject it or be indeterminate about it. There is a separate discussion on what it means to accept a belief, or take it to be true. The questions related to this discussion are not in the focus of this thesis. The epistemic attitude of acceptance of a certain belief can be imagined as keeping this belief

26 1.3 Elements of a Belief Theory as a part of the current epistemic state the subject is in. Rejecting a belief can be defined as accepting its negation. Being indeterminate about a belief means that neither the belief nor its negation are accepted in the current epistemic state. To make this clearer, epistemic states have to be explained. Epistemic states, or, as they will also be called doxastic states, are the central concept of belief theory. A doxastic state is imagined as the set of all beliefs the subject accepts. Formally an epistemic state is rep- resented by an aggregation of epistemic units. Consequently, an epistemic state is a set of propositions. But this is already a theoretic predetermination, because epistemic states are of different structure in different theories. They can also be expressed as a probability measure in Bayesian models or as a set of possible worlds, to mention only two of the most prominent conceptions. So far, these are the static aspects of belief. The dynamics of belief come into account by the following argument. An epistemic state can be altered by the confrontation with new evidences, or, as Gärdenfors calls them, “epistemic inputs”. This name suggests that there is some new entity introduced but this is in fact not the case. A minimal example for an epistemic input can for instance be a proposition becoming in some way “present” to the subject, combined with a particular posterior certainty degree6. If the subject accepts the new epistemic input, she alters her epistemic state by integrating the new proposition into it. This means that she performs a transition from a prior epistemic state to a posterior epistemic state. This kind of altering is called epistemic update. It means that the subject changes its prior epistemic state by integrating the new evidence into it. This requires changes on epistemic attitudes to some – possibly many – epistemic units. The result of performing the epistemic update is a posterior epistemic state. Theorizing about this transition from a prior to a posterior state implies a concept of the re- quirements of rationality because otherwise no assertion can be made about the requirements necessary to ensure that the transition mechanism leads to a rational state, regardless which kind of evidence it processes. The mechanism has to reflect – or, to be more provocative: define – rationality on epistemic states. Thus on the meta-level, a theory about belief revision makes assumptions on rationality. The most prominent rationality postulates were brought into discussion in (Hintikka, 1962): consistency and deductive closure. The axioms for modeling the update function have to ensure that the function meets these postulates. A later section will explain them in more detail. Having completed these considerations, the basic picture is sketched. The remainder of this chapter will propose formal definitions for all the elements described above. In the remainder of this chapter we will formally introduce propositions as epistemic units and show that they can be interpreted to form algebras. We will also analyze which rules should hold for the transition from a prior to a posterior state. The other epistemic entities – as there are attitudes, states, inputs and updates – will be defined in the terms of ranking theory in chapterII.

6Note that also in this most simplified presentation, the evidence is not just the proposition but the proposition together with its posterior certainty degree. Leaving out the certainty degree in fact means to assign some default value. This may be interpreted as evidence that comes with maximal certainty. We will discuss the details later.

27 IBelief,Belief States, and Belief Change

Chapters III andIV will then show how updates can concretely be implemented for those formal concepts.

1.4 Propositions as Epistemic Units

1.4.1 The Concept of Proposition

Talking about beliefs presupposes a conception of epistemic units that are the objects to which our epistemic attitudes relate us to. Developing such a conception is a philosophically very complex and intricate part of work and many aspects of the conception that we will introduce and use here are not beyond doubt or discussion. The strategy of this thesis is not to discuss or evaluate the strengths and weaknesses of Spohn’s ranking theory but just use it to invent a belief revision algorithm. This, basically, saves us from the requirement of defending and discussing the conceptions but nonetheless we add some remarks that help to understand why we do what we are about to do. It is Spohn’s declared strategy in (Spohn, 2012) to use conceptions that have found a suffi- cient amount of acceptance within the philosophical discourse to assign persuasiveness to any new theory to be built on them. Therefore, we stay with the strategy of Spohn that, whenever it is possible, we will try to use formally precise conceptions that are on the one hand as highly neutral to interpretation as possible and on the other hand near to the epistemological “mainstream” insofar as it is possible. We will start with the assumption that all objects of belief have the same nature or are of the same type. This assumption is not accepted by all authors. Furthermore, the precise description of the type of these objects is the center of a vivid discussion in which we will nonetheless not engage here. This assumption is also the first concession we have to make, because presupposing a uni- form conception of all objects of beliefs seems to hurt the requirement of total neutrality. Many positions in the discussion are build on arguments trying to show that there are different kinds of objects of belief. We will not engage in the discussion of those arguments now, but it should be remarked that this approach can also be extended to different kinds of objects of belief, although, no explicite trial is made to do this in this thesis. Since Freges Der Gedanke the mainstream of the discussion agrees to consider propositions as the objects of beliefs and therefore consider beliefs as propositional attitudes. Of course, there are rejections. Quine is a prominent protagonist who rejects the propositio- nal nature of beliefs. He argues that they were rather sentential than propositional. However, the author will use well-known and decently approved concepts as basis for his approach and therefore he follows the mainstream and accepts the propositional attitude of beliefs. The typical feature of propositions is their capability to have a truth value assigned. Carnap describes them as also having an “intension” and an “extension”, but this will not matter for the moment because it leads to numerous separate discussions. Computationally, propositions are boolean variables. But this mere “digital” interpretation does obviously not fit the requirements of our discussion because we are not interested in

28 1.4 Propositions as Epistemic Units interpreting our relations to beliefs as relations to truth values. Nonetheless, truth values play an important role for belief theory because truth values are assigned to propositions in con- sideration of truth conditions. One could say, truth conditions are the pure and uninterpreted content of a belief while the objects of beliefs are propositions. This seems rather artificial and the mainstream position is to simply identify propositions with truth conditions. We will stay with the latter. The probability theorist knows an entity equivalent to propositions she calls “event”. This word is strongly related to the perspective of a special science and philosophically not ade- quate, therefore we will use the notion “proposition” instead. Many formal concepts of prob- ability theory are very usable for belief theory and therefore we will continue to add short comments to the parallel concepts in probability theory and mathematics while introducing the formal concepts. The formal material presented in the remainder of this chapter is, as already pointed out, mostly compiled from chapters 2 and 4 of (Spohn, 2012). This is a reasonable strategy since, as will become clear later on, it explains how this conception directly fits the requirements for graphical models and the update algorithm. Since also Spohn makes use of some concepts that are quite common in the discussion, the author does not present this approach as specific Spohnian, but in fact, the formal treatment will be very close to Spohn’s position.

1.4.2 Propositions Form Algebras

Propositions are represented as sets of possibilities that are subsets from a given underlying set of possibilities W, meaning precisely the subset of all possibilities in which the proposition is true. In short, this will be formally fixated by definition 1.3. We will not give an explanation of the entity that is denoted by “possibility” in precise terms. The reason is that there is de facto no way to describe it without hurting neutrality. The reader may see possibilities as a placeholder that can be interpreted within an application of the theory that is to be unfolded here. For instance, possibilities may denote possible worlds, if this is convenient in the context of the particular application. Possibilities are denoted with italic small letters from the end of the alphabet like u, v, w. Since W represents the set of all accessible possibilities, it will be called “space of possi- bilities” to stress that there are no possibilities considered that are not contained in W. We denote W optionally with an index for differentiation. Note that the space of possibilities W directly corresponds to what a probability theorist calls “sample space” or “event space”. (It differs from what is called a “probability space”.) Propositions are denoted by slanted capital letters from the beginning of the alphabet like

A, B, C, . . ., optionally written with an index like Ai. The empty proposition is denoted by ∅. Note that W is itself a proposition. It may not be immediately obvious why propositions are represented by sets of possibilities.

Example 1.1 Consider a lottery as an example. An arbitrary sequence of 6 numbers is selected out of a finite set of natural numbers greater than 0 and less than 50 (without putting a number back that already has been selected). In this example, W consists of all possible sequences of 6 numbers, containing only numbers bigger than 0 and smaller than 50.

29 IBelief,Belief States, and Belief Change

Let A be the proposition denoted by the sentence “50% of the lottery numbers of this week are even numbers”. This sentence describes a situation that is true in a big number of cases, precisely in all those cases where 3 of the 6 numbers in the selected sequence are even. Let proposition A denote the set of all those cases. Let now the actual sequence of 6 numbers for this week be for example 2, 32, 48, 21, 17, 41 . This concrete vector represents a possibility that is clearly an element of A and A is therefore true in this case. Of course, the occurrence of a single concrete fact can make other propositions becoming true or false, although they are not connected to A. Consider proposition B and let it be represented by the sentence “50% of the lottery numbers of this week are prime numbers”. Given the above numbers are selected, B is also true. 

For a set of sets S the symbol SS denotes the union of all elements of S. The union may be infinite as well as uncountable if nothing additional is said and it is not made explicit by the S∞ context. An infinite union for explicitly countable sets is expressed by i=1 Si. An explicitly Sn  finite union is expressed by i=1Si for some n ∈ N \ 0 . If nothing additional is said about the index set I, it is assumed that I = N and n ∈ N where the symbol “N” is used7 to denote the set 0, 1, 2, . . . not including ∞. The same convention is always used for the intersection TS. The equivalence of sets (and a fortiori of propositions) is expressed by the operator “=”.

The cardinality of set S is denoted by S . To ensure that only propositions are the objects of consideration, we define the algebra of propositions A over W and consider the elements of A. The statement that A is a proposition is hence equivalent to the statement A ∈ A.

Definition 1.2 (Propositional Algebra) Let 2W be the power set of a space of possibilities W. A set A ⊆ 2W such that8:

(1) W ∈ A,

(2) if A ∈ A, then A ∈ A,

(3) if A, B ⊆ A, then A ∪ B ∈ A is then called a propositional algebra over W.

Instead of speaking of a “propositional algebra over W” we will mostly use the short term “algebra” if the context is implicitly understandable.

Definition 1.3 (Proposition) For a given propositional algebra A over a set of possibilities W, a set A ∈ A is called a proposition.

Propositions are closed under logical operations of negation (“¬”), conjunction (“∧”) and disjunction (“ ∨ ”). Since propositions are represented as sets, the logical operations are im- plemented by the corresponding set operations of complement (“ ”), intersection (“∩”) and

7Symbol “N” denotes the set as defined above in accordance to DIN 5473. This is nice to know, but we just define it this way since it seems practical. 8The symbol “⊆” denotes “subset of”, “⊂” means “proper subset of”.

30 1.4 Propositions as Epistemic Units union (“∪”). Furthermore, we use set difference (“\”) for ease of notation. Note that definition 1.2 preserves closure under all set operations9. Propositional algebras are closed under the operations of complement and union, meaning that any application of complement or union operations on propositions will always yield propositions as result and never any entity that is not a proposition. The empty proposition ∅ represents the contradiction ⊥. Since ∅ contains no actual possi- bilities it represents a proposition that is never true: no change in the state of the world could ever make ∅ come true. This can easily be recognized since the logical contradiction has the form A ∧ ¬A = ⊥ and A ∩ A = ∅. The complement of the empty proposition over W is the proposition W which accordingly represents the tautology >. Consider A ∨ ¬A = > and A ∪ A = W. Note that it always holds that ∅ ∈ A because definition 1.2 requires W ∈ A. For W = ∅ and A is closed under complement by definition, W and ∅ are elements of any propositional algebra. Note that W may be infinite or uncountable. If W is infinite, 2W is uncountable10. Since A is defined as a subset of 2W , also A may be uncountable and, hence, uncountability may play a role in any application of our theory. Therefore, the formal foundation cannot exclude uncountability neither on the pure formal level nor on the semantic level because this would hurt neutrality. It follows that propositions are also not ensured to be finite or countable. Consider for example the belief that the weather forecast will express the possibility that it rains tomorrow as a real value between 0 and 1. If any value in the interval [0 ; 1] is possible and we take W as containing each real number in the interval as a concrete possibility, W and also A over W are infinite an uncountable. Consequently, the proposition that the value will be in the interval [0 ; 1] is an infinite uncountable subset of A because there are uncountably many possibilities w that make the proposition true when they become actual. It is therefore required to consider cases where A is infinite or uncountably infinite. We have to distinguish different kinds of algebras and thus have to distinguish between closure under finite, infinite and uncountable operations.

Definition 1.4 (Sigma-Algebra) A propositional algebra A over W such that for each countable subset S ⊆ A it holds that SS ∈ A is called a σ-algebra.

Definition 1.5 (Complete Algebra) A propositional algebra A over W such that for each subset S ⊆ A it holds that SS ∈ A is called a complete algebra.

Corollary 1.6 (Completeness of Finite Propositional Algebras) Each finite algebra is complete.

Proof: It is to show that SS ∈ A for each subset S ⊆ A. From definition 1.2 follows that for  S2 each pair of propositions A1, A2 ⊆ A it holds A1 ∪ A2 = i=1 Ai ∈ A. Obviously, if for some

9Note that although definition 1.2 demands closure only under conjunction and negation, it can easily be seen that this entails closure for disjunction and also for difference. By application of de Morgan’s law, each disjunction of propositions can be expressed by the complement of the union of their complements. The difference A \ B = A ∩ B can be expressed using complement and intersection. 10If a set S is infinite, the power set 2S is uncountable. This can be shown with a generalization of Cantor’s second diagonal argument in (Cantor, 1890/91) that he used to prove the uncountability of real numbers.

31 IBelief,Belief States, and Belief Change

A Sn−1 A n ∈ N and Ai ∈ with 1 ≤ i ≤ n it holds that i=1 Ai ∈ , then it also holds by definition 1.2 Sn−1  Sn A  A that i=1 Ai ∪ An = i=1 Ai ∈ . Hence for every finite subset S := A1, A2,..., Am ⊆ Sm S and m ∈ N, it holds that i=1 Ai = S ∈ A. Since A is finite, each subset S ⊆ A is finite. S Hence, S ∈ A for every S ⊆ A. 

This implies that the properties of being complete and being a σ-algebra are equivalent and in fact indistinguishable for finite algebras. Each finite algebra is hence a σ-algebra. An infinite algebra can be a σ-algebra without being complete. We will study an example for this case on page 54. The definition of a σ-algebra extends the definition of an algebra by ensuring closure on all countable operations also if they are infinite11. In order to cover also infinite and uncountable operations, we had to introduce the concept of a complete algebra.

1.4.3 Atoms and Atomic Algebras

Note that it is not ensured that w ∈ A for each w ∈ W. This means, it is not ensured that any single possibility w forms a corresponding proposition w .

Definition 1.7 (Atom) Let A be a propositional algebra over W. Let A ∈ A be a proposition such that A 6= ∅. If there is no proposition B ∈ A such that ∅ ⊂ B ⊂ A, then A is called an atom of A.

Note that atoms in our sketch correspond to what the probability theorist calls “elementary events”. Remember the example of the lottery in the preceding section. An example for an atom is the proposition: “This week, the lottery numbers will be 2, 32, 48, 21, 17, 41 .” An algebra that only contains propositions that can be expressed as unions of atoms is called “atomic”.

Definition 1.8 (Atomic Algebra) Let A be a propositional algebra. If for each proposition A ∈ A it holds that A is either an atom of A or A is equivalent to a union of some atoms of A then A is atomic.

It is understood that singletons of possibilities are atoms if they are members of an algebra A. However, note that the converse need not hold: not all atoms in A need to be singletons. It is possible that an algebra is entirely atomless. As an example consider the set A of all finite unions of half open intervals over R of the form a, b = x : a ≤ x < b with  a, b, x ⊆ R. Note that A is an algebra but it is atomless since for each element E0 ∈ A there is some element E1 ⊂ E0 and E1 is in A.

Corollary 1.9 (Complete Algebras are Atomic) Each complete propositional algebra is atomic.

Proof: Let A be a complete propositional algebra over a space of possibilities W. For each T w ∈ W there obviously exists a corresponding proposition Iw := A ∈ A : w ∈ A ∈ A.   Lucidly Iw is an atom. Hence, the set atoms A := Iw 6= ∅ : w ∈ W is a set of atoms   of A. Each finite or infinite combination of propositions Iw, Iv ⊆ atoms A by union or

11A note on the background: the concept of a σ-algebra is of importance in measure theory, especially in proba- bility theory. The problem addressed by the requirement of closure under all countable operations is that this requirement allows the investigation of infinite countable sets.

32 1.4 Propositions as Epistemic Units complement will be lucidly a member of A. This is obvious from the definition of an algebra and the set operations. Additionally, each member of A can be represented as a combination of some elements of atomsA. If there was any proposition B ∈ A that could not be represented as a combination of members of atomsA, it would imply that B contains some possibility x for which Ix is empty. This is a contradiction. 

Note that corollary 1.9 implies that each finite algebra is atomic.

1.4.4 Beliefs,Contents, and Concepts

Let us make a little digression and briefly discuss the concessions we made with the decision to consider propositions as the objects of belief. This decision is not free of problems, but the obvious seeming alternatives are much more problematic. Considering beliefs as propositions and therefore as sets of possibilities has the following advantage: for each application W has to be interpreted to bestow semantics to the appli- cation. By our conjecture of propositions this necessary interpretation step is not restricted in any respect. Furthermore, the interpretation itself can contain a restriction of W if this is reasonable for a given application. The concrete nature of possibilities is also not determined. One may say that this is rather a disadvantage than a feature, because it introduces indeterminism and vagueness into the model. But this is only a misunderstanding of the abstraction level of this consideration. The entities denoted by “possibility” are intentionally open to interpretation. Possibilities can be philosophically interpreted as any kind of possible worlds or as inter- pretations of a formal language to state only two examples. It is not easy to find a notion that preserves this kind of neutrality. Spohn is more detailed in this respect, introducing a “doxastic possibility” as a formal concept in (Spohn, 2012, p. 29). This has obvious advantages for belief theory but it is not necessary for our claim here, so we will not raise any discussion here. Spohn argues in (Spohn, 2012, p. 23f) that the perspective constructed so far establishes at least two strong regulations that violate neutrality. The first is, as already stated in section 1.4, that we presuppose a uniform nature of all objects of belief as they are all sets of possibilities. If someone does not agree to the idea of a uniform nature of the objects of belief, she may although assent to the argument that there may be different types (or perhaps “kinds”) of possibilities. A possible consequence would be that the approach given here is restricted to describe only certain types of beliefs generated by a certain interpretation of what a possibility is. The interaction of these different types of beliefs would then be, as Spohn points out, beyond the scope of the approach. The second irrevocable predefinition Spohn stresses is that beliefs are always propositional. He sketches three types of counterarguments. The first one may arise if propositions are considered as sets of possible worlds. This perspective identifies possibilities with possible worlds. In short, there is no argument why the framework should not allow for this. It means just to renounce some of the neutrality of the notion of possibility. This should be perfectly

33 IBelief,Belief States, and Belief Change acceptable when identifying possible worlds as an implementation of possibilities. Thus, there is no point in this argument, according to Spohn. The second type of argument against the propositional nature of belief is that propositions are of conceptual nature and belief may be non-conceptual. Perceptual belief seems to be an obvious counterexample, because perceptual belief seems not conceptually structured. This argument has to be met by showing methods how perceptual beliefs can be under- stood in terms of sets of possibilities. Spohn answers that since possibilities are introduced as unstructured it is not clear why they should not be appropriate to model perceptual belief in particular or non-conceptual belief in general. Spohn’s answer reverses the burden of proof. The third type of argument just reverses the second: it insists that conceptual belief seems to be at least a kind of standard or normal case of belief. Furthermore, one may argue that the “conceptual nature” of belief is not reflected in considering beliefs as sets of possibilities. Spohn explains the argument as follows:

“If we conceive of objects of belief as sets of possibilities, then we really conceive of them as pure contents. A pure content is nothing but a truth condition: a set of possibilities is true if and only if the one and only actual possibility is a member of it. The problem now is that such contents are not given directly to us. Having a belief is somehow having a mental representation in the belief mode (. . . ), which will usually be a conceptual representation or, if that is too unclear, a linguistic representation; this is, finally, something quite indeterminate. The belief is then bestowed with content only because this representation is somehow related to the content or because the sentence representing the belief has a truth condition. But then it seems that it is rather the representation that should be the object of belief and not the content; different representations are different beliefs, even if they have the same content. As Quine has (. . . ) insisted (. . . ), belief is, (. . . ) not a propositional, but rather a sentential attitude.” (Spohn, 2012, p. 25)

“(. . . ) The problem is that contents seem accessible only through representations, that in principle infinitely many representations represent the same content, and that the issue of whether two representations are equivalent (i.e., have the same content) can be computationally arbitrarily complex and is indeed undecidable above a certain level of complexity. Nobody knows or can be expected to know about all these equivalences. Restricting the discussion to logical equivalences, this is called the problem of logical omniscience.” (Spohn, 2012, p. 25)

While we consider propositions equivalent to contents or truth conditions, the third argu- ment seems to require us taking sentences as the objects of belief – but sentences are concrete representations of propositions. Spohn provides only the short explanation cited above. Although, we will not engage deeper in the discussion about the philosophy of semantics, it may be reasonable to comple- ment his considerations by elaborating shortly on this problem. To understand how the “conceptual nature” of beliefs should be reflected, one should de- velop an understanding of what concepts are.

34 1.4 Propositions as Epistemic Units

The arguments (Kripke, 1972) and (Putnam, 1975) showed that concepts were not to identify with meanings of predicates. Putnam demonstrated that the “mental states” of knowing the meaning of an expression in the heads of different speakers do not reflect unknown differences in the meaning of these expressions. This argument directly addresses the core of meaning internalism. The counterexample is the well-known twin-earth example, in which some in- dividuals are in the same mental state about the meaning of the word “water” although the extension of this word differs in their idiolects. The defender of internalism seemed not to be able to argue which difference on the mental level of the two individuals reflects the exten- sional difference of their concepts of “water”. One prominent line of defense argues that not the mental states of the two individuals have to reflect those differences. Instead, the content of the two individuals’ beliefs must differ. A well-known site for this argument is (Fodor, 1987). In section 1.4.1 we already stated that the content of a belief is a truth condition. Truth conditions are the logical figuration of entities that are called “concepts”, introduced by Frege who called them “Begriffe” and it is often argued that the contents of propositions are of conceptual nature and somehow composed from concepts. But we also know that contents are not given directly to us. They are mediated to us in a way that is not really clear to us and it seems that having a belief means having established a mental representation of a content that is kept in the “belief mode” as cited from Spohn above (and not for example in the wish mode). The way the representation is related to the content is interpreted in different ways by different philosophical positions, however, it seems to follow immediately that if this sketch is accepted, the representation should be considered as the object of the belief, not the content itself. The belief does only have content because the representation is somehow related to the content and therefore contents are only accessible through their representations. But it also follows that the same content is present through different representations in the subjects and it is absolutely convincing to proceed from the assumption that each subject instantiates her own representation of a given content. Therefore, there are as many representations as there are subjects and the number of representations of a given content can be infinitely in principle. What does it therefore mean that two subjects have the same belief? One has to find out if there are representations of the same content present in both. How should this be done? It can be, as Spohn says, “arbitrarily complex” to verify if two representations are representations of the same content. Above a certain level of complexity it is practically beyond any possibility for solution and it follows that it may be computationally impossible to check whether two subjects have equivalent beliefs. This reveals the silent assumption we made so far: we assume that different representa- tions of the same content, meaning of the same truth condition, are in fact, or at least can in principle, be recognized by the subject to be equivalent. Otherwise, it would be possible for a subject to belief a certain truth condition if the truth condition is given by one representation while she rejects the same truth condition when it is given by another representation. This assumption is crucial for justifying propositions as objects of belief. But it is not be- yond dispute: it is easy to imagine situations where subjects are confronted with different representations of the same truth condition and do not recognize them as equivalent. One

35 IBelief,Belief States, and Belief Change may think of two sentences of equivalent meaning in different languages. Examples describ- ing such scenarios can be found in (Loar, 1988), where it is argued that the content of a belief is not fixed by propositions at all. Loar argues that a person, Pierre, a native speaker of French, may have the belief “Londres est jolie” without having been in the city he knows as “Londres” in his life so far. It happens that he visits London, but without recognizing that this is the particular city he thinks of as “Londres”. He may find that London is loud, very crowded und full of traffic and may think “London isn’t beautiful”. We can then argue that the truth condition, that London is beautiful, is given to this subject in two different representations, and that the subject accepts it as true in one representation and rejects it in another. An even more intuitive case can be described considering the subject who thinks that the capital of Germany is a beautiful city, but finds that Berlin is a really ugly place, dirty, awfully loud and overcrowded with weird people. This is a subjectively consistent scenario assuming that this subject does not know that Berlin is the capital of Germany. This is an instance of a class of well-known examples discussed in the early analytical . These cases were first discussed by Frege when he analyzed the conditions of informative identity statements12. Nevertheless, there is one problem in this argument: we can argue that no rational subject would intentionally belief a truth condition in one representation and reject it in another. Once Pierre knows that “Londres” and “London” refer to the same city, he will adjust his beliefs such that they are consistent. Pierre intentionally used the names “Londres” and “London” to refer to different cities, but this was only possible because of a specific deviation of his partic- ular idolect that misses the consensus in his linguistic community. When he once recognized that he used the words in a different way than the linguistic community he lives in, he will adjust his use of language. Furthermore, Pierre is a rationally thinking individual and once he recognizes that his state of belief is logically inconsistent, he will immediately correct it. Examples like the ones of Frege and Loar can be constructed systematically by ignoring intentionality and adding some misunderstanding or ignorance about the meaning of some language expression by the subject.13 The example illustrates that it is completely reasonable to presuppose that the acceptance of some belief does not depend on its representation under conditions where encyclopedias and thesauri are available to the subject in principle. Of course, reduced knowledge or reduced competence in the idiolect of the given linguistic community may lead to inconsistencies but this does neither mean that the subjects are not rational nor that beliefs are better considered to be mental representations than truth conditions. However, we do not intend to engage into a deeper discussion about contents, concepts and meaning. The short digression of this section should only show that we are well advised to concentrate on propositions as objects of belief, and that this decision was not made arbitrarily

12Remember the prominent Hesperus/Phosphorus example: not knowing that Hesperus is Phosphorus, I can ag- gregate totally different beliefs about each of them. Frege took this as a hint that a proper name has a semantic dimension beyond its mere object of reference and he called this its “Sinn”. 13Similar well-known examples were constructed by Russell as well as by Tyler Burge and Stephen Stich, to quote only some. These examples founded broad discussion threads but not all of them come to real argumentative impact.

36 1.5 Epistemic States and Rationality and has important and vital consequences for our theorizing. Spohn avoids the discussion referenced above and answers the third counterargument by three quite pragmatical arguments for considering the truth conditions that are modeled by sets of possibilities as the objects of belief:

“One reason is that we are in good company; in that respect, I think, the majority of epistemologists are still with us. Of course, the argument from authority is dubious, but it is reassuring not to feel like an outsider. My main reason is that our stance is by far the more fruitful one. (. . . ) In fact, most theories apparently operating on a sentential level assume the substitutability of logical equivalences and differ only superficially from our procedure. Without this substitutability we are left with hardly any theory at all. (. . . ) In my view this [epistemological theorizing, remark by author] entails that the objects of belief should be conceived as contents or truth conditions or sets of possibilities and not as sentences or representations of contents.” (Spohn, 2012, p. 26)

Spohn’s second argument simply says: if we want to theorize, meaning to provide a struc- tural analysis about the topic, it is not reasonable to consider the topic as of a nature that makes structural analysis nearly impossible. It is therefore rational to choose the perspective that suits the needs of the claim. (Any engineer would commit to this point of view perfectly and unhesitantly.) His third argument in fact states that normative epistemology is not possible on a radically subjective level. This points us to a separate discussion we will not dive into, however, it should be clear that this argument is firmly grounded on a genuine philosophical perspective. We will close the discussion about the objects of belief and state that besides all discussions, it is at least from a pragmatical point of view a helpful move to consider propositions in their form as sets of possibilities as the objects of belief.

1.5 Epistemic States and Rationality

1.5.1 Rationality Postulates

We proceed with some arguments that are strongly oriented to Spohn’s argumentation in (Spohn, 2012, chapter 4, section 4.1). Our aim is to introduce the same rationality postulates he uses. In former sections we used the notion of an epistemic or doxastic state (both notions are use as synonyms throughout this thesis). Having developed an account of the objects of belief, it is straightforward to define epistemic states as characterized by the entirety of belief the subject believes. We therefore consider a set of beliefs B ⊆ A and demand that the subject, if it accepts all and exactly those beliefs in B, then B characterizes the epistemic state the subject is in (at a particular time t). This definition does not explain any aspects of the belief relation itself, which is treated simply as a black-box. But it permits us to speak about doxastic states in terms of belief sets,

37 IBelief,Belief States, and Belief Change which will be sufficient for the moment. We will later see that this definition is only sufficient for a static description of belief states and we will also develop a more precise notion of a belief set. At this point, the normative aspect comes into consideration: normative theorizing about belief is only possible when some criteria are defined that make it possible to verify which belief is “acceptable” and which is not. This criterion, precisely, is rationality. Is any belief set B of equal rationality, or can there be stated additional criteria for rational belief sets? In this inquiry, we will use the two rules14 for rational belief Hintikka brought up in his (1962).

Postulate 1.10 (Consistency) Belief sets characterizing rational doxastic states are consistent.

Informally this means a rational subject is required to know that an actual possibility can- not be contained in its complement. Stated in another way, the subject is required to apply consequently the belief that sentences of the form A ∧ ¬A are always false. This contains es- pecially that the empty proposition denotes contradiction because it cannot contain any actual possibility. The reason for requiring consistency is easy to understand: ex contradictione sequitur quodli- bet. Arbitrary inferences may be drawn from a logically (not only contingently) false proposi- tion as well as from two contradictory propositions. Giving up consistency would be equiva- lent to giving up inference and this is maximally unattractive for theorizing.

Postulate 1.11 (Deductive Closure) Belief sets characterizing rational doxastic states are deduc- tively closed.

According to postulate 1.11, a rational subject has to know two fundamental aspects. First, if the actual possibility is in each of two propositions, it is also contained in their disjunc- tion. This is obviously a direct consequence of the intersection operation, which implements disjunction for sets. The second aspect is that if the actual possibility is contained in some proposition, it will also be contained in any superset of this proposition. Consider the throw of a dice. I believe that the next throw in a sequence of throws will surely result in “3”. Let proposition A be the set consisting of only the actual possibility that “3” is the result. Let proposition B be the set consisting of all possibilities in which an odd number is the result. Clearly, it holds that A ⊆ B. If I believe A, there is no way to reject B at the same time. Put in a more general way: while accepting a highly distinguished belief, the subject cannot reject a more general version of this belief while maintaining rationality at the same time. Note that the converse does not hold: I can rationally expect that the next throw yields an odd number without expecting that it will be 3. Therefore, I may accept a general belief without actively believing some particular case included in it. Together both aspects of deductive closure will result in the requirement that the subject Vn must know that A1,..., An are all true if and only if i=1 Ai is true. This is formally explained in section 1.5.2 and will also be proven there.

14This idea stems from Spohn, confer (Spohn, 2012, p. 48).

38 1.5 Epistemic States and Rationality

This leads to the intuitive understanding of deductive closure: all that can be deduced from a set of beliefs is already logically contained in it. Or, to express it in the reverse way, from a rational belief set nothing can be deduced that has to be rejected as false while keeping the original belief set. Obviously this seems to require the subject to keep an infinity of beliefs. This does not lead to substantial problems because it is a question of definition. A subject can have an infinity of dispositional beliefs for instance and not all beliefs must be represented in the mind of the subject. Postulate 1.11 seems to require the subject to perform unending inference processes in some cases – precisely when the underlying algebra A is infinite. But this is a spurious argument because it is not required for rationality that a subject really produces each inferred belief by an explicite act of inference and then accepts this belief in an intentional and conscious act. Deductive closure only requires that the belief base must have a structure such that inference produces informative results that are not arbitrary. Both assumptions introduce a more serious problem: the problem of undecidability. Logical consistency as well as logical consequence is computationally undecidable (except the under- lying logic is restricted to a decidable system, say, monadic predicate logic, but this would be a rather artificial constraint). Belief relates a subject to a proposition and not to a sentence. As Spohn emphasizes in his (Spohn, 2012, p. 48f), taking undecidability as an argument against the above rationality assumptions applies computational aspects of sentences to propositions, presupposing that those rules will also hold for propositions. But this is granted in no way because we took only deduction for granted when we decided to consider propositions as objects of belief. This is a strong consequence because the nature of propositions completely depends on the nature of possibilities and we do not want to restrict their interpretation to only those notions that avoid undecidability. This would make the entire theory quite trivial. One may answer that these assumptions clearly express what we called an idealization in the beginning of this chapter – and that the occurrence of such deviations from intuition is one of the consequences of normative theory construction. But there is no point in inquiring idealized settings when one cannot exactly argue where the idealization lies. Nevertheless, besides the problem of undecidability, it is obvious that no existing human mind fits these requirements. Our mind does not function like a database system to whose engine all stored informations are equally “present” thus that it can recognize contradictions or avoid constraint violations easily and immediately. Our mind does also not work like a learning algorithm processing all informations in the same way, with the same correctness, and the same attention. Our beliefs are always biased by different states of attention, by having different beliefs “present” from which to draw inferences from and by emotions like wishes and fears that form attitudes towards beliefs and those attitudes usually take influence on the inference process. The point of this is that the idealization does not come into play specifically when undecidability is introduced but also without it. It is consensual in the discussion that consistency and deductive closure at most cover a minimal basis for rationality. They provoked relevant counterarguments, however, on the other hand, they are the only basis from where theorizing can really be developed. Many

39 IBelief,Belief States, and Belief Change philosophers felt that some strengthening is necessary to progress from a minimal definition of rationality to a more complete basis for further theorizing. But this strengthening is not clear and is not our subject here. However, sufficient criteria have not yet been established in the discussion, thus we will consider these two well-defined requirements as necessary conditions for rationality in the remainder of this inquiry.

1.5.2 Rational Belief Sets

Both rationality postulates can be developed to a formal definition of rational belief sets or, in short, for belief sets. Then, a belief set is a set of propositions B ⊆ A that meets the requirements of rationality. Consistency means that a rational system must not believe a contradictory proposition. This implies disbelief in the empty set to at least some degree. To preserve consistency, a rational belief set must never contain the empty set. Deductive closure of beliefs means that each proposition inferred from positively believed propositions has to be believed itself. The reverse is also part of the requirement: if some belief can be inferred from a belief set, all its premises have to be believed also.

Definition 1.12 (Belief Set) For any propositional algebra A, a set B ⊆ A such that for all proposi- tions A, B ⊆ A:

1) ∅ 6∈ B

2) if A, B ⊆ B, then A ∩ B ∈ B

3) if A ∈ B and A ⊂ B, then B ∈ B is called a rational belief set (in A) or belief set for short.

Requirement 1 implements consistency. The subject believes all propositions in B, thus, she obviously also believes the conjunction of all propositions. The complement of the conjunction over B is the empty set, hence it cannot be contained in B. Requirements 2 and 3 implement deductive closure as explained informally above. Requirement 2 entails that if a proposition A is in B and a proposition B is in B, the subject has to know that also A ∩ B has to be in B. Requirement 3 entails that if a proposition A is in B, also any superset of A is contained in B. In sum, 2 and 3 result in:

Corollary 1.13 (Deductive Closure of Belief Sets) For any algebra of propositions A and any ra- tional belief set B ⊆ A and propositions A, B ⊆ B it holds that

A ∩ B ∈ B ⇐⇒ A ∈ B ∧ B ∈ B.(1.1)

Proof: The first direction, A ∈ B ∧ B ∈ B =⇒ A ∩ B ∈ B is already expressed by requirement 2 of definition 1.12.

40 1.5 Epistemic States and Rationality

It remains to prove the other direction: A ∩ B ∈ B =⇒ A ∈ B ∧ B ∈ B. Let therefore A ∩ B ∈ B. It is obvious that A ∩ B ⊆ A, hence by requirement 3 it holds that A ∈ B. Analogously with A ∩ B ⊆ B it holds that B ∈ B. With both it holds that A ∩ B ∈ B =⇒ A ∈ B ∧ B ∈ B. 

Note that the definition of belief sets corresponds to the mathematical concept of a “filter”. The equivalence stated in corollary 1.13 does also explain the informal remarks on deductive closure in section 1.5.1 on page 38. In section 1.4.2 it was argued that the case of infinite algebras must also always be con- sidered. Therefore, we extent the implementation of deductive closure to the infinite case. This is performed by adding the definition of a complete belief set corresponding to complete algebras.

Definition 1.14 (Complete Belief Set) If A is a complete algebra, then a rational belief set B ⊆ A such that for any B0 ⊆ B it holds that TB0 ∈ B is called a complete rational belief set or complete belief set for short.

Note that if A is finite, any belief set in A is complete. This seems to be a new case of infinity since we postulate deductive closure of belief sets from infinitely many consequences. But remember the argument that the rejection of infinity mainly addresses sentences and need not to be applicable to propositions.

1.5.3 Belief Cores

Note that TB as used in definition 1.14 is a single proposition. It represents the intersection of all propositions believed in the doxastic state that is denoted by B. It is therefore called the “core” of B and is denoted by coreB.

Definition 1.15 (Core of a Belief Set) Let B be a complete belief set. Then, the proposition TB, denoted by coreB is called the core of B.

Note that the core of a rational belief set is always non-empty and thus non-contradictory.

Corollary 1.16 (Consistency of Core) For any complete belief set B it holds that

 core B 6= ∅.(1.2)

Proof: Definition 1.12 directly implies that for any propositions A, B ⊆ B it holds that A ∩ B ∈ B (by requirement 2) and that ∅ 6∈ B (by requirement 1). Hence, for any propositions  T  A, B ⊆ B it holds that A ∩ B 6= ∅. It follows by definition 1.15 that B = core B 6= ∅. 

Note also that a doxastic state represented by a complete belief set may be represented also by its core. This is a crucial property of belief sets. A direct technical consequence of the filter properties is that for any complete algebra A, a given belief set B ∈ A is the subset of A that contains all and only those propositions that contain coreB. From an intuitive technical view, a belief set B “filters” those propositions from A that contain coreB.

41 IBelief,Belief States, and Belief Change

Corollary 1.17 (Significance of coreB for B) Let A be a complete propositional algebra and B ⊆ A a complete belief set. Then, it holds that B = A ∈ A : coreB ⊆ A .

Proof: First, we show that A ∈ B =⇒ coreB ⊆ A. Because of (1.2), it trivially holds for any A ∈ B that coreB ⊆ A. It remains to be shown that the second direction holds: A ∈ A ∧ coreB ⊆ A =⇒ A ∈ B. It trivially holds that coreB ∈ B. By requirement 3 of definition 1.12, it follows that if  core B ⊆ A it also holds for any A ∈ A that A ∈ B. 

This is the reason why a complete belief set can also be represented by its core: for a given belief set, all propositions containing its core will be elements of this belief set and – reversely – the belief set will not contain any proposition that does not contain the core. Hence, the core will identify the belief set uniquely. Note that complete belief sets can be interpreted as one part of a formal model of an epis- temic state. The model is not complete since belief sets do not assign firmness degrees to beliefs, but this topic is postponed to chapterII.

1.6 Transitions Between Epistemic States

1.6.1 Description of Epistemic Updates

Up to this point, we only considered static aspects of belief. Although we do not have modeled any firmness degrees for belief, we have introduced propositions as units for belief and we have furthermore introduced complete belief sets as the basic elements for modeling epistemic states. We are now prepared for the progression to the question about the dynamics of beliefs already previewed in the second question in section 1.2.3. What can be said about the dynamics of epistemic states represented as complete belief sets or their cores? The dynamics of belief covers the question of how the transition from a prior rational belief state to a posterior rational belief state can be described. Which requirements are necessary for such a transition mechanism to ensure that the posterior belief state is always a rational state? This is the second question from section 1.2.3. We consider a subject s and a time t. Let C be the core of s’ belief set at t. The evidence, s receives during the time span between t and a later time t0 is denoted by the symbol E and for the moment, we will tolerate that E remains uninterpreted and its structure is not explained. Between t and t0, s will perform an epistemic update due to being confronted with E. This update will somehow modify C such that the resulting new belief set C’ will reflect E. At time t0, the epistemic state of s will be represented by C’. The focus of the investigation lies in the nature of the epistemic update that transforms the prior core C by E into the posterior core C’. We consider the update capability of the belief management system as the implementation of a function that accepts two arguments, a prior epistemic state and a new evidence. For any ordered pair of a prior epistemic state and a new evidence, this function returns a posterior

42 1.6 Transitions Between Epistemic States epistemic state that is the result of the modification of the prior state in the light of the new evidence. The function can be informally understood as a strategy for changing belief states. The problem of induction can be reformulated as the question for an appropriate scheme for reasonable belief revision.

1.6.2 Transition by Consistent Evidence

To complete the theoretical preliminary consideration on epistemic updates, we will reproduce the important points from the argumentation of (Spohn, 2012, chapter 4, section 4.2). We can distinguish two possible cases concerning the logical relationship between the core of the prior doxastic state C and the new evidence E. For the sake of simplicity, we pretend to be only interested in the set of propositions in E and represent them as a single proposition15 E. Either, E is consistent with C or it is not. The first case is formally described as C ∩ E 6= ∅, which means E is consistent with C. Informally it can be understood as the case that E captures some evidence not forcing the subject to change beliefs she already had at t. Evidence E does not rise a contradiction to C. We may consider this case as a situation where the subject faces some evidence that does not surprise or puzzle her or that she can easily accept without having to change her view of the world. For this case, we can postulate two requirements. First, it is plausible that all beliefs present in the prior core and also in E are obviously part of the posterior core. Informally it means that the posterior doxastic state preserves all beliefs from the prior states, accepts also the new evidence, and draws all new inferences that are possible by this intersection.

0 Postulate 1.18 C ∩ E 6= ∅ =⇒ C ∩ E ⊆ C .

This postulate says that the posterior core will always contain the combination of the prior core with the evidence and all consequences that follow from this combination. This postulate furthermore represents the epistemological equivalent to the principle of inertia. Note that prior beliefs are kept unless the acceptance of E forces the subject to reject them. This means the rational subject will never change her beliefs without a trigger by a new evidence. Each change in the belief set must be motivated. This motivation is the acceptance of E, so it is also implied that the evidence has to be accepted, if it is consistent with the prior state. Therefore, the postulate sums up to the rule that all that is accepted in the prior state and does not rise a contradiction with the new evidence will also be accepted in the posterior state.

Example 1.19 Consider the old example with the throw of a dice. You made a bet on the result being a “3”, which means, before the result is obvious you expect (which means you believe) that the result will be a “3”. This belief induces some other beliefs, e.g. the belief that

15As was already pointed out, it is insufficient for theorizing to model an evidence as a proposition since this would mean to assign maximal possible belief to the evidence. Since nonetheless a set of beliefs is always part of an evidence, this simplified picture will be sufficient for the coarse-grained argumentation in this section.

43 IBelief,Belief States, and Belief Change an odd number will be the result and the belief that a prime will be the result. Let these three beliefs be the core of your prior epistemic state while the evidence E contains the actual result of the throw. Let the result of the throw be “5”. Of course, this is not what you expected but E is consistent with C because E ∩ C 6= ∅. The set E ∩ C contains the beliefs that the result will be an odd number and a prime, which is perfectly consistent with the information that the result is a “5”. 

0 Postulate 1.20 C ∩ E 6= ∅ =⇒ C ⊆ C ∩ E.

Informally this means that nothing is accepted in the posterior state that is not either accepted in the prior state or contained by the evidence or drawn as a consequence of their combination. Without further formal explications, it is lucid that both postulates can be combined to

0 Postulate 1.21 C ∩ E 6= ∅ =⇒ C = C ∩ E.

It turns out that in case of consistency of the evidence with the prior belief, the posterior belief is completely determined by the prior belief. This case captures completely the addition of a new belief to the prior beliefs in case of consistency. It is normally called “expansion”. The operation that is usually called “revision” is an expansion operation that will only be performed if a prior consistency check succeeds. Because consistency is in our approach already part of belief sets, the propositional view cannot and need not draw a sharp distinction between expansion and revision.

1.6.3 The Inconsistent Case

The second case is the complement of the first. Evidence E is inconsistent with C, which means C ∩ E = ∅. In this case, the subject is confronted with an evidence that raises a contradiction with her previous world view. Almost no solid postulates can be stated for this case. The only firm and obviously accept- able statement about this case is the postulate that if the evidence is inconsistent with the prior belief, than the posterior belief has although to be consistent. Formally this means that the posterior core must not be contradictory.

0 Postulate 1.22 C ∩ E = ∅ =⇒ ∅ 6= C ⊆ E

This postulate follows directly from the consistency postulate (requirement 1 of definition 1.12, confer page 40) that is contained in the definition of belief sets. Many questions from many disciplines address this case in particular. It would be interest- ing to know in which cases it is rational to reject the evidence, and in which cases the subject should better revise the prior core. And of course, in the cases where the evidence is accepted, which revising strategies are necessary to combine the evidence with the prior belief to obtain a consistent posterior core? Which beliefs should be given up first, and which should tried to be kept? A vast discussion on this topic exists, however, there is no consensus regarding this question. While this thesis is written, there is still not a clear picture pertaining to this issue.

44 1.6 Transitions Between Epistemic States

1.6.4 The Transition Function

Considering both cases for revision described above, we recognize that in each case the pos- terior core will be completely determined by the prior core, as was also stated above. The essential aspect is that whatever the internal structure of the revision operation may be, its re- sult is determined by its arguments, the prior belief and the new evidence. Hence, the disposition for the transition from the prior to the posterior state is part of the prior state. This allows us to consider the revision operation as a function with two arguments, a prior core and a new evidence. The result of this function is the posterior core. When analyzing the transition function it will be immediately clear that some additional information about the propositions has to be kept within an epistemic state. This is simply because a proposition does not carry information about its certainty or reliability. But the consequence of the processing of new evidence will be exactly the revision of the firmness with which propositions in the current epistemic state are believed. New evidence does not only introduce new information to the belief base, it can also cause the subjective reliability of already present propositions to change. To perform an epistemic update on the belief base, it is necessary to assign a value to each proposition that measures the degree of subjective certainty or reliability for the belief in this particular proposition. A function assigning such values to each proposition in a given belief set will be a fully functional model of an epistemic state. Having defined such a certainty measure is the prerequisite for implementing change rules for the certainty values. The transition from the prior to the posterior epistemic state is basi- cally a change in those degrees of certainty. By providing the existing framework with a semantics that can represent degrees of cer- tainty, we will obtain a powerful utility that can be used to model the transition function. This semantics may be provided by probabilities to state only the most well-known example, but, although, probabilities are sufficiently powerful for this task, they are difficult for the use in philosophical discussion since they have no clear connection to beliefs. Ranking theory has proven to maintain all relevant computational strengths of probability theory and provides a simple and suitable notion of belief at the same time. At this point, the consideration to the purely propositional level of the framework is fin- ished. The next step is to show how a transition function can be founded on ranking theory. This will be the topic of the following chapter.

45 46 II

Ranking Functions and Rank-based Conditional Independence

2.1 Introduction

This chapter introduces the foundation of ranking theory as a tool for modeling the statics and dynamics of epistemic states. The material presented here forms a survey over a major part of the current research state concerning ranking functions and stems mostly from the works of Wolfgang Spohn. The aim is to introduce a fine grained formal model of belief revision. There are three main components the model has to describe:

1) It must contain a sufficient concept of the epistemic state. An epistemic state is also called “belief state” and is seen as the entirety of the propositions the subject beliefs or disbeliefs and their corresponding belief degrees, i.e. the firmness to which they are believed or disbelieved. This is the static aspect of belief theory, which characterizes the belief state of a subject at a given time.

2) It must contain a sufficient concept of evidence which means that the model must formally specify what is to be considered “new information” that becomes available to the subject. Evidence is the entirety of new information the subject becomes aware of during a certain time interval between a time t and a time t0.

3) It must represent a sufficient modeling of the transition of a prior epistemic state to some posterior epistemic state, triggered by the availability of some new evidence. In this con- text, the belief revision is seen as an epistemic update operation that is performed on the prior epistemic state. Its aim is to “compute” a posterior epistemic state from the input of the prior epistemic state and the new evidence. Since the epistemic career of a subject or epistemic agent is a sequence of epistemic states and revision takes place permanently, the revision operation has to be modeled in a way that iterated belief change is feasible.

47 II Ranking Functions and Rank-based Conditional Independence

Aside of the formal part, some brief discussion is included keeping the relation between the mathematical model and the epistemic domain lucid. The main focus will nonetheless be on the formal and technical parts because they are used in the next chapter to construct a data structure feasible for efficient updating. Section 2.2 describes the statics of epistemic theory by the rank-based modeling of epistemic states. Ranking theory has proven to be characterized by a strong family resemblance to probability theory, thus, to stress this similarity, the most rank-theoretic concepts are named similar to their “siblings” from probability theory. Section 2.3 sketches the basics of the dynamics of belief. The transition from a prior to a posterior state is thereby implemented by some type of conditionalization operation. Con- ditionalization describes how the ranking function characterizing a prior epistemic state can be transformed by an evidence parameter to obtain the ranking function characterizing the posterior epistemic state. We will discuss three types of conditionalization: Plain Conditiona- lization, Spohn-conditionalization, and Shenoy-conditionalization. The two main points concerning conditionalization are:

1. Regardless which conditionalization method is actually applied, it is always the case that the posterior belief state is completely determined by the prior belief state and the actual evidence.

2. The posterior belief state is represented by the same formal concepts as the prior state which entails that conditionalization can be applied iteratively to model iterated belief change.

Section 2.4 introduces a rank-based concept of conditional independence among proposi- tions and forms the foundation for graphical modeling. The precise source of all theorems and definitions stemming from (Spohn, 2012) can be retraced using the list of references starting on page 221. All references to other sources are completely contained in the text.

2.2 Ranking Functions

2.2.1 Ranking Functions on Possibilities

A ranking function is intended as a representation of an epistemic state a rational subject keeps at a time t. We know from chapterI that epistemic states are characterized by belief sets. This, of course, remains true, but belief sets are in at least two respects insufficient as a complete representation of a belief state: belief sets do neither carry information about the certainty of the beliefs they contain nor do they express a disposition how to change the epistemic state when new evidence becomes available. These additional aspects of epistemic states can both be expressed by ranking functions.

Definition 2.1 (Ranking Function on Possibilities) Let N∞ := N ∪ ∞ be the extended set of naturals and W a set of possibilities. A function $ : W → N∞ from W into the extended set of −  naturals such that $ 1 0 6= ∅ is called a ranking function on possibilities for W.

48 2.2 Ranking Functions

The ranking function $ assigns a non-negative integer value or ∞ to each element of W where 0 is assigned to at least one w ∈ W.

Definition 2.2 (Rank of w) For any possibility w ∈ W and a ranking function $ for W, the value $w is called the rank of w.

The rank $w of w is according to the definition of $ some non-negative integer number or ∞.

2.2.2 Negative Ranking Functions

Usually we will consider epistemic issues on the abstraction level represented by propositions – not by possibilities. The reader may therefore think of ranking functions on possibilities as of a just theoretic conception that is not illustrative for belief states. The concept of ranking functions thus has to be extended to apply to propositions, which means $ : W → N∞ has ∞ to be extended to some function $A : A → N that assigns values to all propositions in a propositional algebra A.

Definition 2.3 (Complete Negative Ranking Function) Let A be a complete algebra of proposi- tions over a set of possibilities W and let $ be a ranking function (on possibilities) for W. A function κ: A → N∞ such that for each A ∈ A it holds that

   min $ w : w ∈ A if A 6= ∅ κA := ∞ if A = ∅ is called a complete negative ranking function on propositions for A.

Definition 2.4 (Negative Rank of A) For any proposition A ∈ A the value κA is called the neg- ative rank of A.

Note that definition 2.3 refers explicitly to complete algebras. Some of the properties intro- duced in the following hold exclusively for complete negative ranking functions. The details will be pointed out when not obvious. The domain of function κ are all the propositions in A. Therefore κ is a set function in the opposite to function $. Ranking functions on possibilities will therefore also be called pointwise to distinguish them explicitly from ranking functions on propositions. By now, we will use the short term “ranking function” for ranking functions on propositions only. We will abbreviate the term “negative ranking function on propositions” by “NRF”. In the following we will express the relationship between $ and κ by saying that κ is induced by $.

Lemma 2.5 (Inclusion of $ in κ) Let κ be an NRF on propositions for an algebra of propositions A over a set of possibilities W. Let κ be induced by $. Then for all w ∈ W it holds that   w ∈ A =⇒ $w = κ w .

49 II Ranking Functions and Rank-based Conditional Independence

This fact follows trivially from definition 2.3, thus the proof can be omitted. Note that definition 2.3 also holds for propositions that contain infinitely or uncountably many elements. The minimum is always taken from a subset of the well-ordered codomain of N∞, where a minimum is guaranteed to exist in each subset. An NRF represents a grading of disbelief and negative ranks are grades of disbeliefs. This is the reason why Spohn calls them “negative” (cf. (Spohn, 2012, p. 70)) in his comment on the definition. The higher the negative rank κA of a proposition A, the more disbelieved is A. If rank κA = 1, proposition A is disbelieved, but to the least degree, κA = 2 denotes a disbelief to the second least degree and so on. A is believed if and only if κA > 0 which means that the belief in A is constituted by the disbelief in its complement. If A is disbelieved to some degree, this implies a belief in A. For expressing a belief in A, it is nonetheless not sufficient to just ensure that κA = 0. This only means that A is not disbelieved. Because one could think consistency would require a rational system believing either A or A, it could seem that κA = 0 =⇒ κA > 0 but this is definitely a misunderstanding. Definition 2.3 entails κA > 0 =⇒ κA = 0 as well as κA > 0 =⇒ κA = 0 but in both cases the reverse does not hold. Informally, the complement of a proposition must be actively disbelieved to express belief in the proposition. Consistency requires that not A as well as A is believed at the same time which only implies that not κA > 0 ∧ κA > 0. It is nevertheless possible that κA = κA = 0. This is no violation of consistency since it only means that κ expresses epistemic neutrality with respect to A: A is neither believed nor disbelieved. −  Consistency is always preserved by requiring that κ 1 0 6= ∅ as it is implicitly included in definition 2.3. Informally, this means it is not considered rational not to disbelieve the empty proposition. Therefore the negative rank of the empty proposition expresses maximal  disbelief which means κ ∅ = ∞.   Note that it also holds that κ W = 0. This follows immediately from definition 2.3: since the preimage of 0 is non-empty in κ, at least one proposition will have rank 0. Clearly, 0 is the global minimum of the well-ordered codomain N∞ of κ, which directly entails that κW = 0. Note that it is also true that if A ⊆ B then κB ≤ κA. This is obvious since if B contains more elements than A it may be the case that B also contains some element w such that $w < min$v : v ∈ A .

Definition 2.6 (Belief Set of κ) For an NRF κ for an algebra of propositions A the set

n  o Belκ := A ∈ A : κ A > 0 is called the belief set of κ.

Definition 2.7 (Core of κ) Let κ be an NRF for an algebra of propositions A over a space of possibil- ities W. Then coreκ := κ−10 is called the core of κ.

50 2.2 Ranking Functions

The core of an epistemic state is the set of propositions that determines everything that is believed in that state. This does not mean that the core actually contains only actively believed  propositions. (Note that in fact Belκ ⊆ core κ .) If κ defines an epistemic state, the preimage of 0 (which is the core) contains all and only those propositions that are not disbelieved at all. Those are all propositions A ∈ A with κA = 0. If κ−1(0) contains A but not A, then A is believed. Therefore the core of κ determines what is believed.   Definition 2.8 (Regularity of κ) An NRF κ for a propositional algebra A such that κ A < κ ∅ for every non-empty A ∈ A is called regular.

Regularity represents the epistemological assumption that the contradiction is the only ac- tually unbelievable proposition. All non-contradictory propositions are believable to some degree. Huber introduces in (Huber, 2006) the notion of naturalness, which is similar to regularity.

Definition 2.9 (Naturalness of κ) An NRF κ for a propositional algebra A that is induced by some pointwise ranking function $: W → N is called natural.

For ranking functions on complete algebras, the notions of regularity and naturalness are equivalent. It can easily be seen that all complete NRFs as introduced by definition 2.3 are natural. For ranking functions on non-complete algebras, regularity and naturalness may not be equivalent, how Huber points out in (Huber, 2006). The reason is that ranking functions can be defined in a more general way than by definition 2.3, giving up the requirement of reducibility to some pointwise ranking function $. We will discuss this aspect later in this chapter.

Theorem 2.10 (Minimitivity of κ) Let κ be an NRF on propositions for an algebra of propositions A over a space of possibilities W. Then for all propositions A, B ⊆ A it holds that

n o κA ∪ B = min κA, κB .(2.1)

If κ is complete, it holds additionally for all sets S ⊆ A that

[  n o κ S = min κA : A ∈ S .(2.2)

Proof: 1) Proof of (2.1): By definition 2.3 and by the definition of a minimum it obviously holds that

κA ∪ B = min$w : w ∈ A ∪ B n o = min min$u : u ∈ A , min$v : v ∈ B n o = min κA, κB .

 2) Proof of (2.2): Either S is countable or uncountable. If S is countable let A1, A2,..., An ⊆ A with n ∈ N∞ be some partition of S w.l.o.g.

51 II Ranking Functions and Rank-based Conditional Independence

 n  o For the case n = 2 it holds by (2.1) that κ A1 ∪ A2 = min κ A1 , κ A2 . The induction step is to show that if

 n  o κ A1 ∪ A2 ∪ ... ∪ An−1 = min κ Ai : Ai ∈ A1 ∪ A2 ∪ ... ∪ An−1

holds then it also holds that

 n  o κ A1 ∪ A2 ∪ ... ∪ An = min κ Ai : Ai ∈ A1 ∪ A2 ∪ ... ∪ An .

The equivalence is straightforward.

     κ A1 ∪ A2 ∪ ... ∪ An = κ A1 ∪ A2 ∪ ... ∪ An−1 ∪ An      = min κ A1 ∪ A2 ∪ ... ∪ An−1 , κ An   n  o  = min min κ Ai : Ai ∈ A1 ∪ A2 ∪ ... ∪ An−1 , κ An n  o = min κ Ai : Ai ∈ A1 ∪ A2 ∪ ... ∪ An .

If S is uncountable the equivalence relation A ∼ B :⇔ κA = κB induces a partition ΠS of equivalence classes hii on S. The partition ΠS is always a well-ordered set since each subset hii = A ∈ S : κA = i of it is uniquely identified with some i ∈ N∞. S Each equivalence class hii can be considered as a proposition Ai = hii. It then clearly    holds that i = κ Ai = min $ w : w ∈ Ai . (This clearly also holds for h∞i.) S The set S can be expressed as the countable union A1 ∪ A2 ∪ ... ∪ An of all Ai = hii of ∞  all hii ∈ ΠS for i = 1, 2, . . . , n with n = ΠS ∈ N and κ Ai = i for all Ai. Therefore the case of an uncountable S is thus reduced to the case of a countable S. 

The notion of “minimitivity” was proposed by (Huber, 2006). Spohn calls the theorem the “law of disjunction”16 for NRFs in (Spohn, 2012, p. 72, Theorem 5.8b). The term “minimitivity” stresses the analogy to the corresponding notion of “additivity” in probability theory. A distinction has to be made between finite minimitivity, σ-minimitivity and complete mi- nimitivity. Which level of minimitivity the function κ fulfills, is dependent on the structure of the underlying algebra A. NRFs for complete algebras are of course completely minimi- tive and NRFs for σ-algebras are σ-minimitive. This implies that all countable algebras are completely minimitive. The interpretation of the symbol “S S” in theorem 2.10 can therefore denote infinite countable as well as uncountable unions.

2.2.3 Minimitivity,Completeness, and Naturalness

Completeness and minimitivity are closely interrelated, which is important to note.

16The proof of theorem 2.10 is originally by Wolfgang Spohn and stems from an early version of the manuscript of “The Laws Of Belief”.

52 2.2 Ranking Functions

For NRFs it follows directly from definition 2.3 that each value κA can be represented by the minimal value $w from the set $w : w ∈ A . Hence the set function κ is reducible to the pointwise function $. This fact is obvious from the inclusion of $ in κ (theorem 2.5 on page 49). For epistemological analysis, ranking functions on propositions are the natural choice be- cause propositions are the objects of belief. Pointwise ranking functions have no obvious corresponding aspect in epistemological analysis. Beliefs can also be established without having a clear and complete understanding of all underlying possibilities. This is the case in many situations where some expert knowledge is involved in beliefs. For example, I have the belief that nobody in the history of music has ever written a sonata that consists of more than seven movements. Since my expert knowledge about the devel- opment of the sonata as an artistic form in the history of music is quite limited, I have no clear, distinct, and complete awareness of all possibilities that are represented by my belief and therefore I am not completely sure about what kind of possible world can be considered as a counterexample. Nevertheless, I take the belief to be true. The example illustrates that we should be able to use NRFs for the modeling of epistemic states also in the case where no underlying pointwise ranking function can be established. This requires us to first modify the definition of κ such that it does no more contain any reference to $.

Definition 2.11 (Negative Ranking Function II) Let A be an algebra of propositions over a possi- bility set W. A function κ: A → N∞ such that  1) κ ∅ = ∞

2) κW = 0 and n o 3) for each pair of propositions A, B ⊆ A it holds that κA ∪ B = min κA, κB is called a negative ranking function on propositions for A.

If A is a σ-algebra/complete algebra it holds additionally for every countable/every set of propositions S ∈ A that   n o 4) κ S S = min κA : A ∈ S the function κ is called a σ−/complete negative ranking function on propositions for A.

  As usual κ ∅ = ∞ is required. As a further requirement κ W = 0 becomes part of the definition, where it was implicitly entailed by definition 2.3 on page 49. This implies the −  property κ 1 0 6= ∅, which had to be made an explicit part of definition 2.3. Also finite as well as complete minimitivity is now part of the definition. This definition stipulates the exact same properties for NRFs as definition 2.3 does with the only exception that the new definition also allows NRFs over A that are not induced by some pointwise ranking function. Those NRFs are hence not reducible to some pointwise

53 II Ranking Functions and Rank-based Conditional Independence function on possibilities which is equivalent to stating that they are not natural in the sense of definition 2.9. Hence, definition 2.3 is a definition for natural NRFs only – while definition 2.11 defines NRFs in general. It is important to note that the property of reducibility/naturalness is guaranteed only for complete ranking functions. It is not trivial to see but there are indeed counterexamples.

Example 2.12 (Non-natural ranking function) Consider the set of real numbers R as space W. Let A be an algebra over W such that n o A := A ⊆ W : A is countable ∨ A is countable .

Note that A is a σ-algebra since it contains each countable subset of W but A is clearly not complete. As a counterexample consider the interval [0 ; 1] ⊂ R, which is an uncountable set and has an uncountable complement over R. For each number w ∈ [0 ; 1], the set w forms a countable non-empty subset of R and therefore it holds that w ∈ A. But the union of those w (which is just the set [0 ; 1]) is obviously not in A since it is not countable and does not have a countable complement. Hence A is not complete. We now define a ranking function κ on A such that:  ∞ if A =  ∅   κ A := 1 if A 6= ∅ and A countable  0 if A 6= ∅ and A uncountable

 Note that κ is a σ-minimitivie ranking function for A according to definition 2.11 since κ ∅ = ∞, κR = 0 and for every countable subset S ⊆ A it holds that κS S = minκA : A ∈ S . This is seen as follows:

1) If S S is empty, then so is every A ∈ S, therefore κS S = minκA : A ∈ S = ∞.

2) If S S is non-empty and countable, then every A ∈ S is countable, and at least one A ∈ S is non-empty, therefore κS S = 1 = minκA : A ∈ S .

3) If S S is non-empty and uncountable, then at least one A ∈ S must be uncountable, too. Hence it holds by the definition of A that A is countable and it directly follows κS S = 0 = minκA : A ∈ S

Now note that κ is not reducible to some pointwise ranking function $.

To reduce κ to any pointwise ranking function $ it is required for each non-empty A that κA = min$w : w ∈ A . Consider the case for A := R. For every w ∈ R it holds that κw  = 1 and hence $w = 1. But it then clearly also holds that κR = 0 and hence κR < min$w : w ∈ R which obviously breaks complete minimitivity. 

Since the infinite or uncountable unions of members of A are guaranteed to be members of A themselves, minimitivity must be required for them because otherwise it is allowed to

54 2.2 Ranking Functions accept the conjunction of some propositions while rejecting each of them, which results in an epistemic paradoxon17. This is formally expressed in κW < minκw : w ∈ W . This is the punchline of this illustrative example. Among some other examples on reducibility it can be found in (Huber, 2006). Consider the case where S := W = R . The set S is finite and countable since it contains only one element. Hence, it is required to fulfill the minimitivity requirement:

[  n o κ S = min κA : A ∈ S = κR = 0.

Nonetheless it is the case that n o κR = 0 6= min κA : A ∈ R = 1 and complete minimitivity is violated. As an alternative situation consider the case when A := 2R. In this case A is complete. Since κ violates the property of complete minimitivity it is simply no ranking function for A. A complete discussion on reducibility/naturalness can be found in (Huber, 2006), where it is discussed under which conditions a ranking function on propositions on some propositional algebra A is reducible to some pointwise ranking function on the underlying set of possibilities W.

Theorem 2.13 (Acceptance in κ) Let κ be an NRF for an algebra of propositions A over a space of possibilities W. Then for all propositions A ∈ A it holds that n o min κA, κA = 0. (2.3)

Proof: Equation (2.3) follows from both definitions of κ, 2.11 and 2.3. It follows from definition 2.3 since at least one w ∈ W must have a rank of 0 and since for each A ∈ A, the set A, A forms a partition of A, it is either w ∈ A or w ∈ A. In either case w induces rank 0 for one of the propositions A and A. Equation (2.3) follows further directly from definition 2.11, since κW = 0 and W = A ∪ A for each A ∈ A, it follows from minimitivity that at most one of A and A has rank 0. 

Theorem 2.1318 makes an important statement about consistency concerning NRFs: a ra- tional doxastic state is required not to attribute disbelief to A as well as A. Disbelieving both propositions cannot be considered as rational. It is therefore required to assign disbe- lief to at most one of the propositions A and A. Note that this can also be expressed as κA = 0 ∨ κA = 0 (which obviously includes κA = κA = 0). On the other hand, it is sound with rationality to be in a neutral state with respect to A and A. It is not required to actively belief (or disbelief) one of the two propositions.

17Precisely in an instance of the lottery paradoxon. 18Spohn calls it “law of negation” for negative ranks.

55 II Ranking Functions and Rank-based Conditional Independence

2.2.4 Two-sided Ranking Functions

Let Z∗ := Z ∪ ∞, −∞ be the extended set of integers. A two-sided ranking function τ: A → Z∗ is a function that represents a grading of belief as well as of disbelief.

Definition 2.14 (Two-sided Ranking Function) Let A be an algebra of propositions over a possi- bility set W and let κ be an NRF for A. Then a function τ : A → Z∗ such that for each A ∈ A it holds that  ∞ if A = W    τ A := −∞ if A = ∅  κA − κA otherwise is called a two-sided ranking function for A.

Spohn discusses two-sided ranking functions in (Spohn, 2012, p. 76f). He stresses the fact that they are also ranking functions. We will call two-sided ranking functions “TRFs” for short. The function τ assigns positive integers to propositions that are believed and negative in- tegers to propositions that are disbelieved. Neutrality with respect to a proposition A is expressed by τA = 0. The value τA is called the two-sided rank of A. If a TRF τ and an NRF κ are interrelated as in definition 2.14, we will say that κ induces τ. Note that while a TRF can be defined without being reducible to some pointwise function it cannot be axiomatized without reccurrence to an NRF in a simple way. In other words, a TRF is always induced by some NRF. This shows that TRFs are a derived notion at least in some respect. Two-sided ranking functions are different from other ranking functions in some aspects. For example, other than NRFs, TRFs do not represent a well-ordering of beliefs. Remember the case for NRFs: the equivalence relation A ∼ B :⇔ κA = κB induces a partition ΠA of equivalence classes hii on A. Each equivalence class can be represented by a number i ∈ N∞ such that it holds that κA = i for all propositions A ∈ hii. Since N∞ is a well-ordered set, there will always be some equivalence class represented by a minimal i. Thus, NRFs represent a well-ordering of beliefs. The case for TRFs is different since Z∗ is not a well-ordered set. Therefore, TRFs do not induce a well-ordering of degrees of belief. As a consequence, TRFs are not minimitive in a non-trivial way.

Definition 2.15 (Core of τ) Let τ be a TRF for an algebra of propositions A over a space of possibili- ties W. Then, the function coreτ := Sτ−1i : i > 0 ∧ i ∈ Z∗ is called the core of τ.

Corollary 2.16 (Symmetry of τ) Let A be a propositional algebra and τ a TRF for A. Then it holds for all propositions A ∈ A that τA = −τA.(2.4)

56 2.2 Ranking Functions

Proof: For each proposition A ∈ A it either holds A = ∅ or A = W or ∅ 6= A 6= W. If    A = ∅ it holds by definition 2.14 that τ A = −∞ = −τ ∅ = −τ A . If A = W it holds by    definition 2.14 that τ A = ∞ = −τ W = −τ A . If ∅ 6= A 6= W it holds that

τA = κA − κA.

Due to the acceptance theorem it is either κA = 0 or κA = 0. If κA = 0 then τA = κA and, as a consequence, τA = −κA and (2.4) is shown.      If κ A = 0 then τ A = −κ A and thus τ A = κ A which completes the proof. 

It is obvious that if κA = κB it also holds that τA = τB.

Corollary 2.17 (Asymmetry of κ and τ) Let τ be a TRF induced by an NRF κ. It then always holds for every propositions A, B ⊆ A that

κA < κB ⇐⇒ τA > τB.(2.5)

Proof: Since (2.5) implies κB 6= 0 it follows κB = 0 by the acceptance theorem and therefore τB = −κB. It either holds that κA = 0 or κA > 0. If κA = 0 it holds that κA ≥ 0 and hence τA = κA − κA ≥ 0. Therefore τA > τB. If κA > 0 it holds      that κ A = 0 and τ A = −κ A and thus τ A > τ B . 

It is not required to introduce TRFs in a fully detailed manner. In some cases, they are practical for representing relationships between terms in a more compact and intuitive way than it would be possible with NRFs. An example for this will be the concept of conditional independence among propositions introduced in section 2.4 on page 75. Since we will not utilize TRFs for any other purpose, the material about them introduced so far will be sufficient.

2.2.5 Conditional Negative Ranks

In analogy to probability-based belief representation, we wish to express what it means to belief some proposition A conditional on some proposition B. A conception of conditional belief is the basis to model the transition of one epistemic state to another since it is the only indication for an actual posterior rank value.

Definition 2.18 (Conditional Negative Rank of w) Let A be a complete algebra of propositions over a possibility set W. Let $ be a ranking function on possibilities for W. Let A ∈ A be a proposition with min$v : v ∈ A < ∞. Then, for each w ∈ W the value

  n  o $ w − min $ v : v ∈ A if w ∈ A $w | A := ∞ otherwise is called the conditional negative rank of w given A.

In the case w ∈ A the factor min$v : v ∈ A is subtracted for normalization: it has to be ensured that the minimum value of $w | A is always 0.

57 II Ranking Functions and Rank-based Conditional Independence

Definition 2.19 (Conditional Negative Rank of B) For any propositional algebra A, an NRF κ for A and propositions A, B ⊆ A the value

 ∞ if B =  ∅    κ B | A := 0 if B 6= ∅ and κ A = ∞  κA ∩ B − κA else is called the conditional negative rank of B given A.

Note that we could also define κB | A := min$w | A : w ∈ B which is equivalent to the term κA ∩ B − κA in case A is a complete algebra. Note that this approach may leave the term κB | A undefined for some propositions in case κ is non-natural and therefore not induced by some pointwise ranking function $. It is therefore not adequate for the case of A being not complete. There is some variation to be found in literature about the definition of conditional ranks. Table 2.1 presents an overview about the differences of some definitions.

Term Goldszmidt & Pearl (Huber, 2006)(Huber, 2009)(Spohn, 2012)  κ ∅ | ∅ ∞ ∞ ∞ undef.  κ A | ∅ ∞ 0 ∞ − ∞ = undef. undef.  κ B | ∅ ∞ 0 ∞ − ∞ = undef. undef.  κ ∅ | A ∞ ∞ ∞ undef. κA | A ∞ 0 ∞ − ∞ = undef. undef. κB | A ∞ 0 ∞ − ∞ = undef. undef.    κ ∅ | B ∞ − κ B = ∞ ∞ ∞ ∞ − κ B = ∞ κA | B ∞ − κB = ∞ ∞ − κB = ∞ ∞ − κB = ∞ ∞ − κB = ∞ κB | B 0 0 0 0

 Table 2.1: Different definitions of conditional ranks. Let A 6= B, A 6= ∅ 6= B, κ A = ∞, κB < ∞.

Note that (Goldszmidt & Pearl, 1996, p. 63) defines κB | A := ∞ for all A with κA = ∞. This means that conditional on any impossible condition, every proposition is disbelieved maximally firmly in κ. The converse case where some maximally unbelievable proposition is conditional to some possible condition is left undefined. Definition 2.19 says in contrast that we cannot say anything about belief on conditions that are impossible. In (Huber, 2006) a different definition is introduced that assigns κB | A := 0 for all   A, B with B 6= ∅ and κ A = ∞. Furthermore, it is stipulated that κ ∅ | A := ∞ for all A ∈ A which is needed to ensure that κ· | · is a ranking function in turn. This introduces a distinction in the set of impossible propositions: there is the contradictory proposition ∅ and – in case κ is not regular – “virtually impossible” propositions that are non-empty but have a rank value of ∞. While the contradiction is always maximally unbelievable regardless of its condition, the virtually impossible propositions are maximally believed under impossible conditions following ex falso quodlibet. Definition 2.19 does not obey this distinction and leaves both cases undefined.

58 2.2 Ranking Functions

Huber additionally introduces a slightly different definition in (Huber, 2009), which pre- serves this distinction, but leaves belief on impossible conditions undefined instead of maxi- mal. In (Spohn, 2012, p. 78, Definition 5.15) conditional ranks are defined by the term κA ∩ B − κA without any distinction of cases. A consequence is that κB | A is only defined for A with κA < ∞. The term is undefined on unbelievable conditions. This establishes a coincidence to probability theory where impossible conditions are undefined accordingly. All definitions apply the equation κB | A = κA ∩ B − κA to every A, B ⊆ A with   A 6= ∅ 6= B and κ A , κ B < ∞. Definition 2.19 is equivalent to the definition Huber uses in (Huber, 2006). The reason for using this definition is that it is the only version that leads to a sufficiently general version of the chain rule for ranks (cf. corollary 2.21) while preserving regularity of κ in κ· | ·. It is of special importance to note that if κ is a ranking function, then κ· | · is a ranking function in turn. (This statement will be proven as corollary 2.39 in section 2.3 (cf. page 68).)   Comparing definitions 2.19 and 2.3 shows that κ ∅ | · and κ W | · satisfy the require- ments of definition 2.3. It is furthermore obvious that κ· | · satisfies minimitivity. Also a version of theorem 2.13 applying for κ· | · can easily be proven. The fact that κ· | · is an NRF if κ is an NRF applies to all the definitions of κ· | · that are discussed here. This fact is highly relevant for updating, since it is important to represent the posterior belief state by the same formal devices as the prior belief state. Otherwise, iterated updating would be difficult. We will discuss this aspect in more detail in section 2.3. Note especially that for the two definitions given by Huber and definition 2.19, which stems from (Spohn, 2012, p. 78), it holds that if κ is regular, also κ· | · is regular. This can be seen without proof. The definition of (Goldszmidt & Pearl, 1996) is insofar deviant as κ· | · is never regular, regardless if κ is regular. This property may turn out to be quite impractical for several applications. Having defined conditional ranks, some computation rules can be deduced that are directly corresponding to the laws of probability theory.

Corollary 2.20 (Law of Conjunction for Negative Ranks) Let A, B ⊆ A and κA < ∞. It then holds that: κA ∩ B = κB | A + κA.

Proof: Rewriting definition 2.19 yields:

      κ B | A = κ A ∩ B − κ A ⇐⇒ κ A ∩ B = κ B | A + κ A 

The law of conjunction can be generalized to the rank-based version of the chain rule. Note the applicability of the chain rule is strongly dependent from the underlying definition of conditional ranks. In chapters III andIV we will see that the chain rule is of crucial importance for ensuring relevant properties of the rank-based data structures we will use to compute epistemic updates on. It is of vital importance that the chain rule is reliable. It is a notable fact that the capabilities

59 II Ranking Functions and Rank-based Conditional Independence of reasoning mechanisms are heavily influenced by the conception of conditional ranks that is actually used.

 Corollary 2.21 (Chain Rule for Negative Ranks) Let A1, A2,..., An ⊆ A be a set of proposi-  tions such that A1 ∩ A2 ∩ ... ∩ An 6= ∅ and κ A1 ∩ A2 ∩ ... ∩ An < ∞. Be κ an NRF for A. It then holds that:

n    κ A1 ∩ A2 ∩ ... ∩ An = κ A1 + ∑ κ Ai | A1 ∩ A2 ∩ ... ∩ Ai−1 .(2.6) i=2

Proof: The proof of corollary 2.20 shows that (2.6) is true for the case n = 2. It remains to be shown that for n with 2 ≤ n ∈ N if

n    κ A1 ∩ A2 ∩ ... ∩ An = ∑ κ Ai | A1 ∩ A2 ∩ ... ∩ Ai−1 + κ A1 i=2 it also holds that

n+1    κ A1 ∩ A2 ∩ ... ∩ An+1 = ∑ κ Ai | A1 ∩ A2 ∩ ... ∩ Ai−1 + κ A1 . i=2

This is quite trivial since

    κ A1 ∩ A2 ∩ ... ∩ An+1 = κ A1 ∩ A2 ∩ ... ∩ An ∩ An+1   = κ An+1 | A1 ∩ A2 ∩ ... ∩ An + κ A1 ∩ A2 ∩ ... ∩ An  = κ An+1 | A1 ∩ A2 ∩ ... ∩ An n   + ∑ κ Ai | A1 ∩ A2 ∩ ... ∩ Ai−1 + κ A1 i=2 n+1   = ∑ κ Ai | A1 ∩ A2 ∩ ... ∩ Ai−1 + κ A1  i=2

 Note that instead of stipulating a finite rank for the intersection κ A1 ∩ A2 ∩ ... ∩ An , regularity of κ could be stipulated in the definition alternatively.

 Theorem 2.22 (Formula of the Total Negative Rank) Let A1, A2,..., An ⊆ A be a partition of W. It then holds that for any B ∈ A:

    κ B = min κ B | Ai + κ Ai 1≤i≤n

Proof: For any B ∈ A it lucidly holds that W ∩ B = B. It then holds that

[  W = Ai : 1 ≤ i ≤ n

60 2.2 Ranking Functions and therefore

[  B = Ai : 1 ≤ i ≤ n ∩ B [  = Ai ∩ B : 1 ≤ i ≤ n .

It follows:

 [   κ B = κ Ai ∩ B : 1 ≤ i ≤ n .

   With corollary 2.20 it holds that κ Ai ∩ B = κ B | Ai + κ Ai and therefore

n   o = min κ B | Ai + κ A : 1 ≤ i ≤ n . 

Theorem 2.23 (Bayes’ Theorem for Negative Ranks) Let A be an algebra of propositions over a  possibility space W. Let κ be an NRF for A. Let A1, A2,..., An ⊆ A be a partition of W. It then holds for each i such that 1 ≤ i ≤ n that:

    κ Ai | B = κ B | Ai + κ Ai − κ B .

Proof: Applying corollary 2.20 to definition 2.19 yields for each i:

   κ Ai | B = κ Ai ∩ B − κ B    = κ B | Ai + κ Ai − κ B 

Theorems 2.22 and 2.23 underline the analogy between ranking theory and probability the- ory. The analogy leads to a rule of thumb by which the ranking theorems can be generated from the corresponding probability theorems on the syntactical level.

Probability theory Ranking theory

Addition (+) Minimum (min) Multiplication (∗) Addition (+) Division (÷) Subtraction (−)

The details of the analogy between ranking theory and probability theory are intensively discussed in (Spohn, 2012). Spohn also argues in chapter 5 of (Spohn, 2012, p. 79, Definition 5.19) that the notion of a conditional rank can be extended to a definition of conditional NRFs while transferring the idea of Popper measures from probability theory to ranking theory.

61 II Ranking Functions and Rank-based Conditional Independence

Definition 2.24 (Conditional Negative Ranking Function) Let A be an algebra over a space of possibilities W and S a non-empty subset of A. Then a function κ• : A × S → N∞ such that for all A, B, C ⊆ A with B, A ∩ B ⊆ S it holds that

1) κ•A | A = ∞

2) if κ•A | B < ∞, then A ∈ S

3) κ•A | B = 0 or κ•A | B = 0

4) κ•A ∩ B | C = κ•A | C + κ•B | C ∩ A is called a conditional negative ranking function.

Note that we then may define κ•A := κ•A | W as the unconditional rank of A. As κ, function κ• will be an NRF.

Definition 2.25 (Complete Conditional Negative Ranking Function) Let κ• be a conditional  negative ranking function for some complete propositional algebra A and A1, A2,..., An ⊆ A with ∞ n ∈ N a partition of A. If it holds for each pair of propositions Ai, B with B ⊆ A that

n _  •   κ B | Ai = 0 i=1 is true then κ• is called a complete conditional negative ranking function.

This is just a strengthening of requirement 3 in definition 2.24. Conditional NRFs can be seen as a generalization of NRFs. The relationship is similar to the way Popper measures generalize probability measures. Spohn points out in his comment on definition 5.19 in (Spohn, 2012, p. 79) that this general- ization “vanishes in the conditional counterpart of ordinal ranking functions”. Since we will stick with N∞ as a codomain for ranking functions, this restriction is not crucial in the context of this inquiry. It is nevertheless an open question whether this generalization is of great use.

2.2.6 Conditional Two-sided Ranks

As it was done for NRFs, conditional ranks can also be defined for TRFs.

Definition 2.26 (Conditional Two-Sided Rank) Let κ be an NRF for A and τ the TRF induced by κ. Then for any pair of propositions A, B ⊆ A with τA > −∞,

τB | A := κB | A − κB | A is the conditional two-sided rank of B given A.

Note that the asymmetry κA | C < κB | C ⇐⇒ τA | C > τB | C always holds since both κ· | · and τ· | · are ranking functions and are related by the same inequality of asymmetry as κ and τ.

62 2.2 Ranking Functions

Theorem 2.27 (Law of Disjunctive Conditions) Let κ be an NRF for A and τ the TRF induced by κ. Then for every triple of propositions A, B, C ⊆ A it holds that:

τC | A ≤ τC | B =⇒ τC | A ≤ τC | A ∪ B ≤ τC | B.(2.7)

Proof: The term τC | A can be rewritten as follows:

τC | A ⇐⇒ κC | A − κC | A   ⇐⇒ κA ∩ C − κA − κA ∩ C − κA

⇐⇒ κA ∩ C − κA ∩ C.

It follows that in (2.7) the premise

τC | A ≤ τC | B (2.8) can be rewritten as

κA ∩ C − κA ∩ C ≤ κB ∩ C − κB ∩ C.(2.9)

Furthermore, in the right side of (2.7) the term

τC | A ∪ B (2.10) can be rewritten as follows:

τC | A ∪ B ⇐⇒ κC | A ∪ B − κC | A ∪ B     ⇐⇒ κ A ∪ B ∩ C − κ A ∪ B ∩ C     ⇐⇒ min κA ∩ C, κB ∩ C − min κA ∩ C, κB ∩ C (2.11)

Now, suppose that the premise (2.8) holds. Term (2.11) generates four possible cases how the right side of inequality (2.7) could actually resolve. The proof is completed by showing that in each of these cases, (2.7) holds.     1) min κA ∩ C, κB ∩ C = κA ∩ C and min κA ∩ C, κB ∩ C = κA ∩ C:

Term (2.10) resolves to κA ∩ C − κA ∩ C which is just τC | A. It follows trivially that (2.7) holds.     2) min κA ∩ C, κB ∩ C = κB ∩ C and min κA ∩ C, κB ∩ C = κB ∩ C:

Term (2.10) resolves to κB ∩ C − κB ∩ C which is just τC | B. It follows trivially that (2.7) holds.     3) min κA ∩ C, κB ∩ C = κA ∩ C and min κA ∩ C, κB ∩ C = κB ∩ C and19 κB ∩ C 6= κA ∩ C:

19Case 3 with κB ∩ C = κA ∩ C is already covered by case 1.

63 II Ranking Functions and Rank-based Conditional Independence

Term (2.10) resolves to κA ∩ C − κB ∩ C. The assumptions in this case directly imply that κB ∩ C < κA ∩ C and it therefore holds that κA ∩ C − κA ∩ C ≤ κA ∩ C − κB ∩ C.(2.12)

It also holds that κA ∩ C < κB ∩ C and therefore

κA ∩ C − κB ∩ C ≤ κB ∩ C − κB ∩ C.(2.13)

From the combination of (2.12) and (2.13) it follows that

κA ∩ C − κA ∩ C ≤ κA ∩ C − κB ∩ C ≤ κB ∩ C − κB ∩ C

which is just a rewriting of the right side of (2.7).     4) min κA ∩ C, κB ∩ C = κB ∩ C and min κA ∩ C, κB ∩ C = κA ∩ C and κB ∩ C 6= κA ∩ C:

The assumptions imply that κA ∩ C > κB ∩ C and κA ∩ C < κB ∩ C which is a contradiction to (2.9). Since the premise does not hold, the case is impossible. 

Spohn remarks on the law of disjunctive conditions

“(. . . ) that the rank of some proposition conditional on some disjunction is always between the ranks conditional on the disjuncts. (. . . ) Note, however, that (. . . ) it is not required that the disjuncts are disjoint.” (Spohn, 2012, p. 81)

Theorem 2.28 Let κ be an NRF for A and let τ be the TRF induced by κ. Let A, B, C ⊆ A. Then it holds that:

If τC | A < τC | B, then τC | A ∪ B = τC | A ⇐⇒ κA ∩ C ≤ κB ∩ C (2.14) and τC | A ∪ B = τC | B ⇐⇒ κA ∩ C ≥ κB ∩ C.(2.15)

Proof: Since the premise τC | A < τC | B is stronger than (2.8) the former is included in the latter. Thus for the proof, we have to consider the same four cases as in the proof of theorem 2.27. In case 4 the premise does not hold for the same argumentation as in the proof of theorem 2.27. This renders the hypothesis true in this case. It remains to consider cases 1, 2, and 3. First, it is to show that the premise implies (2.14). For case 1 it is true that τC | A ∪ B = τC | A ∧ κA ∩ C ≤ κB ∩ C, while in each of cases 2 and 3 it holds that τC | A ∪ B 6= τC | A ∧ κA ∩ C > κB ∩ C. Equivalence (2.14) is therefore true in all cases. Second, we show that the premise implies (2.15). Case 2 is the only case where it holds that τC | A ∪ B = τC | B ∧ κA ∩ C ≥ κB ∩ C. For cases 1 and 3 it holds that

64 2.2 Ranking Functions

τC | A ∪ B 6= τC | B ∧ κA ∩ C < κB ∩ C. Equivalence (2.15) is therefore true in all cases. 

The proofs for theorems 2.27 and 2.28 use Spohn’s proof sketches for his theorems 5.23a and 5.23b in (Spohn, 2012, p. 81) as a blueprint but unfold the argumentation in more explicit detail. This will be enough inquiry on conditional TRFs. Since they do not play a central role for the version of the update algorithm presented here, we will leave the analysis at that.

2.2.7 ADigression on Positive Ranking Functions

Since NRFs represent gradings of disbelief, they only indirectly carry information about the firmness of beliefs that are actually kept in the current epistemic state. The grade of disbelief κA of the complement of A can be interpreted as a grade of belief in A. This seems somewhat unintuitive and may not be convenient in all situations. It seems promising to achieve a more “intuitive” modeling of doxastic states by expressing the firmness of beliefs by the presence of belief than by the absence of disbelief. We will try to obtain such a modeling by so-called “positive ranking functions” and see what there is to gain.

Definition 2.29 (Positive Ranking Function) Let A be an algebra of propositions over a possibility space W. A function β: A → N∞ such that

1) βW = ∞  2) β ∅ = 0 and n o 3) for each A, B ⊆ A it holds that βA ∩ B = min βA, βB

is called a positive ranking function on propositions for A. If A is a σ-algebra/complete algebra and it holds additionally for every countable/every set S ⊆ A that   n o 4) β TS = min βA : A ∈ S

the function β is called a σ−/complete positive ranking function on propositions for A.

Definition 2.30 (Positive Rank of A) For any proposition A ∈ A a positive ranking function β for A, the value βA is called the positive rank of A.

While NRFs express gradings of disbelief, positive ranking functions express gradings of belief. The higher the value βA for a given proposition A ∈ A, the firmer is the belief in A. If βA = 0, there is no belief in A at all.   Note that κ ∅ = ∞ while β ∅ = 0. That means informally that κ expresses rejection of the contradictory proposition ∅ maximally while β assigns no acceptance to it. Furthermore, while κW = 0 it is βW = ∞, which means that κ ensures that the tautological proposition W is disbelieved to no amount, β ensures that W is believed to the maximal degree. Note that also if the requirement that βW = ∞ would have been removed from the definition of positive ranking functions, it would nevertheless remain derivable that βW >

65 II Ranking Functions and Rank-based Conditional Independence

0. But this would not respect the fact that the epistemic subject knows that the tautological proposition is always true. Therefore, it is not sufficient to stipulate only the highest certainty degree that is actually assigned to some w also to the tautological proposition. In applications where this requirement is not needed, a weaker definition can be established.

Definition 2.31 (Complementarity of β and κ) Let κ be an NRF for a propositional algebra A and let β be a positive ranking function for A, defined by βA := κA for each A ∈ A. In this case, functions κ and β are called complementary.

Moreover, each NRF κ induces a positive ranking function that is complementary to κ, since if it holds for each A ∈ A that βA = κA without further assertions on β, than β is a positive ranking function20.

Definition 2.32 (Core of β) Let β be a positive ranking function on propositions for an algebra of propositions A over a space of possibilities W. Then coreβ := β−10 is called the core of β.

This is intuitively clear. Since all propositions A with βA > 0 are believed, all is believed that is not part of the preimage of 0 in β.

Theorem 2.33 (Acceptance in β) Let β be a positive ranking function on propositions for an algebra of propositions A over a space of possibilities W. Then for all propositions A, B ⊆ A it holds that

n o min βA, βA = 0.

  Proof: Assume that β A > 0 ∧ β A > 0. Obviously, A ∩ A = ∅. Since minimitivity         requires that β A ∩ B = min β A , β B , it follows that β ∅ = min β A , β A . Since    we know that β ∅ = 0 it must hold that either β A = 0 or β A = 0. 

So far, we have seen that NRFs and positive ranking functions have many corresponding properties. Note that nonetheless the differences are weighty as will be pointed out in the following. Considering the concept of naturalness in respect of positive ranking functions, we note an important difference to negative ranking functions. For some non-empty proposition A and an NRF κ induced by some $ on W, there will always exist some w ∈ A such that $w = κA. As Spohn points out in a remark directly following sentence 5.11 in (Spohn, 2012, p. 75), this is clearly not the case for positive ranking functions: for some value βA, reducibility to a pointwise function is only guaranteed insofar as there is always a w such that βW\ w  = $w. Since W\ w is a minimal non-tautological proposition possible over W, the practical use of this reduction is limited. In most non-exceptional applications, we can expect to have βw  = 0 for most propositions w ⊆ A. This limits the applicability of positive ranking functions strongly. So far we showed that most of the basic concepts introduced in respect to NRFs have a corresponding form for positive ranking functions. This is also the case for the concepts concerning conditionality.

20See also sentence 5.11, p. 75 in (Spohn, 2012).

66 2.2 Ranking Functions

Definition 2.34 (Conditional Positive Rank of B) For any A, B ⊆ A with βA < ∞ the value βB | A := βA ∪ B − βA is called the conditional positive rank of B given A.

Corollary 2.35 (Law of Material Implication for Positive Ranks) For any A, B ⊆ A it holds that: βA ∪ B = βB | A + βA

Proof: Rewriting definition 2.34 yields:

      β B | A = β A ∪ B − β A ⇐⇒ β A ∪ B = β B | A + β A 

Corollary 2.35 corresponds with corollary 2.20, where the latter states the same type of property for negative ranks. The reason for the different naming is that in corollary 2.35 the term on the left side is equivalent to material implication21. The two statements describe properties that are not of equal practical importance, neither for computing nor for epistemological analysis. Spohn emphasizes in his comment on definition 5.20 in (Spohn, 2012, p. 80) that conditional positive ranks are considerably less practical than conditional negative ranks. He stresses that the conditional rank of B given A results from just taking the rank of the material implication and then

“(. . . ) subtracting from it the positive rank of the falsity of its antecedent, its trivial and uninteresting truth condition. In particular, if A is not believed, then the conditional rank and the rank of the material implication coincide.” (Spohn, 2012, p. 80)  Theorem 2.36 (Formula of the Total Positive Rank) Let A1, A2,..., An ⊆ A be a partition of W. It then holds that for any B ∈ A:

    β B = min β B | Ai + β Ai 1≤i≤n

S Proof: For any B ∈ A it lucidly holds that W ∩ B = B. Clearly, it is W = Ai : 1 ≤ i ≤ n S T and Ai : 1 ≤ i ≤ n ∩ B = Ai ∪ B : 1 ≤ i ≤ n = B. With corollary 2.35 it holds     T  that β Ai ∪ B = β B | Ai + β Ai and therefore β B = β Ai ∪ B : 1 ≤ i ≤ n =    min β B | Ai + β A : i ≤ n . 

Theorem 2.37 (Bayes’ Theorem for Positive Ranks) Let A be an algebra of propositions over a possibility space W. Let β be a positive ranking function on propositions for A and let the set  A1, A2,..., An ⊆ A be a partition of W. It then holds for each i such that 1 ≤ i < n that:

    β Ai | B = β B | Ai + β Ai − β B .

21Consider A =⇒ B, which just means ¬A ∨ B. The latter logical term is implemented for propositions by A ∪ B.

67 II Ranking Functions and Rank-based Conditional Independence

Proof: Applying corollary 2.35 to definition 2.34 yields:

   β Ai | B = β Ai ∪ B − β B     = β Ai ∪ B − β A + β A − β B    = β B | Ai + β Ai − β B . 

Positive ranking functions are not as practical as negative ranking functions for mainly two reasons: the reducibility of βA to some $w in w ∈ W \ A is of no practical help, and furthermore, the informational value of conditional positive ranks is limited. In general, concepts based on positive ranking functions do not correspond to their coun- terparts in probability theory as direct and intuitive as in the case of NRFs how has become clear during this section. For most application, additional translation steps are required. Having described positive ranking functions, the awareness for this alternative is sufficiently developed. We will stick with negative ranking functions and two-sided ranking functions.

2.3 Conditionalization and Revision of Ranking Functions

2.3.1 Plain Conditionalization

The conditionalization of a ranking function can be imagined as a function that recomputes the image of the ranking function in some dependence of the corresponding conditional ranks. The motivation for introducing conditionalization is to represent the posterior epistemic state of a subject by the conditionalization of the ranking function that represents its prior epistemic state. The factor by which the conditionalization is performed corresponds to the evidence that the subject receives in the prior state. Processing the evidence results in the posterior state and therefore the posterior state is seen as a change of the prior state conditional on what became evident. Note that the term “evidence” in the above paragraphs is not congruent with just “propo- sition”. This will become clear soon. We will start this section with very simple concepts of evidence and conditionalization and then develop them to an adequate level. The simplest example for a conditionalization function is the so-called “plain conditionali- zation” that assigns each B ∈ A the value κB | A of some conditional A ∈ A.

Definition 2.38 (Plain Conditionalization of κ by A) Let κ be an NRF for an algebra of proposi-  ∞ tions A. Let A ∈ A w.l.o.g. be a proposition with κ A < ∞. A function κA : A → N such that for any B ∈ A:   κA B := κ B | A is called the plain conditionalization of κ by A.

This is the propositional version of plain conditionalization. It is trivial to see that it corre-   sponds to the form $A B = $ w | A on the level of possibilities. It is very important to note that the plain conditionalization of a ranking function is a ranking function in turn. The fact, that κ· | · is an NRF if κ is an NRF was already stated on page 59. As was already pointed out there, this fact is an important requirement of iterated

68 2.3 Conditionalization and Revision of Ranking Functions belief change because the representation of a posterior epistemic state has to be suitable as an input of a subsequent revision operation. Therefore, the representations of prior and posterior state obviously have to be syntactically equivalent because otherwise the output of a revision could not be the input for the subsequent revision. This quite obvious requirement is sometimes called “principle of categorical matching” following (Gärdenfors & Rott, 1995). At least from an engineer’s viewpoint, it is indeed completely trivial.

Corollary 2.39 (Plain Conditionalizations is a Ranking Function) For a given NRF κ for a pro-  positional algebra A and some proposition A ∈ A with κ A < ∞, the plain conditionalization κA of κ by A is an NRF on propositions for A in turn.

Proof: It is to show that if κ is an NRF it holds that  1) κA ∅ = ∞  2) κA W = 0 and  n  o 3) κA A ∪ B = min κA A , κA B .

The proof is straightforward:

1) Requirement 1 is part of definition 2.19 and consequently contained in definition 2.38.       2) It holds that κA W = κ W | A = κ A ∩ W − κ A = κ A − κ A = 0. n o 3) If κB ∪ C = min κB, κC holds, it also holds that

  n o κ B ∪ C ∩ A = min κB ∩ A, κC ∩ A   n o κ B ∪ C ∩ A − κA = min κB ∩ A − κA, κC ∩ A − κA n o κB ∪ C | A = min κB | A, κC | A  n  o κA B ∪ C = min κA B , κA C

and hence finite minimitivity for κA is shown. (The proof for complete minimitivity is left out here. It is completely straightforward and simple.)

The function κA is thus an NRF on propositions for A if κ is an NRF on propositions for A. 

Having shown that plain conditionalization κA is a ranking function if κ is a ranking func- tion, it is implicitely shown that κ· | · is a ranking function in the same case. Plain conditionalization is not sufficient to model a transition between epistemic states. The reason is that plain conditionalization relies on a notion of evidence that is too narrow: the first limitation is that evidence is simply the single proposition A and the second limitation comes from the fact that plain conditionalization takes A as maximally certain. The presupposition of maximal certainty is seen as an inacceptable strengthening of evi- dence in the most contexts of epistemology. Therefore, plain conditionalization is too simple

69 II Ranking Functions and Rank-based Conditional Independence to be of practical use but the concept of conditionalization can be nicely shown by seeing plain conditionalization as a first approach. This gives a good motivation for discussing extended concepts of conditionalization.

2.3.2 Spohn-conditionalization

While rejecting maximal certainty of the evidence, a second aspect is also important. An adequate modelling of the evidence is required to not only consist of a proposition but of at least a proposition and its posterior rank. Thus, if there is some new information that leads the subject to believe A to the degree n, then this posterior rank n has to appear as an additional parameter in the conditionalization. A notion of conditionalization that meets both requirements was introduced in (Spohn, 1988b), labeled as “A, α-conditionalization”. A definition of it is also discussed in (Spohn, 2012, p. 83, definition 5.24).

Definition 2.40 (Simple Indirect Spohn-Conditionalization of $) Let $ be a pointwise ranking function for a set of possibilities W and let κ be an NRF for an algebra A over W and let κ be induced by $. Let A ∈ A be a proposition such that κA < ∞, κA < ∞, and n ∈ N∞. Then a function ∞ $A→n : W → N such that

   $ w | A if w ∈ A $A→n w := $w | A + n if w ∈ A is called the indirect A → n-conditionalization or Spohn-conditionalization of $.

The effect of $A→n is to decrease the ranks of the possibilities in A – which in fact decreases the disbelief in them and makes them more plausible. The disbelief in the possibilities in A is increased at the same time by n. Note that the special case $A→∞ is identical to pointwise plain conditionalization $w | A.

The pointwise Spohn-conditionalization $A→n is obviously a pointwise ranking function if $ is so.

The next step is to consider the NRF κA→n that is induced by $A→n.

    min $A→n w : w ∈ B if B 6= ∅ κA→n B := ∞ if B = ∅

 The conditionalization κA→n B of B represents the posterior rank value of the proposition B conditional on A given that the belief in A has changed to a posterior degree of n. This is  easy to see since Spohn-conditionalization leads for A and A obviously to κA→n A = 0 and  κA→n A = n. (Note that “A → n” is not to understand as “Proposition A gets posterior rank n” since n is the actual rank of A but as “Proposition A is believed to the posterior degree n”. Therefore this conditionalization is labeled “indirect”.) Spohn-conditionalization extends plain conditionalization insofar it refines the formal no- tion of “evidence”. While plain conditionalization takes just a single proposition A as the

70 2.3 Conditionalization and Revision of Ranking Functions new evidence, Spohn-conditionalization considers the evidence as an ordered pair hA, ni that represents an evidential proposition A that has changed its rank to a posterior value n. The ranks of all possibilities have to be updated subsequently to respect the posterior rank of the evidential proposition A. The posterior rank of each proposition B is now determined by  κA→n B . It is almost obvious that κA→n obeys the requirement of categorical matching, which means   that if κ is an NRF, so is κA→n. It is obvious that κA→n ∅ = ∞. It also holds that κA→n W = 0 since there is some w ∈ A such that $w = κA and therefore $w | A = $w − κA =    0. This w is also in W and thus the value $A→n w = 0 is minimal in $A→n w : w ∈ W . The proof of minimitivity works exactly as explained in the proof of part 2.1 of the minimitivity theorem on page 51. With the well-known argument of (Jeffrey, 1965/83, chapter 11) it is widely accepted not to identify the new evidence just with a single evidential proposition and not to take the evidence as maximally certain. The definition of κA→n already respects the latter argument by respecting the posterior rank but ignores the former by relying on a single evidential proposition. Jeffrey’s original model of conditionalization was based on probabilities. Spohn transferred this idea to ranking theory in (Spohn, 1988a). A change in the firmness of the belief in A implies usually a change in the firmness of the belief in A (except in case κA becomes 0 while κA is already 0). Therefore, it seems  reasonable to consider the set hA, n1i, hA, n2i as the new evidence. This is sound with definition 2.40 since Spohn-conditionalization reassigns new values to both, A and A, but the parameter n2 is always 0 in fulfillment of the acceptance theorem and is therefore not mentioned explicitly. Hence, the next step is an obvious generalization: since the evidential propositions A and A induce a partition of W it seems natural to consider an arbitrary partition EW of W as an evidence instead of the quite reduced set A, A .  Let EW := Ei ∈ A : i ∈ I be a partition of W with some index set I such that every ∞ proposition Ei ∈ EW changes its rank from a prior to a posterior value ni ∈ N . For formal correctness it is required that EW is the most finegrained partition that incorpo- 0 0 0 rates the change. This means that there is no partition EW such that EW ⊆ A and EW ≤ EW 0 0 0 (which means that EW is finer than EW ) and propositions Ei ∈ EW also change their ranks. We call EW an evidential partition of W.

Definition 2.41 (Generalized Indirect Spohn-Conditionalization of $) Let $ be a pointwise ranking function for a set of possibilities W and let κ be an NRF for an algebra A over W and let  κ be induced by $. Let EW := Ei ∈ A : i ∈ I be a partition of W. A change in the rank of each ∞  Ei ∈ EW to some posterior value ni ∈ N such that min ni : Ei ∈ EW = 0 and ni = 0 if Ei = W 0 and ni = ∞ if Ei = ∅, excluding any change of the posterior ranks in some finer partition EW ≤ EW E 0 ⊆ A W → N∞ ∈ W with W , is expressed by a function $Ei→ni : such that for every possibility w :

 = |  + $Ei→ni w : $ w Ei ni → and $Ei→ni is called the Ei ni-conditionalization or Spohn-conditionalization of $.

71 II Ranking Functions and Rank-based Conditional Independence

Definition 2.42 (Spohn-Conditionalization of κ) Let κ be an NRF for a propositional algebra A  over a set of possibilities W. Let EW := Ei ∈ A : i ∈ I be a partition of W. A change in the ∞  rank of each Ei ∈ EW to some posterior value ni ∈ N such that min ni : Ei ∈ EW = 0 and ni = 0 if Ei = W and ni = ∞ if Ei = ∅, excluding any change of the posterior ranks in some finer E 0 ≤ E E 0 ⊆ A κ A → N∞ partition W W with W , is expressed by a function Ei→ni : such that for every proposition A ∈ A: n o κ  = κ |  + ∈ E Ei→ni A : min A Ei ni : Ei W . κ → κ Then, Ei→ni is called the Ei ni-conditionalization or Spohn-conditionalization of .

Note that κ is defined without any reference to $, which means that non-natural NRFs are also covered by definition 2.42, where generalizations of the conditionalizations of $ will by definition only cover natural NRFs.

Further note that according to definition 2.42 “Ei → ni” is to be understood as “Proposition κ  = Ei gets posterior rank ni” since Ei→ni Ei ni. κ κ κ The function Ei→ni is a ranking function, if is a ranking function. If is regular, so is κ Ei→ni . The evidential partition EW is the set of all propositions, whose ranks are initially changed according to new information. This update on EW usually implies updates on other proposi- tions that are not contained in EW . For example some change on Ei may imply a change on Ei ∪ A while Ei ∪ A may not be an element of EW .  The entirety of the new evidence is expressed by the set E := hEi, nii : Ei ∈ EW that contains all propositions that are initially changed and all their posterior ranks ni. There is now a first sketch for modeling the transition of one epistemic state to another: if the epistemic state of a subject at a time t is determined by κ and between t and t0 the subject gets acquainted with the evidence E and no further evidence, then the epistemic state of the 0 κ subject at time t is determined by Ei→ni . This is a first fully developed account of belief change. But this is not the final result. A further generalization is obvious at this point. Instead of considering only those propositions as evidence that are initially changed we can consider the evidence as the set that contains precisely all propositions whose rank is changed by the new information, including those which are changed “non-initially” fulfilling the requirement of closure. This means to identify the evidence with the set of all propositions whose ranks require ac- tually or may require an update of their rank value. This is precisely the powerset E := 2EW ⊆

A of EW as the smallest set that contains every Ei in EW and is closed under complement and union. The partition EW is then the set of atoms of E. Since E is a subalgebra of A, we call E an evidential algebra.

The rank value for each Ei ∈ E is either given as evidential posterior rank ni or derived by conditionalization from the prior rank of Ei. Then the function λ that assigns to each proposition in E its posterior rank value is obviously a ranking function for E. The generalized evidence E := hE, λi is hence represented as an ordered pair of the evi- dential subalgebra E and an NRF λ for E.

72 2.3 Conditionalization and Revision of Ranking Functions

Definition 2.43 (Spohn-Revision of $) Let E ⊆ A be a non-empty evidential subalgebra of an al-  gebra of propositions A over a set of possibilities W with atoms Ei ∈ A : i ∈ I . Let λ be an NRF ∞ for E and $ a pointwise ranking function for W. Then a function $E→λ : W → N such that:

   $E→λ w := $ w | Ei + λ Ei where Ei is the unique atom such that w ∈ Ei is called the E→λ-revision or Spohn-revision of $. Definition 2.44 (Spohn-Revision of κ) Let E ⊆ A be a non-empty evidential subalgebra of an al-  gebra of propositions A with atoms Ei ∈ A : i ∈ I . Be λ an NRF for E and κ an NRF for A. Then ∞ a function κE→λ : A → N such that:

 n   o κE→λ A := min κ A | Ei + λ Ei : Ei ∈ EW is called the E→λ-revision or Spohn-revision of κ.

If κ is a ranking function, κE→λ is a ranking function. If κ and λ are regular, κE→λ is regular. From this basis, we can state a fully developed dynamics of epistemic states, modelled by ranking functions. Definition 2.45 (Transition Function for Epistemic States) If the prior epistemic state of the sub- ject s at the time t is characterized by the ranking function κ and if s receives evidence E := hE, λi between t and t0 and there is no other evidence available to s, then the posterior state of s at t0 is represented by κE→λ.

2.3.3 Shenoy-conditionalization  An observation about κE→λ stated for instance in (Shenoy, 1991a), is that the parameters λ Ei are not an inherent part of the evidential input but just the epistemic result of acquiring the new information. In an ordered pair hEi, nii no information about evidence itself is included, but only about the effect, the acquiring of the evidence will have. In this respect, Spohn-Revision seems to be modelled artificially because its representation of evidence is in a strict sense not a model of what becomes available with the evidence. It focusses on the result of the update, and not on the actual evidence that triggers the update. In applications for which a more “natural” modeling of the evidence parameter is important, an alternative can be interesting: a strategy to modify κE→λ can be to replace λ by a parameter that exclusively characterizes the evidence. Since such a parameter also implies access to the posterior ranks, they need not to be part of the evidential input. Definition 2.46 (Generalized Indirect Shenoy-Conditionalization of $) Let $ be a pointwise ranking function for a set of possibilities W and let κ be an NRF for an algebra A over W and let κ be  induced by $. Let EW := Ei ∈ A : i ∈ I be a partition of W. A change in the rank of each Ei ∈ EW ∞  by some value zi ∈ N such that min zi : Ei ∈ EW = 0, excluding any change of the posterior E 0 ≤ E E 0 ⊆ A A → N∞ ranks in some finer partition W W with W , is expressed by a function $Ei↑ni : such that for every proposition A ∈ A n o  =  + − = + κ  ∈ E $Ei↑ni w : $ w zi m with m : min zi Ei : Ei W

73 II Ranking Functions and Rank-based Conditional Independence

↑ and $Ei↑ni is called the Ei ni-conditionalization or Shenoy-conditionalization of $.

Definition 2.47 (Shenoy-Conditionalization of κ) Let κ be an NRF for a propositional algebra A  over a set of possibilities W. Be EW := Ei ∈ A : i ∈ I a partition of W. A change in the rank ∞  of each Ei ∈ EW by some value zi ∈ N such that min zi : Ei ∈ EW = 0, excluding any change 0 0 of the posterior values in some finer partition EW ≤ EW with EW ⊆ A, is expressed by a function κ A → N∞ ∈ A Ei↑ni : such that for every proposition A n o n o κ  = κ ∩  + − ∈ E = + κ  ∈ E Ei↑ni A : min A Ei zi m : Ei W with m : min zi Ei : Ei W

κ ↑ κ and Ei↑ni is called the Ei ni-conditionalization or Shenoy-conditionalization of .

Note that the effect of Shenoy-conditionalization is that whatever the prior ranks of A and A have been, the improvement in the firmness of the belief in A as opposed to the firmness of κ  − κ  = the belief in A is exactly zi. Since it holds that Ei↑ni Ei Ei zi, the value zi represents the difference between the prior rank and the posterior rank of Ei.

While in Ei → ni-conditionalization, the factor ni represents the posterior rank, in Ei ↑ ni- conditionalization the factor zi exclusively characterizes the evidence. The result is normalized by a factor m to ensure that the minimum is 0. κ In literature (cf. (Huber, 2007) and (Spohn, 2012, p. 83-86)) Ei→ni is therefore characterized κ with the term “result-oriented” while Ei↑ni is called “evidence-oriented”. κ κ Note that both, Ei→ni as well as Ei↑ni represent the update information necessary to com- pute the posterior ranks. Both conditionalization operations are therefore interdefinable. The decision to prefer one of them over the other for a particular application does not make a fundamental difference. It only emphasizes whether result or evidence is more interesting to be modeled in the concrete case. More information about the two types of conditionalization and as well about Spohn- revision and Shenoy-revision can be found in (Huber, 2007) as well as in (Spohn, 2012, chapter 5, section 5.4).

2.4 Rank-based Conditional Independence

One of the most crucial notions in the further inquiry is the concept of conditional indepen- dence. Informally, this property means that it is ensured that two beliefs do not take causal influence on each other in respect of their subjective certainty degree while some additional knowledge is present. Consequently, one can say that those entities are independent condi- tional on the additional knowledge. Without this knowledge they would have to be considered as interrelated concerning their firmness degrees and a change in the degree of the one may lead to a change in the degree of the other. Conditional independence is the central element to construct reasoning networks because it functions as a device for reducing computational complexity. Knowledge of independence re- lationships between beliefs allows to reduce the set of terms to be computed when performing epistemic updates. It will soon be shown how exactly this advantage can be achieved.

74 2.4 Rank-based Conditional Independence

First, we will define conditional independence among propositions, then extend the defini- tions to variables in chapter III.

Definition 2.48 (Conditional Independence Among Propositions in κ) Let κ be an NRF for A. Let A, B, C ⊆ A and κC < ∞. Then, A is called independent of B given C w.r.t. κ if and only if κA ∩ B | C + κA ∩ B | C = κA ∩ B | C + κA ∩ B | C.(2.16)

This relationship is denoted by A ⊥κ B | C.

Definition 2.48 seems to differ obviously from the notion of conditional independence in probability theory. Remembering that conditional independence in the probabilistic sense is given as follows: PA ∩ B | C = PA | C + PB | C.(2.17)

Having (2.17) in mind, (2.16) seems to be a quite unintuitive notion. Remember that having an actual value PA implies the knowledge of the actual value for PA = 1 − PA. Therefore, (2.17) is not required to state any assertions about PA. This is different for the case of ranks since knowing the actual value for κA does not necessarily provide any information about the actual value of κA: the case κA > 0 allows us to conclude that κA = 0 but κA = 0 does not contain any information about κA. This fact requires to always treat the case of negation in the definition of concepts based on ranks and (2.16) respects this fact. A slightly different notion of conditional independence for ranks was suggested in (Studený, 1995, p. 46), where a “weak” and a “strong” variant is declared. The defining term in both cases is the one derived in the following lemma.   Lemma 2.49 Let κ be an NRF for A. Let A, B, C ⊆ A and κ C < ∞. Then, if A ⊥κ B | C it also holds that κA ∩ B ∩ C + κC = κA ∩ B + κB ∩ C.(2.18)

Proof: Let A ⊥κ B | C be true. By resolving the four conditional ranks in (2.16) by their definitions, we derive:

κA ∩ B ∩ C + κA ∩ B ∩ C = κA ∩ B ∩ C + κA ∩ B ∩ C.(2.19)

We define22:

n    o Mg := min κ A ∩ B ∩ C , κ A ∩ B ∩ C , κ A ∩ B ∩ C , κ A ∩ B ∩ C , n  o M1 := min κ A ∩ B ∩ C , κ A ∩ B ∩ C , and n  o M2 := min κ A ∩ B ∩ C , κ A ∩ B ∩ C .

   Note that Mg = κ C , M1 = κ A ∩ C and M2 = κ B ∩ C . We can hence rewrite equation

22I am thankful to Wolfgang Spohn suggesting this proof strategy to me in conversation.

75 II Ranking Functions and Rank-based Conditional Independence

(2.18) to the following form:

 κ A ∩ B ∩ C + Mg = M1 + M2.(2.20)

We can now distinguish four cases.

 Case 1: Mg = κ A ∩ B ∩ C   It follows immediately that M1 = κ A ∩ B ∩ C and M2 = κ A ∩ B ∩ C . Substituting Mg, M1 and M2 in equation (2.20) by these actual terms leads immediately to a trivial truth.

 Case 2: Mg = κ A ∩ B ∩ C  It follows from (2.19) and the actual minimum Mg that the term κ A ∩ B ∩ C is maximal for the four terms in the set κA ∩ B ∩ C, κA ∩ B ∩ C, κA ∩ B ∩ C, κA ∩ B ∩ C . (Know- ing the actual minimum, inequality κA ∩ B ∩ C < κA ∩ B ∩ C as well as κA ∩ B ∩ C < κA ∩ B ∩ C is contradictory with (2.19).)   This insight determines M1 = κ A ∩ B ∩ C and M2 = κ A ∩ B ∩ C . Equation (2.20) there- fore evaluates directly to (2.19).

 Case 3: Mg = κ A ∩ B ∩ C  It follows that M2 = κ A ∩ B ∩ C . From (2.19) and the knowledge of the actual minimum, it follows by the same argumentation as in case 2 that κA ∩ B ∩ C is maximal for the four  terms und thus M1 = κ A ∩ B ∩ C . Substituting Mg, M1 and M2 in equation (2.20) by these actual terms leads immediately to a trivial truth.

 Case 4: Mg = κ A ∩ B ∩ C  It follows that M1 = κ A ∩ B ∩ C . From (2.19) and the knowledge of the actual minimum, it follows by the same argumentation as in case 2 that κA ∩ B ∩ C is maximal for the four  terms und thus M2 = κ A ∩ B ∩ C . Substituting Mg, M1 and M2 in equation (2.20) by these actual terms leads immediately to a trivial truth. 

We will now derive two implications of conditional independence that are more intuitive since they resemble the corresponding probabilistic terms.

Lemma 2.50

κA ∩ B ∩ C + κC = κA ∩ B + κB ∩ C ⇐⇒ κA | B ∩ C = κA | C

Proof: Completely straightforward (for both directions):

κA ∩ B ∩ C + κC = κA ∩ C + κB ∩ C κA | B ∩ C + κB ∩ C + κC = κA ∩ C + κB ∩ C κA ∩ B ∩ C = κA ∩ C − κC + κB ∩ C − κB ∩ C   κ A ∩ B ∩ C = κ A | C 

76 2.4 Rank-based Conditional Independence

Lemma 2.51

κA ∩ B ∩ C + κC = κA ∩ B + κB ∩ C ⇐⇒ κA ∩ B | C = κA | C + κB | C

Proof: Also completely straightforward for both directions:

κA ∩ B ∩ C + κC = κA ∩ C + κB ∩ C κA ∩ B ∩ C = κA ∩ C + κB ∩ C − κC κA ∩ B ∩ C = κA ∩ C + κB | C κA ∩ B ∩ C = κA | C + κC + κB | C κA ∩ B ∩ C − κC = κA | C + κB | C    κ A ∩ B | C = κ A | C + κ B | C 

Note that lemma 2.51 introduces the rank-based equivalent to the well-known probabilistic definition of conditional independence as it was stated in (2.17). The relationship of conditional independence among propositions can also be stated in terms of TRFs.

Theorem 2.52 (Conditional Independence Among Propositions in τ) Let κ be an NRF for A and τ the TRF induced by κ. Let A, B, C ⊆ A and κC < ∞. Then it holds:

  τ B | A ∩ C = τ B | A ∩ C iff A ⊥κ B | C (2.21)  or τ A | C = ±∞ iff A ⊥κ B | C.(2.22)

Proof: Proof of (2.21):

κA ∩ B | C + κA ∩ B | C = κA ∩ B | C + κA ∩ B | C κA ∩ B ∩ C + κA ∩ B ∩ C = κA ∩ B ∩ C + κA ∩ B ∩ C κB | A ∩ C − κB | A ∩ C = κB | A ∩ C − κB | A ∩ C τB | A ∩ C = τB | A ∩ C

Note that this proof works in both directions. 

If the independence between two propositions is based on a trivially true condition, we speak of unconditional independence.

Definition 2.53 (Unconditional Independence Among Propositions) If it holds that A ⊥κ B | W, A is called (unconditionally) independent of B.

Spohn analyzes some properties of conditional independence among propositions that might be sensible to repeat here (cf. (Spohn, 2012, p. 127, Theorem 7.2)).

Theorem 2.54 (Properties of Conditional Independence Among Propositions) Let κ be a ne- gative ranking function for A. Be A, B, C, D ⊆ A.

77 II Ranking Functions and Rank-based Conditional Independence

   1. If κ C < ∞, then A ⊥κ B | C for all B ∈ A iff κ A | C = ∞ or κ A | C = ∞. A forteriori, W ⊥κ B | C and ∅ ⊥κ B | C for all B ∈ A.    2. If κ C < ∞ and A ∩ C ⊆ B ∩ C, then A ⊥κ B | C iff κ A = ∞ or κ B = ∞. A forteriori,   if κ A | C , κ A | C < ∞, then not A ⊥κ A | C.

3. If A ⊥κ B | C, then A ⊥κ B | C, A ⊥κ B | C, and A ⊥κ B | C.

S 0 4. Let ΠW be a partition of W. If A ⊥κ B | C for all B ∈ ΠW , then A ⊥κ ΠW | C for all 0 ΠW ⊆ ΠW .

5. If A ⊥κ B | D and A ⊥κ C | B ∩ D, then A ⊥κ B | C ∩ D.

6. If A ⊥κ B | C, then B ⊥κ A | C.

Proof: Theorems 2.54.1-3 and 6 follow directly from definition 2.48.

Proof for 2.54.4)   Let B := B1,..., Bn with i = 1, . . . , n be a propositional algebra. Let κ A ∩ Bi | C =:  ai and κ A ∩ Bi | C =: bi. The premise then says that for all i = 1, . . . , n it holds that    ai + min bj : j 6= i = bi + min aj : j 6= i . If ai = ∞ for all i, then κ A | C = ∞ and the statement follows trivially from 2.54.1. The same holds for the case if all bi = ∞. Therefore, we   may w.l.o.g. assume that a1 = min ai : i = 1, . . . , n < ∞. Thus a1 − min ai : j 6= 1 = b1 −   min bj : j 6= 1 ≤ 0 and so b1 = min bi : i 6= 1, . . . , n . Our premise thus says that for all   i 6= 1 it holds that ai − bi = min aj : j 6= i − min bj : j 6= i = a1 − b1 or ai = bi = ∞. This     entails for any R ⊆ 1, . . . , n that min ai : i ∈ R − min bi : i ∈ R = min ai : i 6∈ R −  min bi : i 6∈ R or that both terms on the left or on the right side are ∞. Thus, the latter says S A ⊥κ B | C for B = Bi : i ∈ R , either by definition or by 2.54.1, as was to be shown. Proof for 2.54.5) Note first that by theorem 2.52 the first premise A ⊥κ B | D resolves to the equality statement   τ B | A ∩ D = τ B | A ∩ D , the second premise A ⊥κ C | B ∩ D resolves to the term τ C |   A ∩ B ∩ D = τ C | A ∩ B ∩ D and the intended conclusion A ⊥κ B | C ∩ D is equivalent to the term τB | A ∩ C ∩ D = τB | A ∩ C ∩ D. Let for convenience a := κA ∩ B ∩ C ∩ D, b := κA ∩ B ∩ C ∩ D, c := κA ∩ B ∩ C ∩ D, d := κA ∩ B ∩ C ∩ D, e := κA ∩ B ∩ D and f := κA ∩ B ∩ D. Then the first premise A ⊥κ B | D can be expressed as

e − mina, b = f − minc, d,(2.23)

the second premise A ⊥κ C | B ∩ D as

b − a = d − c (2.24)

and the intended conclusion A ⊥κ B | C ∩ D as

minb, e − a = mind, f  − c.(2.25)

78 2.4 Rank-based Conditional Independence

It therefore remains to show that

if (2.23) and (2.24) then (2.25) and three cases can be distinguished: 1. b − a = d − c ≥ 0: In this case, (2.23) reduces to e − a = f − c and then the implication obviously holds.

2. b − a = d − c = 0: In this case, (2.23) reduces to e − a = f − d. Since 0 = d − c, this entails e − a = f − c and and thus the case reduces to the first case.

3.0 ≥ b − e = c − d: In this case, (2.23) reduces to e − b = f − d. Adding this to (2.24) yields e − a = f − c therefore the case reduces to the first case.

Having shown this, the proof is completed.  In (Spohn, 2012, p. 128f) when discussing his theorem 7.4, Spohn compares the proper- ties of rank-based conditional independence on propositions to the properties of probability- based conditional independence. The rank-based theorem turns out to be weaker than its probability-based counterpart. (As an example, Spohn points this out for requirement 4, which is weaker than its probabilistic counterpart.) Spohn states that the reason for this weak- ness is mainly theorem 2.27 in conjunction with the rank-based possibility that τB | A = τB 6= τB | A, which is not a probabilistic possibility. More details are discussed in (Spohn, 2012, section 7.1), from which the two following points to note are reproduced23:

1. If A is independent of B and of C, it may or may not be independent of A ∩ B and of A ∪ B.

2. If A and B are independent given C and given D, they may or may not be independent given C ∩ D and given C ∪ D.

This insight will be sufficient for the purposes of this text. We will therefore not engage deeper into the analysis of the commonalities and differences of probabilistic and rank-based conditional independence. It was the intention of the author to introduce the technical concepts for modeling epistemic states. This was done by presenting a survey about the foundation of ranking theory, where many references to the works of other authors were pointed out for more detailed discussion on specific aspects. With ranking functions and the concept of rank-based conditional inde- pendence among propositions, we have developed an account of modeling epistemic states that is computationally sufficient to implement epistemic updating. Probability theory implements updating by respecting conditional independence to reduce the number of terms to be computed. This basic principle leads to the graphical model of probability functions that is known as Bayesian network. The following chapter uses the same strategy to develop graphical models of ranking func- tions in terms of undirected graphs as well as directed acyclic graphs. Those graphical models will form the data structure on which the update algorithm is eventually performed.

23See (Spohn, 2012, p. 128).

79 80 III

Graphical Models for Ranking Functions

3.1 Introduction

3.1.1 Content of this Chapter

From a normative point of view, the main task in the epistemological topic of belief revision is to identify rules for revising existing beliefs. Some of the conditions for those transition rules were discussed in chapterI. Remember for instance that the rules, if applied to a particular prior belief state and a particular evidence, always induce the same posterior belief state. ChapterII introduced ranking theory as a fully developed framework for belief revision that satisfies those rules by conditionalization. The next step will be to develop a data structure for representing belief states in a way that facilitates their efficient revision. This is the main topic of this chapter. Updates on ranking functions can obviously not be computed by simply calculating the actual values of all conditional ranks induced by the entire set of observed entities. This is practically impossible in real-world applications since the number of terms increases too much in dependence of the number of observed entities and their possible states. Indeed, already for a small set of possible observations, the complete set of possible compound realizations is intractable. Hence, the precondition for an algorithmic solution of the update is to signifi- cantly reduce the number of terms that actually have to be computed. As already known from Bayesian networks, this reduction is achieved by respecting relationships of conditional inde- pendence between the observed entities. This strategy allows to compute only those terms that are known to have in fact an influence on the result while all other terms are ignored. In short, this is the approach of Bayesian networks and also the reason for their relative success in multiple disciplines24.

24In fact it would be interesting to discuss those types of problems in which Bayesian networks provide the best solution and compare them to the types for which the solutions of Bayesian networks are overruled by the solutions of other algorithmic approaches. Perhaps this would make other approaches provided by the field of machine learning more interesting for epistemology. To state just a simple example: Bayesian networks in fact can be

81 III Graphical Models for Ranking Functions

In this chapter, we will analyze the foundation of a graphical model for NRFs. The con- crete aim is to introduce a representation of NRFs as graphical data structures called ranking networks. Ranking networks are the rank-based concept analogous to Bayesian networks in probability theory and their virtues will support the algorithmic task of updating belief states. Ranking networks enable us to reduce the number of conditional ranks to only the relevant terms so that in fact all terms that do not effectively contribute to the result can be ignored for computation. While introducing ranking networks, we will also adequately explicate the context of the general graph-theoretic considerations of expressing independence by way of graphical mod- els. In this field we can draw benefit of similar works already produced for the same purpose in probability theory. We begin our inquiry about ranking networks with some historical remarks about condi- tional independence and graphoids in section 3.1.2. We formally introduce the notion of a variable in section 3.1.3, which will be the basic concept to build a graphical model for NRFs on it. Section 3.1.4 introduces algebras over sets of variables. Section 3.1.5 presents all notions of graph-theory that are relevant for the argumentation. Section 3.2 shows how conditional independence can be generalized to apply on variables instead on propositions. This section also introduces the relevant notions of the theory of graphoids. Since conditional indepen- dence on variables in ranking theory is known to be a graphoid, it inherits the practically relevant properties of this structure. In sections 3.3 and 3.4 we analyze the relationship between NRFs and their Markov graphs. Section 3.5 prepares the transition to DAGs by analyzing the conditions under which NRFs can be represented by perfect maps. Section 3.6 makes the progression to ranking networks. Section 3.7 makes a digression to discuss possible characterizations of precisely those NRFs that can be modelled perfectly by a graphical model.

3.1.2 Historical Remarks

Before descending into details, the author will develop some historical context to stress that the work introduced in this chapter is strongly connected to the research concerning graphical models in general and conceptions of conditional independence in particular. Independently of each other, Dawid and Spohn analyzed the abstract properties of condi- tional independence in general, Dawid in his (1979) and Spohn in (1980)25. Both texts focus on probability theory as the concrete example for conditional independence. While Dawid’s con- sideration originated from a point of view based on statistics, Spohn’s analysis was interested in the epistemological and causal implications. Those properties are the basis for the definition of graphoids introduced by (Pearl & Paz, 1985). During this state of the discussion, the probabilistic approach was the only concrete example of a well-analyzed model of conditional independence that was considered. The

used for solving classification problems but support vector machines are considered by far superior in this field of application. What can we learn from this for epistemology? Are there epistemological problems that could consistently be described as classification problems? Which conceptual results would originate from trying to describe a framework that makes support vector machines applicable for belief revision? Is there anything to learn for epistemology from the success of support vector machines in data mining? 25Also partly earlier in (Spohn, 1978).

82 3.1 Introduction theory of graphoids was also the basis for considering the relationship between probability distributions and their graphical representation. A graphoid is an abstract model of relevance that can be interpreted as a minimal con- sensus about commonalities of most concepts of relevance developed in artificial intelligence, statistics, and also epistemology. Informally, the insight of the minimal consensus about relevance reveals a quite simple property: when we learn something irrelevant, this learning does not affect the relevance relationships between the beliefs we already acquired – what is already relevant will remain relevant and what is irrelevant will remain irrelevant. Thus, relevance is defined indirectly by considering irrelevance. Furthermore, Graphoids are the key to efficient representation and manipulation of rele- vance relationships between epistemic units. As Pearl proves in detail,

“the theory of graphoids shows that a belief network can constitute a sound and complete inference mechanism relative to probabilistic dependencies, i.e., it iden- tifies, in polynomial time, every conditional independence relationship that log- ically follows from those used in the construction of the network. (. . . ) The essential requirement for soundness and completeness is that the network be con- structed causally, i.e. that we identify the most relevant predecessors of each vari- able recursively, in some total order, say temporal. (. . . ) It is this soundness and completeness that gives such a central role in this book, and perhaps in knowledge organization in general.” (Pearl, 1988b, p. 14)

Along the line of discussion about graphoids, the close relationship of conditional indepen- dence and concepts of graphical separation came into focus. The locus classicus concerning graphical models of probability distributions is surely (Pearl, 1988b) where Pearl discusses general observations on how graph structures are related to an abstract model of indepen- dence. This work presented a vast reflection on graphical models for probability semantics, introduced the terminology used in the subsequent discussion, and first presented the now well-known concept of “Bayesian networks”. Pearl also describes the historical development of the research about graphical models and graphoids in particular in (Pearl, 1988b, p. 131ff). A general but detailed discussion on formal graphical models, which can be utilized for dif- ferent epistemic requirements, can for instance be found in (Lauritzen, 1996), among many others. Although, as Pearl says in (1988b, p. 14), “the precise relationships between causality as a representation of irrelevancies and causality as a commitment to a particular inference strat- egy” are in the focus of a vast philosophical discussion for a long time, it is well accepted that computational models of relevance are a reasonably good formal implementation of our intuition of relevance in general. Thus, the theory of graphoids is a topological point on the scientific map where at least four different scientific perspectives on conditional independence touch each other: formal epistemology, statistics, computer science with its fuzzy bounded subdisciplines of AI and machine learning, and, fourth, utility theory, especially preference elicitation, which repre-

83 III Graphical Models for Ranking Functions sents a perspective from economy on the subject. This touching is a historical as well as a thematic phenomenon. For this touching, two aspects have been of special importance since they make graphoids relevant for all four disciplines:

1. Graphoids represent a common abstract view on conditional independence. This ab- straction was not only fruitful for probabilistic independence but also for other models of conditional independence since what was already known for graphoids could be shown for each notion of conditional independence that satisfies the graphoid proper- ties.

2. It was shown how notions of separation – mainly d-separation – in graphs implement conditional independence. Although, separation and conditional independence in gen- eral do not share the same properties, the former can be used to represent or implement the latter in a way that is sufficient for most requirements of practical reasoning.

Considering the first aspect, all four disciplines drew profit from an abstract view on condi- tional independence. In statistics it was fruitful for probabilistic methods while in computer science, it helped to develop the Bayesian approach of reasoning (which is of course inher- ently interdisciplinary). In general, at this point the theory of conditional independence made a step towards emancipation from probability theory since graphoids provided a blueprint for all disciplines working with own notions of conditional independence. There is first the philosophical interest of formal epistemology to find a sound and complete formal theory of belief revision, which is on the one hand near to intuition and on the other hand based on a clear and distinct concept of belief. While probability calculus is of great strength in meeting the former requirement, it fails in the latter. But despite this fact, proba- bility theory developed a manifold of tools for practical reasoning under uncertainty. One of the most prominent is surely Bayesian networks. Thus, while the mechanisms of reasoning are quite well modeled by the probabilistic approach, it almost completely fails in defining belief, which, from the epistemologist’s point of view, renders the probabilistic tools unconnected to the crucial notion of interest. The amount of philosophical publications on probability and causality is overwhelming, and it would be a topic for a thesis on its own to refer the history of the relationship between these two notions or the research status concerning their connections. Secondly, there is the stochastic and statistic view on quantitative reasoning, as it was rep- resented for instance by the works of Lauritzen and Wermuth. The statistic perspective was of course interested in deriving new knowledge from knowledge already present in the form of concretely instantiated variables. This perspective is represented for example by (Dawid, 1979), (Wermuth & Lauritzen, 1983) and (Walley, 1991). An important analysis of the proper- ties of graphical independence models is given by (Lauritzen et al., 1990), a compound view on the topic is provided by (Lauritzen, 1996). The third perspective is defined roughly by computer science, and within it, it is defined by artificial intelligence and machine learning in particular, whose protagonists where in- terested in formal models of quantifiable rules of reasoning that are implementable in the available computer hardware. This perspective is most prominently represented by the works

84 3.1 Introduction of Pearl cited above, furthermore in Pearl’s joint works with Azaria Paz, Daniel Geiger and Thomas Verma as cited in the bibliography. Another generally important site for this inquiry is (Neapolitan, 1990). After graphoids instantiated an “abstract” theory of conditional independence, research conducted in other subjects was also oriented on graphoids. A fourth perspective came from decision theory, especially from the context of utility theory and preference elicitation in economy where methods of optimal formal decision-making on the basis of quantifiable utility measures were raised. On the arithmetic level, additive utility theory has a strong “family resemblance” with ranking theory: both disciplines model con- ditional independence on an additive computational model. Most interesting is that already (Fishburn, 1967) described a model of conditional independence based on additive arithmetics. This model is known as “general additive independence” – or GAI for short – from which an independence model for conditional additive independence (“CAI”) can be derived. The an- alysis of conditional independence described in this chapter for ranking theory shares many thoughts with CAI. After some decades of lower activity in this field, the attention of many researchers turned to the analysis of GAI-networks. Important for utility theory is for instance (Bacchus & Grove, 1995), proving that CAI is a graph-isomorph (and thus a graphoid)26. The more recent (Braf- man & Engel, 2009) and (Brafman & Engel, 2010) used these insights to develop an additive utility theory, which has many formal similarities with ranking theory. Especially the latter publication develops many concepts very similar to ranking theory, for example the additive version of Bayes’ theorem as it is presented by theorem 2.23 and the additive chain rule (cf. corollary 2.21). While ranking theory and parts of utility theory share their relationship to the theory of graphoids, there are also differences. In contradiction to ranking theory, utility theory is not interested in minimitivity, to state only a rather obvious difference. Since utility theory measures preference instead of disbelief, the conditions for maximization are naturally more interesting for utility theory. Another important difference is that the specific instance of CAI used in utility theory is a graph-isomorph; this is, however, not true for rank-based conditional independence. The reason is that CAI is originally only defined for triplets of variables that form a partition of the entire underlying set of observed variables. The definition of CAI can be trivially extended to the other cases by defining the constraint that each triplet not forming a partition must have a valid extension to a partition while preserving its original statement. Relying on this constraint, (Bacchus & Grove, 1995) could show that CAI fulfills the properties of strong union and strong transitivity, that is two axioms of graph-isomorphism that are not satisfied by rank-based conditional independence27. Beneath all those and other differences, utility theory (as well as the three other disciplines), is interested in normative rules for correctly and efficiently revising existing belief. And all the disciplines mentioned above recognized graphoids as an important clarification for graphical modeling of independence. Not only are computational models for reasoning recognized as an interdisciplinary subject, but also different scientific perspectives are committed to some

26Since CAI is a graph-isomorph, any utility function has a CAI-based network as a perfect map. 27For a discussion of this aspect, see page 118.

85 III Graphical Models for Ranking Functions basic concepts developed for this task. However, also the second of the two aspects of the touching of disciplines as mentioned on page 84 was of crucial importance: the graphical implementation of conditional independence as separation. The close relationship between conditional independence and separation revealed the facil- ity of graphical models to implement the concept of independence of the underlying reason- ing semantics (in case that is a graphoid). In other words, the graphical implementation of independence is separation. Separation can be expressed in directed as well as in undirected graphs. Especially for directed graphs, the concept of d-separation came into focus. Pearl introduced d-separation in (Pearl, 1986a) and (Pearl, 1985) without proof. The general validity of d- separation was first shown by Verma and (among other loci) published in (Verma & Pearl, 1988), which constitutes together with (Verma & Pearl, 1990) a full formal treatment of d- separation. A carefully motivated and illustrated discussion can also be found in (Jensen & Nielsen, 2001/2007). The insight that d-separation can implement independence opened the possibility to use graphical models in general for epistemic updating since they provide us with the option to leave out relationships between epistemic units that are judged irrelevant from computation. The benefit is a reduction in magnitudes of the number of terms that have to be computed to complete the update. This was a general motivation for Bayesian networks. Thus, whether a reasoning calculus can be implemented in a graphical model as for exam- ple, probability calculus can be implemented by Bayesian networks, depends on whether the calculus has an adequate notion of conditional independence. Pearl used probability calculus as semantics for his consideration on graphical models of relevance, but the theory of graphoids is completely separated and also independent from probability theory. While the theory of graphoids is an abstract theory about formal require- ments of relevance, probability calculus can be seen as one particular and concrete implemen- tation of a graphoid; in fact it can be judged as the best analyzed implementation so far. The details of building a graphical model for probability calculus are very insightfully discussed in (Neapolitan, 1990). It is therefore interesting to see how the benefits of graphoid theory can be applied to ranking theory since the fruitful work on the algorithmic parts of graphoid theory then also becomes feasible for ranking networks. Ranking theory defines appropriate notions of conditionality – for example by definitions 2.19, 2.26 and 2.34 – and also of conditional independence (by definition 2.48 and theorem 2.52). Spohn has already shown in (Spohn, 1988b) that the concept of conditional indepen- dence in ranking theory as defined in chapterII has the properties of a graphoid. Another proof was introduced by (Hunter, 1991a), and Spohn discusses the graphoid properties of rank-based conditional independence in his recent (Spohn, 2012, p. 127-133). Important prop- erties without particular relation to graphical models of NRFs are explored in (Goldszmidt & Pearl, 1992a), in (Studený, 1995), and in the works of Wolfgang Spohn cited in the bibliography. Analyzing the facilities of graphical models for NRFs will be our main task in this chapter.

86 3.1 Introduction

3.1.3 Measurability and Variables

To combine NRFs with data structures allowing us to model epistemic reasoning, we need the notion of a variable. This notion directly corresponds to random variables in probability theory; we do not refer to variables in the sense the notion is used in logic.

Definition 3.1 (Measurability) Let W be a space of possibilities and a set of possibilities W’ some value space for W. Let A be the propositional algebra over W and A0 6= A the algebra over W 0. Let w be some element of W and let f : W → W 0 be a function. Now, let f [A] be a symbol for the set  f w : w ∈ A with A ∈ A and let correspondingly be f −1[A0] a symbol for the set w : f w ∈ A0 with A0 ∈ A0. Then, function f is called A-A’-measurable iff for each A0 ∈ A0 it holds that f −1[A0] ∈ A.

Informally, measurability means that for every B ∈ A0, and, hence, for every possible value a measurable function X can take there exists a non-empty preimage A ∈ A. The consequence of measurability is that an expression like “X takes a value in B” is always a proposition in A if B ∈ A. The same holds a forteriori for expressions like “Y takes value y”. Measurability is a crucial requirement since we want to talk about situations as sets of variables taking certain values while ensuring that in fact we use propositions.

Definition 3.2 (A-measurability of κ) Let κ be an NRF on propositions for an algebra of proposi- tions A over a set of possibilities W. If it holds for each n ∈ N∞ that κ−1n ∈ A, function κ is said to be measurable in A.

For formal correctness of all what follows, it is always silently presupposed that we consider measurable NRFs exclusively.

Definition 3.3 (Variable) Let W, V be sets of possibilities. Then, a measurable function X: W → V is called a variable.

Variables are denoted by roman upper case letters from the end of the alphabet, occasionally indexed with subscripts. We use the symbol cdX to denote the codomain of variable X. Hence it holds that Xw ∈ cdX for all w in the domain of variable X. We say that a variable takes an actual value and the actual values of variables are obviously propositions. As an example consider the proposition X−1[B] = w :Xw ∈ B , which expresses that the variable X takes a value in B. This means that the actual possibility that is assigned as a value to X makes B true. From this point of view, a proposition is also a kind of variable. Probabilists call this an indicator variable. We will often face the requirement to represent the proposition that a variable takes a par- ticular actual value, e.g. X = x, which means that variable X takes the actual value x ∈ W 0. The proposition that X = x in question is represented by the preimage of the variable X:

X−1x = w :Xw = x (3.1)

87 III Graphical Models for Ranking Functions

In analogy to probability theory, a proposition of the form Xi = xi is also called a hypothesis. A variable is said to be instantiated if its actual value is known.

Example 3.4 Remember the example of the lottery (example 1.1 from page 29). An example for a variable can be a function X that represents the value of the third number in the selected sequence of 6 numbers. If we consider 2, 32, 48, 21, 17, 41 as the current result of the selection, we find that the third number is 48, hence, we find that X takes value 48. The proposition corresponding to this situation is w :Xw = 48 , which we abbreviate by X−148 or, alternatively, by X = 48. Now X is instantiated and the hypothesis that X = 48 turns out to be true. 

Note that in this example, W and W 0 are identical and, hence, A and A0 are identical. The reason for this is that the function X denotes a bare selection function on the given space W where W consists of natural numbers bigger than 0 and smaller than 50. Although the example is quite illustrative it is in fact not legal since definition 3.1 excludes that A = A0. We will therefore use a better example.

Example 3.5 An example of a variable where W and W 0 differ, is the arithmetic mean value of all selected numbers ai. Let n ∑ ai = X := i 0 . n For the case of the sequence h2, 32, 48, 21, 17, 41i with n = 6, X will take 26.83... as actual value. Now W is the finite set of sequences with 6 elements of numbers bigger than 0 and + smaller than 50. The codomain WX of X is a partition of R with the smallest element 3.5 and the biggest element 46.5. The function X points from W to W . We can further state that   X since X h2, 32, 48, 21, 17, 41i = 26.83... we also have h2, 32, 48, 21, 17, 41i ∈ X−126.83... and −  therefore, the proposition X 1 26.83... is true. 

 A set of variables X := X1,X2,...,Xn can be considered as a single variable that takes compound values. We will use the term “compound” or “compound variable” synonymously with “set of variables”. Compounds will be denoted by the same letters as non-compound variables but set in boldface. Non-compound variables will also be called “singletons” or “singleton variables”. If for a singleton variable X a possibility x is an actual value, then for a compound variable  X := X1,X2,...,Xn a concrete value x is a vector hx1, x2,..., xni. The symbol x therefore is     short term for the proposition w : X1 w = x1 ∧ X2 w = x2 ∧ ... ∧ Xn w = xn . To denote actual values of compound variables, we use lower case letters just like we do for singleton variables, so the symbols xi, xj,..., xn denote the propositions X = xi, X = xj and so forth. Note that measurability ensures that for a variable X and a non-empty proposition A0 ∈ A0 there is always a non-empty preimage X−1[A0] ∈ A, which means the set w :Xw = x is non-empty. If it were allowed to be empty, this would entail that there are some w0 ∈ W 0 that have no preimage in W, which is excluded by definition 3.1. Hence, a proposition like X = x

88 3.1 Introduction or X = x will never be contradictory and therefore will never be assigned the rank ∞. This is a crucial precondition for computation. Note that a term like “κX” or “κV” has no meaning since NRFs are defined on proposi- tions but not on variables. The correct way to denote the intention behind the above would be to use the symbols κX−1x and κV−1v, or, in an abbreviated form κx and κv. The latter is allowed since we can interpret the actual values x and v without loss of generality as any actual value in the codomain of the variables in question.

Example 3.6 Staying with the lottery example, consider the sequence of the 6 selected num-  bers as a compound variable X := X1,X2,X3,X4,X5,X6 containing all variables that denote the result of the specific selection of a single number. Variable X1 denotes the result of the first number selected, variable X2 the result of the second selection and so on. Let then be the tuple x := h2, 32, 48, 21, 17, 41i the actual value for the compound variable X. This denotes the hypothesis X1 =2 ∩ X2 =32 ∩ X3 =48 ∩ X4 =21 ∩ X5 =17 ∩ X6 =41. The tuples representing possible selections are the atoms of the underlying propositional algebra A over W. Again, −  proposition κ X 1 x is true in accordance to the definition of X and the actual value x. 

Note that since compounds can be understood as sets of variables, it may be convenient to consider subsets of variable sets. We will therefore introduce a set of variables, say Y, as a subset of another compound: Y ⊂ X. Note that the relationship between the variables entails a relationship of their possible actual values: if Y ⊂ X then x ⊂ y as can be understood from the following example.

Example 3.7 Although, in the lottery a sequence of 6 numbers is selected as an actual value for the compound X, it may be convenient in some contexts to consider only the first three  numbers that were actually selected. So we have X := X1,X2,X3,X4,X5,X6 and Y :=  X1,X2,X3 . Now, considering the sequence x := h2, 32, 48, 21, 17, 41i as an actual value for X, this specifies also the actual value y := h2, 32, 48i for Y. Now the set y contains all vectors h2, 32, 48, ·, ·, ·i having their first three index positions instantiated with the values 2, 32 and 48 while the set x contains only one vector with all six positions instantiated. It is therefore clear that x ⊂ y. 

We characterize the relationship between x and y in example 3.7 by saying that x induces y since x also assigns actual values to all positions in the vector y. It is vital to the understanding of the remainder of the thesis, how actual values for variables are used: the usual case will be that we consider all actual values occurring in formulas as arbitrary but fixed and, furthermore, induced by the same compound v assigning actual values to all singleton variables in the underlying compound V when considering propositions in AV. This leads to the convenience that we can always argue that, supposing V to be instantiated, all subsets of V are automatically instantiated coherently. This is a highly non-trivial presupposition and we will continue to explicitly state it in all theorems, lemmas and proofs. When the statement is omitted in the argumentation, the actual values may nonetheless be considered as induced by a common compound.

89 III Graphical Models for Ranking Functions

3.1.4 Algebras Over Variable Sets

In example 3.5 it is obvious that the elements of W and WX form algebras, but there are also cases where this is not obvious. Starting with a given set of propositions and then confirming whether a given function is a variable can therefore be expensive. In many situations the function of interest is already known de facto, and hence, it seems smarter to simply take those functions and turn them into variables by defining the codomain accordingly.

Definition 3.8 (Minimal Algebra AV over a Set of Variables V) Let V be a set of functions on a space of possibilities W. For each function X ∈ V be WX the set of possible values that X can take −1   and AX the algebra over WX. Let X [B] := w :X w ∈ B for any X ∈ V and any B ∈ AX be the proposition that X takes some value in B. Then we define AV to be the smallest algebra that contains all X−1[B] for each X ∈ V and each B ∈ A0.

This means that AV is the smallest algebra relative to which all functions in V are vari- ables. The atoms of AV are the propositions that assign an actual value to each variable in V. Let   for example be V := X, Y , then the atoms of A V are X = x1 ∩ Y = y1, X = x1 ∩ Y = y2, X = x2 ∩ Y = y1, X = x2 ∩ Y = y2, and so on for each xi in WX and each yj in WY. So the codomain of V is the Cartesian product cdX × cdY of the codomains of each variable in V. We therefore recognize that without loss of generality, it holds that V−1v = X−1x ∩ Y−1y. For convenience, we may abbreviate the latter by v = x ∩ y. We will further usually abbreviate a Cartesian product of the codomains of variables simply be denoting the Cartesian product of the variable names, i.e. X × Y is shorthand for cdX × cdY. Hence, although compounds and singleton variables are formally different entities, they can for convenience be treated as if they were equivalent. When we talk about variables it is convenient to start with the set of variables V in focus that have the space W as their domain and the algebra AV as the algebra of propositions over V. Hence, we can consider AV and it is ensured that all X ∈ V are variables. The alternative would be to start consideration with W and then assume the relevant algebra A over W. The first method is more convenient because the focus of our interest is normally expressed by variables, not by possibility spaces.

Definition 3.9 (Marginal Negative Ranking Function) Let κ be an NRF for a propositional alge- bra AV over a set of variables V. Let V0 ⊂ V and κ0 an NRF for the algebra AV0. Let X ∈ V0 be a singleton variable and x an actual value for X without loss of generality. If κ0x = κx for each X ∈ V0 then κ0 is called the marginal NRF for AV0 relative to the (joint) NRF κ for AV.

Function κ0 is called “marginal” relative to κ since the variables in V \ V0 are “marginal” for the consideration of functions for AV0. Marginal NRFs can be easily computed from the underlying joint NRF as follows.  Let X := X1,X2,...,Xn be a set of n non-compound variables on the same domain space of possibilities. Then, the codomain of X are vectors hx1, x2,..., xni.

90 3.1 Introduction

 Let Xi be the variable, whose rank is to be computed. Then κ xi is the minimum rank of    all possible actual vectors hx1,..., xi−1, xi+1,..., xni ∈ cd X1 × ... × cd Xi−1 × cd Xi+1 ×    ... × cd Xn for the compound X1,X2,...,Xn \ Xi .

To ease up notation, we will denote the vector hx1,..., xi−1, xi+1,..., xni by hxi, x−ii. Stated informally, this means precisely the particular vector, but with position i having its value fixed. We then have28:   κ xi = min κ x1 ∩ x2 ∩ ... ∩ xn .(3.2) hxi,x−ii

 \ X Assembling the minimum minhxi,x−ii over X1,X2,...,Xn i means all variables that are not Xi are running through all their possible values while Xi is being held constant.  Example 3.10 Let X := X1,X2,X3 be a set of variables with the possible values x, x¯ for X1,  y, y˙ for X2 and z, zˆ for X3. Then the rank of κ x , according to (3.2) resolves to

 −1  n    o κ X1 x = min κ x ∩ y ∩ z , κ x ∩ y˙ ∩ z , κ x ∩ y˙ ∩ zˆ , κ x ∩ y ∩ zˆ while the rank of κx¯ resolves to

 −1  n    o κ X1 x¯ = min κ x¯ ∩ y ∩ z , κ x¯ ∩ y˙ ∩ z , κ x¯ ∩ y˙ ∩ zˆ , κ x¯ ∩ y ∩ zˆ . 

Example 3.10 illustrates that computation rule (3.2) also holds for the computation of the joint ranking function of a subset of the original set of variables. In particular, (3.2) will be used in chapterIV, especially for the proofs in section 4.5.1.

3.1.5 Graph-theoretic Preliminaries

Nearly all concepts in this short paragraph are standard material. They are only introduced to perform a lucid transition to the graph-theoretic part of our subject. For a comprehensive introduction to graph theory consult (Diestel, 2010)29.

Definition 3.11 (Graph) An ordered pair G := hV,Ei of a non-empty finite set V and a set E ⊆ V × V \ v, v : v ∈ V is called a graph.

Note that definition 3.11 ensures u 6= v for any pair u, v ∈ E. This definition is equivalent to the class of simple graphs. Since we do not consider other graph classes, we use the notion of a graph interchangeably with “simple graph”. We denote a graph as shown above as hV,Ei or with a single calligraphic upper case letter like G with or without superscript or subscript. The elements of V are called vertices and the elements of E are called edges. Single vertices are denoted by lower case letters from the end of the alphabet (u, v, w, x, y, z). Sets of vertices are denoted by upper case letters in boldface like V. Edges are denoted by curly braces as in u, v .

28Thanks to Sven Kosub for pointing me to this notational convention. 29Definitions 3.11, 3.12 and 3.19 are taken quite directly from (Diestel, 2010). While writing section 3.1.5,(Diestel, 2010) was the most important source of inspiration in general.

91 III Graphical Models for Ranking Functions

Note that the symbol V denoting the set of vertices in the graph is deliberately chosen the same as the symbol V denoting the set of variables generating the algebra AV. In the later sections, we will see that in the remainder of this thesis each vertex in a graph represents a variable and, in fact, the variables just are the vertices of a graph. Thus, there is no reasonable distinction between variables and vertices. Nonetheless, it will sometimes be sensible to restrict statements to just the vertices as graphical entities. Whether the elements of V are addressed as vertices or as variables is therefore implicit from the context. Following, we will always silently suppose that V ∩ E = ∅ and that all elements of V as well as of E are pairwise distinct.

Definition 3.12 (Directed Graph) Let G := hV,Ei be a graph. Let the two maps init :E → V and term :E → V be such that for each edge e := u, v ∈ E it holds that either inite = u and terme = v or otherwise inite = v and terme = u, then G is said to be a directed graph.

The edges of a directed graph are also called directed edges. For graphs in general, we denote v, u ∈ E, the fact that an edge is present between vertices v and u, by v ∼ u. This edge may be directed or undirected. The complementary case v, u 6∈ E, which means that no edge exists between the two vertices v and u, is denoted by v  u. If it is especially required to respect the direction of the edges, we will denote the situation that e := u, v ∈ E with inite = u and terme = v by saying that e is a directed edge from u to v, denoted by hu, vi. The statement hu, vi ∈ E is denoted by u → v in short. The situation hu, vi 6∈ E is correspondingly denoted by u 9 v. In the following, we will silently suppose that any directed graph in consideration contains at most one directed edge from any vertex u to a vertex v 6= u. Note that this does not exclude that a different edge from v to u may nonetheless be present. A graph for which no maps init and term are defined is called undirected30. If we consider E as a binary relation between vertices, then, intuitively, an undirected graph is a graph for which the relation E is symmetric. It is obvious that an undirected graph can be generated by considering a directed graph and just dropping init and term.

Definition 3.13 (Undirected Graph of a Directed Graph) Let D := hV,Ei be a directed graph. Then the undirected graph G0D := hV,E0i with

n o E0 := u, v : hu, vi ∈ E is called the undirected graph of D.

Definition 3.14 (Complete and Empty Graphs)

1. An undirected graph hV,Ei such that E := V × V \ hv, vi : v ∈ V is called complete.

2. A directed graph hV,Ei such that for each pair of distinct vertices u, v ⊆ V × V it either holds u → v or v → u is called complete.

30Note that in graph theory, the concept of a mixed graph exists. A mixed graph contains directed as well as undirected edges. We do not need this concept so we will stick with directed and undirected graphs, supposing that any graph considered either contains only directed or only undirected edges.

92 3.1 Introduction

3. A graph hV, ∅i is called empty.

Definition 3.15 (Subgraph and Induced Subgraph) A graph G0 := hV0,E0i is a subgraph of the graph G := hV,Ei iff V0 ⊆ V and E0 ⊆ E ∩ V0 × V0. We denote this by G0 ⊆ G or by G0 ⊂ G in case that V0 6= V. If it holds furthermore that E0 = E ∩ V0 × V0 we call G0 an induced subgraph of G and in this case we say that V0 induces the subgraph G0 of G. The induced subgraph G0 is then denoted by GV0.

Informally, the main difference between a subgraph and an induced subgraph is that an induced subgraph hV0,E0i will contain each edge in E that connects any vertices in V0.A subgraph of G that is not an induced subgraph might miss some or in fact might miss all of those edges.

Definition 3.16 (Complete Subset) Let G := hV,Ei be a graph and S ⊆ V a subset of vertices and GS the subgraph of G induced by S. If GS is complete, then S is called a complete subset within G.

If the requirement of maximality is added to the definition of a complete subset, we obtain the notion of a clique31.

Definition 3.17 (Clique) Let G := hV,Ei be a graph and C ⊆ V a subset of vertices and GC the subgraph of G induced by C. If GC is complete and there is no vertex v ∈ V such that C ∪ v also induces a complete subgraph of G, then GC is called a clique in G.

Intuitively, a clique is a complete subgraph of a graph that is not contained in any other complete subgraph of the same graph. In the case of directed graphs no assertion is made about the directions of the edges. Since cliques are complete subgraphs, it is convenient to just denote them as a set of vertices, e.g. C, with or without subscript.

Definition 3.18 (Walk) Let G := hV,Ei be a graph and let Q := hv0, e1, v1, e2, v2, e3,..., vni be a  sequence of vertices and edges such that for each i in 1 ≤ i ≤ n it holds that ei := vi−1, vi with ei ∈ E and vi ∈ V and v0 ∈ V. We then call Q a walk in G.

Definition 3.19 (Path) A walk P := hv0, e1, v1, e2, v2, e3,..., vni in a graph G such that vi 6= vj for all i 6= j is called a path in G. Number n is called the length of the path P.

We will also say that v1 and vn are connected by P and P is called a connection between v1 and v2. The statement that P is a path of length k is occasionally abbreviated by calling P a k-path. Note that if P is a path in G this entails trivially that P ⊆ G and P can be interpreted as a subgraph of G.

Definition 3.20 (Directed and Undirected Paths) A path P containing the vertices v1, v2,..., vn such that for all i with 1 ≤ i < n it either holds that vi → vi+1 or it holds for all those i that vi+1 → vi is called a directed path. A path that is not directed is called undirected.

31Note that sometimes in literature, the notion “clique” just denotes what we called “complete subset” and what we call a clique is called a “maximal clique” in those cases.

93 III Graphical Models for Ranking Functions

Intuitively, a directed path is a path that can be walked completely from v1 to vn while passing every edge in the same direction. Note that a sequence of directed edges may form an undirected path and that directed graphs can hence contain undirected paths.

If P is a directed path connecting two vertices v1 and vn, we will therefore also say that P leads from v1 to vn and denote this by v1 7→ vn. Being connected by a path (including paths of length 2) is obviously an equivalence relation, which means it is reflexive, symmetric, and transitive.

Definition 3.21 (Closed Path, Cycle, Polygon) Let C be a path from v1 to vn with 2 ≤ n ≤ |V|.  If and only if there exists an edge v1, vn ∈ E, then C is called closed. If and only if C is either

1. such that for all i with 1 ≤ i ≤ n − 1 it holds that vi → vi+1 and additionally vn → v1, or, alternatively

2. such that for all those i it holds that vi+1 → vi and additionally v1 → vn, or

3. an undirected graph and closed then C is called a cycle. A closed path that is not a cycle is called a polygon.

Definition 3.22 (Directed Acyclic Graph) A directed graph hV,Ei that does not contain any cycles is called acyclic.

The notion “directed acyclic graph” is usually abbreviated by “DAG”. Note that definitions 3.21 and 3.22 allow directed acyclic graphs to contain polygons. As an example consider the DAG in figure 3.3 on page 97, which contains no cycles but 2 polygons while the DAGs in figures 3.1 and 3.2 neither contain cycles nor polygons. There is some variation found in literature of what is called a cycle. Sometimes, what we call a cycle is called “directed cycle” and what we call polygon is called “undirected cycle”. The latter leads to a rather unintuitive terminology since DAGs were allowed to contain undirected cycles although they are called “acyclic”. Diestel calls a cycle what we call a “closed path” (cf. (Diestel, 2010, section 1.3)).

Definition 3.23 (Incidence) A vertex v is called incident to each edge u, v ∈ E for any vertex u ∈ V.

A vertex incident to exactly one edge is called terminal vertex.

Definition 3.24 (Adjacency, Neighborhood of a Vertex) Two vertices v and u such that u, v ∈ E are called adjacent (to each other). The set adjv := w : v, w ∈ E of vertices that are adjacent to v is also called the neighborhood of v and is denoted by adjv.

Equivalently, two vertices u and v are called adjacent if and only if they are incident to the same edge, meaning they are directly connected to each other.

Definition 3.25 (Neighborhood of a Vertex Set) For a set of vertices S ⊂ V we call each vertex in the set of vertices adjS := Sadjw : w ∈ S \ S adjacent to S. The set adjS is called the neighborhood of the vertex set S.

94 3.1 Introduction

Definition 3.26 (Closure of a Vertex or a Vertex Set) Let v ∈ V be a vertex. We then call the set clsrv := adjv ∪ v the closure of v. Correspondingly, for a vertex set S the set adjS ∪ S is the closure of the vertex set S.

Definition 3.27 (Relationships between Vertices) Let hV,Ei be a directed graph and let vertices u, v ⊆ V with u 6= v.

1. A vertex u sucht that u → v, u is called the parent of v. The set of parents u ∈ V : u → v of v is denoted by pa(v).

2. A vertex u such that v is a parent of u is called child of v. The set of children u ∈ V : v → u of v is denoted by ch(v).

3. A vertex u such that u 7→ v is called an ancestor of v. The set of ancestors of v is denoted by an(v).

4. A vertex u such that v is an ancestor of u is called a descendant of v. The set of descendants of v is denoted by dn(v).

5. A vertex u that is not a descendant of v (and not equal to v) is called a non-descendant of v. The set of non-descendants V \ dn (v) ∪ v  of v is denoted by nd(v).

Note that a vertex is not an ancestor, parent, child, descendant or non-descendant of itself. A vertex v such that ch (v) = ∅ is called leaf vertex and, correspondingly, a vertex v such that pa (v) = ∅ is called root vertex.

Definition 3.28 (Ancestral Set) For an undirected graph hV,Ei, a vertex set S ⊆ V such that for each v ∈ S it holds that adjv ⊆ S is called an ancestral set. For a directed graph, a set S is ancestral if and only if for each vertex v ∈ S it holds that an (v) ⊆ S.

The author borrowed the notion of an ancestral set from (Lauritzen, 1996, p. 6), where Lauritzen also states that the intersection of a collection of ancestral sets is itself an ancestral set and thus for any subset W ⊆ V there is a smallest ancestral set S that contains W. We denote this smallest ancestral set by AnW.

Definition 3.29 (Relationships between Vertex Sets) Let hV,Ei be a directed graph and let S ⊂ V be a set of vertices.

1. For a set S of vertices we call the set pa (S) := Spa (w) : w ∈ S \ S the parents of S.

2. For a set S of vertices we call the set ch (S) := Sch (w) : w ∈ S \ S the children of S.

Note that for the set of graphs we consider, this entails adjv = pa (v) ∪ ch (v).

Definition 3.30 (Component) Let G := hV,Ei be a graph. Each equivalence class U := hiiV generated by the relation of connectedness on V is called a component of G.

Definition 3.31 (Connected Graph) A graph G that contains only one component is called con- nected.

95 III Graphical Models for Ranking Functions

Note that in a connected graph, there is always at least one path between any two vertices.

Definition 3.32 (Singly Connected Graph, Polytree) Let G := hV,Ei be a graph. If there exists at most one path between any two distinct vertices v, u ⊆ V then G is called singly connected.A graph that is connected and singly connected is called a polytree.

Note that a singly connected graph need not be connected (which seems quite counterintu- itive). Equivalently to definition 3.32, a polytree is a connected DAG containing no polygons. The notion polytree was introduced by (Rebane & Pearl, 1989). Also note that in a polytree, for any two distinct vertices v, u ⊆ V there is exactly one path connecting u and v. For polytrees it is therefore unambigous to say that vertices u and v have a distance of k if the path connecting u and v is of length k. The distinction between DAGs that are polytrees and those that are not, is motivated by the algorithmic perspective on informational updates. Updates on polytrees are algorithmically less complex than updates on DAGs that are not polytrees. A particular subtype of singly connected graphs is of special interest: polytrees also exist in the variant of trees.

Definition 3.33 (Tree) A directed polytree hV,Ei in which each vertex has at most one parent vertex is called tree.

Equivalently, polytrees are trees in which the direction of the edges does not matter. Note that for undirected graphs there is no sensible distinction between polytrees and an equivalent definition of trees. The removal of an arbitrary single edge from a singly connected graph increases the number of components of the graph. It is just another perspective on the property that each vertex is connected with each other vertex by at most one path. Graphs that contain edges that can be removed without increasing the number of compo- nents are correspondingly said to be “multiply connected”.

Definition 3.34 (Multiply Connected Graphs) Let G := hV,Ei be a graph. If there exist at least two different paths between at least two vertices u, v ⊆ V then G is called multiply connected.

We will distinguish these three types of DAGs: trees, polytrees, and multiply connected graphs. The DAG in figure 3.1 on page 97 is a tree (and a forteriori a polytree), the one in figure 3.2 is a polytree. Note that the graph in figure 3.2 does not contain any cycles or polygons. It is not a tree either. The DAG in figure 3.3 is multiply connected and contains 2 polygons.

Definition 3.35 (Moral Graph of a DAG) Let D := hV,Ei be a DAG. For any distinct vertices u and w that are parents of v and not adjacent to each other, create an (undirected) edge v, u . Do so for every vertex v ∈ V and every pair of distinct parents of v. Let Em be the set of all edges created by 0 n o this procedure. Create a set of (undirected) edges E := Em ∪ u, v : hu, vi ∈ E . Then the graph  0 Gm D := hV,E i is called the moral graph of D.

96 3.1 Introduction

Figure 3.1: A tree Figure 3.2: A polytree Figure 3.3: A DAG

 A moral graph Gm D is an undirected graph of the directed input graph D containing additional edges. The modification operation described in definition 3.35 is called moralization32 of a graph. It ensures that for each vertex v there exists a clique in the moral graph that contains v as well as pa(v). This assertion can be interpreted semantically, as we will see later when passing on to definition 3.57 on page 106. Figure 3.4 shows the moral graph of the graph in figure 3.3.

Figure 3.4: Moral graph of graph 3.3 Figure 3.5: Missing chords

Definition 3.36 (Chord) Let G := hV,Ei be a graph and let P := hv0, e1, v1, e2, v2,..., vni be a   path in G. If for some i, j with 0 ≤ i < j ≤ n there exists an edge vi, vj ∈ E such that vi, vj 6∈ P then this edge is called a chord for P and we say that P possesses a chord.

Definition 3.37 (Chordal Graph) A graph G such that every cycle of length strictly greater than k in G possesses a chord is called k-chordal.

The dashed lines in figure 3.5 represent the chords whose insertion would make the DAG a 3-chordal graph. Note that there may exist different sets of chords missing whose insertion yield to different 3-chordal graphs. Since 3-polygons in graphs are also called triangles and all polygons in 3-chordal graphs consist of, or just are, triangles, 3-chordal graphs are also called triangulated graphs as for instance (Neapolitan, 1990) does as well as the authors of (Rose et al., 1976) and (Tarjan & Yannakakis, 1984). A triangulated graph can obviously be obtained by inserting edges into a non-triangulated graph.

Definition 3.38 (Triangulated Graph, Triangulation of a Graph, Triangulation) A graph that is 3-chordal is called triangulated. A triangulated graph Gc := hV,E ∪ Fi such that for some set E

32The funny name refers to the imaginative association that the parents of the vertex are to be formally connected by a “marriage”.

97 III Graphical Models for Ranking Functions the graph G := hV,Ei is not triangulated is called a triangulation of G. The operation of inserting F into G is called triangulation and we equivalently say that we obtain Gc by triangulating G.

It should be mentioned that triangulated graphs are not equivalent to triangle graphs, which are usually called maximally planar graphs. A maximally planar graph, informally, is character- ized by the property that it is impossible to add any edge to it without destroying its planarity. It can easily be shown that triangulated graphs and maximally planar graphs represent dif- ferent types of graphs: each complete graph containing more than 4 vertices instantiates a counterexample since it is triangulated without being a triangle graph. An example is shown in figure 3.6. The other type of counterexample, a triangle graph that is not triangulated, is shown in figure 3.7.

Figure 3.6: K5, a triangulated graph Figure 3.7: A non-triangulated, maxi- that is not planar. mally planar graph.

Triangulated graphs are of special importance for us because of properties with respect of their cliques. A triangulated graph with n vertices is known to have at most n cliques as (Fulkerson & Gross, 1965) pointed out, where arbitrary graphs with n vertices may have at n most 3 3 cliques as can be understood from (Moon & Moser, 1965). Since triangulated graphs are restricted to a linear number of cliques, their cliques are computationally tractable as we will see in chapterIV. The other important property is that the cliques of a triangulated graph can be coherently ordered such that they form a tree.

Definition 3.39 (Running Intersection Property) Let α be a total ordering of the elements in a set  S := E1,..., Em of sets with cardinality m for which it holds that for each i with 2 ≤ i ≤ m there is a j < i such that  Ei ∩ E1 ∪ ... ∪ Ei−1 ⊆ Ej holds. Then α is called to have the running intersection property or RIP for short.

 Element Ei as in definition 3.39 is called a twig for the subset E1,..., Ei and Ej is called a branch for the twig Ei. Note that in any ordering that has the RIP, an integer ji such that 1 ≤ ji ≤ i − 1 for each  i with 2 ≤ i ≤ m can be selected such that Ej(i) is a branch for the twig Ei in E1,..., Ei . Applied on the set of cliques of a given graph, an ordering possessing the RIP induces a tree graph of the cliques. Each vertex in this graph represents a clique of the original graph. Such a tree is called clique tree. A clique tree is a special case of a junction tree of the original graph.

98 3.1 Introduction

Definition 3.40 (Clique Tree) Let G := hV,Ei be an undirected graph with the complete set of all   cliques C G := C1,..., Cp for some integer p > 0. (This implies for each 1 ≤ i ≤ p that Ci ⊆ V S  0 and V = Ci : 1 ≤ i ≤ p .) Let TG := hC G ,E i be a tree such that for each vertex Cj on the (unique) path that connects two distinct vertices Ci and Ck in TG it holds that Ci ∩ Ck ⊆ Cj. Then, TG is called a clique tree of G.

Note that the clique tree of a graph is not necessarily unique, which can be figured out very easily. The defining property of a clique tree is that the ordering 1 . . . p has the RIP. This is also called coherence and implies that if two distinct cliques Ci and Ck both contain a particular vertex v ∈ V, then all cliques on the unique path that connects Ci and Ck in TG contain v either.

Definition 3.41 (Decomposability, Decomposition, Decomposable Graph) The process of con- structing a clique tree TG from a graph G is called decomposition of G and we say that G is decom- posed to the clique tree TG . Graphs that can be decomposed to a clique tree are called decomposable.

Note that if and only if an ordering of the cliques of a graph can be established that has the RIP, then the graph is decomposable. It was first shown in (Beeri et al., 1983) that in fact, decomposability and triangulatedness are equivalent and the decomposable graphs are exactly the triangulated graphs.

Theorem 3.42 (Beeri, Fagin, Maier and Yannakakis) A graph is decomposable if and only if it is triangulated.

We will therefore use the notions triangulated graph and decomposable graph completely equiv- alent. This entails that an ordering of the cliques that has the RIP is possible if and only if the underlying graph is triangulated. Graphs can be generalized to hypergraphs.

Definition 3.43 (Hypergraph, Hyperedge) An ordered pair H := hV,Si such that V is a set of vertices and S ⊆ 2V is an arbitrary subset of the power set of V is called a hypergraph. An element of S is called a hyperedge.

The set of vertices V of a hypergraph H is alternatively denoted by VH and called context of a hypergraph (while its set of hyperedges S is also called figure of the hypergraph). Intuitively, a hypergraph is a generalization of a graph such that an edge can connect ar- bitrary many vertices. A graph as defined previously is then a hypergraph of which each hyperedge has cardinality 2.

Definition 3.44 (Hypertree) A hypergraph N := hV,Si such that an ordering of the elements of S can be established that has the running intersection property is called a hypertree.

99 III Graphical Models for Ranking Functions

3.2 Graphoids and Conditional Independence Among Variables

3.2.1 Conditional Independence Among Variables

Constructing a graphical model for epistemic updates requires an adequate notion of condi- tional independence in the paradigm that is to be modeled. In the previous chapter, we have already developed a concept of rank-based conditional independence for propositions. For further progress, this concept has to be extended to variables. As already suggested in the last section of the previous chapter, it is crucial to have a concept of conditional independence because it provides us with the basis for the decision whether the value of a particular variable has to be considered while updating a particular other variable. The problem of updating is made computationally feasible by ignoring all that is not defined to be respected. The following notion of conditional independence for NRFs is defined in accordance to definition 7.5 in (Spohn, 2012, p. 130) and applies to variables.

Definition 3.45 (Conditional Independence Among Variables in κ) Let A be a propositional al- gebra over W and let κ be an NRF for A. Let X, Y, Z be disjoint variables over W. Then, if and only if for all propositions A ⊆ AX, and B ⊆ AY, and all atoms C of AZ it holds that A ⊥κ B | C, variable X is called independent of Y given or conditional on Z w.r.t. κ, denoted by X ⊥κ Y | Z.

This means precisely that X and Y are independent given Z if and only if in each case propositions about X and propositions about Y are independent from each other given any complete realization of the variables in Z. Concerning definition 3.45, Spohn remarks:

“(. . . ) Note that the reference to the atoms of AZ is essential here. It is quite open whether or not the independence of X and Y still holds if given less infor- mation about Z, i.e. disjunctions of atoms of AZ. (. . . ) Because of this reference, the generalization to the infinite case and non-atomic algebras AZ is not trivial. It is straightforward only for complete ranking functions.” (Spohn, 2012, p. 130)

Let V be a finite set of either propositions or variables and let X, Y and Z be three arbitrary disjoint subsets of V. An independency statement is an assertion of the form X ⊥ Y | Z.A function that assigns truth values to independency statements of a particular base set V we will call a dependency model. This notion is borrowed from (Pearl, 1988b, p. 91). Definition 3.45 makes any NRF a dependency model since definition 2.48 provides us with a rule for testing the truth of any independency statement. Another quite obvious example for a dependency model are probability distributions – combined with the probabilistic definition of conditional independence, each probability distribution is a dependency model. We will shorten the general notion “rank-based conditional independence” by RCI and use the symbol ⊥κ for the particular dependency model induced by a given NRF. The most important task in the remaining part of this chapter will be to analyze the re- lationship between the dependency models induced by RCI and certain kinds of graphical dependency models to prove that RCI can be sufficiently implemented by graphical devices.

100 3.2 Graphoids and Conditional Independence Among Variables

3.2.2 RCI is a Graphoid

Pearl and Paz analyzed the necessary and sufficient properties conditional independence mod- els have to satisfy to be adequately representable in graphs. Their framework for graphical representation of conditional independence models was fundamentally introduced in (Pearl & Paz, 1985) and later comprehensively discussed and developed in (Pearl, 1988b). The relevant properties were first described independently by (Dawid, 1979) and also (Spohn, 1980). Both publications present axioms that showed to be equivalent to what Pearl and Paz proved to be a common base for modeling epistemic irrelevance. Following Pearl and Paz, we introduce the notions of a semi-graphoid and a graphoid as abstract models of conditional independence.

Definition 3.46 (Semi-Graphoid) For a set of variables V a ternary relation ⊥ such that for all X, Y, Z, W ⊆ V it holds that:

∅ ⊥ ∅ | Z, and if X ⊥ X | Z, then X ⊥ Y | Z (trivial independence) if X ⊥ Y | Z, then Y ⊥ X | Z (symmetry) if X ⊥ Y ∪ W | Z, then X ⊥ Y | Z and X ⊥ W | Z (decomposition) if X ⊥ Y ∪ W | Z, then X ⊥ Y | Z ∪ W (weak union) if X ⊥ Y | Z and X ⊥ W | Z ∪ Y, then X ⊥ Y ∪ W | Z (contraction) is called a semi-graphoid.

Concerning the intuitive interpretation of these axioms, one may refer directly to Pearl:

“The symmetry axiom states that in any state of knowledge Z, if Y tells us nothing new about X, then X tells us nothing new about Y. The decomposition axiom asserts that if two combined items of information are judged irrelevant to X, then each separate item is irrelevant as well. The weak union axiom states that learning irrelevant information W cannot help the irrelevant information Y to become relevant to X. The contraction axiom states that if we judge W irrelevant to X after learning some irrelevant information Y, then W must have been irrelevant before we learned Y. Together, weak union and contraction properties mean that irrelevant information should not alter the relevance of other propositions in the system; what was rele- vant remains relevant, and what was irrelevant remains irrelevant.” (Pearl, 1988b, p. 85)

Note especially that the properties (b) and (d) given in (Spohn, 1980, p. 77) are identical to symmetry and contraction. Property (c) is a generalized version of weak union and property (a) is generalized trivial independence. The entire set is equivalent to the semi-graphoid properties. An additional property is of special interest. If a semi-graphoid satisfies this property, it is called a graphoid.

101 III Graphical Models for Ranking Functions

Definition 3.47 (Graphoid) For a set of variables V a semi-graphoid ⊥ over V such that for all X, Y, Z, W ⊆ V it holds that:

if X ⊥ Y | Z ∪ W and X ⊥ W | Z ∪ Y then X ⊥ Y ∪ W | Z (intersection) is called a graphoid.

Informally, the intersection property states that if X can be made independent of the other variables by instantiating one of two different subsets, either Z ∪ Y or Z ∪ W, then the inter- section of these two subsets is also sufficient to make X independent from the other variables. Property (e) of Spohn’s (Spohn, 1980, p. 77) presents the intersection property in just a different formal presentation. It is possible to simply connect ranking theory with this context since RCI is a graphoid.

Theorem 3.48 (RCI is a graphoid) For each NRF κ for a propositional algebra AV over a set of variables V the relation ⊥κ is a graphoid.

An early proof was presented in (Hunter, 1991a, p. 496). A proof following a different strategy can be found in theorem 7.10 in (Spohn, 2012, p. 132). As Spohn points out in the comment on this theorem, there exists an interesting difference between NRFs and probability functions in respect of their graphoid-nature. For a probability measure P to be a graphoid, it has to be strictly positive, which means P(A) > 0 for all A ∈ A with A 6= ∅. Spohn underlines that an NRF κ on the other hand is not required to be regular to make κ be a graphoid. As a special case of conditional independence, we introduce the case X ⊥κ Y | ∅. In this case we are not required to know anything about X and Y to recognize them as being connected by an independency relation.

Definition 3.49 (Unconditional Independence Among Variables in κ) Be κ an NRF for an al-   gebra A V and be the variables X, Y ⊆ V. If it holds that X ⊥κ Y | ∅, the variable X is called unconditionally independent of Y w.r.t. κ, denoted by X⊥κY.

Theorem 3.50 (Properties of Unconditional Independence among Variables) Let V be a set of functions over a possibility space W and κ an NRF for AV. Let X, Y, Z ⊆ V be variables and x any concrete value that X can take. Then the following holds:

∅⊥κ∅, and if X⊥κX, then X⊥κV, (trivial independence)

if X⊥κY then x⊥κY, (symmetry)

if X⊥κY ∪ Z, then X⊥κY, (decomposition)

if X⊥κY and X ∪ Y⊥κZ, then X⊥κY ∪ Z. (mixing)

Proof: The proofs for trivial independence, symmetry, and decomposition are analogous to the proof of theorem 3.48. The “mixing” property follows immediately from the semi- graphoid properties “weak union” and “contraction”. 

102 3.2 Graphoids and Conditional Independence Among Variables

Theorem 3.50 is the rank-based part of theorem 7.6 from (Spohn, 2012, p. 130). This is a relevant addendum since Pearl shows that the three properties of symmetry, de- composition and mixing constitute a complete axiomatization of unconditional independence in general. Unconditional independence is also called “marginal independence” referring to the fact that in the term X ⊥ Y | Z the condition Z is kept fixed and therefore does not contribute to the independence relationship between the variables under consideration. The variable Z is therefore considered marginal. Note that the complete axiomatization of this special case is a difference to conditional independence in general since the graphoid-properties are not a complete axiomatization of conditional independence and – a forteriori – not of RCI. Pearl and Paz conjectured completeness of the graphoid-properties in (Pearl & Paz, 1985). A further analysis of the completeness problem was later issued in (Geiger & Pearl, 1989). Pearl repeated the conjecture later in (Pearl, 1988b, p. 88+131) but also argues that results from (Geiger, 1988) suggest that it probably may turn out to be wrong. Indeed, it was later shown by (Studený, 1992) and (Studený, 2005) that conditional independence cannot have a finite and complete characterization of the form

X1 ⊥ Y1 | Z1 ∧ ... ∧ Xi ⊥ Yi | Zi =⇒ Xi+1 ⊥ Yi+1 | Zi+1 ∨ ... ∨ Xn ⊥ Yn | Zn.

We therefore take notice of the fact that the graphoid properties are a valuable tool that has proven useful for practical purposes but does not represent a complete axiomatization of RCI and conditional independence in general.

3.2.3 Agenda

Having shown that κ is a graphoid, two questions arise:

1. Given an NRF κ, how to obtain a sound graphical representation of it?

2. Given an actual graph G, can an NRF κ be deduced from it such that G is an adequate representation of κ?

The first question is obviously interesting because its answer affects directly the possibility of using graphical models for NRFs. The second question may seem artificial at first sight but is inherently relevant since we need a method to compute ranks by only operating on the graph. If it is possible to derive the joint NRF κv from the graph structure we will always have access to the NRF the graph represents. The answers to both questions are definitively “yes” as the remaining part of this chapter will show. For convenience, we will proceed without any formal distinction between variables and vertices but use the two notions interchangeably. We will occasionally also use the term “singleton variables” in the graphical context to refer to graph vertices.

103 III Graphical Models for Ranking Functions

3.3 Ranking Functions and Their Markov Graphs

3.3.1 D-Maps, I-Maps, and Markov Properties

We consider the first question now and analyze how NRFs can be turned into adequate graph- ical models. As already stated informally, conditional independence can be transferred on the graphical level. In the case of undirected graphs, conditional independence is implemented by the conception of separation.

Definition 3.51 (Undirected Graphical Separation) Let X, Y, Z ⊆ V be disjoint non-empty subsets of vertices in an undirected graph G := hV,Ei. Vertex set Z is called to separate X and Y in

G, denoted by X ⊥u Y | Z, iff each path between any vertex u ∈ X and any vertex v ∈ Y contains at least one vertex w ∈ Z.

Undirected graphical separation is obviously a graphoid. Instead of a proof, one can also consider the enlightening graphical illustration of the four non-obvious graphoid properties decomposition, weak union, contraction, and intersection, which can be found in (Pearl, 1988b, p. 86). Undirected graphs can therefore be utilized as graphical models of a set of variables V and an NRF κ for the algebra AV generated by V. Each variable X ∈ V is represented by a vertex in the graph. The dependency relations between the variables are expressed by edges connecting the vertices of the graph: a directed edge from a vertex v to a vertex w is interpreted as a causal relationship insofar as w is taking its actual value in some causal dependence of the value v takes.

Definition 3.52 (D-map) Let V be a set of variables, κ an NRF for AV and G := hV,Ei an undirected graph. G is called a dependency map or D-map of κ for short if it holds for any variables X, Y, Z ⊆ V that

X ⊥κ Y | Z =⇒ X ⊥u Y | Z.

If a graph G is a D-map of an NRF κ then G is ensured to represent all independence relationships contained in κ. Despite of its D-mapness with respect to κ, the graph G is allowed to contain additional independence relationships. Therefore, the deletion of edges in a D-map does never affect its D-mapness.

Definition 3.53 (I-map, Markov field, Global Markov property) Let V be a set of variables, κ an NRF for AV and G := hV,Ei an undirected graph. G is called an independence map or I-map of κ for short if it holds for any variables X, Y, Z ⊆ V that

X ⊥u Y | Z =⇒ X ⊥κ Y | Z.

Correspondingly κ is called a Markov field of G iff G is an I-map of κ. κ is called to satisfy the global Markov property w.r.t. G iff G is an I-map of κ.

The I-mapness of a graph G with respect to an NRF κ does clearly not ensure that all relations of independence induced by κ are also present in G. The I-mapness of G only ensures

104 3.3 Ranking Functions and Their Markov Graphs that G does not contain independencies that are not already given in κ. Thus adding edges to an I-map G of κ does not destroy the I-mapness of G. Note that complete graphs are trivial I-maps and empty graphs are trivial D-maps. It has become clear that the following expressions are equivalent:

1. G is an I-map of κ.

2. κ is a Markov field of G.

3. κ satisfies the global Markov property w.r.t. G.

We remember why conditional independencies were introduced into discussion. Respecting relationships of conditional independencies between variables decreases the amount of calcu- lation that is necessary to perform epistemic updates. The more independencies are known, the more effectively the update can be computed. Thus, we are interested particularly in those I-maps that do not contain any edges not required to represent any independence relationships induced by κ.

Definition 3.54 (Minimal I-map, Markov graph) An undirected graph G := hV,Ei is called a minimal I-map of an NRF κ if G is an I-map of κ and there is no edge hX, Yi ∈ E with X, Y ⊆ V D E such that V,E \ hX, Yi is also an I-map of κ. Equivalently, G is called Markov graph of κ.

Definition 3.55 (Perfect Map) An undirected graph G is called a perfect map of an NRF κ iff G is an I-map of κ and G is also a D-map of κ.

Note that for a perfect map G of κ it holds that ⊥u = ⊥κ. Some authors call perfect maps also “perfect graphs”. Further note that a perfect map is a minimal I-map that is also a D-map.

Theorem 3.56 (Pairwise Markov property of graphs and NRFs) For each NRF κ for some alge- bra AV there exists a Markov graph G := hV,Ei such that for all (singleton) variables X, Y ⊆ V it holds that  X  Y ⇐⇒ X ⊥κ Y | V \ X, Y .

If this relationship holds for G and κ, then κ is called to satisfy the pairwise Markov property w.r.t. G.

Proof: A general proof of this theorem for dependency models satisfying symmetry, decom- position, and the graphoid-property of intersection, is given by (Pearl, 1988b, p. 139). Since it was already shown that the dependency model induced by κ is a graphoid the relationship postulated holds a forteriori. 

Theorem 3.56 resembles theorem 3 of (Pearl & Paz, 1986, p. 361) that is also discussed in (Pearl, 1988b, p. 97). Pearl and Paz discuss the pairwise Markov property for graphoids in general thus its proof need not to be reproduced here in detail for the special case of RCI. (Pearl stresses that the weak-union-property is not necessary for the proof.) This theorem is an important foundation for all further work since it enables us to rely on the fact that each NRF has a corresponding Markov graph. Thus, there exists a graphical model

105 III Graphical Models for Ranking Functions for each NRF κ that is at least as strict in representing conditional independence relations as κ is. From theorem 3.56 a simple algorithm can be deduced that constructs a Markov graph from a list of independency statements of a given NRF: Let V be some set of variables and κ an NRF for AV.

1. Start with the complete graph G := hV,Ei, i.e. E := V × V \ hX, Xi :X ∈ V .   2. For each pair of variables X, Y ⊆ V for which it holds that X ⊥κ Y | V \ X, Y , delete the corresponding edge hX, Yi from the graph G.

3. Return the resulting graph that is a minimal I-map of κ.

This naïve algorithm is obviously of linear time complexity if the set of conditional inde- pendence relations used in step 2 is already known. Let q be the number of independence relations among V and c the cost of inserting an edge into G. Then step 2 produces a total cost of qc. The costs of the other steps are constant and hence neglectable. In this scenario, we naturally assume that the relevant list of conditional independence relationships induced by κ is already known. Thus, at this point, it can already be stated that a given NRF always induces a graphical representation in the form of a Markov graph. This means the answer of the first question raised on page 103 in the end of section 3.2 is “yes”. Before we proceed to the analysis of the second question, whether or not NRFs can be deduced from given graphs, it is interesting to show which properties I-maps of NRFs have and how those properties are related to each other.

3.3.2 Markov Blankets and Markov Boundaries

For each variable X we can identify the set of variables that are of direct influence on the actual value of X and thus are considered relevant for X. We will now introduce an operational definition of relevance and then transfer it to the graphical model.

Definition 3.57 (Markov blanket, Markov boundary) Let V be a set of variables, X ⊆ V a (com- pound or singleton) variable and W ⊆ V a set of variables with X ∩ W = ∅. The set W is called a Markov blanket of X iff it holds that:

 X ⊥κ V \ W ∪ X | W.

Variable W is called the Markov boundary of X iff there is no subset W0 ⊂ W, which is also a Markov  blanket of X. The Markov boundary of X in κ is denoted by bdryκ X .  Note that it trivially holds that X ⊥κ ∅ | V \ X . Therefore, each variable has a Markov blanket and thus a Markov boundary. Furthermore, each variable has indeed a unique Markov boundary. Remember the moralization of a graph as defined on page 96. Moralization is intuitively the operation that connects each vertex with its Markov blanket.

106 3.3 Ranking Functions and Their Markov Graphs

Theorem 3.58 (Variables have unique Markov boundaries) For a set of variables V and an NRF κ for AV, each singleton variable X ∈ V has a unique Markov boundary. For a minimal I-map G of      κ, the set bdryκ X coincides with the set adj X , so bdryκ X = bdryG X = adj X .

For the general proof on dependency models with the appropriate properties, consult the- orem 4 in (Pearl, 1988b, p. 97, 141). Theorem 3.58 holds since RCI was already shown to be a graphoid. Pearl stresses that the contraction property is not required for the proof, consequently, not only graphoids will fulfill theorem 3.58 but also so-called pseudo-graphoids, which are depen- dency models that satisfy the symmetry, decomposition, weak union, and intersection proper- ties without being required to satisfy the contraction property. Since RCI was already shown to be a graphoid, it is a forteriori a pseudo-graphoid. We have defined the closure of a vertex in a graph as clsrX := X ∪ adjX. Equivalently,   we define the closure of a variable X w.r.t. κ by the set clsr X := X ∪ bdryκ X .

Theorem 3.59 (Local Markov property of NRFs) Let V be a set of variables and κ an NRF over the algebra AV. Let G := hV,Ei be an undirected graph. Then it holds for each variable X ⊆ V that

  X ⊥u V \ clsr X | adj X iff G is an I-map of κ. If this relationship holds for G and κ, then κ is called to satisfy the local Markov property w.r.t. G.

Proof: The “only if”-part follows directly, assuming the I-mapness of G w.r.t. κ. Remember that for G I-mapness w.r.t. κ is equivalent to κ satisfying the global Markov property w.r.t. G. It is therefore shown that the global Markov property implies the local Markov property. The proof of the forward direction is postponed here to the proofs of lemmata 3.60 and 3.61 forming an indirect proof of the forward direction. 

This is the rank-based version of theorem 4 of (Pearl & Paz, 1986, p. 361). A more detailed discussion is contained in (Pearl, 1988b, p. 98, theorem 5.iii). In the remainder of the analysis, the following notation will be used: if an NRF κ satisfies the global Markov property w.r.t. a graph G, we will denote this simply by writing (G). Equiv- alently, we will use (L) to denote the presence of the local Markov property and (P) for the pairwise Markov property. It is an important insight that the three Markov properties are equivalent for NRFs as will be shown immediately, following the corresponding proofs for probability measures in (Lau- ritzen, 2002) and (Lauritzen, 1996, p. 33f).

Lemma 3.60 For any NRF κ for AV and any undirected graph G := hV,Ei the following implica- tion holds: (G) =⇒ (L) =⇒ (P).

Proof: Since the proof of theorem 3.59 already contains (G) =⇒ (L) it remains to be shown that (L) =⇒ (P).

107 III Graphical Models for Ranking Functions

Assume that (L) holds. Consider any vertex Y ∈ V \ clsrX, which is thus non-adjacent to X. It obviously holds for Y that:   adjX ∪ V \ clsrX \ Y = V \ X, Y .

Then, as a direct consequence of (L), it follows:

  X ⊥u V \ clsr X | V \ X, Y .

Further, since Y ∈ V \ clsrX, it holds a forteriori that

 X ⊥u Y | V \ X, Y which is (P). 

Lemma 3.61 For any NRF κ for AV and any undirected graph G := hV,Ei the following implica- tion holds: (P) =⇒ (G).

Proof: Let X 6= ∅ 6= Y without loss of generality. Assume further that a set of variables S separates X from Y in G and that (P) holds. The proof is a backward induction on the number of vertices z := |S| in S. Induction starts with the case of z = |V| − 2. In this case both X and Y obviously contain  exactly one vertex. It then follows from (P) that X ⊥u Y | V \ X, Y and thus (G) is shown. We therefore assume z < |V| − 2 and furthermore that separation implies conditional inde- pendence for all sets Si which separate X and Y and contain more than z elements. We now distinguish two cases:

1. X ∪ Y ∪ S = V

2. X ∪ Y ∪ S ⊂ V

Case 1: The equivalence V = X ∪ Y ∪ S obviously implies that at least one of the sets X and Y has more than one element. Let this be X, without loss of generality. Now, for any singleton variable Z ∈ X, the set S ∪ Z separates X \ Z from Y and also S ∪ X \ Z separates Z from Y. Hence, by the induction hypothesis

   X \ Z ⊥κ Y | S ∪ Z and Z ⊥κ Y | S ∪ X \ Z .

Applying the intersection property (Definition 3.47, page 102) yields X ⊥κ Y | S and thus (G) is shown.

Case 2: If X ∪ Y ∪ S ⊂ V we choose without loss of generality a singleton variable Z ∈ V \ X ∪ Y ∪ S.

108 3.4 Ranking Functions and Given Markov Graphs

Then it follows that S ∪ Z separates X and Y, implying

 X ⊥u Y | S ∪ Z .(3.3)

Again, we can distinguish two cases:

1. The set X ∪ S separates Y from Z. In this case we obtain

Z ⊥κ Y | X ∪ S.(3.4)

Applying the intersection property on (3.3) and (3.4), we derive that X ⊥κ Y | S and thus (G) is shown.

2. The set Y ∪ S separates X from Z. This case resolves analogously by application of the intersection property. 

We have shown that the following implication holds:

(G) =⇒ (L) =⇒ (P) =⇒ (G).

Note that this also implies the equivalence of the three properties.

Theorem 3.62 (Pearl & Paz) For a given NRF κ and an undirected graph G, the following statements are equivalent:

1. κ satisfies the global Markov property w.r.t. G.

2. κ satisfies the local Markov property w.r.t. G.

3. κ satisfies the pairwise Markov property w.r.t. G.

It is obvious that theorem 3.62 is a direct consequence of lemma 3.60 and 3.61, therefore no proof is necessary. On page 103 in the end of section 3.2 two questions were raised. The first question whether NRFs have adequate graphical models was already answered positively. Each NRF has a Markov graph. The other question was whether given graphs can be assigned an NRF, such that the graph is a representation of the NRF. We will consider this question now.

3.4 Ranking Functions and Given Markov Graphs

3.4.1 Potential Representation of Negative Ranking Functions

As was already pointed out previously, it would be very convenient to find a formal repre- sentation of an NRF that can be derived directly from the graph structure because this would enable us to compute the joint NRF κv directly from the graph. For this, we will consider V as the union of a set of subsets and concentrate on those subsets.

109 III Graphical Models for Ranking Functions

Definition 3.63 (Potential Representation, Potential Function) Let V be a finite set of variables   and let κ be an NRF for the algebra A V . Let Wi : 1 ≤ i ≤ p be a set of subsets of V such that S ∞ V = Wi : 1 ≤ i ≤ p and let ψi be a function from the set of possible values of Wi into N . Let  further ψ := ψi : 1 ≤ i ≤ p . A possible compound value for a set Wi is w.l.o.g. denoted by wi.  The representation Wi : 1 ≤ i ≤ p ; ψ of κ is called a potential representation of κ iff for each compound value v := w1 ∩ w2 ∩ ... ∩ wp that V can take it holds that

p  κ v = K + ∑ ψi(wi) (3.5) i=1 where   K := − min ψi wi (3.6) 1≤i≤p is a normalization constant. In this case, functions ψi are called potential functions. The normalization constant K simply ensures that κ if defined by a potential representation conforms to the definition of an NRF, i.e. it may take 0 as an actual value. The constant K is added to adjust the range of possible rank values such that its actual minimum is ensured to be 0.

Note that the symbol ψ is interpreted as a set containing for each i exactly one ψi corre- sponding to Wi ∈ V. Note also that it is not required by definition 3.63 that the subsets Wi of V are pairwise  disjoint. Thus, any singleton variable X contained in at least one element Wi ∈ Wi : 1 ≤ i ≤  p may also be contained in any other of the variable sets in Wi : 1 ≤ i ≤ p . It is therefore a requirement of the definition of ψi to ensure that variables being elements p of different sets Wi, Wj, . . . are respected only once while computing the sum ∑i=1 ψi(wi). Thus, whether a valid potential representation of κ can be constructed entirely depends on   the definition of the functions ψi. A naïve definition of ψi as for instance ψi wi := κ wi will obviously not be sufficient. Definition 3.64 (Residua and Separators) Let V be a finite set of variables and let κ be an NRF for   the algebra A V . Let Wi : 1 ≤ i ≤ p be a set of subsets of V. For 1 ≤ i ≤ p let the sets Ri and Si be defined as follows:

 Si := Wi ∩ W1 ∪ ... ∪ Wi−1

Ri := Wi \ Si.

Then, the set Ri is called residuum of the set Wi and the set Si is called separator of the set Wi.

Note that Si is the set of variables that for each 1 < i ≤ p separates the residuum Ri of the subset Wi from the residua of all other subsets Wj with 1 ≤ j < i. Some authors also prefer the terms “residual set” and “separator set” instead. Definition 3.63 can provide us with a blueprint for deriving an NRF from a given graph. But this requires developing two further conceptions: to be precise we need first a method to deduce the set of subsets of V from the graph structure. Additionally, we need an appropriate way to define the functions ψi such that the resulting function is an NRF.

110 3.4 Ranking Functions and Given Markov Graphs

The first requirement will be addressed in the subsequent section. To address the second requirement, some intermediate work is to be done. We will therefore postpone the analysis of ψ to section 3.6.6.

3.4.2 Representations of Negative Ranking Functions by Markov Graphs

Considering an undirected graph G := hV,Ei, we note that the cliques of G are a complete set of subsets of V as it is used by theorem 3.63. Thus, we can generate a complete set of subsets of V by enumerating the cliques of the graph. Pearl made the observation that a probability distribution factorizes over the cliques of a graph G in a special way if and only if G is triangulated. Pearl describes this in theorem 8 in (Pearl, 1988b, p. 115). He calls the resulting representation “product representation”. A rank-based version of his observation is as follows.

Theorem 3.65 (Additive Clique Representation of κ) Let G := hV,Ei be an I-map of κ and let

G be triangulated. Let C1, C2,..., Cp be the cliques and S1, S2,..., Sq the corresponding separators of the cliques in G. Be v w.l.o.g. an actual value for V, which induces the actual values ci for all Ci and a forteriori the actual values sj for all separators Sj. Then it holds that:

p q    κ v = ∑ κ ci − ∑ κ sj . i=1 j=1 which is called an additive clique representation of κ.

Proof: Let T be the clique tree of G. Since G is an I-map of κ, T is also an I-map of κ. Let   C1, C2,..., Cp be an ordering of the cliques of G such that for every i with 1 ≤ j < i < p 33 we have a unique integer j(i) < i such that Cj(i) is parent of Ci in T . Then clearly, Cj(i) separates Ci from C1, C2,..., Ci−1 in any such ordering. Applying the chain rule formula to the cliques, we obtain

p   κ c1 ∩ c2 ∩ ... ∩ cn = ∑ κ ci | c1 ∩ c2 ∩ ... ∩ ci−1 i=1 and because of the I-mapness of T w.r.t. κ this resolves to:

p  = ∑ κ ci | cj(i) . i=1

Because of the I-mapness of G it furthermore holds that all variables in Ci are separated from all variables Xi 6∈ Cj(i) ∩ Ci by the set Cj(i) ∩ Ci. Or, for short, Cj(i) ∩ Ci separates Ci and  V \ Cj(i) ∩ Ci . It therefore holds that

p p   ∑ κ ci | cj(i) = ∑ κ ci | ci ∩ cj(i) i=1 i=1

33Definition 3.70 on page 121 will introduce this property as consonance.

111 III Graphical Models for Ranking Functions which resolves to

p p   ∑ κ ci | ci ∩ cj(i) = ∑ κ ci | sj(i) i=1 i=1 p     = ∑ κ ci ∩ sj(i) − κ sj(i) i=1 p q   = ∑ κ ci ∩ sj(i) − ∑ κ sj i=1 j=1 p q   = ∑ κ ci − ∑ κ sj .  i=1 j=1

The additive clique representation shows a way to compute the joint negative rank for hypotheses over V for the class of triangulated Markov graphs. Note that this property is useful in the same contexts in which the chain rule is useful since it makes it possible to sum up the results of local computations to the joint rank value. This idea can be generalized to non-triangulated Markov graphs by defining how a potential representation of the NRF in question can be obtained by local computations on cliques.

Theorem 3.66 (Given Graphs have Markov fields) Let G := hV,Ei be an undirected graph and  let Ci : 1 ≤ i ≤ p be the complete set of cliques of G. Let all the corresponding actual values ci be induced by the joint actual value v for V. Then define an NRF κ by a potential representation as follows p   κ v := K + ∑ ψi ci i=1 with the normalization constant K defined as by (3.6) and such that

p !  min ψi ci 6= ∞. ∈ ( ) ∑ v cd V i=1

Then, G is an I-map of κ.

Proof: Theorem 3.62 says that the three Markov properties are equivalent for NRFs. Thus, to show the I-mapness of G it is sufficient to proof that κ satisfies one Markov property w.r.t. G. We proof that κ satisfies the local Markov property w.r.t. G. Therefore, it must be shown for each X ⊆ V that   X ⊥κ V \ clsr X | bdryκ X .    Let now Y := V \ clsr X and Z := bdryκ X = adj X . We will presuppose that v induces each actual value w of any W ⊂ V. Then due to definition 2.48 w.l.o.g. for any realizations x, y, z the corresponding variables can take, the local Markov property is equivalent to

κx ∩ y | z + κx ∩ y | z = κx ∩ y | z + κx ∩ y | z

112 3.4 Ranking Functions and Given Markov Graphs which, by applying definition 2.19, resolves to

κx ∩ y ∩ z + κx ∩ y ∩ z = κx ∩ y ∩ z + κx ∩ y ∩ z.(3.7)

The proof is completed when it is shown that (3.7) holds. Following the definition in theorem 3.66, we assume that the factorization property holds:

κx ∩ y ∩ z = κv p  = K + ∑ ψi ci i=1   = K + ∑ ψi ci + ∑ ψi ci . i :X∈Ci i :X6∈Ci

From now on, we leave out the normalization constant K for simplicity. = S ∈ We now define set T : i:X6∈Ci Ci. Note that Y T. Note that each vertex adjacent to X can be contained or not be contained in T, which means T may or may not contain Z. Furthermore, in general it holds that Y ⊆ T ⊆ V \ X = Y ∪ Z. Let now S := T \ Y and further, for any value x the variable X can take and realizations z and s:

  ∑ ψi ci =: φ1 x, z i :X∈Ci   ∑ ψi ci =: φ2 y, s i :X6∈Ci which leads to

   κ x ∩ y ∩ z = φ1 x, z + φ2 y, s .

Each of the four terms κx ∩ y ∩ z, κx ∩ y ∩ z, κx ∩ y ∩ z and κx ∩ y ∩ z occurring in

(3.7) can be expressed using φ1 and φ2:

   κ x ∩ y ∩ z = φ1 x, z + φ2 y, s (3.8)    κ x ∩ y ∩ z = φ1 x, z + φ2 y, s (3.9)    κ x ∩ y ∩ z = φ1 x, z + φ2 y, s (3.10)    κ x ∩ y ∩ z = φ1 x, z + φ2 y, s (3.11)

Since z and s remain constant, the sum of (3.8) and (3.9) obviously equals the sum of (3.10) and (3.11), thereby implying (3.7). Thus, the local Markov property is shown34. 

Since the potential representation of an NRF can be defined on arbitrary graphs, each graph has obviously a Markov field. This means for each graph G, an NRF κ can be defined such that G is an I-map of κ and, equivalently, κ is a Markov field of G.

34 Thanks to Wolfgang Spohn for pointing me to the strategy of using Y, Z and S for defining φ1 and φ2.

113 III Graphical Models for Ranking Functions

Furthermore, Markov fields have obviously always a potential representation since such a representation can be derived from the cliques of G. This insight can equivalently be expressed in the following way: if an NRF κ has a potential representation based on the cliques of a graph G, then G is an I-map of κ. If an NRF κ has a representation as given in theorem 3.66, the relationship between κ and G can also be expressed by saying that κ factorizes according to G like (Lauritzen, 1996) does (cf. ibid. p. 34). The author would prefer to say instead that κ factorizes according to the cliques of G but we will mostly continue with the already established short form of factorization property. In analogy to symbolizing the Markov properties by upper-case letters (G), (L) and (P) in parentheses, we will denote the factorization property by the symbol (F). We have already proven that (G), (L) and (P) are equivalent. The proof of theorem 3.66 – while in fact showing that (F) =⇒ (L) holds – also implies that (F) =⇒ (G) holds. The description of the relationship between (F) and the three Markov properties can be completed by additionally proving that at least one of the Markov properties implies (F) such that the equivalence of (F) with the Markov properties is proven. There exists a corresponding definition of potential representations in probability theory. To be precise, a probability function with positive and continous density with respect to some product measure can be represented by factorizing over the cliques C1, C2 ... Cp of a graph G: A probability distribution p  P x = K · ∏ ψi(ci) i=1 −  p p  1 with the normalization factor K := ∑i=1 ∏i=1 ψi(ci) is called a Gibbs random field or Gibbs distribution. For probability theory, the theorem of Hammersley and Clifford states that a probability distribution P is a Gibbs distribution with respect to a graph G if and only if P satisfies the three Markov properties with respect to G – based on the probabilistic notion of conditional independence of course. Thus the connection between the potential representation of κ and its fulfillment of the Markov properties w.r.t. G can be made even tighter by the following theorem.

Theorem 3.67 (Hammersley-Clifford for Ranks) An NRF κ satisfies the Markov properties w.r.t. an undirected graph G iff it factorizes according to the cliques of G.

Proof: Since we have already shown that (G) =⇒ (L) =⇒ (P) in the proof of lemma 3.60 and (F) =⇒ (G) in the proof of theorem 3.66 it remains to be shown that (P) =⇒ (F) to prove the theorem. Hence, it is to show that for any NRF κ satisfying (P) there exists a graph G such that κ has a representation p   κ v := K + ∑ ψi ci i=1 with respect to the actual values ci of all the cliques C1, C2,..., Cp of G and the actual value v of V (which induces all the ci).

114 3.4 Ranking Functions and Given Markov Graphs

We assume that κ satisfies the pairwise Markov property. Now let

∀W ⊆ V : T := V \ W for each subset W ⊆ V without loss of generality. We choose an arbitrary but fixed actual compound value t∗ for T. Let further w be an actual value of W such that w ∩ t∗ = v and w ∩ t∗ in fact represents an actual instantiation of V. We now define the following function H for each W ⊆ V:

 ∗ ∀W ⊆ V : HW v := κ w ∩ t .(3.12)

Note that the actual values of all variables in T are fixed by the vector t∗ while the variables in W are assigned the actual value w. Since t∗ is fixed, the value of κw ∩ t∗ depends on V only by W. Note that (3.12) implies that   HV v = κ v (3.13) since W := V entails T = ∅. We further define for each W ∈ V the following function, whereby s, the actual value of the corresponding compound S, is induced by the corresponding value w:

  W\S  ∀W ⊆ V : φW v := ∑ −1 HS v (3.14) S : S⊆W

To obtain a definition of HV that is only based on φ, we apply the Möbius inversion lemma to the definition of φW. The Möbius inversion lemma allows us to use the following equivalence:

  W\S    ∀W ⊆ V : φW v = ∑ −1 HS v ⇐⇒ HV v = ∑ φW v .(3.15) S : S⊆W W : W⊆V

By applying (3.15) to (3.14) we derive

  HV v = ∑ φW v (3.16) W : W⊆V

From (3.13) and (3.16) it follows that:

  κ v = ∑ φW v (3.17) W : W⊆V

We recognize that (3.17) formally equals the definition of a potential representation of κ with φ as the set of potential functions. However, this is in fact not true: (3.17) refers to functions φ that are defined over all subsets W of V while a potential function is defined for all and only the complete subsets of V.  Therefore, to recognize φW as a potential function, it remains to be shown that φW v = 0 for all subsets W ⊆ V that are not complete. The proof is completed by showing that φW is

115 III Graphical Models for Ranking Functions indeed a potential function.  Assume that without loss of generality X  Y for some set of vertices X, Y ⊆ W and define set U := W \ X, Y . (Note that since X and Y are not adjacent, W is not complete.)

  W\S  φW v = ∑ −1 HS v S : S⊆W

 W\S  = ∑ −1 HS v  S : S⊆ U∪{X,Y} from which we obtain

 U\S      = ∑ −1 HS v − HS∪{X} v − HS∪{Y} v + HS∪{X,Y} v . S : S⊆U

The proof is completed by showing that the term

    HS v − HS∪{X} v − HS∪{Y} v + HS∪{X,Y} v (3.18) is always zero.

To ease up notation we define R := V \ S ∪ X, Y  and consider x, y, r and s as actual values for the corresponding variables, whereby all actual values occurring in the computa- tion are induced by v, the actual compound value for V. Actual values that we consider as constants are marked with a star ∗.

We obtain

  ∗ ∗ ∗ HS∪{X,Y} v − HS∪{X} v = κ s ∩ x ∩ y ∩ r − κ s ∩ x ∩ y ∩ r = κx | s ∩ y ∩ r∗ + κs ∩ y ∩ r∗   − κx | s ∩ y∗ ∩ r∗ + κs ∩ y∗ ∩ r∗ .

Now apply (P), which states the independence of X from Y and obtain

= κx | s ∩ r∗ + κs ∩ y ∩ r∗   − κx | s ∩ r∗ + κs ∩ y∗ ∩ r∗ .

Consider the actual value x of X as a constant

= κx∗ | s ∩ r∗ + κs ∩ y ∩ r∗   − κx∗ | s ∩ r∗ + κs ∩ y∗ ∩ r∗ .

116 3.4 Ranking Functions and Given Markov Graphs

Apply (P) to reinsert y, then apply the chain rule to eliminate the sums

= κs ∩ x∗ ∩ y ∩ r∗ − κs ∩ x∗ ∩ y∗ ∩ r∗   = HS∪{Y} v − HS v . 

A briefer version of the proof for the probabilistic version of the Hammersley-Clifford- theorem can for instance be found in (Lauritzen, 1996, p. 35f)35. The strategy of using the inclusion/exclusion principle or especially the Möbius inversion lemma for the proof was independently applied by (Sherman, 1973), (Grimmett, 1973) and (Preston, 1973). Summing up the known implication relationships between the factorization property and the Markov properties, the following picture arises (with the number of the establishing the- orem printed beside each implication arrow):

(L) 3.66 3.59

(F) 3.60 (G)

3.67 3.61 (P)

It is therefore clear now that (F) and the three Markov properties are in fact equivalent. We have now developed a well-defined formal concept of how to derive a set of subsets of V from the graph. We need to apply the potential functions on this set of subsets to obtain a representation of the NRF. The set of subsets in question can just be the set of cliques of the graph. We have therefore answered successfully the first of the requirements declared at the end of paragraph 3.4.1 on page 110.

The second requirement remains to be answered: how are the potential functions ψi to be defined for obtaining a potential representation of κ from the given set of cliques of G? So far, nothing is said about their concrete definition. Of course, during the update their actual value has to be computed by the data that is accessible from the context. However, before we can give the actual answer, we have to do some further work. Since the answer is postponed, here is an outline of what is to follow until ψi can be defined. Remember the chain rule for negative ranks from corollary 2.21 on page 60. Using this rule we can derive the joint rank value of a given set of propositions by summing up the conditional ranks on subsets of the original set of propositions. The precondition is, of course, that the propositions are ordered by some index. Certainly, this rule is the blueprint for the definition of the potential functions. But to make this idea functional, it has to be connected with the notion of the Markov boundary. To be precise: while summing up we only have to take into account the members of the relevance boundary of the given variable since all actual values in variables beyond this border do not influence the updated value. The Markov boundary can – as we will point out in the further

35The author oriented the design of his proof on Lauritzen’s representation of the probabilistic version of the proof. Lauritzen uses a two-place function to express fixed parts in the actual value. This can more elegantly be achieved using our notation for actual values. The proof was also supplemented with the concrete arithmetic solution step to show that (3.18) always resolves to zero.

117 III Graphical Models for Ranking Functions analysis – be expressed more effectively when respecting directions of influential relationships between variables. Consequently, we have to redo for DAGs what we already have done so far for undirected graphs.

After having completed these steps, the question for the definition of ψi will receive its answer in section 3.6.6 by theorem 3.84.

3.5 RCI and Undirected Graphs

We should especially stress that RCI being a graphoid in particular does not entail that each NRF has a perfect map. At this point, this would exactly be the most helpful equivalence but it cannot be derived from the graphoid properties and is indeed not true. Nonetheless, there is a list of necessary and sufficient properties a dependency model must satisfy, such that the existence of a perfect map can be derived axiomatically. It was introduced in (Pearl & Paz, 1985) and is also discussed and proven in (Pearl, 1988b, p. 93–95).

Theorem 3.68 (Graph-Isomorphic Dependency Model) A necessary and sufficient condition for a dependency model M to be a graph-isomorph is that X ⊥ Y | Z in M satisfies the following five independent axioms:

if X ⊥ Y | Z, then Y ⊥ X | Z (symmetry) if X ⊥ Y ∪ W | Z, then X ⊥ Y | Z and X ⊥ W | Z (decomposition) if X ⊥ Y | Z ∪ W and X ⊥ W | Z ∪ Y then X ⊥ Y ∪ W | Z (intersection) if X ⊥ Y | Z, then X ⊥ Y | Z ∪ W (strong union) if X ⊥ Y | Z then X ⊥ y | Z or x ⊥ Y | Z. (strong transitivity)

Note that the properties of symmetry, decomposition, and intersection are also properties of graphoids and RCI satisfies them as was shown in the proof of theorem 3.48. In fact, RCI is not a graph-isomorph since it violates as well strong union as transitivity. Consider the following example, borrowed from (Pearl, 1988b, p. 93):

Example 3.69 (Coin and Bell) Let X and Y be two coins that are thrown once, independently from each other. If and only if both coins show the same side, a bell will ring. Consequently, we consider three variables, X and Y, each representing the outcome of an- other of the two coins, and a third, Z, which represents whether the bell rings or not. It is very clear that, if Z is not instantiated, X and Y are unconditionally independent, i.e. X ⊥κ Y | ∅ since the outcome of a throw of one coin does not influence the outcome of the throw of the other coin. This view changes when we can access knowledge about the actual value of Z: knowing whether the bell rang or not and knowing if one of the coins, say X, showed a particular side, makes it possible to conclude the actual value of Y. That means, if Z is given, X and Y are not independent conditional on Z. How can this relationship be modelled adequately by an undirected graph?

118 3.6 Ranking Networks

It is clear that for a correct modeling, Z has to be connected with X and also with Y. Thus, there are in fact two options: either connect X and Y or not. We start with the latter option leading to the graph in figure 3.8.

This graph is not an I-map. Since Z separates X and Y it states that X ⊥κ Y | Z, which we already know is wrong. The only remaining option is to add an edge connecting X and Y. The result is the graph in figure 3.8.

Z Z

X Y X Y

Figure 3.8: Trial 1 Figure 3.8: Trial 2

Trial 2 yields a graph which is trivially an I-map since it is complete. This modeling does no longer reflect the fact that X and Y are genuinely independent since Z does not in fact have influence on their instantiation. Obviously, it is not possible to form an adequate graphical model for the example situation using only undirected graphs. 

We will return to this example at a later point in the analysis. For the moment it is suf- ficient to note that the actual RCI-model ⊥κ in the example violates strong union as well as transitivity.  Strong union is obviously violated since it requires X ⊥κ Y | ∅ to entail X ⊥κ Y | ∅ ∪ Z for example, which is not true. A violation of transitivity can be shown in a version of the example when one of the coins is not fair. In this case, the two possibilities – the bell rings or does not ring – depend sep- arately on the outcome of each coin. On the other hand, the outcome of each coin remains independent of the outcome of the other coin. We can therefore see that RCI is not a graph-isomorph, implying that actual RCI-models may exist that cannot be modeled adequately by undirected graphs.

3.6 Ranking Networks

3.6.1 DAGs as Graphical Models

After having seen that undirected graphs cannot model all cases of actual RCI-models, we are motivated to search for another type of graphical model. Since we are also interested in expressing influence relationships that are typically asym- metric, it is obvious that undirected graphs are anyhow not in all cases the best device for developing graphical models of NRFs. We lack a device for expressing the directedness of the influence a variable has on another. As an undirected edge between to vertices can only state that in fact there is some influential relationship between the vertices, the directed edge denotes also the source and the target of the influential effect.

119 III Graphical Models for Ranking Functions

A directed edge between two vertices u and v can for instance be interpreted as a causal or temporal relationship between u and v. In the causal reading, this means that the edge represents the knowledge that the realization of u causes v to be instantiated with a particular value. (It is very sloppy – and indeed without meaning – to say ‘u causes v’, although, it sometimes happens to be uttered.) Consider the following example: variable X represents the pH-value of some liquid. Vari- able Y represents the color of a given piece of litmus paper after it had physical contact with the liquid. Let now Y be instantiated with the value “red”. A directed edge from X to Y interpreted causally claims that the litmus paper turned red because the pH-value of the liquid took a value lower than 7. Hence, variable X taking its particular value determined Y taking its actual value. In a temporal reading, the directed edge represents the knowledge that u is always instanti- ated before v is instantiated. This does not imply a causal relationship. Let X be the birthday of a person and Y the day the person dies. Then the directed edge from X to Y represents the fact that a person is born before she dies. This does in no way claim that a person dies because she was born. On the other hand, a directed causal influence from X to Y can only be stated if X is, in a temporal reading, instantiated before Y is instantiated. Hence, a particular edge from X to Y denotes in fact two particular pieces of information about the relationship between Y and X:

1. X is instantiated before Y is instantiated.

2. X taking its actual value has a causal influence on which actual value Y takes.

The convention in probability theory has lead to directed acyclic graphs – or “DAGs” for short – as they are defined in section 3.1.5 being the standard graphical representation for relevance models. How can a given NRF be represented by a DAG? This can indeed be deduced very in- tuitively: take the set of variables V, then connect each variable X that is known to have a direct influence on another variable Y by a directed edge hX, Yi. The resulting DAG is also called “influence diagram”. Influence diagrams correspond to NRFs by the underlying set of variables and the independence relationships among them. Although, this seems nearly trivial at first sight, the precise relationship between NRFs and their influence diagrams has to be analyzed to obtain more clarity. Most of these thoughts are widely analyzed for probability models, however, except for (Spohn, 2012, chapter 7), (Hunter, 1991a), and (Hunter, 1988) almost no material is found about specifically graphical models for NRFs. In the following sections we will show the strong relationship between DAGs and NRFs, which will lead to the insight that for each DAG there is an NRF such that both are instan- tiations of the same set of independence relationships among the underlying set of variables. Furthermore, we will analyze the Markov properties and the factorization property on di- rected graphs and find that they are in the same equivalence relationship as in the general case for undirected graphs.

120 3.6 Ranking Networks

3.6.2 Strict Linear Orderings on Variables

We begin with a set of variables V and an NRF κ over AV. Let θ be a strict linear ordering on the variables in V, meaning that θ is an asymmetric (and hence irreflexive), transitive and trichotomous36 total ordering of the variables in V. The vertices of a network can be ordered in a way that respects the connections of the network.

Definition 3.70 (Consonance of G with θ) Let G := hV,Ei be a DAG and let θ be a strict linear ordering on V. If E ⊆ θ, i.e. if for each pair of distinct variables X, Y ⊆ V the fact X 7→ Y entails

X <θ Y then G is called to be consonant with θ.

We further define:  Definition 3.71 (Predecessors of Vi in θ) For a set of variables V := X1,X2,...Xn and a strict  linear ordering θ on V, the set Xi ∈ V :Xi <θ Xj is called the set of predecessors of Xj w.r.t. θ,   denoted by predθ Xj , and each element of predθ Xj is called a predecessor of Xj in θ.

We now generate a list of all known independence relationships in κ on the basis of θ.

Definition 3.72 (Causal List) Let V be a set of variables over the algebra AV and θ a strict linear  ordering on V. Let Lθ be a set of statements of the form X ⊥κ predθ X \ Y | Y with X ∈ V and  Y ⊆ predθ X . Then, if Lθ only contains independencies that are also present in κ,Lθ is called a causal list of κ with respect to θ.

Intuitively, a causal list is a group of independence statements that characterizes a set of NRFs over AV.

Note that a causal list Lθ entails conditional independence relationships that are not explic- itly contained in Lθ. An intuitive example is that a variable X is independent from all of its non-descendants, given that the parents of X are already instantiated with concrete values.

We say that a causal list Lθ entails an independence statement I if and only if every NRF that satisfies every statement in Lθ also satisfies I. This is denoted by Lθ  I. If an additional independence assertion I can be logically derived from Lθ when a set of additional assertions S is available, we denote this by Lθ ∪ S ` I. We call a causal list consistent if it does neither contain nor entail any set of independency statements, which cannot both be true among the same NRF. An independency statement X ⊥κ Y | Z is exhaustive over a set of variables V if X ∪ Y ∪ Z = V. In the remainder of this chapter we will shorten the term “causal list” to CL wherever the context is unambigious. To produce an influence diagram from a given a CL, start with a non-empty set V of vertices  and an empty set of edges E. Then, for each statement X ⊥κ predθ X \ Y | Y in Lθ add a directed edge hYi,Xi to E for each Yi ∈ Y to make each Yi a parent of X. The resulting DAG hV,Ei is an I-map of κ.

36An order < is trichotomous iff for each X, Y ⊆ < it holds that exactly one of the statements X < Y, Y < X and X = Y is true.

121 III Graphical Models for Ranking Functions

We therefore note that given the set V, the NRF κ, the ordering θ and either a CL or an influence diagram, we can construct the missing CL from the given influence diagram and vice versa, which is quite trivial. The interesting part begins where it is required to construct an influence diagram from only a set of variables V and data about the actual rank values, a quite common and highly non-trivial case. Techniques for this task are found in the topic of machine learning, especially Bayesian structure learning. But this is not the subject in this state of our current investigation.

3.6.3 Separation in Directed Graphs

Next, we define what graphical relationships between vertex sets are usually to be considered as a standard implementation of conditional independence in directed acyclic graphs. A path P connecting vertices X and Y in a DAG is said to be blocked by the vertex set S if P contains a vertex Z ∈ P such that either:

1.Z ∈ S and directed edges of P do not meet head-to-head at Z, or

2.Z 6∈ S and S furthermore contains no descendants of Z and directed edges of P do meet head-to-head at Z.

A path that is not blocked is said to be active. If and only if all paths that connect X and Y are blocked by S, X and Y are said to be d-separated by S, denoted by X ⊥d Y | S. D-Separation can be expressed in an alternative way, because the following holds:

Lemma 3.73 Let D := hV,Ei be a DAG and let the compounds X, Y and S be disjoint subsets of V. Let U := AnX ∪ Y ∪ S the minimal ancestral set over the union of those compounds. Let DU be     the subgraph of D induced by U and let D U m be the moral graph of D U .   It then holds that S d-separates X from Y in D if and only if S separates X from Y in D U m.

A proof for this lemma can for instance be found in (Lauritzen, 1996, p. 48f). D-Separation is a graphoid. This fact can be understood consulting (Pearl, 1988b) and especially (Verma & Pearl, 1988). D-separation and RCI do not have the same properties. In fact, d-separation is stricter than RCI. An example for this is the property of weak transitivity that d-separation satisfies while RCI-models may violate it:

 X ⊥d Y | S ∧ X ⊥d Y | S ∪ Z =⇒ X ⊥d Z | S ∨ Z ⊥d Y | S.

Nonetheless, d-separation is a sufficient implementation of conditional independence in probabilistic and also ranking-theoretic context.

As we have seen by lemma 3.73, ⊥d can be defined using ⊥u. Thus, instead of giving an independence statement in terms of ⊥d, we always have the option to give an equivalent statement in terms of ⊥u. Therefore, we will mostly just say that S separates X from Y if it is clear from the context whether separation in the undirected or d-separation in the directed case applies.

122 3.6 Ranking Networks

Pearl used the concept of d-separation to develop the directed version of the global Markov property and so do we now.

3.6.4 Directed Markov Properties

The next step will be to transfer the general Markov properties to appropriate corresponding statements about directed acyclic graphs.

Definition 3.74 (Ranking Network, Local Directed Markov property) Let V be a set of vari- ables, κ an NRF for the algebra AV and D := hV,Ei a DAG. An ordered 3-tuple hV, E, κi such that it holds for each variable X ∈ V that

X ⊥κ nd (X) | pa (X) is called a ranking network. Equivalently, we call κ to satisfy the local Markov property w.r.t. D.

The defining property is also called local directed Markov property, which entails that ranking networks fulfill the directed variant of the local Markov property trivially. Note that we use the notions “DAG” and “network” in a way that technically a network is simply a DAG. The notion of a network reflects the fact that we assign semantics to this particular kind of DAG that expresses the relationships between the vertices of the DAG. This is the reason why the author prefers the notion “ranking network” over “ranking graph” or “ranking DAG”. But a DAG does not have any inherent properties that make it a network, it is simply considered in a way that suggests to call it a network. From definition 3.74, the chain rule for ranking networks can be derived straightforwardly.

Theorem 3.75 (Chain Rule For Ranking Networks) Let hV, E, κi be a ranking network. Let v be an actual value for V, inducing w.l.o.g. the actual value x for each variable X ∈ V and paX, the actual value for the corresponding compound pa(X). Then it holds that

  κ v = ∑ κ x | paX .(3.19) X∈V

Proof: Since hV, E, κi is a DAG, it is possible to define an ordering γ of the vertices of V such that for every Xi ∈ V it holds that the ancestors of Xi are ordered before Xi in γ:

 max j :Xj ∈ an (Xi) < i and hence Xj <γ Xi for each Xj ∈ an (Xi) .

  An ordering with this property is called ancestral ordering of V. Let γ := X1,X2,...,Xn be an ancestral ordering of V. Then, for any given sequence x1, x2,..., xn of actual values for all Xi ∈ V, one of two cases hold.  Case 1: κ x1 ∩ x2 ∩ ... ∩ xn < ∞.

It then holds for 1 ≤ i ≤ n that

  κ x1 ∩ x2 ∩ ... ∩ xn = min x1 ∩ x2 ∩ ... ∩ xn hx1,x2,...,xni∈{x1}×{x2}×...×{xi}×Xi+1×...×Xn

123 III Graphical Models for Ranking Functions

The repeated application of the rule of conjunction (corollary 2.20) leads to the chain rule (corollary 2.21):

  κ x1 ∩ x2 ∩ ... ∩ xn = κ xn | x1 ∩ x2 ∩ ... ∩ xn−1  + κ xn−1 | x1 ∩ x2 ∩ ... ∩ xn−2 + ...   + κ x2 | x1 + κ x1

For all 1 ≤ i ≤ n all parents of Xi are ordered before Xi,

 pa (Xi) ⊆ X1,X2,...,Xi−1 , hence, it holds that

  X1,X2,...,Xi−1 = X1,X2,...,Xi−1 ∪ pa (xi) .

Note that also all descendants of Xi are ordered after Xi:

 X1,X2,...,Xi−1 ⊆ dn (Xi) .

By the definition of a ranking network (definition 3.74) it follows:

κx | x ∩ x ∩ ... ∩ x  = κx | pa  i 1 2 i−1 i Xi and thus    κ x1 ∩ x2 ∩ ... ∩ xn = κ xn | paX + κ xn−1 | paX + ... n n−1 (3.20) + κx | pa  + κx . 2 X2 1

Since X1 is a root vertex, pa (X1) = ∅, which completes the proof for case 1.  Case 2: κ x1 ∩ x2 ∩ ... ∩ xn = ∞.

Since the value ∞ cannot be generated by summing up finite values, there has to be a particular value xi such that xi = ∞. We can distinguish two cases:  Case 2.1: κ x1 = ∞

Obviously, (3.20) holds, which immediately completes the proof for this case.

 Case 2.2: There exists some vertex Xi ∈ V with 2 ≤ i ≤ n such that κ x1 ∩ x2 ∩ ... ∩ xi = ∞  and κ x1 ∩ x2 ∩ ... ∩ xi−1 < ∞.

For this i it obviously holds that

   κ xi | x1 ∩ x2 ∩ ... ∩ xi−1 = κ x1 ∩ x2 ∩ ... ∩ xi − κ x1 ∩ x2 ∩ ... ∩ xi−1 = ∞ and the same conclusion as in case 2.1 becomes applicable: a simple application of definition

124 3.6 Ranking Networks

2.19 leads to κx | pa  = κx | x ∩ x ∩ ... ∩ x  = ∞, i Xi i 1 2 i−1 which completes the proof for case 2.2. 

We will call the sum term on the right side of (3.19) the additive decomposition term of the rank value κv. It compares directly to the factorization of a Bayesian network. Since the chain rule enables us to concretely compute the joint NRF κv, theorem 3.75 is the most important theorem for computing updates. It is the rank-based version from Pearl’s corollary 3 in (Pearl, 1988b, p. 119). A detailed proof for the probabilistic version of the theorem can be found in (Neapolitan, 1990, p. 162+163). A global directed Markov property is defined corresponding to the local directed Markov property.

Definition 3.76 (Global Directed Markov Property) Let V be a set of variables and κ an NRF for the algebra AV. Function κ is called to satisfy the global directed Markov property w.r.t. a DAG D := hV,Ei iff for each singleton variable X ∈ V and variables Y,Z ⊆ V it holds that

X ⊥d Y | Z =⇒ X ⊥κ Y | Z.

Note that lemma 3.73 would allow us to define the directed global Markov property just in terms of undirected separation. We use the notion of d-separation just for clarity and simplicity. While speaking about DAGs we will continue to call a DAG D an I-map of an NRF κ if and only if κ satisfies the directed global Markov property w.r.t. D. The pairwise directed Markov property on directed graphs is defined as:

Definition 3.77 (Pairwise Directed Markov Property) Let V be a set of variables and κ an NRF for the algebra AV. The function κ is called to satisfy the pairwise directed Markov property w.r.t. a DAG D := hV,Ei iff for each pair of non-adjacent vertices X,Y with X, Y ⊆ V and Y ∈ nd (X) it holds that  X ⊥κ Y | nd (X) \ Y .

We denote the presence of the directed local Markov property by (DL), while the directed global Markov property is denoted by (DG), and for the directed pairwise Markov property we use the symbol (DP). The directed Markov properties are stronger versions of the general Markov properties. Nonetheless, the proofs of their equivalence are mostly analogous to the proofs of the equiv- alence of the general Markov properties. This can easily be seen since in a DAG, we dinstinguish between children and parents of a vertex X as well as between its descendants and its non-descendants, the sets V \ clsrX and adjX can be represented by terms involving those subsets. Note especially that for directed graphs it holds that adjX = ch (X) ∪ pa (X). As a result, the directed Markov properties make assertions about subsets of the sets the general Markov properties speak about.

125 III Graphical Models for Ranking Functions

Lemma 3.78 For any directed graph D and an NRF κ such that it holds that κ satisfies the directed pairwise Markov property w.r.t. D it also holds that κ satisfies the directed local Markov property and vice versa. Directed local Markov property and directed pairwise Markov property are therefore equivalent for any NRF κ.

Proof: It is to show that (DL) ⇐⇒ (DP) for any κ.

(DL) =⇒ (DP):

Assume that (DL) holds. Consider any vertex Y ∈ nd (X) \ pa (X), which is thus not adjacent to X. It obviously holds for X and Y that:

pa (X) ∪ nd (X) \ pa (X) \ Y  = nd (X) \ Y .

Then, as a direct consequence of (DL), it follows:

 X ⊥d nd (X) | nd (X) \ Y .

Further, since Y ∈ nd (X), it holds a forteriori that

 X ⊥d Y | nd (X) \ Y which is (DP).

(DP) =⇒ (DL):

Assume that (DP) holds. Let set S := nd (X) \ pa (X) \ Y . Consider any vertex Y ∈ nd (X) \ pa (X), which is thus not adjacent to X. It obviously holds for Y that:

pa (X) ∪ S = nd (X) \ Y .

Then, as a direct consequence of (DP), it follows:

X ⊥d Y | pa (X) ∪ S (3.21) and, since S ⊆ nd (X), (DP) also implies

 X ⊥d S | pa (X) ∪ Y .(3.22)

We apply the intersection property to terms (3.21) and (3.22) to derive

 X ⊥d S ∪ Y | pa (X).

 Since S ∪ Y = nd (X) \ pa (X) this is exactly (DL). 

The equivalence of (DP) and (DL) for ranking networks is a difference to probability theory. In probabilistic networks, (DL) also implies (DP), but the converse does not generally hold how (Lauritzen, 1996, p. 50–51) points out, stating a counterexample. The reason can easily

126 3.6 Ranking Networks be seen: we needed the intersection property for the proof that (DP) =⇒ (DL). Only strictly positive probability functions satisfy the intersection property. This restriction is not true in case of NRFs, which are always graphoids, regardless if they are regular or not.

Theorem 3.79 (Equivalence of the Directed Markov Properties) For a given NRF κ and a DAG D := hV,Ei, the following statements are equivalent:

1. κ satisfies the global directed Markov property w.r.t. D

2. κ satisfies the local directed Markov property w.r.t. D

3. κ satisfies the pairwise directed Markov property w.r.t. D.

Proof: Since lemma 3.78 has already stated that (DL) ⇐⇒ (DP) it remains to be shown that (DG) ⇐⇒ (DL).

(DG) =⇒ (DL):     Let X ∈ V and T := X ∪ nd (X). Let D T be the subgraph of D induced by T. Let D T m be the moral graph of DT. Assume that (DG) holds. Obviously, T is an ancestral set   and pa(X) separates X from the set nd (X) \ pa (X) in D T m which, by lemma 3.73, implies

X ⊥d nd (X) | pa (X) in D and because of (DG) also X ⊥κ nd (X) | pa (X) holds. Thus, (DL) is shown.

(DL) =⇒ (DG): Let X, Y, S ⊆ V be three variables. The proof is constructed by induction on the number |V| of vertices of D. For |V| ≤ 2 there is nothing to show. Assume that (DL) =⇒ (DG) is true for graphs with n vertices and assume that |V| = n + 1. To show that κ satisfies (DG), we only need to consider the case where the minimal ancestral set generated by X ∪ Y ∪ S equals V. The reason is that cases with smaller ancestral sets will follow from the inductive assumption combined with the observation that (DL) is inherited for marginal distributions to ancestral sets. We can thus always extend X and Y if necessary such that X ∪ Y ∪ S = V. Hence, these cases need no separate proof. Thus, assume that X ∪ Y ∪ S = V and that X is separated from Y by S in the moral graph

Dm of D. Let T be a leaf vertex in D and note that the separation stipulated implies that either pa (T) ⊆ X ∪ S or pa (T) ⊆ Y ∪ S since parents of X and Y would be connected in Dm.

Case T ∈ X:     Separation implies that pa (T) ⊆ X \ T . By (DL), we derive that T ⊥κ Y | X \ T .     Let Dm V \ T be the subgraph induced by the vertex set V \ T of the moral graph Dm         of D. Then, S separates X \ T from Y in Dm V \ T and, hence, also in D V \ T m, the moral graph of the subgraph DV \ T  induced by the vertex set V \ T of D, since    D V \ T m contains no additional edges. By the inductive hypothesis it holds that X \   T ⊥κ Y | S. From these two conditional independencies, we deduce that X ⊥κ Y | S.

Case T ∈ Y is analogous.

127 III Graphical Models for Ranking Functions

Case T ∈ S:  We observe that if S separates X from Y in Dm, then S \ T separates X from Y in the        subgraph Dm V \ T of Dm induced by V \ T and, hence, also in D V \ T m, the moral     graph of D V \ T . By the inductive hypothesis, we deduce X ⊥κ Y | S \ T .  Now, if pa (T) ⊆ X ∪ S property (DL) implies that T ⊥κ Y | S \ T . From this it follows that    Y ⊥κ X ∪ T | S \ T and thus X ⊥κ Y | S. Similarly in the case where pa (T) ⊆ Y ∪ S. 

The proof for (DL) =⇒ (DG) in a probabilistic context can be found in (Lauritzen et al., 1990, p. 502), the proof for (DG) =⇒ (DL) is also stated there on page 497. The author took both proofs from Lauritzen. Although, the properties of NRFs are not explicitly required to show the relationship, the proofs are reproduced for consistency of presentation since they are related to RCI instead of probabilistic conditional independence. In the case of directed graphs, global and local Markov properties of NRFs as defined above, are equivalent. For the case of probability functions, this equivalence does not hold in general, as it was already pointed out. For a profound analysis, the reader may consult (Lauritzen et al., 1990, p. 502). We are now aware of the following relationships between the directed Markov properties:

(DG) ⇐⇒ (DL) ⇐⇒ (DP).

After having proven the equivalence of the three general Markov properties we continued with the proof that an NRF that satisfies the general Markov properties also satisfies the factorization property and vice versa. The following section proves the same for the case of directed acyclic graphs, where we also introduce the basic computation rules for ranking networks.

3.6.5 Factorization in Directed Graphs

A directed version of the factorization property exists that is formally equal to the general factorization property already defined for undirected graphs in theorem 3.66. The shorthand symbol for the directed factorization property is (DF). We already know that (DG), (DL) and (DP) are equivalent. Before we can proceed to the technical details of epistemic updates, we have to complete the equivalence proof also for the directed case and show that (DF) is equivalent to the directed Markov properties.

Lemma 3.80 If κ factorizes according to the cliques of a DAG D := hV,Ei, then κ also factorizes according to the cliques of the moral graph Dm of D. Hence, Dm is an I-map of κ.

Proof: By definition of a moral graph, for any vertex set X ∈ V it holds that X ∪ pa (X) forms a clique in Dm, and we can therefore define an appropriate potential function ψX∪pa(X). The proof is completed by noticing that for theorem 3.66 it is already proven that the (undirected) factorization property implies the general local and thus the general global Markov property.

It is furthermore obvious that the following holds:

128 3.6 Ranking Networks

Lemma 3.81 Let κ be an NRF that factorizes according to the cliques of a DAG D := hV,Ei. Let S ⊆ V be an ancestral set and graph D[S] the subgraph of D induced by S. The marginal NRF corresponding to κ defined for AS then factorizes according to the cliques of D[S].

Lemmata 3.80 and 3.81 stem from their probabilistic counterparts in (Lauritzen, 1996, p. 47) and are reproduced for the ranking-theoretic context here.

Theorem 3.82 Let κ be an NRF that factorizes according to the cliques of a DAG D := hV,Ei. Let vertices X, Y, S ⊂ V be such that S separates X from Y in D. Let T := AnX ∪ Y ∪ S be the   minimal ancestral set generated by the union of X, Y and S. Let D T m be the moral graph of the     subgraph D T of D induced by T. Then, if X ⊥u Y | S in D T m then X ⊥κ Y | S.

Theorem 3.82 is a direct and obvious consequence of lemmata 3.80 and 3.81 and needs therefore no separate proof. Note that theorem 3.82 states that (DF) implies (DG) which means that factorization of κ over the cliques of D also implies the I-mapness of D w.r.t. κ.

Theorem 3.83 (Directed Local Markov Implies Directed Factorization) If an NRF κ satisfies the directed local Markov property w.r.t. a DAG D := hV,Ei it also factorizes according to the cliques of D.

Proof: The proof is done by an induction over the number of vertices |V| in D. The induction step is as follows: Let the singleton variable Z ∈ V be a leaf vertex in D and let U := V \ Z . Be z an actual value for Z, u an actual value for U, and v an actual value for V such that z ∩ u = v. We then  define function ψZ := κ z | u .

The directed local Markov property allows us to express ψZ in dependence of just pa(Z) and choose an arbitrary but fixed value for the actual value of the vector T := V \ pa (Z) ∪ Z . ∗ Let t be a fixed actual value of T and let paZ be the actual value of pa(Z) such that ∗ u = paZ ∩ t . This yields: ∗ ψZ = κ z | paZ ∩ t .

Furthermore, the marginal NRF κ∗ for AU satisfies the local Markov property trivially w.r.t. the corresponding induced subgraph DU of D. This is also stated by the inductive hypothesis. ∗ Combining the factorization property of κ with ψZ yields the fact that κ factorizes according to the cliques of D. 

The proof of theorem 3.83 was formed along the proof from (Lauritzen, 1996, p. 51). Since theorem 3.83 has stated that (DL) =⇒ (DF), we have now derived the follow- ing equivalence relations and implications between the directed Markov properties and the directed factorization property (with the number of the establishing theorem printed beside each arrow):

129 III Graphical Models for Ranking Functions

3.79 3.78 (DG) (DL) (DP)

3.82 3.83 (DF)

We now know that ranking networks always satisfy the factorization property and that satisfying the factorization property makes a DAG a ranking network.

3.6.6 Potential Representation of Ranking Networks

We already know potential representations of NRFs as they are introduced by definition 3.63. Especially note that the definition of potential representations relies on a number of p subsets  Wi : 1 ≤ i ≤ p of V such that the union of those subsets is equivalent to V. As was already pointed out in section 3.4, especially in the context of theorem 3.66 it is convenient to just consider the cliques of the network as those subsets and define a potential representation of κ that is directly derived from the graph structure.

Theorem 3.84 (Potential Representation of a Ranking Network) Let R := hV, E, κi be a rank-  ing network. Let Gc be a triangulation of the moral graph Gm of the DAG hV,Ei. Let Ci : 1 ≤ i ≤   p be the complete set of cliques of Gc. Let clq : V → i : 1 ≤ i ≤ p be a function that for any  variable X ∈ V returns an integer r ∈ i :X ∪ pa (X) ⊆ Ci . Let x, paX, and ci be the actual values for X, pa(X), and Ci and let all those values completely be induced by v, the actual value for V. Define for 1 ≤ i ≤ p  ψi(ci) := ∑ κ x | paX . X: clq(X)=i

Then C1,..., Cp; ψ is a potential representation of κ.

Proof: By theorem 3.75 the function κ has the representation

  κ v = ∑ κ x | paX . X∈V

Each variable X ∈ V is assigned to one unique clique Cclq(X) and pa (X) ⊆ Cclq(X). Therefore it holds for each clique that

  κ v = ∑ κ x | paX X∈V p  = ∑ ∑ κ x | paX i=1 X: clq(X)=i p = ∑ ψi(ci) . i=1

The result is obviously a potential representation of κ. 

130 3.7 Perfect Maps of Ranking Functions

Note that for any variable X it is always possible to choose a clique Cr such that X ∪ pa (X) ⊆ Cr since Gc was moralized before the triangulation and therefore all parents of a child vertex are connected with each other. It is clear that being able to define the function clq is accordingly the motivation for moralizing: function clq is needed to define the potential functions. Moralization of a graph generally results in each vertex being connected to its Markov blanket. Since moralizing adds edges, it does not affect the I-mapness of the graph. Basically, a clique can be treated as a local ranking network. The joint rank value of this local network can easily be computed by applying the chain rule. It is always ensured that each vertex is in the same clique with its parents. Combined with the fact that the parents separate the vertex from its other non-descendants, it follows that the local computation of marginal NRFs for cliques is always sufficient to respect all information relevant for the conditional rank of the given vertex.

This thought also leads to a good understanding of how the potential functions ψi are to be defined: function ψi is always the joint rank value marginal to the given clique. The defining term is obtained by applying the chain rule. We have thus found a method to represent the joint NRF κv by the sum of the marginal NRFs of the cliques as already used in the proof of theorem 3.84. Note that this is also the answer to the second requirement stated in paragraph 3.4.1 on page 110.

3.7 Perfect Maps of Ranking Functions

3.7.1 Ranking Functions and DAGs

So far we operate with four notions of conditional independence:

1. the axiomatic approach of a graphoid given in definitions 3.46 and 3.47 (cf. page 101),

2. RCI, instances denoted by ⊥κ, the conditional independence relationships constituted by any particular NRF, which were captured in a causal list,

3. undirected graphical separation, instances denoted by ⊥u, and

4. d-separation, instances denoted by ⊥d, which represents conditional independence on directed graphs.

We already know that each assertion of two vertex sets being d-separated by a third vertex set can be reformulated by an equivalent assertion about a relationship of undirected separation on a particular undirected graph. In (Hunter, 1991a) we find the proof that also d-separation is in fact transferable to RCI.  Theorem 3.85 (Hunter) Let κ be an NRF for an algebra A V . Let Lθ be a causal list of κ and D be an influence diagram of κ. Let further SG be the set of the semi-graphoid axioms as stated in definition 3.46 on page 101. Then the following statements are equivalent:  1.L θ  X ⊥κ Y | predθ X

131 III Graphical Models for Ranking Functions

 2.L θ ∪ SG ` X ⊥κ Y | predθ X  3. X ⊥d Y | predθ X . The detailed proof is given in (Hunter, 1991a, p. 498–501). For convenience, we can put theorem 3.85 in a slightly different form such that it is pre- sented as follows: Theorem 3.86 (Each DAG is a Perfect Map of some NRF) For each DAG D there exists an NRF κ such that D is a perfect map of κ.

Proof: Trivial: Since we know from theorem 3.85 that for any two d-separated vertices in D there exists an equivalent independence statement in the causal list, we know that ⊥d =⇒ ⊥κ. Since we know that all statements contained in, or entailed by, the causal list, are also equivalent to some d-separation relationship in D it is also lucid that ⊥κ =⇒ ⊥d.  Pearl states the corresponding relationship for DAGs and probability distributions in (Pearl, 1988b, p. 122, theorem 10). Note that the converse of theorem 3.86 does not hold. NRFs without a perfect map are possible since theorem 3.85 does not imply that a consistent causal list represents a set of NRFs that have perfect maps. It is possible that a causal list entails dependency structures that cannot be represented adequately by a DAG. Example 3.87 on page 133 discusses such a case. Furthermore, an adequate general description of the relationship between DAGs and NRFs is intricate. The reason is that there is no finite and complete list of necessary and sufficient properties that makes a ternary relation ⊥ a DAG-isomorph how (Geiger, 1987) has already pointed out. Pearl states a list of necessary properties in his theorem 11 in (Pearl, 1988b, p. 128). The necessary properties contain the 5 graphoid properties and additionally the following three

if X ⊥ Y | Z and X ⊥ W | Z, then X ⊥ Y ∪ W | Z (composition) if X ⊥ Y | Z, and X ⊥ Y | Z ∪ U, then X ⊥ U | Z or U ⊥ Y | Z (weak transitivity) if X ⊥ Y | Z ∪ W and Z ⊥ W | X ∪ Y, then X ⊥ Y | Z or X ⊥ Y | W. (chordality)

Note that RCI and doxastic independence in general do not satisfy composition, the con- verse of decomposition. Also weak transitivity may be violated by some NRFs while d- separation in general satisfies it, as proven by (Pearl, 1988b, p. 129) in his theorem 12. A modification of the concept of d-separation to strong d-separation proposed by (de Waal & van der Gaag, 2005b) introduces a further necessary condition. Knowing that a complete axiomatization of DAG-isomorphism is impossible, two questions remain nonetheless:

1. Can there be a list of necessary and sufficient properties that characterize the set of all and only those NRFs that are DAG-isomorphic?

2. Is there a way to modify non-DAG-isomorphic NRFs such that the modified NRF is DAG-isomorphic while being equivalent to the original NRF?

132 3.7 Perfect Maps of Ranking Functions

3.7.2 Characterization of CLs that have Perfect Maps

Considering the first question, there is a proposal in (Wong et al., 2002) for a structural char- acterization of those causal lists (of probabilistic conditional independence statements), which have a probabilistic perfect map. We will transfer this argumentation to the case of NRFs.

We start with a simple CL. Let Lθ be a consistent causal list of independency statements with each of them exhaustive over a set of variables V (as described on page 121). Now consider a single entry of the CL. It states a conditional independency between three  distinct sets of variables X, Y, Z ⊆ V such that Y ∩ Z = ∅ as well as Y ∩ X = ∅ and further Z = pa (X). It has the form X ⊥κ Y | Z.

We can interpret such an entry in the CL as a definition of the marginal joint NRF among those variable sets: κx ∩ y ∩ z + κz = κx ∩ z + κy ∩ z.(3.23)

Note that from (3.23) we can derive an additive decomposition term (as in theorem 3.75 on page 123) of the marginal joint distribution:

κx ∩ y ∩ z = κx ∩ z + κy ∩ z − κz.(3.24)

We will transfer the observation of (Wong et al., 2002) from the case of probability functions and Bayesian networks to the case of NRFs and ranking networks. We will directly present the arguments in terms of NRFs. A necessary condition for a term of the structure of (3.24) having to be satisfied to make this term a candidate for an additive decompositon term of a ranking network can be stated imme- diately: precisely, all subtrahends occurring in the decomposition term must be “absorbed” by some summands to form conditional ranks. This is just the simple observation that one could derive κx ∩ y ∩ z = κx ∩ z + κy | z (3.25)

from (3.24) and noticing that (3.25) does no more contain any subtrahends. In the remainder, we will say that a partial term like κy ∩ z absorbed a subtrahend like κz in case both can be replaced by the equivalent conditional rank κy | z. One could be inclined to think that a decomposition term describes a ranking network if and only if it does not contain any subtrahends but this is not true.

 Example 3.87 Consider the CL L := X ⊥κ Y | ∅; X ⊥κ Y | Z ∪ W . The first statement im- plies κx ∩ y = κx + κy (3.26) while the second entails

κw ∩ x ∩ y ∩ z = κx ∩ z ∩ w + κz ∩ w ∩ y − κz ∩ w.(3.27)

133 III Graphical Models for Ranking Functions

The entire CL therefore implies the following decomposition term

κw ∩ x ∩ y ∩ z = κx ∩ y − κx − κy (3.28) + κx ∩ z ∩ w + κz ∩ w ∩ y − κz ∩ w.

After absorbing all subtrahends we legally derive:

κw ∩ x ∩ y ∩ z = κy | x + κx | z ∩ w + κz ∩ w | y.(3.29)

This is obviously not an additive decomposition term of a ranking network since a complete graphical representation of this CL would not be a DAG.37 To understand this, one has to look at the induced relationships of parenthood between X, Y and Z to notice that the graph contains cycles. 

We conclude that complete absorbing of the subtrahends is not sufficient for constituting a factorization of a ranking network. Wong, Wu and Lin argue that another structural property has to be satisfied.

Definition 3.88 (Hypertree Construction Ordering) Be H := hV,Si a hypergraph. An ordering α of the elements of S such that α has the RIP is called a hypertree construction ordering or HCO for short of H.

We will denote by VH the set of vertices of a given hypergraph H.  In a set of hypertrees H0, H1,..., Hn we call a hypertree Hj a descendant of a hypertree   Hi if V Hj ⊆ V Hi . Equivalently, Hi is an ancestor of Hj if and only if Hj is a descendant of Hi. If Hj is a descendant of Hi and there is no k such that Hj ⊆ Hk ⊆ Hi then Hj is called a child of Hi. If Hj is a child of Hi, then, equivalently, Hi is called a parent of Hj. It is to be noted that a hypertree hV,Si denotes a Markov graph if S is interpreted as the set of cliques of a network containing the vertices in V. Thus, each hyptertree Hi uniquely denotes a Markov graph. Obviously, this interpretation can be extended to conditional independency statements: as we already saw in this paragraph, each statement X ⊥κ Y | Z can also be interpreted as a joint NRF and therefore as an additive decomposition term of a Markov graph. This was already illustrated by (3.24). Hence, a set of independency statements can also be expressed by a hypertree. For the remainder of this section, we will therefore no longer distinguish between a consis- tent set of independency statements and the hypertree Hi that represents this set.

Definition 3.89 (Hierarchical Causal List) A set L of conditional independency statements of which each has the form X ⊥κ Y | Z is called a hierarchical causal list or H-CL for short if and only if for  some m > 0 a partition HL := H0, H1,..., Hm of L can be defined such that each set of statements Hi ∈ HL satisfies each of the following properties:

1. Hi is a consistent set of conditional independency statements,

37The equivalent probabilistic version of this example is taken from example 3 in (Wong et al., 2002, p. 200f) and was originally provided by Milan Studený.

134 3.7 Perfect Maps of Ranking Functions

  2. V Hi ⊆ V H0 ,

 38 3. if Hj is a child of Hi := hVi,Sii then a hyperedge hi ∈ Si exists such that V Hj ⊆ hi and

4. if Hj and Hk are two distinct children of Hi := hVi,Sii with i 6= j 6= k then two distinct  j k  j  k hyperedges h , h ⊆ Si exist in Hi such that V Hj ⊆ h and V Hk ⊆ h .

Note that condition 2 makes HL in fact a tree hierarchy of hypertrees, with H0 being the root vertex of the tree. We saw in example 3.87 that the complete absorption of subtrahends is not a sufficient condition for a set of CLs to constitute a ranking network decomposition term. In fact, (Wong et al., 2002) argues for probability functions that the complete absorption (of denominators in this case) together with the property of being a H-CL completely character- izes DAG-isomorphic probability functions. The argumentation completely relies on the chain rule for probability functions, and the factorization of Bayesian Networks and probabilistic Markov networks. We already showed how the beginning of this argumentation is transferred to the ranking case. The transfer of the remaining part of Wong’s argumentation is completely analogous. Transferring further parts of it would not add valuable information but grade to mere mechanical reproduction, which we will avoid. Since the intermediate steps for the case of NRFs can be quite directly understood from the original paper, we will leave them out and proceed to the important result:  Theorem 3.90 (DAG-isomorphic Causal Lists) Let HL := H0, H1,..., Hm be a H-CL over the  set of variables V := V H0 with the root vertex H0.  We denote an actual value for a compound V Hi in a simplified form by vi. Let the decomposition    term φi characterizing κi vi , the marginal joint NRF for A V Hi according to Hi be schematically expressed by

       φi := κi vi = κi + κi + ... + κi − κi − κi − ... − κi .(3.30)

Note that then, the joint NRF κ is defined by:

 κ v = φ0 + φ1 + ... + φm.(3.31)

Then, the underlying DAG D := hV,Ei is a perfect map of κ, the NRF induced by HL, if and only if all subtrahends (in every φi) on the right side of (3.31) can be absorbed by legal term rewriting.

Note that theorem 3.90 introduces a DAG-isomorphic model of conditional independence for NRFs, namely those H-CLs that induce factorizations in which all subtrahends can be absorbed. It therefore gives a complete characterization of those NRFs that can be faithfully represented by a DAG. An algorithm for testing whether a given CL is DAG-isomorphic has to test if the CL is a H-CL and if its subtrahends can be absorbed. 38In (Wong et al., 2002, p. 202), condition 3 is printed in an erroneous form, mixing up i and j at one site and therefore  requiring in fact that V Hi ⊆ hi if Hj is a child of Hi.

135 III Graphical Models for Ranking Functions

Hence, it is possible to decide whether a given RCI-model has a perfect map or not, con- sequently the answer to the first of the two questions at the end of section 3.7 is “yes”. Al- though, there is no sound and complete set of axioms characterizing DAG-isomorphic NRFs, there is nonetheless a structural characterization of DAG-isomorphic dependency models as was pointed out in this section.

3.7.3 Outlook:Can CLs Be Made DAG-isomorphic?

Of special interest for the answering of the second question is the chordality property as introduced on page 132 because it excludes dependency models that are isomorphic to non- triangulated graphs, how (Pearl, 1988b, p. 130f) points out. But, although, DAGs cannot rep- resent dependencies that are non-chordal, the introduction of auxiliary variables can extend the non-chordal dependency model to a chordal one. Pearl demonstrates this by introducing an auxiliary variable and then treat this variable as condition. An alternative strategy was recently proposed by (de Waal, 2009), where it is suggested to fill in the set of variables V under consideration with auxiliary variables until the corresponding dependency model satisfies the chordality axiom and then treat the original function as a marginal function of its chordal extension. Thus, given a particular RCI-model that does not induce an additive decomposition term for an actual network, it can be extended to satisfy the necessary DAG-isomorphism properties. On the other hand it is to the best knowledge of the author not yet proven that NRFs that satisfy all necessary constraints for DAG-isomorphism without being a DAG-isomorph cannot exist. So the relationship between NRFs and their perfect maps is not completely analyzed at the moment. With this insight the graphical modeling of ranking networks is completed. In the next chapter we will introduce the update algorithm.

136 IV

Belief Propagation in Ranking Networks

4.1 Introduction

Chapter III has introduced graphical models for NRFs. We have seen that the concrete graph- ical model for an NRF is a DAG in which the vertices are variables and the edges represent relations of causal influence between those variables. The DAG that represents the NRF is also called “influence diagram” of the NRF, and its specific structure depends on the set of variables and the relationships between them. The set of variables represents the measurable aspects observed by the epistemic system. In conjunction with an actual NRF, the influence diagram forms a ranking network. A particular state of the ranking network is characterized by the actual rank values of the vertices and provides a formal model of a belief state. The ranking network represents the entirety of the subject’s beliefs and the current state of the network models the subject’s current belief state. It is thereby emphasized that we have not deviated from the paradigm of interpreting epistemic states subjectively. To complete the description of the update mechanism for rank-based belief networks, it remains to be shown how the transition from a prior to a posterior state is technically modeled. This being the topic of the current chapter, we will introduce a formal model of this transition by developing an algorihm for updating ranking networks efficiently. Which requirements have to be met for an adequate modeling of the transition function and its implementation as a concrete algorithm? Obviously, the transition is triggered by new evidence. Evidence can be represented quite intuitively by a change of the actual numeric values in the conditional ranks of a particu- lar subset of variables. Additionally, it is also possible that completely new variables are integrated into the network. Those will induce new causal influence relationships to other variables. To gain an idea of the update process, we will use a prominent update mechanism as a blueprint which was developed for probability networks. Pearl’s (1988b) has described a technique known as “message passing” by which singly connected Bayesian networks can be updated with new evidence. Message passing works by

137 IV Belief Propagation in Ranking Networks completing the update successively by computing updates of each vertex locally. Local com- putation relies on making all terms needed for updating a particular vertex locally available at that vertex. In the course of the update process, all necessary computations are performed locally on the particular vertices. This is possible only if the values needed by particular ver- tices are passed to them as “messages”: a single vertex receives a message, uses it to update itself, and passes its own message thereafter to other receivers. Without further modifications, this technique is only suitable for singly connected networks for a simple reason: it has to be ensured that an update on a particular vertex is performed only once. This can easily be ensured in the case of the network being singly connected, since then each vertex is reachable from each other vertex by at most one path. In a multiply connected network, we face the problem that in the course of the message passing process, when touching a particular vertex, it will never be known whether it already has been updated or not. A very prominent approach for updating multiply connected Bayesian networks was de- veloped by (Jensen et al., 1990), preceded by vital arguments of (Lauritzen & Spiegelhalter, 1988) which lead to the insight that multiply connected networks can be updated by applying message passing on a clique tree of the moralized and triangulated input network. The basic idea is to transform the network to an epistemically equivalent singly connected structure and use message passing on this structure. The algorithm that performs updates on the clique tree is usually named after its authors “Lauritzen-Spiegelhalter-Algorithm”. Its most prominent implementation is probably the HUGIN-network. The version of the algorithm implemented in HUGIN is sometimes also called “HUGIN-algorithm” or “JLO-algorithm” after the first letters of the lastnames of the three authors of (Jensen et al., 1990). The JLO-algorithm was introduced in a specifically object-oriented design, which seems natural from the historical context. It was not uncommon for many software developers in the late Eighties and early Nineties to expect the object oriented paradigm to supersede procedural or modular paradigms of programming in all relevant practical contexts completely. The vagueness of this expectation did not prevent it from being proven as false insofar as object- orientation, though very dominant, still coexists with paradigms that where formed earlier39. We will not elaborate on the possible or actual reasons for this historical development and just content with not making a strong commitment for object-orientation when designing the algorithm. Performing updates on the clique tree of a triangulation of the moralized influence diagram instead of the original network is a constituting aspect for a family of update algorithms we will subsume to the label “Lauritzen-Spiegelhalter-strategy”. In the following sections we will call this the “LS-strategy” or “LS”. LS is not tied to any specific semantics of the vertices. Since this strategy is well-known to be suitable to update probabilistic networks, the conjecture that it can also be implemented for updating ranking networks seems obvious. A glance at the literature on algorithmic analysis of ranking theory shows that, to the best knowledge of the author, no specific general update

39Not to speak of the fact that the mainstream style of object-oriented programming techniques underwent a signifi- cant change during the first decade of the current century, when the use of dependency-injection techniques came into vogue. But we will not cover this topic here.

138 4.1 Introduction algorithm has been described so far in this field. Nonetheless, (Kudo et al., 1999) demonstrates iterated belief updating on the mere arithmetic level for negative ranking functions with the ordinal numbers as codomain. The approach uses results from (Williams, 1994). Nothing is said there on how to implement this approach in an algorithm. To the best knowledge of the author, the only algorithmic approach on specifically updating NRF-based belief networks was introduced by Hunter, who has developed an update algo- rithm for a particular type of singly connected rank-based belief networks in (Hunter, 1988). However, at the moment, there is no efficient update algorithm for multiply connected rank- ing networks so far. We will discuss Hunter’s algorithm as it represents the current state of the knowledge about rank-based networks. In principle, the algorithm introduced in this chapter will be an implementation of the LS- strategy but with keeping a strong orientation to Pearl’s message passing. The algorithm takes an arbitrary ranking network as an input. In particular, it is not guaranteed that the ranking network is a polytree. Instead, it is allowed to be multiply connected. The update algorithm consists of two phases. First, there is an initialization process performed once in the beginning, and re-performed in case the influence diagram is modified. This is precisely the case whenever either V or E are modified by insertions or deletions. Such a modification means on the epistemical level that either

1. aspects of observation are added

2. or given up

3. or the knowledge about the conditional independencies changes.

In this phase, the input network is decomposed to a clique tree that forms the representation of the permanent belief base. Constructing the clique tree is a formally required task and completely independent from our argumentation on ranking theory. Thus, it will not be interesting from a philosophical point of view. On the other hand, clique tree generation is an elaborate technical task and, in fact the only part in this chapter that is interesting from an algorithmic point of view – since there is no single, general solution that was optimal for decomposing an arbitrary DAG to a clique tree. We will therefore introduce a detailed proposal for how a decomposition of a ranking network to a clique tree can be performed. It is roughly oriented on the pro- posals of (Neapolitan, 1990) but integrates relevant recent arguments to reach a significant improvement. The second phase of the algorithm is a sequence of update propagations on the clique tree triggered by new evidences. To model the update mechanism we will first show that the main theorems used for devel- oping the LS-strategy for probability networks also apply to ranking networks. In a second step we will then use this insights to develop a rank-based version of the LS-strategy. Section 4.2 contains a short explication on Hunter’s algorithm for updating rank-based networks that will also explain its strengths and limitations.

139 IV Belief Propagation in Ranking Networks

Section 4.3 gives an informal overview of the LS-strategy as well as message passing. Section 4.4 explains the first update phase. It introduces a technique of decomposing an arbitrary input ranking network to a clique tree and a corresponding potential representation of the NRF associated with the network. Section 4.5 explains the second update phase, which is a propagation mechanism on the clique tree. Especially subsection 4.5.1 is of importance since it introduces the relevant techni- cal theorems required for developing the actual update algorithm.

4.2 Hunter’s Algorithm for Polytrees

In (Hunter, 1988) a solution is described for updating a special type of rank-based network. Hunter calls this data structure “Spohnian Network”. The Spohnian Network differs from a ranking network in that

1. it is required to be singly connected and

2. a single vertex contains unconditional information about its parents to enable local com- putation.

In a Spohnian network, a single vertex X is associated with two informational structures,   first the conditional matrix κ x | paX and additionally the joint marginal NRF κX x for AX which can be described by the term

    κX x := min κ x | paX + κ paX .(4.1) paX∈cd(pa(X))  Computing (4.1) implies access to all unconditional values κ yi for each Yi ∈ pa (X). By definition of a Spohnian network, each child vertex contains this information about its parent vertices, hence (4.1) can be computed locally on vertex X. This design makes message passing superflous for Spohnian networks. Spohnian networks can be updated easily and elegantly. Since the Spohn-conditionalization of a given NRF κ is also an NRF, the Spohn-conditiona- lization of a Spohnian network keeps the Markov properties. Hunter proves this explicitly for the local Markov property in his theorem 4 in (Hunter, 1988, p. 248). Hunter further shows that the following holds (cf. (Hunter, 1988, p. 247)):

Theorem 4.1 (Hunter) Let X, Y,Z ⊆ V. Let κ0 be the Spohn-conditionalization of an NRF κ that results from an actual update on variable X. Let the singly connected DAG D := hV,Ei be an I-map of κ’. Let Y 6= X be an arbitrary variable and Z ∈ adjY a vertex on the path that connects X and Y. It then holds that   κ0y = min κy | z + κ0z . z∈rg(Z)

Variable Z may be either a parent or a child of Y. In either case the term in theorem 4.1 can be computed from the local data available at vertex Z and Y. This shows that a propagation of an update triggered by the availability of new evidence on a single variable can be performed in a Spohnian network in parallel. An update message

140 4.2 Hunter’s Algorithm for Polytrees that reaches a particular vertex can be propagated simultaneously to all adjacent vertices that have not been updated up to this point. Let R := hV, E, κi be a Spohnian network, variable X ∈ V the variable receiving new evidence, κ∗ the (marginal) NRF representing the new evidence and κ0 the posterior joint NRF representing the updated belief state. Hunter’s algorithm can then be formalized as algorithm 1:

input : Spohnian network R := hV, E, κi, variable X ∈ V, update function κ∗ output: updated network R0 := hV, E, κ0i  1 Set I ←− X ; 2 Set J ←− V; 3 Vertex Y ←− nil; 4 Vertex Z ←− nil; 0  ∗  5 κ x ←− κ x ; 6 while I 6= ∅ do 7 Y ←− select a vertex from I;  8 foreach Z ∈ adj Y ∩ J do 9 if Z ∈ pa (Y) then 0    ∗  10 κ y ←− min κ y | z + κ z ; z∈rg(Z)

11 else 0    ∗  12 κ y ←− min κ y + κ z ; z∈rg(Z)  13 J ←− J \ Z ;  14 I ←− I ∪ Z ;  15 I ←− I \ Y ;

Algorithm 1: Algorithm for Hunter’s parallel single update on Spohnian networks

Vertex X receiving the evidence is the origin of a “wave” that travels from X towards the ter- minal vertices of the network, crossing successively all vertices between, touching all vertices with the same distance to X in parallel. The set I contains the most recently updated vertices which may have yet non-updated adjacent vertices. When entering the while-loop in line 6 for the first time, I contains only X which is already updated. Whenever the while-loop is re-started thereafter, I contains all and only those variables that are of distance i to X. In this state, J always contains only vertices which have at least distance i + 1 to X and have not yet been updated. The for-loop starting in line 8 iterates subsequently over all integers i occurring as a distance between X and any other vertex in V. Note that if vertex Y has distance i to X, then Z has distance i + 1 to X, since Y and Z are adjacent. Checking if Z is also in J in line 8 avoids multiple updates on the same vertex. If the condition in line 9 is not true, the only other possible case is Z ∈ ch (Y) and therefore we do not need to explicitly check this case. Since theorem 4.1 requires access to both vertices, Y and Z, to compute the update, the vertices in I need to be kept while updating the vertices in J. Hunter’s algorithm provides an efficient and impressively elegant solution for updating

141 IV Belief Propagation in Ranking Networks certain rank-based networks under two restrictions:

1. The network is a polytree.

2. The evidence comes as a change on a single vertex.

Hunter elaborates on a workaround for the second limitation, which works well for the case when the new evidence is learned with certainty (cf. (Hunter, 1988, p. 175f)). He transfers this  proposal to the case of an evidence on multiple variables V1,...,Vn ⊂ V without certainty and recognizes that

“this method [the workaround, remark by author] will not in general produce a

joint (. . . ) [NRF, remark by author] in which the variables V1,...,Vn have their original updates. This method [the workaround, remark by author] seems to model

a situation in which independent sources give updates for the Vi, one source for each variable, and the task is to combine the information from these sources. A

different situation is one in which a single source stipulates the the [sic!]Vi are to have certain new marginal (. . . ) [NRFs, remark by author] and the task is to find some revised joint (. . . ) [NRF, remark by author] that satisfies these marginals. The difference between these two tasks must be kept in mind in deciding whether or not the updating method described in this section [the workaround, remark by author] is appropriate.” (Hunter, 1988, p. 177)

Hence, the result of Hunter’s proposal is an elegant and efficient update method for singly connected networks. In a strong sense, it covers only polytrees but since any component of an unconnected singly connected network is a polytree, this general case is not a problem for the algorithm. Nonetheless, it underlies the limitation of not being applicable to multiply connected networks and it does not fit the scenario of updates on multiple vertices in general – although, certain special cases in fact can be covered adequately. To the best knowledge of the author, this is the current state of research concerning updates on rank-based networks. The remainder of this chapter will describe an update algorithm for rank-based networks that does not require any of these conditions.

4.3 The LS-Strategy on Ranking Networks:An Outline

The ranking network is the basis for the representation of the belief base and until this point of the inquiry its modeling follows quite directly the modeling of the corresponding probabilistic concept of a Bayesian network. An extensive amount of research was done on those structures since they were introduced, and we will not engage in the task of reproducing all interest- ing results here. Nonetheless, we will keep pointing out some of the strong and important connections between Bayesian networks and rank-based networks while proceeding. How are epistemic updates on Bayesian networks performed? When Pearl introduced the concept of a Bayesian network the first time, he described an update algorithm that performs a distributed computation on a given prior belief state to obtain the posterior belief state. He titled this propagation strategy “message passing”. It

142 4.3 The LS-Strategy on Ranking Networks:An Outline differs in relevant aspects from the update mechanism we will introduce, but it makes a good motivation, thus we will describe the idea briefly. Message passing works on polytrees only, it is not feasible for multiply connected networks. Of special interest for our case is message passing within trees, which means each vertex in the network can have multiple children and at most one parent. New evidence received assigns updated values to an arbitrary non-empty set of variables. How is a particular variable updated by message passing? We will describe that closely fol- lowing the argumentation of (Pearl, 1988b, p. 163), but with the original probability formulas transformed into ranking formulas. The effect will be that we describe message passing for tree networks. We will show how a single vertex X is to be updated after the variables which received evidence have their posterior values. − Let EX be the part of the evidence in dn(X), which is just the subtree rooted at vertex X. Let + EX be the part of the evidence in nd(X), which is just the rest of the network. Let U be the − parent of X. Let the actual values of the vertices X and U be x and u. The actual values of EX + − + and EX be eX and eX . (As always, we presuppose that all actual values are induced by the same joint actual value v.)  − To compute its actual posterior value κ x , vertex X must have access to eX as well as to +  eX . By definition, X has only access to the matrix κ x | paX thereby being unable to compute − its local update autarchically. Message passing is a strategy to provide X with information eX + and eX without having to store those values permanently at vertex X. Note that this strategy is exactly the opposite to the one Hunter uses for Spohnian networks, where all relevant information is stored in the vertices. + The case for eX is quite simple. Since X d-separates its children from its non-descendants in the tree it holds: +  κ x | u ∩ eX = κ x | u .

This insight combined with Bayes’ rule for ranks yields:

 + − κ x = κ x | eX ∩ eX − +  + = α + κ eX | eX ∩ x + κ x | eX −  + = α + κ eX | x + κ x | eX .

The normalization-constant α is defined appropriately. Having reached this result, we recognize that the vectors

 −  λ x := κ eX | x  + π x := κ x | eX must be locally accessible at X to complete the update. Finding a method to provide this information locally at each vertex is therefore a method for performing an epistemic update in the tree. The solution Pearl has offered is to propagate the λ-vectors for each vertex bottom up,

143 IV Belief Propagation in Ranking Networks starting at the leafs of the tree and then, after finishing this step, propagating the π-vectors top-down from the root to the leafs. Each vertex that is activated while propagating the λ- vectors can use them to compute its local values and then pass its own λ-vector to its parent. After the root vertex has received λ-messages from all of its children, it computes his π-vector, sends it back to its children, and the propagation process starts again, now performed top- down with the π-messages. After each child of any vertex has received its π-message, the update is completed. As was shown by Pearl, message passing works quite straightforward not only for trees but either for polytrees. The reason is that since there is only one path by which a vertex can be visited from a particular other vertex, it is thus always known for the vertex in consideration whether it already has been updated or not. Hence the correctness of the update computation can always be ensured by choosing a certain walk through the network. Now, in a multiply connected network, when reaching a vertex by walking a particular path, it is impossible to ascertain whether it was visited over another path before and whether has already been updated. This makes it clear that in multiply connected networks, “simple” message passing as described before is neither guaranteed to terminate nor to yield correct updates since a given vertex could be visited multiple times. It can easily be seen that this fundamental problem cannot be bypassed by parallelization or distributed computation. A generalization of the message-passing-strategy from singly connected to multiply connected networks could only be successful when a possibility was found to transform the multiply connected network into an equivalent singly connected one. A significant improvement to this situation was achieved in (Lauritzen & Spiegelhalter, 1988). Lauritzen and Spiegelhalter were successful in showing that updating can be done on the clique tree of the moralized and triangulated network without loosing I-mapness. As already pointed out, the advantage of this perspective is to keep the belief base singly connected. On the clique tree, updating is a relatively simple operation, and computationally inexpensive in a large set of practically relevant cases. For networks with large treewidth, the computation is nonetheless expensive. Since the cliques are complete subgraphs of the ranking network, the computation of their local joint rank values has always the form like in theorem 3.84. It is therefore formally possible to consider the cliques as if they were single vertices. This thought leads to the insight that updating can be done on a variant of the clique tree of the original network. There are of course preconditions for the applicability of this method:

1. It must be ensured that for each vertex X there exists a clique that contains X as well as its parents.

2. It must be possible to arrange the cliques as a tree while keeping the dependency rela- tionships between them.

The first precondition is necessary because the parents of X separate X from all of its non- descendants and are therefore a formal implementation of the epistemic element that Pearl and Paz call a “relevance boundary” on directed graphs. The relevance boundary, on the variable level, can intuitively be understood as the minimal knowledge on which the subjective

144 4.3 The LS-Strategy on Ranking Networks:An Outline

firmness of a single belief X = x depends. This relevance boundary is expressed in graphs by the Markov boundary. If we would allow a situation such that there is no clique containing X as well as bdryX, then the relevance relationships of X in the original network would be distorted when transferred to the clique tree. Ensuring the relevance boundary of X being in the same clique as X means on the graphical level that the vertex X must be contained by the same complete subgraph that also contains its parents. Since X trivially is connected with all of its parents, it may nevertheless be possible that its parents are not connected with each other. Hence, connecting them to each other will ensure this precondition by establishing a common clique. This precondition is satisfied in the moral graph of the initial network and indeed the motivation for moralizing the graph. The second precondition is required because the clique tree has to keep the dependency relationships of the original graph. We must therefore define a method that derives a legal tree structure from the set of cliques. It is a fact that the cliques of a graph can be arranged as a tree if and only if the graph is triangulated. Triangulating the moralized graph therefore ensures that an NRF κ can be described by a potential representation such that local computation on the cliques is possible. This was already stated in theorem 3.84 on page 130. In fact, the combination of both preconditions motivate the two modifications performed on the network: moralization and triangulation.  If it is possible to compute the marginal NRF κ ci for each clique Ci then the rank value for a concrete hypothesis X = x with X ∈ Ci can be computed from the marginal rank values of all cliques containing X. This is carried out in the following way:   κx = min κt ∩ x .(4.2) t∈rg(C\{X})  C∈ Cj : {X}∈Cj

The strategy of Lauritzen and Spiegelhalter as explained in (Lauritzen & Spiegelhalter, 1988) is, as outlined above, to construct a moralized and triangulated graph from the network, and then compute the clique tree from this graph on which message passing for trees can be performed. When further evidence is to be processed, message passing on this clique tree will remain sufficient for incorporating the evidential information if the evidence does not include a mod- ification of the initial network. This means, as long as the set V of variables or the set E of edges are not modified by the evidence, it is not required to reconstruct a new clique tree. Whenever either V or E are changed by the evidence, a new clique tree has to be compiled from the modified network. This means, an actual LS-based update process may include two phases:

1. Compilation of the clique tree from the input network. The clique tree reflects the epistemic evidence. (This phase is only required if the evidence has actually changed the network.)

2. Message passing on the clique tree. (This phase is mandatory in each case.)

145 IV Belief Propagation in Ranking Networks

The LS-strategy was developed with probability semantics in mind and the computation relies on some special properties of probability distributions. Since LS is known as efficient and elegant, it is an interesting goal to implement it for NRFs. During the remainder of this chapter, the main task will be to prove that the LS-strategy will work for ranking networks. The first necessary precondition for transferring the LS-strategy to NRFs is already per- formed: we have shown in the previous chapter how NRFs can be graphically modelled by DAGs and how d-separation implements RCI. We further have to show a technique how to obtain the clique tree from the influence dia- gram. This technique will describe a complete implementation of the first update phase and is completely covered in section 4.4. Furthermore, to prove that LS-updating also works on ranking networks we have to model both LS update phases algorithmically on ranking semantics. This means, we have to show how message passing is performed on the clique tree. The message passing process forms the second update phase. The algorithm for message passing is described in section 4.5. We will have to prove in particular that the required local computations on the cliques will be possible. Once we have completed this proof, it will also have been shown thereby that

1. a clique tree of the input network is an equivalent representation of the permanent belief base and, further,

2. that all updates can be performed on the clique tree instead of the original network.

This task will be solved in section 4.5.1. The remainder of section 4.5 is dedicated to the description of the actual message passing phase. Sections 4.6 and 4.7 cover a final conclusion of the achievements, comment on aspects that were not discussed in full details and finalizes the thesis with an outlook on further research questions.

4.4 Phase 1 –Triangulation and Decomposition of the Network

4.4.1 Methods for Obtaining the Clique Tree from the Initial Network

Consider the situation when both the set of variables and the dependency model induced by the NRF are known. The influence diagram can easily be constructed from this information and so we obtain the initial network. To prepare the permanent update mechanism, it is first required to decompose the network to a clique tree. The clique tree will be the representation of the permanent belief base. The clique tree has to be rebuild whenever either the set of variables or the dependency model changes. The question to be answered in this section is: how can we moralize and triangulate the initial ranking network and then obtain a clique tree from the moralized and triangulated network?

146 4.4 Phase 1 –Triangulation and Decomposition of the Network

Moralization is not a separate task since it can be performed while constructing the influence diagram. This is more reasonable than first constructing the network from the causal list and then moralizing it in a separate step. Constructing a moralized network just requires that whenever adding an edge that makes a vertex v the parent of a vertex u, it has to be checked whether u already has other parents and, if so, to add an edge v, w for each parent node w 6= v of u. The undirected graph of the result graph as defined in definition 3.13 is the moralized graph of the input network. The two remaining tasks, triangulation and decomposition, are sophisticated problems from an algorithmic point of view. We will elaborate on them in this section and propose an algorithm that performs both minimal triangulation and decomposition of a clique tree in a single run. Additionally, it computes the separators. A major source of ideas for this proposal was surely (Neapolitan, 1990) where a quite elegant triangulation method for probabilistic networks is used. At the time of writing this thesis, Neapolitan’s arguments are over 20 years old and a considerable amount of research has been done on graph triangulation as well as on decomposition of triangulated graphs. This thesis does not claim to contribute substantially to this general topic, but rather it tries to utilize some recent results for improving the proposal made by Neapolitan on the task of triangulation and decomposition. A remark about definition 3.40 on page 98 states that the clique tree of a given network is usually not unique, which means a belief network can be adequately represented by a set of clique trees that differ vastly with regard to their properties. This raises immediately two questions:

1. Which criteria should be defined to estimate the quality of a given decomposition?

2. Which efficient method of decomposition maximizes the quality of the computed de- composition?

The first question asks for the desirable properties the decomposition of a given network should have. The second question demands a method for efficiently obtaining a decomposi- tion of maximal quality. Thus, briefly stated, what constitutes an optimal decomposition of a belief network and how can it be obtained? The first discussion about this topic can be found in (Wen, 1990). Wen’s arguments pro- vide us with a sufficient approach of the available answers to the first question. The second question is part of ongoing research since then. Wen defines an optimality criterion for belief network decompositions that he calls “MTNS”, the Minimum Total Number of States. The MTNS is desired to be minimal. A decomposition with a lower MTNS is to be preferred over an equivalent decomposition of the same graph having a greater MTNS. The total number of states TNSG of a network G := hV,Ei is defined as

p n  i TNS G := ∑ ∏ ηij. i=1 j=1

147 IV Belief Propagation in Ranking Networks

where p is the number of cliques in the decomposition, ni the number of variables in the i-th clique and ηij the number of values the j-th variable in the i-th clique can take. In case all variables are boolean, it is obvious that the TNS equals

p ∑ 2ni . i=1

The TNS is a measure for the computational effort for computing inference on the decom- posed network. This is the main argument for MTNS being the relevant optimality criterion. Hence, the best choice of all q pairwise distinct possible decompositions of a given network k G would therefore be a decomposition TG of G such that

 k n  i  o TNS TG = min TNS TG : 1 ≤ i ≤ q .

k k For this k we say that TG is TNS-minimal or, equivalently, that TG satisfies the MTNS-criterion. Wen gives an example for the results of two different decomposition methods, which illus- trates that none of them is necessarily superior. He then proves that the problem of checking k k for a given network G whether a decomposition TG of G exists such that TNS TG ≤ µ for a given non-negative integer µ is NP-hard. This means, effectively finding k is NP-hard for ar- bitrary input networks. We may therefore not expect to find a method that does both, efficient computation of a decomposition as well as guarantee optimality of the result. As we understood from paragraph 3.1.5 obtaining a clique tree requires a triangulated input graph. It will thus be part of the update algorithm to first test whether the initial ranking network is triangulated. If it is not, it has to be triangulated first. In the next step, the obtained triangulation is decomposed to a clique tree on which the actual update is performed. Since the triangulation method has crucial influence on the number of cliques in the re- sulting triangulated graph, the quality of the decomposition obtained from the input network seems to depend heavily on the set of edges which is inserted to the initial network to obtain a triangulation. It is though not very clear how this dependence is to be described. Wen discusses two strategies of triangulation, the first method tries to minimize the number of edges inserted while the second tries to minimize the cardinality of the largest clique in the triangulated result graph. He then presents an example of two triangulations of which the first does neither satisfy the criteria of any of the two methods while the second satisfies both criteria. He can show that both, the “optimized” and the “arbitrary” triangulation have the same TNS while the non-optimized triangulation contains twice as much cliques as the optimized. This may constitute a significant difference in performance while updating, but whether this is the case depends on the particular update method. Thus it is neither a priori fixed which triangulation strategy is the one of choice to obtain a close-to-optimal decomposition nor if optimization strategies for triangulation really have a direct and strong effect on the quality of the subsequent decomposition. Since optimal decomposition techniques are not a core problem of this thesis, we will not

148 4.4 Phase 1 –Triangulation and Decomposition of the Network perform a deep analysis of this topic. Nonetheless, we will propose a concrete decomposition technique since decomposition is just the technical part of obtaining the permanent belief base. We will present an approach that does minimal manipulation of the input network and therefore use the strategy of minimizing the number of edges inserted for triangulation. The triangulation step in particular is a highly non-trivial part in the update algorithm if certain optimality considerations for the resulting triangulation play a role. To be precise, it is – from an algorithmical point of view – the only computationally complex and challenging problem to be solved in the course of the entire update algorithm: we will later see that the other steps are relatively straight forward. Thus we will try to keep a balance: on the one hand we will reasonably elaborate on the methods we actually propose for triangulation and decomposition but instead of reproducing or explaining each interesting aspect of them, we will mostly just mention the insights impor- tant for lucid proceeding and reference the relevant literature. We will nevertheless shortly discuss some aspects of “optimal” triangulation, the historical origin of this problem.

4.4.2 Triangulating Graphs:ABrief Survey

Note that triangulated graphs and triangulation methods have been a subject of active research for the last 50 years and the amount of evidence on this topic is quite detailed, though there are some substantial open questions. To state just an obvious example: for the best knowledge of the author, at the time of writing, there is no definitive value proven to be a lower time bound for triangulating an arbitrary input graph. Since the triangulation operation is a preparation for the decomposition step, we will pro- pose a triangulation method that makes it easy to subsequently obtain the clique tree. We will introduce the basic technique for triangulation in this section and will then successively un- fold the details to gain an efficient algorithm for triangulating and decomposing an arbitrary input network. We start in this section by introducing triangulation by vertex elimination following an elimination order and then show that this method always transforms the input graph in a triangulated graph. For convenience, we will always suppose that the input network is connected. This is not a significant constraint because the techniques explained in the following can easily be performed on each component of a non-connected graph. In the following, we will speak of an ordering α of vertices of a graph G := hV,Ei, which means a bijection α : 1, . . . , n ↔ V where n = |V|, the number of vertices in G. This entails that α1 denotes a vertex and α−1v denotes a non-negative integer i ≤ n. For the sake of simplicity, we will often speak of orderings on G instead of orderings on the vertices of G. The following definitions 4.2 – 4.5 and the argumentation for the transition between them are borrowed from (Rose et al., 1976, p. 266f).

Definition 4.2 (Monotonely Adjacent Vertices of a Vertex) For an undirected graph hV,Ei, a vertex v ∈ V and a total ordering α on the vertices in V, we call a vertex u ∈ V such that u ∈ adjv

149 IV Belief Propagation in Ranking Networks and α−1u > α−1v monotonely adjacent to v. The set of monotonely adjacent vertices of a vertex v is denoted by madjv.

Intuitively, the monotonely adjacent vertices of v are those vertices which are adjacent to v and have a greater number in the ordering α than v.

Definition 4.3 (Deficiency of a Vertex) Let G := hV,Ei be an undirected graph. For any vertex v ∈ V we call the set

 n o D v := hu, wi : v ∼ u ∧ v ∼ w ∧ u  w ∧ u 6= w the deficiency of v.

Intuitively, the deficiency of v is the set of edges missing in G that, if inserted, would make clsrv a complete subgraph in G. The graph obtained by inserting Dv in the set of edges while eliminating v from the set of vertices is called the v-elimination-graph Gv:

D  E Gv := V \ v ,E ∪ D v .

  Then, for a graph G with n vertices and a total ordering α := v1, v2,..., vn among them an elimination process  E G, α := G0, G1,..., Gn−1  can be defined recursively such that G0 := G and Gi := Gi−1 vi for i : 1 ≤ i ≤ n − 1. Thereby, Gi is derived from Gi−1 by eliminating vi and adding the deficiency of vi in Gi−1. Note that the order of vertex elimination is just the ordering α (in ascending order).  Definition 4.4 (Fill-in) Let δi be the deficiency D vi of vertex vi in Gi−1 for i : 1 ≤ i ≤ n − 1. Then, the set n−1  [ F α := δi i=1 is called the fill-in (of G) w.r.t. α.

The basic vertex-elimination-algorithm was introduced by (Parter, 1961). It is usually called “EliminationGame” and is the simplest representation of the elimination process defined above. It just extends the process with operations to store the fill-in. The representation of EliminationGame on page 151 is closely oriented to the reproduction in (Berry et al., 2008), but with making the rules for generating Gi from Gi−1 maximally explicit. EliminationGame and its derivatives represent one of the two prominent approaches for computing triangulations. (The other prominent family of approaches does not rely on vertex elimination but on so-called minimal separators. We will nonetheless keep the focus on vertex elimination techniques.) The insertion of the fill-in of G w.r.t. α in G generates a graph Gα := hV,E ∪ Fαi that contains G as a subgraph.

150 4.4 Phase 1 –Triangulation and Decomposition of the Network

input : A graph G := hV,Ei, a total ordering α on V output: A triangulation of G 1 F ←− ∅; 2 V0 ←− V; 3 E0 ←− E; 4 G0 ←− hV0,E0i; 5 for i = 1 to n − 1 do 6 F ←− F ∪ δi;  7 Vi ←− Vi−1 \ vi ;   8 Ei ←− Ei−1 ∩ Vi × Vi ∪ δi;

9 Gi ←− hVi,Eii; 10 return hV,E ∪ Fi; Algorithm 2: EliminationGame, the basic vertex elimination algorithm

Definition 4.5 (Fill-in-Graph) Let G := hV,Ei be an undirected graph and α a total ordering on G. D E Let Fα be the fill-in of G w.r.t. α. The graph Gα := V,E ∪ Fα is called the fill-in-graph of G w.r.t. α.

 An empty fill-in is called zero fill-in . We will say that α generates a zero fill-in on G if F α = ∅ for G or, for short, that Gα admits a zero fill-in.

Lemma 4.6 (Rose, Tarjan and Lueker) Let G := hV,Ei be an undirected graph and α a total order- ing on V. Let Fα be the fill-in of G w.r.t. α. Then it holds for any two distinct vertices u, v ⊂ V that u, v ∈ Fα if and only if

1. u, v 6∈ E and

2. u and v are connected in G by a path P such that except u and v, path P only contains vertices w 6∈ u, v such that α−1w < minα−1v, α−1u .

Lemma 4.6 is a reproduction of lemma 4 in (Rose et al., 1976, p. 270f) where also the proof can be found. A path that connects two vertices u and v and satisfies condition 2 of lemma 4.6 is also called lower weight path from u to v.

Theorem 4.7 (Rose, Tarjan and Lueker) Let G := hV,Ei be an undirected graph with a total or- dering α on V and let Gα be the fill-in graph of G w.r.t. α. Then α generates a zero fill-in on Gα.

Theorem 4.8 (Rose) An undirected graph G := hV,Ei is triangulated if and only if there exists an ordering α on V such that α generates a zero fill-in on G.

Theorem 4.7 was implicitly present in (Rose, 1970). It was explicitly shown later in (Rose et al., 1976, p. 266f) by the argumentation that was already cited above when introducing definitions 4.2 – 4.5. Theorem 4.8 was introduced in (Rose, 1970, p. 602f), using conceptions equivalent to that of a fill-in (cf. ibid., corollary 2 and lemma 7). The theorem is already implicit in (Dirac, 1961)

151 IV Belief Propagation in Ranking Networks but Dirac did not use the notion of a fill-in. The same assertion also occurs explicitly in (Tarjan & Yannakakis, 1984, p. 568) as theorem 1. Note that the consequence of theorems 4.7 and 4.8 is that the fill-in graph Gα generated by an ordering α on a graph G is always triangulated. This implies obviously that the fill-in is always a set of edges that has to be inserted into the initial graph G to obtain a triangulated graph Gα containing G as a subgraph. This is also trivially true in case of a zero fill-in. Both theorems therefore provide a blueprint for a triangulation method.

Definition 4.9 (Perfect Elimination Ordering) Let α be an ordering which generates a zero fill-in on an undirected graph G. Then α is a perfect elimination ordering on G or PEO for short.

With this notion, theorem 4.8 can be rewritten as follows:

Theorem 4.10 (Fulkerson & Gross) An undirected graph G := hV,Ei is triangulated if and only if a PEO can be established on V.

The first implicit occurrence of theorem 4.10 can be found in (Fulkerson & Gross, 1965, p. 851) where Fulkerson and Gross draw conclusions equivalent to theorem 4.10 but without using the notion of a PEO. Note that theorem 4.10 entails that assertions about triangulations of a given graph G cor- respond to assertions about vertex orderings on G. What we noticed so far about triangulated graphs can be stated as follows:

Theorem 4.11 (Characterizations of Triangulated Graphs) Let G be an undirected graph. Then, the following statements are equivalent:

1. G is triangulated.

2. G is 3-chordal.

3. G is decomposable (to a clique tree).

4. G admits a zero fill-in.

5. A PEO exists on the vertex set of G.

A crucial property of PEOs of the input graph G is that they induce an ordering with interesting properties on the fill-in graph Gα.

Definition 4.12 (Perfect Ordering) Let G := hV,Ei be an undirected graph. A total ordering α :=   v1, v2,..., vn of all vertices vi ∈ V is called perfect if for every i : 1 ≤ i ≤ n it holds that the set   vi ∪ madj vi is a complete subset in G.

Lemma 4.13 (PEOs are Perfect Orderings on Triangulated Graphs) For any undirected graph G and a PEO α on the vertices of Gα, α is a perfect ordering on the vertices of Gα.   Proof: From definition 4.9 we know that for a PEO α on the graph G α , it is F α = ∅ on  G α . It must by definition 4.4 hence hold that δi = ∅ for each i such that 1 ≤ i ≤ n − 1. Thus,     if u, w ⊆ madj v then either u = w or u ∼ w. It follows that vi ∪ madj vi is always a  complete subset in G α . 

152 4.4 Phase 1 –Triangulation and Decomposition of the Network

In a graph whose vertices are ordered with a perfect ordering each vertex vi forms a com- plete subset with precisely those of its successors in the ordering which are adjacent to it. We know from theorem 3.1 of (Neapolitan, 1990, p. 103f) that the following holds:

Theorem 4.14 (Perfect Orderings Generate Clique Orderings Having the RIP) Let α be a   perfect ordering of the vertices of an undirected graph G. Be further β := C1, C2,..., Cp an ordering −1   −1  of the cliques in G according to their lowest α-numbered vertex, i.e. β Ci = min α v : v ∈

Ci . Then, β has the running intersection property.

We recognize that computing a triangulation of a given undirected graph using vertex elim- ination consists logically of two steps:

1. Compute an ordering α on the set of vertices of the graph.

2. Use α to compute the fill-in via an instance of EliminationGame.

What has been described so far in this section, shows how the triangulation technique works in principle, whereby any consideration about time complexity and asserted properties of the resulting triangulation had not been addressed. We will briefly elaborate on these aspects before proceeding.

4.4.3 Desired Criteria for Triangulations of Graphs

The basic instance of EliminationGame shown above as algorithm 2 on page 151 does not assert any optimality properties on the resulting triangulation of the input graph. It only guarantees that Gα is triangulated. It is obvious that an input graph can have multiple distinct triangulations which can have quite different properties. A crucial aspect for understanding the vertex elimination technique is that the properties of the resulting triangulation completely depend on the properties of α. From any ordering α appropriate to certain criteria, EliminationGame will derive a triangulation with the cor- responding properties. When adjusting the triangulation algorithm, it is therefore crucial to concentrate on how to produce an appropriate ordering α on the input graph. But which properties should α reasonably have? What can be said about the desired prop- erties the fill-in Fα or the resulting triangulation Gα should satisfy? The perhaps most intuitive optimality criterion for a triangulation technique is to compute a fill-in with minimum cardinality overall possible fill-ins or shortly called minimum fill-in. Intuitively, a minimum fill-in is the smallest possible set of edges whose insertion produces a triangulation of the input graph. It will not be possible to produce a triangulation of the input graph by insertion of any smaller edge set. Minimizing the cardinality of the fill-in was the first of the two triangulation strategies Wen considered in his (1990). We say that a minimum fill-in constitutes a minimum triangulation. It was shown in (Yannakakis, 1981) that the computation of a minimum fill-in for arbitrary input graphs is NP-complete. Yannakakis proved this by reducing the optimal linear arrange- ment problem (the “OLAP”, confer for instance (Garey & Johnson, 1979, p. 200), problem GT42) to the problem of minimum triangulation. With the knowledge currently available, we

153 IV Belief Propagation in Ranking Networks will therefore not be able to describe an algorithm that finds a fill-in that is guaranteed to be minimum in polynomial time on arbitrary input graphs. Though, it is a known fact that EliminationGame indeed could generate a fill-in that is asserted to be of minimal cardinality for arbitrary input graphs. Whether this is feasible depends solely on the properties of α. A minimum elimination ordering is defined to be an ordering on the vertices of the input graph G such that it yields a minimum triangulation of G. If we had a reliable method of producing a minimum elimination ordering, we could use EliminationGame for producing minimum triangulations. The obvious impediment is that since such a method is obviously equivalent with computing a minimum fill-in, it is also NP-complete. Currently, to the best knowledge of the author, it is only known that a minimum fill-in can be computed in polynomial time for graphs with a polynomial number of minimal separators. This is a conclusion of (Bouchitté & Todinca, 2002a) and (Bouchitté & Todinca, 2002b). Minimizing the number of edges in the fill-in seems the most obvious optimality criterion when computing a fill-in. Another prominent example for an intuitive optimality criterion for a triangulation is that it minimizes the number of vertices in the largest clique. This strategy is known as the “treewidth problem”. (Many graph-related problems which are NP-hard for the general case are solvable in polynomial time for graphs with bounded maximal treewidth. This fact makes this problem interesting.) This was another criterion for optimal triangulation considered by Wen. Unfortunately, the treewidth problem is also known to be NP-hard which was shown in (Arnborg et al., 1987). Since the minimum triangulation problem as well as the treewidth problem are both NP- hard, much attention focused on what seems to be the best alternative.

A triangulation Gc of a graph G is called a minimal triangulation if no subgraph of Gc is a triangulation of G. Minimal triangulations are the obvious computationally feasible alternative to both, the minimum triangulation problem and the treewidth problem as (Heggernes, 2006), among others, points out. Note that a minimal triangulation is different from a minimum triangulation in that a min- imal triangulation is not guaranteed to have the minimum number of fill-in edges that could produce a triangulation of the input graph. In fact, there may be a fill-in with lower cardinality that also produces a triangulation. But the minimal triangulation guarantees that the actual solution contains no superflous edges since there is no subgraph of the triangulation that is itself a triangulation of the input graph. The problem of computing a minimal triangulation of an arbitrary input graph is subject of very active research. Possibly the best quite contemporary introduction to the topic is (Heggernes, 2006) providing a brief but complete overview about contemporary discussions and results concerning the minimal triangulation problem, the currently existing algorithms and strategies, their particular asymptotic runtime complexities, and also open questions. An aspect that makes minimal triangulation an amazing problem for graph theorists and algorithmics experts is that currently it is not known, to the best knowledge of the author, which asymptotic time complexity defines a lower bound for the computation time of minimal triangulation. In other words: contemporary challenges exist in both, finding a theoretical lower bound for the asymptotic time complexity of a solution, and finding an algorithm that

154 4.4 Phase 1 –Triangulation and Decomposition of the Network is faster than the fastest algorithm currently known. The vertex elimination technique can produce minimal triangulations if α is chosen appro- priately. An ordering α that generates a minimal triangulation of the input graph is called minimal elimination ordering, or MEO for short. If the established ordering α on G can be guaranteed to be a MEO, a clever implementation of EliminationGame can be used to compute the actual fill-in.

4.4.4 Generating the Elimination Ordering

In (Rose et al., 1976), a technique of lexicographic breadth first search (“Lex-BFS”) on the graph is proposed to obtain a PEO. In the same paper, a simplification of Lex-BFS, called “LEX M” was introduced, which computes a MEO for the input graph G := hV,Ei in an asymptotic runtime of Onm where n := |V|, the number of vertices and m := |E|, the number of edges of the input graph. A second algorithm was proposed that is a sophisticated implementation of EliminationGame for computing the actual fill-in. This algorithm runs in On + m0, where m0 := |E0|, the number of edges of the triangulated result graph. Tarjan and Yannakakis have given the fill-in computation algorithm its final form in (Tarjan & Yannakakis, 1984, p. 570). We will call it the TY-algorithm. It produces a minimal triangulation if α is chosen to be a minimal elimination ordering. TY is a very famous algorithm. For about 20 years from 1984 until 2004, it was the asymp- totic fastest known sequential fill-in computation algorithm and Onm seemed to be the lower time-bound for computing a minimal triangulation even for a longer time – from 1976 until 2004. The fastest sequential fill-in computation algorithm currently (at the time of writing) known is the algorithm introduced by (Heggernes et al., 2005), that runs in on2.376. It should be men- tioned that the also sequential algorithm of Kratsch and Spinrad as introduced in (Kratsch & Spinrad, 2006b) runs in On2.69. The only parallel triangulation algorithm, to the best knowl- edge of the author, was introduced by (Dahlhaus & Karpinski, 1994). It runs in Olog3 n parallel time on Onm processors. Both algorithms compute a minimal triangulation of the input graph by computing a MEO. TY generates a minimal triangulation if the input ordering α is a MEO. Since the creation of α is not part of TY, other algorithms can be used for computing α. Lex-BFS as well as LEX M assign labels to unprocessed vertices and use those labels to decide which vertex is numbered next. In the course of the processing of the vertex set, the labels “grow” in length since each numbering of a vertex may be preceded by adding a new lexicographic symbol to existing labels. While Lex-BFS and Lex M both add to the labels of vertices adjacent to the currently processed vertex, Lex M, additionally, adds to labels of all vertices that are reachable over lower weight paths40 starting on the vertex currently processed. While Lex-BFS generates just a PEO, LEX M generates a MEO. One of the main arguments in (Tarjan & Yannakakis, 1984) is that for recognizing triangulat- edness it is not necessary to know specifically which adjacent vertices have been processed so

40Intuitively, a lower weight path contains only vertices whose labels are younger than the labels of both the start vertex and the vertex on the other end of the path. Remember the remark following lemma 4.6 on page 151. Those paths are known to generate fill-in edges as made explicit in lemma 4.6.

155 IV Belief Propagation in Ranking Networks far. It is only required to know the absolute number of processed vertices. By this argument, it was possible to simplify Lex-BFS substantially to another algorithm called “MCS”, Maximum Cardinality Search, that also generates a PEO on the input graph (cf. ibid.).

Definition 4.15 (Maximum Cardinality Search) Let G := hV,Ei be an undirected graph and let n := |V| be the number of vertices in G. Define an ordering α on the vertices in V by the following method:

1. Assign the number n to an arbitrary vertex v ∈ V.

2. Subsequently, for all positive i < n in descending order, select a vertex u ∈ V such that the set of numbered vertices which are adjacent to u is of maximal cardinality and u has not yet a number assigned. Assign i − 1 to u.

Then α is called an ordering that fulfills the maximum cardinality search criterion. For short, α is called an MCS-ordering.

MCS generates a PEO on the vertices of the triangulated input graph and runs in On + m. The ordering ist not guaranteed to be a MEO. If the ordering α on the vertices of G is obtained by MCS and G is triangulated, then Gα =  G and F α = ∅, confer for instance (Neapolitan, 1990, theorem 3.4 and pp. 103-109). While processing, MCS maintains weights for each vertex where the weight function is de- fined by the cardinality of the already processed vertices adjacent to the currently considered vertex. The difference to Lex-BFS is that Lex-BFS maintains lists of already processed vertices instead of weights. The advantage of MCS is its simplicity. When Lauritzen and Spiegelhalter published their (1988), it was not in the focus of their attention how the input graph could or should be triangulated, they just showed that the technique they proposed worked on triangulations of the input graph. The later (Jensen et al., 1990) does not comment on this question either. The perhaps most comprehensive explication of the LS-strategy was published by Neapoli- tan. He proposes in (Neapolitan, 1990, p. 117) to triangulate the graph using a combination of MCS and TY. First, a PEO α on the vertices of the input graph is established by using MCS. Secondly, the fill-in is computed by using the input graph and α as input for his implementa- tion of TY. This method has an important advantage since it nearly perfectly prepares the decompo- sition of the triangulated graph to a clique tree: once having obtained the triangulation, the ordering on its vertices can be “reused” to obtain an ordering of the cliques having the RIP and therefore immediately generates a clique tree. This will work due to theorem 4.14. Although, Neapolitan’s proposal combines triangulation and decomposition in an elegant way, its main problem is that it lacks an idea how to integrate the recognition of the cliques into the triangulation process. He proposes to perform the clique recognition in a separate step41 using an algorithm for clique recognition on triangulated graphs described by (Golumbic, 1980).

41Confer problem 3 in (Neapolitan, 1990, p. 120).

156 4.4 Phase 1 –Triangulation and Decomposition of the Network

We know due to a result of (Moon & Moser, 1965) that an arbitrary graph with n vertices n may contain up to 3 3 cliques, hence the number of cliques is exponentially on the number of vertices. This fact causes that the recognition of cliques on arbitrary graphs consumes exponential time. However, there are many graph classes that are restricted in the maximal set of cliques. Triangulated graphs are in fact restricted to a linear number of cliques as is known from (Fulkerson & Gross, 1965, p. 852), implying in fact that a triangulated graph with n vertices will contain at most n cliques. This has the comfortable technical consequence that clique recognition on a triangulated graph will be possible in linear time. As theorem 3.63 on page 110 already suggests, one could guess that a number of subsets of V, which is linear on n and completely covers V, is sufficient to establish a potential representation of an NRF on V. Since in fact, the number of cliques in a triangulated graph is in On, this intuition is perfectly met. In general, we will stay with the basic strategy proposed in (Neapolitan, 1990). Substantial progress has taken place on the field of graph triangulation since then, from which we will draw benefit to improve Neapolitan’s approach. Precisely, we will modify Neapolitan’s proposal in four aspects:

1. We will compute a minimal triangulation. Instead of using MCS to compute α, we will compute a MEO by using an improved derivative of MCS, called “MCS-M” that was introduced in (Berry et al., 2004). The subsequent tree decomposition will therefore follow the first scenario described by Wen.

2. We will obtain α and Fα for the input network in only one step by the run of a single algorithm. Instead of using subsequent applications of first MCS (to compute α) and then TY (to compute Fα on G) as Neapolitan does, we will use an implementation of TY that does not need to be passed α as an input parameter. To be precise, α will be computed successively while computing Fα.

3. We will construct the clique tree along with the triangulation. Although, Neapoli- tan shows how to find representative vertices for all cliques of the network, he does not propose a technique for recognizing the actual cliques and, hence, he cannot per- form in-place decomposition while triangulating. We know from (Blair & Peyton, 1993) that MCS can be extended to also efficiently recognize the cliques of the fill-in-graph while computing it. We know further that this step will not change the asymptotic run- time complexity since a triangulated graph with n vertices contains only On cliques. Hence, we will effectively do all three steps – compute α as well as Fα and the clique tree – by the run of just one algorithm.

4. We will compute the separators and residua of the cliques while constructing the clique tree. This is a valuable side-effect of computing the cliques and will allow to obtain a readily configured clique tree that does not need any further intercepting before mes- sage passing can be done.

The main task of the remainder of this section is to combine MCS-M, with the arguments of (Blair & Peyton, 1993), which means to show that the argumentation that is applied to MCS in (Blair & Peyton, 1993) also applies to MCS-M as it is introduced in (Berry et al., 2004).

157 IV Belief Propagation in Ranking Networks

The author of this thesis does not claim to introduce a completely new decomposition algorithm, he just argues that some already known facts can be combined to obtain a quite elegant decomposition algorithm that serves the purpose of supporting updates on a rank- based belief network. The currently best known algorithms were already addressed. We prefer minimal triangulation over taking an algorithm with optimal asymptotic time complexity just for the purpose to demonstrate that all the values related to the epistemic domain can be computed during decomposition. This is not a proposal to generally prefer minimal triangulation over optimal time behavior. The following section introduces the algorithm MCS-M, which we will use to compute α.

4.4.5 The MCS-M Algorithm

The MCS-M algorithm was introduced in (Berry et al., 2004), insightful comments on MCS-M and his “cousin”, the LEX M algorithm were stated in (Villanger, 2006). As LEX M, MCS-M has an asymptotic runtime complexity of Onm. The formal representation as presented in algorithm 3 is taken directly from (Berry et al., 2004).

input : A graph G := hV,Ei output: A triangulated graph hV,E ∪ Fα, αi, ordered with a MEO α on V

1 Set F ←− ∅; 2 foreach v ∈ V do  3 w v ←− 0;

4 for i = n downto 1 do  5 Vertex v ←−arbitrary unnumbered vertex of maximum weight w ; 6 Set S ←− ∅;  7 foreach unnumbered u ∈ V \ v do 8 if hu, vi ∈ E or a path u, x1,..., xk, v exists such that   9 for each j : 1 ≤ j ≤ k vertex xj is unnumbered and w xj < w v then  10 S ←− S ∪ u ;

11 foreach u ∈ S do   12 w u ←− w u + 1; 13 if hu, vi 6∈ E then n o 14 F ←− F ∪ hu, vi ;

 −1  15 α v ←− i; α i ←− v;

16 return hV,E ∪ F, αi; Algorithm 3: MCS-M, a method to compute a minimal fill-in

All relevant proofs concerning the runtime of MCS-M can be found in (Berry et al., 2004). The proof for the correctness of MCS-M is also contained there. That MCS, the ancestor of MCS-M, yields an ordering that has the RIP if and only if the underlying graph is triangulated was first shown by (Tarjan & Yannakakis, 1984) where also MCS was described first. In (Neapolitan, 1990, p. 108f, theorem 3.2), a proof can be found that applies to MCS instead of MCS-M.

158 4.4 Phase 1 –Triangulation and Decomposition of the Network

We will now show that MCS-M can furthermore be used to identify the cliques, their residua, and the relevant separators of the triangulated output graph.

4.4.6 Determining the Set of Cliques of the Fill-In-Graph

As we will see in this section, the ordering computed on the input graph by MCS-M provides strong implications about how to detect the cliques. We know from lemma 4.13 on page 152 and the fact that MCS-M generates a MEO α on Gα that α is a forteriori a PEO for Gα and therefore, that α is perfect on Gα. This entails that in Gα, the triangulated output graph, each element of the set

n  o vi ∪ madj vi : vi ∈ V (4.3) is a complete subset. It is a most helpful result of (Fulkerson & Gross, 1965), that the set (4.3) always contains the complete set of cliques as a subset.

  Lemma 4.16 (Fulkerson & Gross) Let α := v1, v2,..., vn be a PEO on the vertices V of an undirected graph G. Then the complete set CG of all cliques of G is equivalent to the set

        vi ∪ madj vi : vi ∈ V ∧ @ j < i such that vi ∪ madj vi ⊂ vj ∪ madj vj .

  Intuitively, this means that a set vi ∪ madj vi is a clique if and only if it is not contained   in any superset vj ∪ madj vj . We will call the elements of (4.3) clique candidate sets which means in accordance to our intuition that the set of clique candidate sets (4.3) is a superset of CG, the set of cliques of Gα. Lemma 4.16 ensures that this superset is already enumerated by α. Hence, we need not search Gα for potentially “hidden” cliques that presumably might not be contained in (4.3). It is just required to detect which clique candidate sets in (4.3) in fact are cliques while sort- ing out the other candidates. We will now describe a method to perform this check efficiently.

  Lemma 4.17 Let α := v1, v2,..., vn be a PEO on the vertices V of an undirected connected trian- gulated graph G obtained by MCS-M. Then for each vertex label i : n − 1 ≥ i ≥ 1 the following inequality holds true when vertex vi+1 is selected to be labeled:

    adj vi ∩ vj : j ≥ i + 2 ≤ adj vi+1 ∩ vj : j ≥ i + 2 (4.4)

Proof: We carry out the proof by showing the contradiction of the negated hypothesis. We use the symbol vi+1− to denote the time when vertex vi+1 is selected to receive an α-number  (line 5 in MCS-M) and wvi+1− v is the weight of vertex v at this time. Now in contradiction to the hypothesis, let

    adj vi ∩ vj : j ≥ i + 2 > adj vi+1 ∩ vj : j ≥ i + 2 .(4.5)

159 IV Belief Propagation in Ranking Networks

Let further be

i+1 =  ∩  ≥ + −  ∩  ≥ + ∆i adj : adj vi vj : j i 2 adj vi+1 vj : j i 2

 the number of vertices in vj : j ≥ i + 2 that are adjacent to vi but not to vi+1. Note that i+1 >  ≥  ∆i adj 0 in accordance to (4.5). Note further that it holds that wvi+1− vi+1 wvi+1− vi because otherwise vi+1 would not have been selected in step i + 1, since it would not have been of maximal weight. Let

w v  := w v  − adjv  ∩ v j ≥ i + ∆ vi+1− i+1 vi+1− i+1 i+1 j : 2

w v  := w v  − adjv  ∩ v j ≥ i + ∆ vi+1− i vi+1− i i j : 2 be the particular parts of the weights gained by having been increased over some lower weight  paths when vertices in vj : j ≥ i + 2 received their α-number. Inequality (4.5) is true if and only if  >  + i+1 + ∆wvi+1− vi+1 ∆wvi+1− vi ∆i adj h (4.6) for some h > 0. Equation (4.6) implies that α generates a non-zero fill-in on G. This induces a contradiction  to (4.5): since G is triangulated, F α will always be ∅. A more sophisticated way to accept lemma 4.17 is the following: informally, (4.6) means that   w vi+1 must have been increased by lower weight paths more often than w vi , by precisely an amount of weight that is greater than the sum of both, the number of relevant vertices  adjacent to vi but not to vi+1 and the number of paths also having increased w vi .  Each lower weight path from any vertex in vj : j ≥ i + 2 to a vertex vk with k ≤ i + 1,  which increases weight of vk also adds an edge in G α and therefore contributes to the set   adj vk ∩ vj : j ≥ i + 2 . Consequently,

 ∆wvk− vk = 0

i+1 which is, together with the requirement ∆i adj > 0, a contradiction to (4.6). 

(Blair & Peyton, 1993, p. 19f) already shows that lemma 4.17 holds for MCS. This is a little bit easier than the proof for MCS-M since in MCS, the definition of the weight function w is equivalent to

   w vi = adj vi ∩ vj : j ≥ i + 1 wherefrom lemma 4.17 follows immediately. The following two statements, lemma 4.18 and theorem 4.19 are discussed and proven for MCS in (Blair & Peyton, 1993). We will only mention those parts of the proofs that are relevant for the fact that the lemmata hold for MCS-M as well.

  Lemma 4.18 Let α := v1, v2,..., vn be a PEO on the vertices V of an undirected connected trian- gulated graph G obtained by MCS-M and be CG the complete set of all cliques of G. Then for each

160 4.4 Phase 1 –Triangulation and Decomposition of the Network vertex label i : n − 1 ≥ i ≥ 1 the following assertions are equivalent:

   vi+1 ∪ madj vi+1 6∈ C G (4.7)

    adj vi ∩ vj : j ≥ i + 1 = adj vi+1 ∩ vj : j ≥ i + 2 + 1 (4.8)      vi ∪ madj vi = vi ∪ vi+1 ∪ madj vi+1 (4.9)

Proof: We begin by showing that (4.7) implies (4.8). Note that    vj : j ≥ i + 1 = vj : j ≥ i + 2 ∪ vi+1 .(4.10)

The combination of (4.4) and (4.10) leads to:

    adj vi ∩ vj : j ≥ i + 1 ≤ adj vi+1 ∩ vj : j ≥ i + 2 + 1. (4.11)

  Assume now, that (4.7) holds for i + 1, which means, clique candidate set vi+1 ∪ madj vi+1 is not a clique of G. In this case, lemma 4.16 implies that there exists some vertex u ∈    V \ vj : j ≥ i + 1 that is adjacent to every vertex in vi+1 ∪ madj vi+1 . Together with (4.11), the existence of vertex u entails that (4.8) holds for i and i + 1. The proof in (Blair & Peyton, 1993, lemma 4.6, p. 19f) proceeds by showing that (4.8) directly implies (4.9) implying (4.7) in turn.    Theorem 4.19 Let α := v1, v2,..., vn be a PEO on the vertices V of an undirected connected triangulated graph G obtained by MCS-M. Then the complete set CG of all cliques in G is precisely the set that contains   1. v1 ∪ madj v1 and   2. each set vi+1 ∪ madj vi+1 with n − 1 ≥ i ≥ 2 such that

    adj vi ∩ vj : j ≥ i + 1 ≤ adj vi+1 ∩ vj : j ≥ i + 2 (4.12) and no further sets not characterized by those 2 conditions.

Proof: Completely analogous to (Blair & Peyton, 1993, lemma 4.7, p. 20): it follows from      lemma 4.16 that v1 ∪ madj v1 ∈ C G . Now, for each of the sets vi+1 ∪ madj vi+1 for each i : n − 1 ≥ i ≥ 1 it follows from (4.10) and the equivalence of (4.7) and (4.8) in lemma    4.18 that C G contains vi+1 ∪ madj vi+1 if and only if conditions 1 and 2 hold.  Theorem 4.19 provides us with a very simple method to test whether a clique candidate set is a clique or not. In fact, it just needs a single comparison of the cardinalities of two sets and we are done. This is sufficient to identify the cliques of the triangulated output graph.

4.4.7 Inline Recognition of Cliques

Note that due to theorem 4.19 for identifying a clique, we do not need access to the entire fill-in graph. Instead, it is required to access only the sets referenced in condition 2 to decide   whether the set vi+1 ∪ madj vi+1 is a clique or not.

161 IV Belief Propagation in Ranking Networks

It is obvious that at the time when MCS-M assigns label i + 1 to vertex vi+1, the set   madj vi+1 is finally determined since madj vi+1 only contains vertices labeled before vi+1.   Lucidly, the set vi+1 ∪ madj vi+1 will not undergo any changes after vi+1 has received its label.   But note especially that the decision whether vi+1 ∪ madj vi+1 is a clique or not cannot be completed at the very time when vi+1 receives its α-label. Instead, theorem 4.19 requires   the comparison with the subsequent clique candidate set vi ∪ madj vi . Thus, we will not   know whether vi+1 ∪ madj vi+1 is a clique or not before vi has received its label.    In other words, for deciding whether vi+1 ∪ madj vi+1 is a clique in G α , it is sufficient  to have access to the part of G α that is known at the time when vi receives its label.   It is clear that we can enumerate the clique candidate sets vi ∪ madj vi completely while constructing the PEO. While enumerating the candidates we can use inequality (4.12) to iden- tify the cliques among them. To prove that we can in fact identify the cliques inline in the course of MCS-M while constructing α, we will show by arguments of Blair and Peyton, that MCS-M computes the cliques of Gα in blocks.    We choose the vertex vir as representative for the clique Cr := vir ∪ madj vir in G α for any r : 1 ≤ r ≤ p. Note that each clique is therefore represented by the vertex with the clique-minimal α-number. It therefore holds that i1 > i2 > ... > ip. The following lemmata 4.20 and 4.22 were originally stated for MCS in (Blair & Peyton, 1993, p. 20ff). Their proofs apply regardless whether α is obtained by MCS or by MCS-M, hence it is not required to reproduce the proofs.

For convenience, let i0 := n + 1.   Lemma 4.20 Let α := v1, v2,..., vn be a PEO on the vertices V of an undirected connected trian-   gulated graph G obtained by MCS-M. Let C G := C1, C2,..., Cp w.l.o.g. be the complete set of G > > > all cliques of and let vi1 , vi2 ,..., vip be the representatives of the cliques such that i1 i2 ... ip. It then holds for each r : 1 ≤ r ≤ p that

r  [ vj : j ≥ ir = Ck. k=1

For the proof, consult (Blair & Peyton, 1993, lemma 4.8, p. 20f). As Blair and Peyton point out for MCS, lemma 4.20 expresses that α enumerates the cliques by their representatives in contiguous blocks. The blocks are characterized as follows:

 = = vi1 , vi1+1,..., vi0−1 vn C1  = \ vi2 , vi2+1,..., vi1−1 C2 C1  = \ ∪  vi3 , vi3+1,..., vi2−1 C3 C2 C1 . (4.13) . p−1  = = \ [ v1 vip , vip+1,..., vip−1−1 Cp Ck k=1

The main result of section 4.4.6 is that we are able to safely identify a clique by performing

162 4.4 Phase 1 –Triangulation and Decomposition of the Network an inexpensive comparison between cardinalities of sets. The main result of this section is that the cliques will be enumerated blockwise by MCS-M. The combination of both results shows that it is possible and indeed quite easy to extend MCS-M with an inline recognition mechanism for the cliques of its output graph. We will illustrate this mechanism with an example:

Example 4.21 We will now run MCS-M on an input graph, illustrating how it triangulates the graph. Additionally, we will use inequality (4.12) from theorem 4.19 to decide whether the clique candidate set considered in the previous loop run was a clique. Consider the following graph:

Figure 4.1: A possible input graph.

Let this graph be the input for a run of MCS-M. Since n = 7, we start the main loop of MCS-M in line 4 with i = 7. In line 5, MCS-M selects a vertex for receiving its label. Then, the weights of the vertices adjacent to the selected vertex are increased. The algorithm also increases the weights of all vertices that are reachable via lower weight paths from the currently selected vertex. Thereafter, the next vertex is selected for being labeled, respecting the weights thereby. Figure 4.2 illustrates the situation after line 15 of MCS-M has been executed immediately before the main loop is restarted: vertex v7 has just received its label. Since MCS-M chooses an arbitrary vertex of maximum weight and for the loop run i = 7 each vertex has weight

0, each vertex could have been selected as v7. The small numbers in parentheses beside unnumbered vertices represent the weights the vertices have at this time. Vertex v7 marked with doubled lines is the vertex with the smallest α-label in the current clique candidate set   while the current clique candidate set v7 ∪ madj v7 is encircled by the dotted line. (Note   that in each set vi ∪ madj vi vertex vi is marked by doubled lines. In case the candidate is a clique, this vertex will be the representative of this clique.)

(1) (1)

7 (1)

 Figure 4.2: Showing v7 ∪ madj v7 , just after i = 7.

163 IV Belief Propagation in Ranking Networks

   Figure 4.2 illustrates that v7 ∪ madj v7 is just v7 since madj v7 is clearly empty. Whether   the clique candidate set v7 ∪ madj v7 is in fact a clique or not cannot be decided by the information that is currently available, so we proceed to a possible result of the next loop run i = 6, illustrated by figure 4.3.

(2) 6

7 (1)

 Figure 4.3: Candidate v6 ∪ madj v6 , just after i = 6.

Comparing the situations shown in figure 4.3 and figure 4.2 we can draw the informative   42 conclusion, that v7 ∪ madj v7 is indeed not a clique since inequality (4.12) is false . Re- solving the inequality with the actual parameter i = 6 shows:

?     adj v6 ∩ vj : j ≥ 7 ≤ adj v7 ∩ vj : j ≥ 8 ?    adj v6 ∩ v7 ≤ adj v7 ∩ ∅ 1 6≤ 0

 Whether v6 ∪ madj v6 is a clique or not cannot be decided at the moment, so we proceed to the end of loop run i = 5 as shown in figure 4.4. MCS-M has recognized a lower weight  path from v5 to another vertex with non-zero weight and therefore added an edge to F α . The new edge is drawn dashed in figure 4.4.

(1) (1)

(1) 5 6 (2) 5 6

7 4 7 (2)

Figure 4.4: After i = 5. Figure 4.5: After i = 4.

  In this situation, we cannot decide whether v5 ∪ madj v5 is a clique but when comparing the situation in figure 4.4 to the situation in figure 4.3 we recognize by (4.12) that the candidate   set v6 ∪ madj v6 encircled by a dotted line in figure 4.3 is not a clique.

42   We further note that lemma 4.16 implies the already stated fact that v7 ∪ madj v7 is not a clique since it is   contained in v6 ∪ madj v6 . We also note equations (4.8) and (4.9) from lemma 4.18 hold true. This is consistent with (4.7). In fact, theorem 4.19 provides the only relevant test for deciding whether the candidate set is a clique or not, so we will not always demonstrate that the other supportive lemmata are true or false in a particular situation.

164 4.4 Phase 1 –Triangulation and Decomposition of the Network

The next loop run to consider is i = 4. Comparing figures 4.5 and 4.4 we recognize by  checking whether (4.12) holds between i + 1 = 5 and i = 4 that the candidate set v5 ∪  madj v5 is a clique. Thus, the first clique C1 is correctly identified.  Note in particular that the new edge v4, v5 added in loop run i = 5 is respected when we  consider the set madj v4 in loop run i = 4. (Remember that v5 was not adjacent to v4 when loop run i = 5 started.) The clique index r starts at 1 and is increased whenever a clique is completed. The clique marked in figure 4.4 is called C1. Note that v5 is the vertex with the smallest α-label in clique   C1 = v5 ∪ madj v5 and therefore the representative for C1, so i1 = 5. We increase r by 1, starting to identify clique C2.

(1) (2) (2) 2

3 5 6 3 5 6

4 7 4 7

Figure 4.6: After i = 3. Figure 4.7: After i = 2.

The next loop run with i = 3 shown in figure 4.6 adds a new edge by a lower weight path   and reveals that also v4 ∪ madj v4 marked in figure 4.5 is a clique. This result completes   the identification of clique C2 = v4 ∪ madj v4 with v4 being the vertex with the smallest α-label in clique C2, so i2 = 4.

1 2

3 5 6

4 7

Figure 4.8: After i = 1.

We proceed by labeling v2 and v1 as well and identify cliques C3 and C4 in the same way. This can be quite easily read off from figures 4.6, 4.7, and 4.8. Note that theorem 4.19 ensures     v1 ∪ madj v1 = C5, so for the set v1 ∪ madj v1 no application of (4.12) is required. Considering the representatives for the identified cliques, we note that

i1 = 5 > i2 = 4 > i3 = 3 > i4 = 2 > i5 = 1.

The example also illustrates (4.13), which can easily be noticed when comparing the fig- ures. 

165 IV Belief Propagation in Ranking Networks

The inline recognition of the cliques is one of the specific improvements of our approach to Neapolitan’s proposal. To complete the task of decomposition, it is required to also show that from the succession of cliques MCS-M generates, a valid clique tree can be constructed. The next section will show that this is possible and can indeed be performed inline, too.

4.4.8 Inline Tree Construction

Remember that since α is perfect on Gα, α has the RIP. This entails that α can be utilized for constructing the clique tree.   Let the ordering of all cliques C1, C2,..., Cp be generated by MCS-M as described by (4.13). We define function43 clqidx : V → 1, . . . , p yielding for each vertex v ∈ V the lowest index r of any clique in this ordering that contains v:

  clqidx v := min r : v ∈ Cr .

  Lemma 4.22 Let α := v1, v2,..., vn be a PEO on the vertices V of an undirected connected trian-   gulated graph G obtained by MCS-M. Let C G := C1, C2,..., Cp be the complete set of all cliques G > > > of and let vi1 , vi2 ,..., vip be the representatives of the cliques such that i1 i2 ... ip. Then, for each r : 1 ≤ r ≤ p − 1 we choose

n −1   o j := min α u : u ∈ Cr+1 ∩ vk : ir ≤ k ≤ n , the smallest α-label in the intersection set and  s := clqidx vj .

It then holds that  Cr+1 ∩ vk : ir ≤ k ≤ n = Cr+1 ∩ Cs.(4.14)

The proof for this lemma can be understood by consulting the proof for (Blair & Peyton, 1993, lemma 4.9, p. 21f).   Note that this lemma also proves that the clique ordering C1, C2,..., Cp has the RIP since it implies that there always exists an s such that (4.14) holds. This is just another representation of the RIP. This leads to a rule for determining the parent of a clique in the clique tree: the parent   −1  of clique Cr+1 is the clique Cs for which s := clqidx vj with j := min α u : u ∈  Cr+1 ∩ vk : ir ≤ k ≤ n . It was first shown by (Tarjan & Yannakakis, 1984) (cf. for example ibid. p. 573ff) that this technique can be used to define an appropriate parent function.

Example 4.23 In example 4.21 starting on page 163, consider the step i = 4, when we are just about to complete clique C1 and start identifying vertices in clique C2, which means r is currently 1 but it is clear that it will be increased. It is now possible to determine the parent

43This function is not to be mixed up with function clq as defined on page 130 in theorem 3.84. Function clqv just returns the index of a clique that contains v as well as pa(v).

166 4.4 Phase 1 –Triangulation and Decomposition of the Network

Cs for clique C2 by first computing

 −1   j = min α u : u ∈ C2 ∩ vk : 5 ≤ k ≤ 7  −1   = min α u : u ∈ C2 ∩ v5, v6, v7  −1   = min α u : u ∈ v5, v7 = 5 and then

 s = clqidx v5 = 1.

By this argumentation, clique C1 will be the parent of clique C2.  There is a subtlety to be respected when performing the parent construction inline: Note that at the time when j has to be computed, not every element of clique C2 is identified. Clearly, loop run i = 4 will be finished without having C2 completed: this run completes C1 while clique C2 may be completed at the earliest in loop run i = 3, precisely if the candidate   set v4 ∪ madj v4 really is a clique. Nonetheless, the information we can access during loop run i = 4 is sufficient to compute  j. The intersection of C2 and vk : 5 ≤ k ≤ 7 will not contain any vertex with a label lower than 5. Each vertex in C2 yet unnumbered is therefore excluded to occur in the intersection. The only vertices already numbered that will be contained in clique C2 are the vertices in   the candidate set v4 ∪ madj v4 . Since v4 is excluded from the intersection, it is in fact   sufficient to quantify over madj v4 ∩ vk : 5 ≤ k ≤ 7 when choosing vertex u. Therefore, when computing the parent clique inline, we may compute j as n o = −1  ∈  ∩  ≤ ≤ j min α u : u madj vir+1 vk : ir k n .

It is lucid that the computation of s is ensured to succeed: since we enumerate the cliques in ascending order, the computation of s will yield a correct result for each input vertex that already occurred in a clique. Vertex vj will always have been assigned to a clique due to the definition of j. We have shown that MCS-M has in principle the facility to detect the cliques of Gα and provides an implicit clique ordering sufficient for establishing a clique tree. To complete the task of decomposition, it remains to be shown how precisely the clique tree can be constructed. Mimicking the strategy of (Blair & Peyton, 1993), we will introduce the weighted clique intersection graph of a triangulated graph and then show that deriving a clique tree from a triangulated graph is equivalent to obtaining a maximum-weight spanning tree on the corre- sponding weighted clique intersection graph. These notions will be defined immediately.

Definition 4.24 (Weigthed Clique Intersection Graph) Let G be a triangulated graph with the   ∗   complete set of all cliques C G := C1, C2,..., Cp . Be E ⊆ C G × C G a set of edges defined ∗  ∗  as follows: E := hCi, Cji : Ci ∩ Cj 6= ∅ for each i, j : 1 ≤ i < j ≤ p. Be w :E → N \ 0

167 IV Belief Propagation in Ranking Networks

  ∗ a function defined as w hCi, Cji := Ci ∩ Cj . We then call the ordered triple hC G ,E , wi the weighted clique intersection graph of G, denoted by KG, w.

Definition 4.24 is a stronger formalized variant of the definition found in (Blair & Peyton, 1993, p. 13). A weighted clique intersection graph is very near to our intuition: its vertices are the cliques of the original graph and each pair of cliques with a non-empty intersection are connected by an edge. All edges are labeled with the cardinality of the intersection set of the two cliques. Figure 4.9 shows the graph of example 4.21 with all its cliques. Figure 4.10 shows its weighted clique intersection graph.

C1 1 2 2 C5 C4 1 1 3 5 6 C2 C3 C1 2 1 C C C 2 3 2 4 4 7 1 2 C5

Figure 4.9: A DAG G with 5 Figure 4.10: The weighted clique intersection  cliques and a PEO α. graph K G, w .

Note that any triangulated graph has a unique weighted clique intersection graph associ- ated. It is very obvious that the weighted clique intersection graph contains the possible clique trees of the original graph as subgraphs. For an illustration confer figure 4.11.

C1

2 1 1 C2 2 1 C C 3 2 4 1 2 C5

Figure 4.11: A hint to Max-WSTs in KG, w according to figure 4.10. Every dashed edge is part of each Max-WST. The maximum weight of 8 of any spanning tree can easily be read off.

168 4.4 Phase 1 –Triangulation and Decomposition of the Network

Definition 4.25 (Maximum Weight Spanning Tree) For a graph G := hV,Ei and a function w : E → N the tree T := hV,E0i with E0 ⊆ E such that there is no tree T ∗ := hV,E00i with E00 6= E0   and ∑e∈E00 w e > ∑e∈E0 w e , T is called a maximum-weight spanning tree of G or Max-WST for short44.

Function w assigns weights to the edges of the graph G. The maximum-weight spanning tree is informally the tree that connects all vertices of G containing a subset of the edges of G such that the entire weight of the edges of the tree is maximized. A maximum-weight spanning tree of a graph need not to be unique, which is exemplified in figure 4.12.

C1

C2 C3

C3 C2 C4

C1 C5 C4

C5

Figure 4.12: Two different Max-WSTs on KG, w. The set of doubled edges in figure 4.11 can generate each of these trees by choosing the root vertex appropriately. The parent finding strategy derived from lemma 4.22 generates the left tree.

Now note the following result of (Bernstein & Goodman, 1981):

Theorem 4.26 (Bernstein & Goodman) Let G be a connected triangulated graph and KG, w its associated weighted clique intersection graph. Then, the set of all maximum-weight spanning trees on KG, w is equivalent to the set of all clique trees of G.

In a lucid form, this is also proven in (Gavril, 1987, p. 596, theorem 1). We therefore note that it is possible to construct a clique tree of the triangulated network by just computing a Max-WST on its weighted clique intersection graph. For this task, we use the well-known algorithm of Prim as introduced in (Prim, 1957). (Blair & Peyton, 1993, p. 18) presents a modification of Prim’s algorithm specifically for constructing the clique tree. We utilize this result and present it as algorithm 4 on page 170. As can be read off from the pseudo-code, the algorithm starts with an empty graph and then constructs the Max-WST successively by adding edges and vertices to it.

44The abbreviations “MST” and “MWST” that seem to be obvious to use may be misleading since they are usually used for the term “minimal (weight) spanning tree” – which is exactly the opposite of a Max-WST.

169 IV Belief Propagation in Ranking Networks

input : A weighted clique intersection graph KG, w output: A Max-WST on KG, w

1 set Et ←− ∅ ;  2 Choose a clique C from the complete set of p cliques C G ;  3 set Kt ←− C ; 4 for r = 2 to p do 0  0 5 Choose a pair of cliques C ∈ Kt and C ∈ C G \ Kt such that C ∩ C is maximum ; 0 6 Et ←− Et ∪ hC, C i ;  0 7 Kt ←− Kt ∪ C ;

8 return hKt,Eti ; Algorithm 4: Prim’s (modified) algorithm, a method to compute a Max-WST

Theorem 4.27 (Each MCS-M-ordering on G is equivalent to a Max-WST on KG, w) Let α be an ordering on the vertices of an undirected graph G obtained by MCS-M. Then, the representatives of all cliques in G occur in α in the identical order the cliques in G are searched by an application of algorithm 4 on KG, w.

Theorem 4.27 equals theorem 4.10 of (Blair & Peyton, 1993, p. 22f) where the proof can be found. It applies to MCS-M as well. The same relationship as in theorem 4.27 was originally discovered to hold between Prim’s algorithm and MCS. To the best knowledge of the author, this was first shown in (Blair et al., 1988)45. A very convenient explanation of this is also given in (Blair & Peyton, 1993, p. 18f). The results of (Jensen & Jensen, 1994) show that the inline construction of the clique tree indeed can be extended to constructing a clique tree that is optimal in respect to having min- imal cost in a given cost measure. It is presupposed that the entire cost measure is strictly increasing in the local measures. The situation is then that besides the weight w each edge of KG, w is assigned an additional cost c and the construction algorithm has to respect this cost as a second priority. The authors argue, that a version of the algorithm of Kruskal can be used instead of algorithm 4. Once, when stage i of Kruskal’s algorithm has been completed, i i i all edges of cost c1, c2,..., ci have been chosen. This set generates a forest T1 , T2 ,..., Tk of  partial Max-WSTs on K G, w . Now, stage i + 1 adds edges of cost ci+1 to the forest and elim- inates the cycles by removing edges of cost ci+1. The authors call this step “thinning”. They argue that the thinning can be performed by Kruskal’s algorithm having the cost measure as selection parameter to construct a cost-minimal clique tree. For the details, consult (Jensen & Jensen, 1994, p. 362f). This method parametizes the clique tree construction with an addi- tional optimality criterion implemented by a cost measure while completely preserving inline computation. For applications that define optimality criteria for the permanent belief base, this may be a valuable extension. Having completed this section, we now know in principle

1. how to identify the cliques inline and

45Unfortunately, this technical report is no longer available from the Department of Electrical Engineering and Com- puter Science of the University of Tennessee and the author did not gain access to this text.

170 4.4 Phase 1 –Triangulation and Decomposition of the Network

2. how to establish parental relationships between them to construct a valid clique tree inline.

Along with the knowledge about potential representations of NRFs developed in chapter III this results are sufficient to characterize the decomposition algorithm.

4.4.9 An Algorithm for Decomposing a Moralized Ranking Network

In this section, we have so far described the main aspects of a method for obtaining a trian- gulation of the input graph in Onm, which allows us to also obtain a clique tree from the triangulated graph. This method is formally described in algorithm Decompose on page 172. Note that the main for-loop starting in line 6 performs the same vertex traversal as in MCS- M. The body of the loop contains the MCS-M algorithm in lines 7 until 17. This part computes the MEO on the vertices of G and therefore induces a minimal triangulation. The block from line 18 until the end of the for-loop implements a specialized form of Prim’s algorithm and computes the cliques as well as the edges of the clique tree. Line 19 uses theorem 4.19 to decide whether to start a new clique. The block starting with line 25 computes the edge to the parent clique. Note that j is the “oldest” α-index in the current clique. Lemma 4.20 ensures that it is part of the separator and hence of the parent clique. Of all cliques containing vj, choose the one with the lowest labeled representative and make it the parent. (Since the current vertex will have a lower α-index than vj it is not yet added to the current clique while the parent is chosen. Note that this is done only later in line 32 .) We compute the cliques of Gα, those of G are irrelevant. Hence, each edge in F must be respected while computing the cliques. Therefore, when considering madjv in the course of each loop-run, we respect in fact only the vertices that already have an α-index and the edges  E ∪ Em ∪ F. Note that this is exactly the part of G α we already know during the current loop- run. This strategy is completely legal since madjv will not be changed by vertices labeled later than v because those will have labels smaller than αv. Note that this modification does not affect that lemma 4.17 holds for this implementation of MCS-M . Underway, the foreach-loop starting in line 20 computes function clq for each vertex u   in the current clique Ci for which clq u = i. Line 22 utilizes clq to compute ψi. Since the treatment of clique Cp is not captured by these lines, we have to compute the potential function of Cp explicitly in lines 38-40 after having exited the main for-loop that iterates over i. The algorithm enumerates the cliques from 1 to p while the edges of the clique tree are  collected in Et. Thus, the ordered pair h C1, C2,..., Cp ,Eti represents the cliques and the set of edges connecting the cliques to a clique tree. Clique C1 is the root vertex of the clique tree. Since each triangulated graph with n vertices contains a number of cliques that is in On, it is suggested that the cliques can be enumerated in linear time using the PEO. Line 31 computes function clqidx subsequently for each vertex. Dependent on the imple- mentation, this enables to do the concrete computation of clqidx only once for each vertex v and access the available value when clqidxv is called.

171 IV Belief Propagation in Ranking Networks

input : A ranking network R := hV, E, κi and a set of moralizing edges Em output: A clique tree TG based on a minimal triangulation of hV,Ei

1 F, Et ←− ∅ ; 2 CRDprev, CRDcurr, r, s ←− 0 ; 3 foreach v ∈ V do  4 w v ←− 0 ;  5 clq v ←− 0 ;

6 for i = n downto 1 do 7 vertex v ←−arbitrary unnumbered vertex of maximum weight w ; 8 S ←− ∅ ;  9 foreach unnumbered u ∈ V \ v do 10 if hu, vi ∈ E ∪ Em or a path u, x1,..., xq, v exists in hV,E ∪ Emi such that   11 for each k : 1 ≤ k ≤ q vertex xk is unnumbered and w xk < w v then  12 S ←− S ∪ u ;

13 foreach u ∈ S do   14 w u ←− w u + 1 ; 15 if hu, vi 6∈ E ∪ Em then  16 F ←− F ∪ hu, vi ;

 −1  17 α v ←− i ; α i ←− v ;  18 CRDcurr ←− madj v in hV,E ∪ Em ∪ Fi ; 19 if CRDcurr ≤ CRDprev then  20 foreach u ∈ Cr such that clq u = 0 ∧ pa (u) in hV,E ∪ Emi ∈ Cr do  −1  21 clq u ←− r ; clq r ←− u ;   22 ψr cr ←− ∑X:clq(X)=r κ x | paX in hV,Ei ; 23 r ←− r + 1 ;  24 Cr ←− madj v in hV,E ∪ Em ∪ Fi ; 25 if CRDcurr > 0 then   26 j ←− min α u : u ∈ Cr ; −1  27 s ←− clqidx α j ; 28 Et ←− Et ∪ hCr, Csi ; 29 Sr ←− Cr ∩ Cs ; 30 Rr ←− Cr \ Sr ;  −1  31 clqidx v ←− r ; clqidx r ←− v ;  32 Cr ←− Cr ∪ v ; 33 if s > 0 ∧ v ∈ Cs then  34 Sr ←− Sr ∪ v ; 35 else  36 Rr ←− Rr ∪ v ;

37 CRDprev ←− CRDcurr;  38 foreach v ∈ Cr such that clq v = 0 ∧ pa (v) ∈ Cr do  −1  39 clq v ←− r ; clq r ←− v ;   40 ψr cr ←− ∑X:clq(X)=r κ x | paX in hV,Ei ;  41 return h C1, C2,..., Cr ,Eti ; Algorithm 5: Decompose, a method to decompose a ranking network to a clique tree

172 4.4 Phase 1 –Triangulation and Decomposition of the Network

The if-else-block starting in line 33 computes the separators and residua for each clique. Those have to be known for the subsequent update algorithm. Therefore, it is an important side effect of Decompose to compute them. Also note that after Decompose has finished we know the potential functions ψi for each clique and have therefore access to a potential representation of κ.

For a more intuitive understanding of the decomposition process, consult appendixA that presents an example of a complete run of Decompose on an actual multiply connected input graph.

In the way it is presented here, Decompose presupposes an already moralized network as input. At the same time, it presupposes access to the original ranking network since it needs access to the set of parents of each vertex. Thus, either we pass the original ranking network as input or, otherwise, the undirected and already moralized graph, as well as κ and for each vertex the set of parents of the original network. Choosing the first case, we also have to pass a set of edges that completes this network to a moral graph. In the second case it is a technical requirement to keep the knowledge about the parents of each vertex. Which of those alternatives is actually chosen is of no importance here and may be decided from the application context. The given representation of Decompose implements the first alternative.

The Decompose algorithm is therefore an effective device for initializing and re-initializing the permanent belief base.

The permanent belief base changes whenever epistemic factors change, i.e. when either the set of observed variables is modified by removing or adding variables or otherwise when the knowledge about causal influence – and hence the dependency model – changes. Both changes correspond to formal modifications in the characteristics of G, which means, at least one of either V or E changes.

In those cases, Decompose has to be re-performed on the modified network to update the structure of the permanent belief base. The Decompose algorithm therefore implements the first of the two phases of the update algorithm as discussed in the introduction of this chapter. It initializes the permanent belief base, which is then capable of receiving and incorporating a sequence of updates. This is the only epistemically relevant aspect of Decompose, and it is of purely formal nature: the inherent structure of the belief base is of no philosophical importance. But it is of course important to prove that there indeed exists a computationally feasible method to construct it and keep it current. It was already shown that this proof was introduced for the probabilistic view by the works of Pearl, Neapolitan, Lauritzen and Spiegel- halter and many others. For the ranking-theoretic view, the works of Spohn and Hunter have shown that ranking theory is a basis for feasible techniques of updating. But those techniques have not been made concrete for multiply connected networks. The specific contribution of this thesis is to solve this task.

Before we move on to section 4.5, where the message passing mechanism is developed that forms the second phase of the algorithm, we should shortly comment on why, precisely, the epistemic relevance of those formal methods is quite weak.

173 IV Belief Propagation in Ranking Networks

4.4.10 ADigression on Triangulatedness and the Epistemic Domain

In the context of this thesis, triangulation is motivated technically since it is a necessary step for performing decomposition – which is required in implementing the update strategy. As we know from the discussion of definition 3.53 on page 104, adding edges to a graph does not affect its I-mapness relative to an NRF κ. In fact, the triangulated network may no more contain every independence relationship that was present in the original network or in the causal list. But in each case, the triangulated network will be an I-map of κ if the original network was an I-map of κ. Two questions arise:

1. Does triangulatedness of the network correspond to a common structural property of belief bases (as, for instance, I-mapness does)?

2. Does the particular triangulation strategy have any epistemically relevant impact on the results of the update process?

Question 1 seems natural since a close correspondence of formal properties and epistemic concerns structured the entire analysis in chapter III and one would expect this pattern to be maintained in the argumentation. In fact, the answer is “no”: we treat triangulatedness as a technical property of graphs. It has no corresponding origin in the epistemic domain. Remember that a directed edge from a vertex v to a vertex u (which makes v the parent of u) denotes causal influence v has on the instantiation of u. The I-map of κ thus represents the subjective knowledge about causal influence relationships in the observed domain, which are represented as variables. Triangulation in fact restructures the supposed influence rela- tionships of the entire network in the form of triangles, hence, for each vertex v there are two other vertices u, w such that v is parent of both, u and w and, additionally w is parent of u without loss of generality.

v

w u

Figure 4.13: A (directed) triangle, the basic structure of a triangulated DAG.

This structure intuitively states the following relationship: v and also w both have a direct causal influence on u and, additionally, v has a direct influence on w, which means v influences u in two ways, on the one hand directly via its parenthood of u and on the other hand indirectly via taking influence on w. In respect to the diagnostic inference, the instantiation of u indicates an evidence about w directly and, additionally, indirectly via v. Although, it is in no way excluded that a particular distinct domain knowledge may reveal a structure that is “naturally” triangulated, it is also in no way guaranteed and does not correspond to experience with the epistemic domain.

174 4.4 Phase 1 –Triangulation and Decomposition of the Network

One may try to see triangulatedness as a kind of epistemic rule of thumb: each effect has direct and indirect causes, but this is not exactly what is expressed in the triangle structure. The triangle additionally expresses that there is always a particular factor that is both, a direct as well as an indirect cause for an observed effect. This is clearly not a common property of belief bases or knowledge structures. In fact, this is all that is to state concerning the concrete question of a correspondence between triangulatedness and the epistemic domain. However, there is an important point that is more general: why should triangulatedness cor- respond to any common epistemic property? Would that increase the quality of the graphical model? Would it even reveal new philosophical insights to the community? Well, maybe – if that correspondence were in fact existent. The fact that it is not does not reveal anything help- ful and does especially not qualify or disqualify the update strategy or the graphical model of ranking networks in any way. Instead, it reveals an erroneous intellectual habit the arguer of question 1 may have concern- ing the interpretation of formal models. Having the intuition that certain - or all - properties of a formal belief model should correspond with some intuitions about properties of beliefs is in no way intuitive or reasonable. To demand such a close correspondence between formal epistemic models and the epistemic domain itself is a misunderstanding about formal models. The formal model of belief offered by ranking theory does not claim to be a complete and lossless formalization of all aspects about belief that were uncovered or may be uncovered in the future by the neuro sciences, computer science, or philosophy. A formal model just concentrates on certain aspects of the domain it models. In fact, especially this makes it a model of the domain. Finding formal concepts of informal aspects of the domain can show that a formally precise and therefore exact model of the aspect in question is indeed possible. Usually, success in the effort of formalization is interpreted as a strong evidence that the particular aspect of the domain captured by the model is in an intuitive way consistent and lucidly structured. But remember that having an intuitively consistent and lucidly structured conception of a domain does not mean that this conception corresponds to observable states of the world. In particular, this does not mean that each formal property of the model must correspond to a property in the domain. It only implies that the model must not have properties that contradict with properties of the domain that are intentionally part of the model. It is therefore in no way required or meaningful if we apply formal operations to the belief base that do not correspond with aspects of the epistemic domain. We are allowed to do so and it cannot surprise us that the formal treatment of our model implies us to apply formal operations on it for which we have no blueprint in our domain. Question 2 is relevant since we found our analysis on the normative paradigm and are therefore interested in optimal results. Thus, from a merely epistemological point of view it may not be highly relevant whether the epistemic update process is optimal in the use of time, space and bandwidth, but it is definitely relevant if the update process leads to exact and correct inference. This is a point where the attention of the philosopher diverge from those of the engineer who additionally intends to develop resource-minimal solutions. This divergence is also lucidly illustrated by the fact that from the entire material presented in this chapter, the decomposition method is – from the engineer’s point of view – the only

175 IV Belief Propagation in Ranking Networks part that is challenging. Due to the complete independence of this aspect from the epistemic domain, from the philosopher’s point of view, decomposition is at the same time the least interesting of all the aspects. In general, the answer to question 2 is “no”. As an example, consider an NRF κ and an 1 2 I-map D of κ. Now let Dc and Dc be distinct triangulations of D. We decompose each of them to corresponding clique trees, which we call T1 and T2. Now, an evidence incorporated in both T1 and T2 and the subsequent update propagation will yield the identical posterior NRF κ’ on both trees. Insofar the details concerning the triangulation strategy are not of the same relevance for the epistemic domain as the properties that are closely connected with the domain of ranking functions.

4.5 Phase 2 –Message Passing on the Clique Tree

4.5.1 Local Computation on Cliques

After having shown how the clique tree can be obtained, it remains to be shown how update computation on the tree is to be performed. As discussed in section 4.3, we have to provide each vertex with the information that is required to perform its specific update locally. The theorems discussed in this section correspond to properties of probability measures and have a strong similarity to theorems 7.1-7.5 stated by (Neapolitan, 1990, p. 253–257) for probability distributions. The fact that those assertions can be transferred to ranking theory makes also the LS-strategy feasible for ranking networks. The reason for this likeness is that those technical properties form the computational basis to apply the LS-strategy. Taking the formal deduction of the algorithm in probability theory as a blueprint, we will be able to prove the applicability of the LS-strategy for NRFs. Hence the development will be mostly parallel to probability theory underlining the strong family resemblance of ranking theory and probability theory.

Theorem 4.28 Let V be a finite set of variables and let κ be an NRF for the algebra AV. Let  Wi : 1 ≤ i ≤ p be a set of subsets of V. Then it holds for 1 ≤ i ≤ p and any actual values  wi ∈ cd Wi with the correspondingly induced actual values ri, si with wi = ri ∩ si that

  κ wi | si = κ ri | si .

Proof: Starting with the definition of conditional negative ranks, we obtain:

   κ ri | si = κ ri ∩ si − κ si .

Since by definition 3.64 on page 110 it is Ri ∪ Si = Wi and Ri ∩ Si = ∅. It therefore holds that ri ∩ si = wi = wi ∩ si and thus

  = κ wi ∩ si − κ si  = κ wi | si . 

176 4.5 Phase 2 –Message Passing on the Clique Tree

Theorem 4.29 Let V be a finite set of variables and let κ be an NRF for the algebra AV. Let   Wi : 1 ≤ i ≤ p ; ψi : 1 ≤ i ≤ p be a potential representation of κ and suppose the ordering   Wi, W2,..., Wp having the RIP. Let all actual values to any variables be induced by the same arbi-  trary but fixed actual value v that instantiates V. This entails that for any actual value wi ∈ cd Wi the correspondingly induced actual values ri for the residuum and si for the separator it holds that wi = ri ∩ si.

It then holds that:     κ rp | sp = ψp rp ∩ sp − min ψp r ∩ sp . r∈rg(Rp)

Proof: Definition 2.19 on page 58 entails:

   κ rp | sp = κ rp ∩ sp − κ sp .(4.15)

  Now, let Up := V \ Rp ∪ Sp which, accordingly, implies Sp = V \ Up ∪ Rp . Remember (3.2) on page 91 which allows us to represent rp ∩ sp by minimizing over the actual values of Up and, correspondingly, sp by minimizing over the actual values of Up ∪ Rp. Applying (3.2) and substituting both terms on the right side of (4.15) by their potential representations yields:

 p  = min K + ∑ ψi(ri ∩ si ∩ u) u∈rg(Up) i=1  p  0 0 − min K + ψ r ∩ si ∩ u 0 ∑ i r ∈rg(Rp) i=1 0 u ∈rg(Up) 0 0 r ∩u ∈rg(Up∪ Rp)

We subtract K. Due to the RIP it holds that Up ∩ Rp = ∅ and it follows therefore:

 p  = min K + ∑ ψi(ri ∩ si ∩ u) u∈rg(Up) i=1  p  0 0 − min min K + ψ r ∩ si ∩ u 0 0 ∑ i r ∈rg(Rp) u ∈rg(Up) i=1

Since also Up ∩ Wp = ∅ we can transform this to the following:

 p−1   = ψp rp ∩ sp + min K + ∑ ψi(ri ∩ si ∩ u) u∈rg(Up) i=1 p−1 ! 0  0 0 − min ψp r ∩ sp + min ψ r ∩ si ∩ u . 0 0 ∑ i r ∈rg(Rp) u ∈rg(Up) i=1

177 IV Belief Propagation in Ranking Networks

Due to the RIP it is Wi ∩ Rp = ∅ for 1 ≤ i ≤ p − 1 and therefore it follows:

 p−1   = ψp rp ∩ sp + min K + ∑ ψi(ri ∩ si ∩ u) u∈rg(Up) i=1 p−1 !   0 − min ψp r ∩ sp + min ψ ri ∩ si ∩ u 0 ∑ i r∈rg(Rp) u ∈rg(Up) i=1    = ψp rp ∩ sp − min ψp r ∩ sp r∈rg(Rp) which completes the proof. 

Theorem 4.29 shows a connection between conditional ranks of the “last” subset Wp in the ordering and the potential representations of the joint NRF. It will enable us to compute the former when having access to the latter. The contribution of the former two theorems is the proof that it is possible to compute the conditional rank value of the “last” clique in the clique ordering from the potential represen- tation of the NRF. This also means that if we have access to a potential representation of an NRF, we can always compute the rank of its highest labeled clique in the corresponding clique tree.

Theorem 4.30 Let V be a finite set of variables and let κ be an NRF for the algebra AV. Let   Wi : 1 ≤ i ≤ p ; ψi : 1 ≤ i ≤ p be a potential representation of κ and suppose the ordering   Wi, W2,..., Wp having the RIP. Let all actual values to any variables be induced by the same arbi-  trary but fixed actual value v that instantiates V. This entails that for any actual value wi ∈ cd Wi the correspondingly induced actual values ri for the residuum and si for the separator it holds that wi = ri ∩ si. Let j be a non-negative value where j < p such that:

 Wj ⊇ Sp = Wp ∩ W1 ∪ W2 ∪ ... ∪ Wp−1 .(4.16)

0 Let the function ψi be defined for each i : 1 ≤ i ≤ p − 1 as follows:  ψ (wi) if 1 ≤ i ≤ p − 1 and i 6= j 0   i ψi wi :=    ψj wj + min ψp r ∩ sp if i = j.  r∈rg(Rp)

  0 Then W1,..., Wp−1 ; ψi : 1 ≤ i ≤ p − 1 is a potential representation of the marginal NRF  κ’ for the algebra A W1 ∪ W2 ∪ ... ∪ Wp−1 .

  Proof: We know that the ordering W1, W2,..., Wp has the RIP. Hence at least one j < p exists that fulfills (4.16). We begin with using (3.2) to derive a potential representation of the term on the left side:

 p   κ w1 ∩ w2 ∩ ... ∩ wp−1 = min K + ∑ ψi(r ∩ si) r∈rg(Rp) i=1

178 4.5 Phase 2 –Message Passing on the Clique Tree

 Due to the RIP it holds that Rp ∩ W1 ∪ W2 ∪ ... ∪ Wp−1 = ∅. Hence we are allowed to state the following:

p−1   = K + min ψp r ∩ sp + ∑ ψi(ri ∩ si) . r∈rg(Rp) i=1

0 Since ψi(wi) = ψi (wi) for each i : 1 ≤ i ≤ p − 1 with i 6= j we derive:

p−1 0 = K + ∑ ψi (ri ∩ si) i=1 p−1 0 = K + ∑ ψi (wi) . i=1

To complete the proof, it remains to be shown that

0     ψj wj = ψj wj + min ψp r ∩ sp r∈rg(Rp) is a function of only Wj.   Note that min ψp r ∩ sp is a function only of the elements in Sp. Remember that r∈rg(Rp) Sp ⊆ Wj due to (4.16). Since wj and sp are induced by the same actual value for V, it holds that wj ⊆ sp and therefore sp ∩ wj = wj. We define Tj := Wj \ Sp and are allowed to perform the following transformation:

      ψj wj + min ψp r ∩ sp = ψj tj ∩ sp + min ψp r ∩ sp .(4.17) r∈rg(Rp) r∈rg(Rp)

We can see that in (4.17), since wj ⊆ sp as well as wj ⊆ tj and furthermore since Rp ∩ W1 ∪  W2 ∪ ... ∪ Wp−1 = ∅ obviously both sides represent a function only of the elements in Wj . With this the proof is completed. 

As theorem 4.30 states, if the underlying ordering of cliques has the running intersection property, we have access to a potential representation of the marginal NRF κ’ over the subset  C1, C2,..., Cp−1 of the cliques.

Lemma 4.31 Let V be a finite set of variables and let κ be an NRF for the algebra AV. Let   Wi : 1 ≤ i ≤ p ; ψi : 1 ≤ i ≤ p be a potential representation of κ and suppose the ordering   Wi, W2,..., Wp having the RIP. Let all actual values to any variables be induced by the same arbi-  trary but fixed actual value v that instantiates V. This entails that for any actual value wi ∈ cd Wi the correspondingly induced actual values ri for the residuum and si for the separator it holds that wi = ri ∩ si. It then holds that:   κ rp | sp = κ rp | w1 ∩ ... ∩ wp−1 .

179 IV Belief Propagation in Ranking Networks

Proof: We begin with the obvious

   κ rp | w1 ∩ ... ∩ wp−1 = κ rp ∩ w1 ∩ ... ∩ wp−1 − κ w1 ∩ ... ∩ wp−1 .

Note now that V \ Rp = W1 ∪ ... ∪ Wp−1. Hence we can substitute by v:

  = κ v − κ w1 ∩ ... ∩ wp−1 .(4.18)

This can obviously be expressed by potential representations, whereby we express the actual values wi by the equivalent intersections of the actual values of their particular residua and separators:

p   p  = K + ∑ ψi(wi) − K + min ∑ ψi(ri ∩ si) . i=1 r∈rg(Rp) i=1

Now, since for 1 ≤ i ≤ p − 1 it holds due to the RIP that Wi ∩ Rp = ∅ we obtain:

p−1  = ψp wp + ∑ ψi(wi) i=1 p−1 !   − min ψp r ∩ sp + ∑ ψi(wi) r∈rg(Rp) i=1    = ψp wp − min ψp r ∩ sp r∈rg(Rp)

Applying theorem 4.29 eventually leads to:

 = κ rp | sp . 

Theorem 4.32 Let V be a finite set of variables and let κ be an NRF for the algebra AV. Let   Wi : 1 ≤ i ≤ p ; ψi : 1 ≤ i ≤ p be a potential representation of κ and suppose the ordering   Wi, W2,..., Wp having the RIP. Let all actual values to any variables be induced by the same arbi-  trary but fixed actual value v that instantiates V. This entails that for any actual value wi ∈ cd Wi the correspondingly induced actual values ri for the residuum and si for the separator it holds that wi = ri ∩ si. It then holds that: p    κ v = κ w1 + ∑ κ ri | si .(4.19) i=2

Proof: Considering (4.18) we can start with

   κ v = κ rp | w1 ∩ ... ∩ wp−1 + κ w1 ∩ ... ∩ wp−1 (4.20)

180 4.5 Phase 2 –Message Passing on the Clique Tree

Applying lemma 4.31 leads to:

  = κ rp | sp + κ w1 ∩ ... ∩ wp−1

Theorem 4.30 ensures that there exists a potential representation

p−1 0  0 0 κ rp−1 | sp−1 = K + ∑ ψi (wi) (4.21) i=1

0  of the marginal NRF κ for A W1 ∪ W2 ∪ ... ∪ Wp−1 relative to κ. We can hence apply lemma 4.31 to (4.21) and obtain

0  0  0  κ w1 ∩ ... ∩ wp−1 = κ rp−1 | sp−1 + κ w1 ∩ ... ∩ wp−2 .(4.22)

Now, since κ0 is a marginal NRF to the original joint NRF κ, both are equal on the actual values of the variables in the set W1 ∪ W2 ∪ ... ∪ Wp−1. We can hence transfer (4.22) to κ:

   κ w1 ∩ ... ∩ wp−1 = κ rp−1 | sp−1 + κ w1 ∩ ... ∩ wp−2 .(4.23)

These steps have to be repeated: we now apply theorem 4.30 on (4.23) and, subsequently, we apply theorem 4.31 to the result of that operation. We use the result to conclude the corresponding values of κ. The proof can thereafter easily be completed by applying theorem 4.30 and theorem 4.31 repeatedly in the same way as above until we eventually obtain

   κ w1 ∩ w2 = κ r2 | s2 + κ w1

Combining the results of all these steps leads to:

     κ v = κ rp−1 | sp−1 + κ rp−2 | sp−2 + ... + κ r2 | s2 + κ w1 .(4.24)

We have therefore shown that we can legally generate (4.24) from (4.20) by just substituting    the term κ w1 ∩ ... ∩ wi by the term κ ri | si + κ w1 ∩ ... ∩ wi−1 in (4.20) p − 2 times start- ing with i := p − 1 and finishing when i = 3. Term (4.24) is identical with (4.19) completing the proof. 

Theorem 4.32 allows a successive representation of the original NRF κ by conditional ranks on the cliques of the clique tree of the moralized and triangulated ranking network. It remains to be shown how a potential representation of a partially instantiated network is accessible. This is exactly the requirement that is relevant for networks updated with a certain evidence.

Theorem 4.33 Let V be a finite set of variables and let κ be an NRF for the algebra AV. Let  W1, W2,..., Wp ; ψ be a potential representation of κ. Let further be X ⊆ V a non-empty subset of V and let, accordingly, U := V \ X. Let x∗ be an arbitrary but fixed actual value for the compound

181 IV Belief Propagation in Ranking Networks

X. Let all actual values of subsets of U be induced by the same actual value u for the underlying compound U. Define now

Wˆ i := Wi \ X and ∗ ψˆi(wˆ i) := ψi(wi ∩ x ) and further κˆu := κu | x∗.   Consequently, Wˆ 1, Wˆ 2,..., Wˆ p ; ψˆi : 1 ≤ i ≤ p is a potential representation of the NRF κˆ.

  ∗ Proof: Let Kˆ := − min ψˆi(wˆ i) + κ x . It then holds 1≤i≤p

κu | x∗ = κu ∩ x∗ − κx∗.

Since u ∩ x∗ = v ∩ x∗ we obtain:

= κv ∩ x∗ − κx∗.

Substituting by the potential representation yields:

p  ∗  ∗ = K + ∑ ψi(wi ∩ x ) − κ x i=1 p ˆ ˆ = K + ∑ ψi(wˆ i) .  i=1

To understand this theorem intuitively, consider κ as the prior belief state and x∗ as the evidence. Then the combination of the NRFs κx∗ andκ ˆu characterizes the posterior belief state. This completes the formal basis for the LS-strategy and we can now develop the second update phase. After having pointed out how local computations in the clique tree are to be performed, we can now pass on to showing how the clique tree is initialized for iterated updates and how message passing for the actual updates can be performed.

4.5.2 Locally Available Information

Again, we return to the start situation, where a given causal list is already transferred ad- equately to an influence diagram. For the sake of simplicity we define that this influence diagram has already been moralized as supposed in the beginning of section 4.4. We have therefore a ranking network R := hV, E, κi as a starting point, without any asser- tions about its structural properties, except that it is acyclic. In particular, it may or may not be multiply connected.

182 4.5 Phase 2 –Message Passing on the Clique Tree

First, we derive a clique tree from this network as it was already described:

  TR ←− Decompose hV,Ei,Em

We are allowed to suppose that the residua Ri and the separators Si for all the cliques Ci with i : 1 ≤ i ≤ p as well as the potential functions ψi are known after Decompose has finished. This means, along with the clique tree, we obtain a potential representation of κ. The vertices of the clique tree are exactly the cliques of the moralized and triangulated graph of the original ranking network.

In the clique tree an edge exists between two vertices Cj and Ci if and only if Cj is a parent clique of Ci, which implies that Si ⊆ Cj. This is ensured by the parent definition function applied when Decompose is executed.

Additionally, each edge of the clique tree can be labeled with the separator Si.

At any single vertex Ci of the clique tree, all information is stored to enable updates as well as queries. Concretely:

1. Ci, the clique that is represented by the vertex

2. Ri, the residuum of Ci

3. Si, the separator of Ci

4. ψi, the potential function of Ci

The clique tree is the permanent belief base. All updates are performed on the clique tree instead of the original network. It is required to reconstruct the clique tree only in those cases where the network is modified. This means, variables are added or removed from the network or their dependency relationships change significantly.

4.5.3 Pre-Initializing the Permanent Belief Base

Having constructed the clique tree, we successively compute a representation of the joint NRF κv from the conditional ranks. The recommendation of (Neapolitan, 1990, p. 264) is to make a copy of the original clique tree, that remains the permanent belief base and perform the update on the copy. Now, when completed the copy, the values in the copy Ci, Ri, Si and ψi are identical to the original values,  but the actual values for the marginal joint NRFs κ ci are not yet determined. First, we assign each vertex a start value by procedure init as introduced on page 184. The start values are the prior ranks. This means, the set of instantiated variables X that represents the new evidence, is empty.

We start with the clique tree and the potential representation C1,..., Cp; ψ of the joint NRF κv for AV. From theorem 4.28 on page 176 it follows that

  κ ci | si = κ ri | si

183 IV Belief Propagation in Ranking Networks

and since Si ⊆ Ci we obtain immediately

   κ ci = κ ri | si + κ si .

 For computing the marginal NRFs κ ci it is therefore sufficient to compute both, the con-    ditional rank κ ri | si and the joint rank κ si for each i ∈ 1, . . . , p . This is an effective reduction of the computational requirements. The conditional ranks and the joint ranks of the separators will be computed in two separate steps: we first compute the conditional ranks of the cliques, passing the clique tree bottom-up. Then, in a second step, the joint ranks of the separators are computed while passing the clique tree a second time, but then top-down. After having completed the top-down propagation step, the clique tree will be updated.

input: clique tree T of a ranking network hV, E, κi, potential representation

C1,..., Cp; ψ of κ, evidential set X ⊂ V

1 foreach Ci ∈ V do 2 Cˆ i ←− Ci \ X; ˆ 3 ψi(ˆci) ←− ψi(ci ∩ x); 4 Ci ←− Cˆ i; ˆ 5 ψi(ci) ←− ψi(ˆci); 6 Si ←− Si \ X; 7 Ri ←− Ri \ X; Procedure init

4.5.4 Bottom-Up Propagation:Conditional Ranks of the Cliques

 In a first step, we compute the conditional ranks κ ri | si .

Since C1,..., Cp; ψ is a potential representation of κ, theorem 4.29 on page 177 allows us to set     κ rp | sp := ψp cp − min ψp r ∩ sp .(4.25) r∈rg(Rp)

We hence start the update on the “last” clique Cp since all information to compute the ini- tial values for the conditional rank of Cp is already locally available at Cp. Therefore, the conditional rank of this clique can be computed easily from the potential function of Cp. Now, theorem 4.30 on page 178 allows us to set

0     ψp−1 cp−1 := ψp−1 cp−1 + min ψp r ∩ sp (4.26) r∈rg(Rp)

0 0 to obtain the potential representation C1,..., Cp−1; ψ . Note that by theorem 4.30 it is ψi = 0  ψi for each i : 1 ≤ i < p − 1, thus to compute ψp−1 cp−1 it suffices to access a potential  representation of the marginal NRF κ’ for A C1 ∪ ... ∪ Cp−1 .

The computation of (4.25) can obviously be performed locally to vertex Cp. But to compute

184 4.5 Phase 2 –Message Passing on the Clique Tree

the value for (4.26) locally on vertex Cp−1 it is required to access the value

  min ψp r ∩ sp .(4.27) r∈rg(Rp)

Note that (4.27) is not known locally on Cp−1. Therefore, to enable local computation of (4.26) on Cp−1, value (4.27) has to be “sent” from Cp to Cp−1.

The third and final local update step on vertex Cp is to set

  ψp cp := κ rp | sp .(4.28)

Note that theorem 3.84 on page 130 enables us to do so. The motivation for doing so will be discussed in the next section, when it is easier to illustrate.

After this third step, we can apply theorem 4.29 on the remaining set of cliques C1,..., Cp−1  and obtain the conditional rank κ rp−1 | sp−1 .  It can easily be seen that we can compute all required conditional ranks κ ri | si by simply applying theorems 4.30 and 4.29 repeatedly while traversing the cliques in the reverse order, starting on Cp and ending on C1. The traversal can be imagined as a passing of messages from the child vertices of the clique tree to their parents. The process starts at the leave vertices, which send information to their parent vertices. After a parent vertex has received all the messages from its children, it adjusts its marginal rank value and its particular potential function ψi in accordance to the messages it has received from its children. Thereafter, it sends its own updated information to its parent.

This process ends at the root vertex C1. Since C1 is the “first” vertex in the clique tree ordering, C1 has no predecessor to whom a message could be sent. For the same reason the separator set Si is always equal to ∅. Therefore, the conditional rank on the left side of (4.25)  is just κ c1 meaning that all information needed locally on C1 is accessible after all messages from the children were received. After the bottom-up process is completed, it therefore holds:

   ψ1(c1) = κ r1 | s1 = κ r1 | ∅ = κ c1 .

Following the convention of (Pearl, 1988b), we use the symbol λx for update information a vertex X sends to its parent, thus we define

   λ ci := min ψi(r ∩ si) r∈rg(Ri) and can hence rewrite (4.26) as:

0    ψp−1 cp−1 := ψp−1 cp−1 + λ ci .(4.29)

Procedure send-lambda-msg on page 186 describes the generation and sending of a λ- message in pseudo-code.

In case Ri = ∅ the λ-message is just ψi(ci). (As a consequence of the RIP, the case Si = ∅ will

185 IV Belief Propagation in Ranking Networks

input: integer i    1 ←− ( ∩ ) λ ci minr∈rg(Ri) ψi r si ;  2 ψ (ci) ←− ψ (ci) − λ ci ; i  i 3 Send λ ci to pa(Ci) ; Procedure send-lambda-msg occur always and only on the root vertex that does not compose a λ-message.) Note that line 2 of send-lambda-msg is simply (4.28) with the right side substituted with

(4.25). Due to the underlying graph beeing a tree, pa (Ci) contains always exactly one vertex.

When a vertex Cj receives a message from its child Ci, it calls process-lambda-msg to incorporate the new information locally.

 input: λ ci , integer i    1 ψi cj ←− ψi cj + λ ci ; 2 if Cj has received all λ-messages then  3 if pa Cj = ∅ then   4 κ cj ←− ψi cj ; 5 send-pi-msg(j);

6 else 7 send-lambda-msg(j); Procedure process-lambda-msg

Note that line 1 of process-lambda-msg is just the implementation of theorem 4.30. After having respected the λ-values of each of its children, the parent vertex sends itself a λ-message to its parent. This part of the propagation is finished after the root vertex has received a λ-message from its children. As the if-block in lines 4 and 5 shows, the processing of λ-messages is different on the root vertex as on the other vertices. In line 5, the top-down part of the message passing is started, which will be discussed in the following section.

4.5.5 Top-Down Propagation:Joint Ranks of the Separators

We remember from section 4.5.4 that we need also the joint ranks of the separators Si to compute the joint ranks for the cliques. The joint ranks of the separators can be computed top-down, starting at the root vertex.

Theorem 4.28 allows us to derive for Ci:

   κ ci = κ ri | si + κ si which we can simply rewrite to

  κ ci = ψi(ci) + κ si (4.30)

 since we previously have set ψi(ci) := κ ri | si in the course of the bottom-up process.

186 4.5 Phase 2 –Message Passing on the Clique Tree

The motivation of redefining ψp as in (4.28) is therefore clear: we want to access a potential representation of the intermediate NRF for C1,..., Cp after the bottom-up propagation is com- pleted. We need this potential representation to complete the second part of the propagation, which is performed top-down.  Let now Ci be a child of Cj. Then, since Si ⊆ Cj, the term κ si can be computed from the  already available κ cj . (Note that Cj can be represented as a union of Si and the remaining set Tj := Cj \ Si and hence cj = tj ∩ si.)

   κ si = min κ t ∩ si .(4.31) t∈rg(Cj\Si)

Again, here we have the situation that the value required to complete the update is locally available at Cj but has to be evaluated at Ci. It therefore has to be send from Cj to Ci. This step can be imagined by each vertex sending messages down to its children, starting at the root vertex. We follow Pearl again and call those top-down messages π-messages.

Starting at the root vertex, a π-message for each child Ci ∈ ch (C1) is generated and sent. The message for a particular child Ci of Cj is, as already pointed out:

   πi cj := min κ t ∩ si . t∈rg(Cj\Si)

The sending of a π-message in pseudo-code is as follows:

input: integer j  1 foreach Ci ∈ ch Cj do    2 π c ←− min κ t ∩ s .; i j t∈rg(Cj\Si) i  3 Send πi cj to Ci ; Procedure send-pi-msg

 Note if Cj \ Si = ∅, the π-message is just κ cj .

When a vertex Ci receives a π-message from its parent Cj, it has direct access to all infor- mation required to incorporate the update. After it has received the π-message, procedure process-pi-msg is called.

 input: π ci , integer i   1 κ ci ←− ψi(ci) + πi cj ; 2 if ch (Ci) 6= ∅ then 3 send-pi-msg ; Procedure process-pi-msg

 Note that the assignment in line 1 of process-pi-msg is just (4.30) with κ si substituted by (4.31). After this second propagation the data structure has successfully incorporated the available evidence. It can be used to obtain concrete rank values.

187 IV Belief Propagation in Ranking Networks

4.5.6 Processing Update Information

We have seen how the propagation of information is performed among the clique tree. We remember that we perform this operation on a copy of the clique tree. When it is done the first time, it just computes a belief state initial to the system. Of course this state need not correspond to an initial state in an epistemic sense, but it is initial in the sense that the actual ranks are computed without any designated evidence that has to be respected. In this sense, the resulting state of the belief base is initial and the actual beliefs are prior beliefs. But how are actual updates processed? Obviously, an update is performed by the exact same operations as explained in the two previous sections. An epistemic update in the clique tree is a set of variables X ⊆ V instantiated with a set x of actual values.  Due to the fact that we set ψi(Ci) := κ ri | si for each clique Ci, theorem 4.32 ensures that

C1,..., Cp; ψ is a potential representation of κ. Theorem 4.33 on page 181 shows us how to derive a potential representation of κ with some variables instantiated. Since we know the values of the variables in X, we can practically remove X from the network, obtaining thereby a new vertex set U := V \ X, and then compute the potential representation of the marginal NRF for AU. Thus, for each i in 1 ≤ i ≤ p, we do

Cˆ i := Ci \ X (4.32) and further

ψˆi(cˆi) := ψi(ci ∩ x) .(4.33)

Note that it will always hold that Cˆ i ⊆ Ci and hence the structure of the clique tree will not be modified although the values stored at the vertices may change significantly in a particular application.  Now theorem 4.33 ensures that Cˆ 1,..., Cˆ p; ψˆ is a potential representation of κ u | x and this NRF represents the posterior belief state after the update. We have to determine the values of this NRF. This goal can be achieved by exactly the same computation steps as discussed in the previous two sections: propagation bottom-up yields the conditional ranks of the residua given the separators, and the subsequent propagation top-down yields the joint ranks of the separators therefore completing the update. We just have to initialize the update by the steps (4.32) and (4.33). Now consider X as the representation of evidence that becomes available to the subject. We then formally express the epistemic step of “recognizing” as in procedure init-update on page 189. Consequently, the complete update algorithm is described by algorithm 6 on page 189. To obtain a more intuitive understanding of the message passing phase, consult appendix B, where an example for message passing on an actual clique tree is presented.

4.5.7 Queries on the Clique Tree

Given that the clique tree represents the current state of belief. It is an important task to query the actual vectors of particular sets of variables.

188 4.6 Conclusion

input: clique tree T of a ranking network hV, E, κi, potential representation

C1,..., Cp; ψ of κ, evidential set X ⊂ V

1 foreach Ci ∈ V do 2 Cˆ i ←− Ci \ X ; ˆ 3 ψi(ˆci) ←− ψi(ci ∩ x) ; 4 Ci ←− Cˆ i ; ˆ 5 ψi(ci) ←− ψi(ˆci) ; 6 Si ←− Si \ X ; 7 Ri ←− Ri \ X ; Procedure init-update

input : A clique tree T := hV,Ei, a potential representation C1,..., Cp; ψ , an evidential set X ⊆ V

output: a potential representation C1,..., Cp; ψˆ  1 init-update T ; 2 foreach Ci such that ch (Ci) = ∅ do  3 send-lambda-msg i ;

Algorithm 6: Update, a method to incorporate new evidence

This task is simple since the actual rank for an instantiation of a variable X ∈ V can be computed from the marginal NRFs of precisely those cliques that contain X via:   κx = min κt ∩ x .(4.34) t∈rg(C\{X})  C∈ Cj : {X}∈Cj

This equation was already presented on page 145 as (4.2). It is a consequence of (3.2) on page 91 and can trivially be extended to the case where X is a compound. An actual query can be answered by performing a set difference operation and a minimum. This is quite convenient.

4.6 Conclusion

4.6.1 Achievements

This chapter described a mechanism for iterated belief change in the form of an update algo- rithm on multiply connected ranking networks. With the first step the network is compiled to a singly connected clique tree on which a second step performs a message-passing update. The clique tree is the data structure that represents the belief base. While the second step has to be performed on each evidence that becomes available, the first step is only required in case of changes to the belief base itself. The belief base could change when the system starts to observe new aspects – this is equiv- alent to incorporating new variables to the belief base – or by information about causal de- pendencies changing in the belief base. The latter can be imagined as achieving a deeper understanding of the observed domain. In both cases, the clique tree has to be recompiled

189 IV Belief Propagation in Ranking Networks since its structure has possibly changed. Section 4.4 described how the first step, the compilation of the belief base, is to be executed. For this phase we respected the former proposal of (Neapolitan, 1990) but improved it in the following points (as already pointed out on page 157):

1. We use a minimal triangulation of the input network (instead of just an MCS-generated arbitrary triangulation).

2. We use a single execution to compute the MEO and the corresponding Fill-in.

3. We construct the clique tree “inline” while computing the triangulation.

4. We also detect the cliques, including residua and separators “inline”, while triangulat- ing the input network. This enables us to also return a potential representation of the global NRF from the decomposition phase.

In fact all tasks belonging to the decomposition phase can be done in one single procedure, which eases the decomposition process significantly. The message passing technique proposed in section 4.5 describes how the new evidence is to be incorporated in the clique tree. Section 4.5.1 contains all relevant assertions to show that the update mechanism will work as proposed. Joining all the arguments provides us with a generalized update mechanism for ranking networks as they are described in chapter III. This update mechanism is able to update multi- ply connected networks as well as polytrees, which is the main achievement and result of this thesis. It also turned out that once committed to the use of formal devices when theorizing about belief revision, one will hardly be able to draw a strong distinction between the philosophical questions and the questions of engineering. The absence of a definitive distinction has become most clear in this chapter. We can show that the question for techniques of triangulating and decomposing graphs are well- distinguishable formal problems of graph-theory, clearly separated from the epistemic do- main that is discussed by philosophers. These problems are just interesting for the presented inquiry by the implications of an intermediate abstraction level: the graphical model we use for belief states. One could argue that the model itself be part of the epistemic domain since it represents a formalization of a belief state. The belief state models only represent the static aspects of a belief, they make no point on the transition between belief states. On the other hand, the triangulation is part of the formal description of this transition, hence one could argue that it is in no sense more distant from the epistemic domain since the model itself. It is hard to say where an acceptable distinction line between philosophically relevant formal models and mere applications of engineering techniques could be reasonably drawn. From a more “classical” point of view more committed to the idea of strict distinction between dis- ciplines, this thesis must seem highly interdisciplinary. In the author’s view, it just combines useful facts from different contexts to an argumentation concerning a relevant problem which is also discussed by philosophers.

190 4.7 Outlook:Learning Ranking Networks and Induction to the Unknown

4.6.2 Remarks on Aspects Not Discussed

Since for polytrees the decomposition phase can be left out, the update technique proposed in this thesis will behave like classical message passing in this case and the considerations about message passing techniques in networks are applicable. (Of course, this argument silently supposes that the message passing technique described in this chapter is transferred to polytrees. This is a quite simple task. Nonetheless, in its current presentation, the message passing technique is only suitable for trees.) It is quite obvious that the solution proposed works efficiently in general. We did not – as an engineer would do – engage in the analysis of runtime complexity or the comparison of the runtime behaviour between different classes of input networks. The aim of this thesis was just to show how the basis of an update algorithm for multiply connected ranking networks will look like. On this level of abstraction, no application specific knowledge could be used for improving efficiency, which would normally play a role in each domain-specific application of the algorithm. We already know about the asymptotic time behaviour of the triangulation technique including clique recognition. An analysis of the runtime-behaviour on the asymptotic level may therefore be of restricted informativeness; nonetheless, it could have been done for the sake of completeness. An interesting aspect would be to analyze how the efficiency of the decomposition phase is influenced by different classes of input networks. For instance, we did not engage in ana- lyzing if the decomposition algorithm shows an acceptably efficient capability to decompose networks with a large treewidth. Serious testing on this field would require a complete im- plementation and test sets whose properties are well-known. Both would exceed the frame of this thesis and is therefore postponed for future work. It would furthermore be interesting to compare the runtime behaviour of this algorithm on ranking networks with an equivalent implementation of Bayesian network. This would require comparable datasets with ranking data as well as relative frequency data or probabilities about the same domain. The design of the test is highly non-trivial and, for the same reasons as above, this would have exceeded the topic of this thesis.

4.7 Outlook:Learning Ranking Networks and Induction to the Unknown

We engaged in the question of how to perform iterated updates on a belief base and used ranking theory as the semantic framework for measuring the epistemic reliability of the actual values. This topic lies completely in the domain of belief revision. However, there are at least two other important questions which immediately come into mind when considering ranking networks. The first question is how the initial network is learned from observations. In chapter III, we described how to transform knowledge about causal relationships in a domain to a ranking network that can efficiently be updated. Indeed, starting at the point where the causal list is already known, the construction of the ranking network is quite trivial how it was argued in paragraph 3.6.2.

191 IV Belief Propagation in Ranking Networks

We just pretended that this situation is a perfectly normal precondition. In fact, it is not. Knowledge about causal relationships is the result of a learning process, and we did not elaborate on how to learn the network structure from the observed data. It would be a signif- icant extension to the considerations of this thesis how this learning process can be described algorithmically. We can distinguish between the two classical cases: either (1) the network structure is al- ready given and we have to compute the actual parameters for the variables or (2) the network structure is unknown and we only have the data. The task in situation (1) is known as “parameter learning” while the task in situation (2) is known as “structure learning”. We did not touch on the question of what ranks in the initial state are to be assigned, and, thus we did not consider parameter learning. One could also put the question the other way and ask for an applicable measurement of ranks, which is a more general version of the same question and nearer to belief revision. The philosopher would prefer to state the question the following way: how to perform abductive inference with ranking networks? Probabilities have a strong connection to observational data since they correspond with rel- ative frequencies. There is no similarly obvious relationship between ranks and observations. Of course, it seems to be a good idea to turn relative frequencies to ranks by just counting exceptional cases. But this is a research topic on its own and deserves further engagement. It would be interesting to bring the questions concerning parameter learning and abductive inference to a concrete formal level that would enable us to describe algorithms to compute measurements for ranks. The case of structure learning is quite more challenging. There are at least two “classical” al- gorithms for structure-learning networks: the K2-algorithm and the PC-algorithm. A possible starting point would be how those algorithms can be transferred to ranking semantics. Beyond the scope of learning networks from data, there is a second important field of questions: how can inductive inference to unknown facts be performed by ranking networks? This can be done exactly or approximatively. Exact techniques are variable elimination and clustering, to mention only two. Furthermore, there exists a manifold of approxima- tive techniques like rejection sampling, likelihood weighting, self importance, adaptive im- portance, Markov chains, and Monte-Carlo-techniques like Gibbs-sampling or Metropolis- sampling (and many, many others). The availability of such algorithms for the ranking case would extend the applicability of ranking networks from a mere belief revision tool to a tool for “real” inductive inference, which, in other words, would enable the subject to anticipate beliefs about the unknown. Those topics clearly exceed the frame of this analysis, which restricts itself to the scope of belief revision. Clearly, the analysis of ranking networks would be significantly enriched by research in the fields of learning as well as induction to the unknown.

192 Appendix A

AComputed Example for Decomposition

Let the DAG in figureA. 1 be the graphical part of some ranking network. In this appendix, we will demonstrate how to decompose this network to a clique tree by an application of the Decompose method. As a side effect, Decompose will provide us with all relevant terms for the subsequent message passing phase.

Figure A.1: The graphical part of a ranking network.

The preliminary step is the moralization, which we will consider to be already performed when starting the decomposition. The graph in figureA. 2 is the moral graph of the network in figureA. 1. The set of visible edges in figureA. 2 is the set referenced as E ∪ Em in the Decompose method. Although, we can easily read off which edges are contained in Em by comparing figureA. 2 to figureA. 1, this information is not of interest to comprehend the example. The graph in figureA. 2 will be the input to the Decompose method.

Figure A.2: The moral graph of the original network.

The following figures in the remainder of this appendix show the state of some relevant variables whenever reaching line 37 in Decompose. The α-number of a vertex is written next

193 Appendix AAComputed Example for Decomposition to the vertex. If a vertex has non-zero weight, the weight is written in parentheses next to the vertex. (We leave out the mark “(0)” in the figures since it is obvious that unmarked vertices are of zero weight).

Starting with i = n = 6, each vertex has weight 0. An arbitrary unnumbered vertex is choosen in line 7. The weight of its adjacent vertices is increased by 1. No lower weight paths exist and no edges are added. The vertex receives number 6 as its label in line 17 of the algorithm.

(1) CRDcurr = 0 r = 1  6 Cr = v6  Rr = v6 (1) (1) Sr = ∅

Figure A.3: State when reaching line 37 in loop-run i = 6.

Note that during the first run it holds that CRDcurr = CRDprev = 0 since the set of already numbered vertices is empty. Therefore, the comparison in line 19 evaluates to TRUE and the subsequent if-block is entered. The clique count index r is set to 1 and a new clique, C1, is started to be added vertices to. Obviously, the if-block starting in line 25 is not entered since the comparison in this line explicitly filters out the case of the first clique which cannot have any predecessor clique. For the same reason, the separator remains empty and the vertex is recognized as a vertex of the residuum of C1.

The second run of the main for-loop starts with i = 5 and an arbitray vertex of maximum weight is chosen to receive number 5. Note that we are free to choose one of the three vertices with weight 1 as can be recognized from figureA. 3. May the vertex in the lower right “corner” of the graph the one actually selected to become v5. The weights of the vertices adjacent to the vertex which is chosen to become v5 are increased by 1. The vertex to become v5 has weight 1. It can easily be recognized from figureA. 3 that the designated v5 is connected to the two other vertices of weight 1 by lower weight paths. Except for their particular start and end vertices, those paths contain only vertices of zero weight. One of those vertices is already adjacent to the vertex selected to become v5, the other causes the insertion of an edge to the fill-in Fα and its weight is increased by 1. The new edge is dashed in figureA. 4. When reaching line 17, the selected vertex receives 5 as its α-number.

Note that line 18 sets CRDcurr := 1 since v5 is monotonely adjacent to one vertex, which is precisely v6. Therefore, the comparison in line 19 is FALSE since 1 = CRDcurr > CRDprev = 0 and hence clique C1 stays current. When the loop-run is completed, v5 is added to the current clique.

When loop-run i = 4 starts, there are two vertices with the maximum weight of 2 and hence

194 (2) CRDcurr = 1 r = 1  6 Cr = v6, v5  Rr = v6, v5 (2) (1) 5 Sr = ∅

Figure A.4: State when reaching line 37 in loop-run i = 5. one of them is chosen to receive number 4. Since a lower weight path exists to another vertex, a new edge is inserted, which makes the vertex on the remote end of the lower weight path adjacent to v4. The weight of this vertex and of the vertices adjacent to the currently selected vertex are increased by 1.

It is obvious that this loop-run, as the previous one, keeps clique C1 current.

(3) CRDcurr = 2 r = 1  (1) 6 Cr = v6, v5, v4  Rr = v , v , v 4 6 5 4 (2) 5 Sr = ∅

Figure A.5: State when reaching line 37 in loop-run i = 4.

The next loop-run is i = 3. As figureA. 5 shows, there is only one candidate to become v3 since no other vertex has maximum weight. Since there is a lower weight path to a remote vertex, a new edge is added to Fα. Note the important fact that the computation in line 18 respects the edges in Fα when computing the cardinality of the set of vertices monotonely adjacent to v3. Other than in all previous loop-runs, we chose a vertex which is incident to edges in Fα, making it adjacent   to already numbered vertices. Those edges are precisely v4, v3 and v5, v3 which can be easily recognized from figureA. 5. (Remember that we used the fact that those edges contribute to the CRDcurr in the proof of lemma 4.17 on page 159.)

Since CRDcurr > CRDprev, the if-block in line 19 is not entered and clique C1 stays current as in the two previous loop-runs.

3 CRDcurr = 3 r = 1  (2) 6 Cr = v6, v5, v4, v3  Rr = v , v , v , v 4 6 5 4 3 (3) 5 Sr = ∅

Figure A.6: State when reaching line 37 in loop-run i = 3.

The next loop-run is i = 2. As in the previous loop-run there is only one vertex of maximum

195 Appendix AAComputed Example for Decomposition weight which can legally be chosen to receive number 2.

CRDcurr = 3 3 r = 2 s = 1  (2) 6 Cr = v5, v4, v3, v2  Rr = v2 4  2 5 Sr = v5, v4, v3

Figure A.7: State when reaching line 37 in loop-run i = 2.

We note that v2 is monotonely adjacent to three other vertices already numbered. This means CRDcurr = CRDprev. Therefore, the if-block starting in line 19 is entered. Within this block, the computations concerning the current clique are completed by com- puting ψ1. Since by now we know all vertices of clique C1, we check for each vertex v ∈ C1 if C1 is the clique that also contains the parents of v. This leads to the following result:

  clq v6 := 1 because pa (v6) = v3 ⊂ C1   clq v5 := 1 because pa (v6) = v4, v6 ⊂ C1

For the other vertices in C1, as there are v4 and v3, their parent sets are not subsets of C1 and thus they do not yet receive a local definition of function clq.  This is perfectly acceptable since for defining ψi, it is sufficient to have clq defined for those vertices v for which clqv = i. All those vertices will be guaranteed to be elements of   Ci since it is impossible to have clq v = i if v 6∈ Ci. (Vertices v4 and v3 for which clq remains currently undefined will receive a definition of clq in further loop-runs – it is obvious by now that they are also elements of other cliques.)

After the definition of ψ1, the treatment of clique C1 is completed.

Thereafter, clique index r is increased and the treatment of clique C2 starts. We know that the currently selected vertex v2 is part of C2 and that there is no vertex with a smaller label  part of C2 currently. We therefore start collecting vertices for C2 by adding madj v2 to C2   (since the candidate set v2 ∪ madj v2 will clearly be contained in C2).

The transition from C1 to C2 also causes the adding of the edge hC1, C2i to the edge set of the clique tree. Additionally, the vertices already part of C2 are assigned correctly to either S2 or R2.

After leaving the if-block, vertex v2 is added to C2 and, additionally, recognized as part of the residuum R2.

The last loop-run, i = 1, starts with the last vertex receiving its number. Since CRDcurr =

3 = CRDprev, a new clique C3 is started. We recognize that

  clq v2 := 2 because pa (v2) = v5 ⊂ C2

196 and the potential function ψ2 is defined accordingly. The edge hC2, C3i is added to the clique tree. All vertices which contributed to CRDcurr were added to C3 and assigned to either S3 or R3.

CRDcurr = 3 3 r = 3 s = 2  1 6 Cr = v4, v3, v2, v1  Rr = v1 4  2 5 Sr = v4, v3, v2

Figure A.8: State when reaching line 37 in loop-run i = 1.

After leaving the if-block, vertex v1 is added to C3. The run now leaves the main-loop of the algorithm and goes beyond line 37. The subsequent lines are required to complete the treatment of the last clique by computing its potential function. Analyzing the last clique we recognize:

  clq v4 := 3 because pa (v4) = v1, v2 ⊂ C3

  clq v3 := 3 because pa (v3) = v1 ⊂ C3

 clq v1 := 3 because pa (v1) = ∅ ⊂ C3.

Looking at the result, we recognize to have computed the following cliques including sepa- rators and residua.

  C1 = v6, v5, v4, v3 R1 = v6, v5, v4, v3

S1 = ∅

  C2 = v5, v4, v3, v2 R2 = v2  S2 = v5, v4, v3

  C3 = v4, v3, v2, v1 R3 = v1  S3 = v4, v3, v2

We also have the corresponding potential functions at hand.

   ψ1 c1 = κ vv6 | vv3 + κ vv5 | vv4 ∩ vv6

  ψ2 c2 = κ vv2 | vv5

    ψ3 c3 = κ vv4 | vv1 ∩ vv2 + κ vv3 | vv1 + κ vv1

197 Appendix AAComputed Example for Decomposition

The decomposition is completed thereafter and the clique tree as shown in figureA. 9 is re- turned.

C1

C2

C3

Figure A.9: The clique tree computed from the input network.

198 Appendix B

AComputed Example for Updating

2.1 The Ranking Network

It seems convenient to refer to a well-known example so we choose the example introduced by (Cooper, 1984). Since it is minimal and contains all relevant structures it was often used in literature. It was also referred to by (Pearl, 1988b, p. 196) to illustrate the difficulties with simple message passing on DAGs that are not polytrees. Pearl borrowed it from (Spiegelhalter, 1986). Consider the following set V := A, B, C, D, E of boolean variables:

A : metastatic cancer B : increased calcium C : brain cancer D : coma E : headache

For the sake of simplicity we consider each of the variables as a boolean variable, which means each variable X can take two actual values, either x˙ = TRUE or x = FALSE and, consequently, x represents one of x˙, x . Note that a possible value v of V is without loss of generality a vector ha, b, c, d, ei.

A

B C

D E

Figure B.1: The cancer example network

199 Appendix BAComputed Example for Updating

Hence the joint rank of all variables is:

κv = κa + κb | a + κc | a + κd | b ∩ c + κe | c.

Therefore, the value of κv is completely determined by the following terms:

κa˙ = 17 κb˙ | a˙ = 5 κb˙ | a = 17 κc˙ | a˙ = 17 κc˙ | a = 36 κd˙ | b˙ ∩ c˙ = 5 κd˙ | b˙ ∩ c = 0 κd˙ | b ∩ c˙ = 7 κd˙ | b ∩ c = 36 κe˙ | c˙ = 5 κe˙ | c = 8

Note that instead of having to compute the values of 25 = 32 terms to determine the joint NRF κv we only need to compute 11 rank values. This is a general advantage of respecting the causal relationships. Additionally, we will later need the negated values. When considering probabilities, they are derivable from the given values by subtracting the known probability from 1. When considering ranks, this is not generally possible. Since all values except one are greater than 0, we know that the negations of all terms are 0 except the actual value of κd | b˙ ∩ c which we cannot derive from the values already given. Thus, suppose that κd | b˙ ∩ c = 3.

2.2 Initialization Phase

We now need to moralize and triangulate the graph. Moralization entails to add an edge that connects vertices B and C since they are both parents of vertex D and unconnected in the original graph.

A

B C

D E

Note that the moralization operation also “removes” the direction of the edges. Having constructed the moral graph of the original network we recognize that the moral graph is already chordal, which can easily be seen. We do not need to perform a separate triangulation step. Note that this is of course just a convenient coincidence related to the actual example. It is in no way guaranteed that moralization yields a chordal graph.

200 2.2 Initialization Phase

An example of how Decompose works was already presented in appendixA, so we will skip most of the actual computation and just show the important intermediate results.

Let the following ordering α be established during the course of Decompose:

4

5 3

2 1

Presupposing α to be chosen as above, Decompose will have ordered the cliques as follows:

    C1 = v4, v5, v3 = A, B, C R1 = v4, v5, v3 = A, B, C

S1 = ∅     C2 = v5, v3, v2 = B, C, D R2 = v2 = D   S2 = v5, v3 = B, C     C3 = v3, v1 = C, E R3 = v1 = E   S3 = v3 = C

When the parent finding mechanism of Decompose has to determine which clique will be the parent clique of the clique whose treatment is currently to be completed, it uses function  clqidx on the vertex with the lowest α-number in the current clique. For both cliques, C2 and  C3, this vertex is v3 and since clqidx v3 = 1, Decompose will make both cliques children of C1. We therefore obtain the following clique tree:

C1

A, B, C

  S2 = B, C S3 = C

C2 C3

B, C, D C, E

201 Appendix BAComputed Example for Updating

Analyzing the original ranking network, we recognize the following parental relationships:

pa (A) = ∅ pa (B) = A pa (C) = A pa (D) = B, C pa (E) = C

An application of Decompose derives a potential representation for κv by defining function clq as follows.

clqA = 1 clqB = 1 clqC = 1 clqD = 2 clqE = 3

The definition of the functions ψi then yields:

   ψ1(c1) = ψ1(a ∩ b ∩ c) = κ a + κ b | a + κ c | a  ψ2(c2) = ψ2(b ∩ c ∩ d) = κ d | b ∩ c  ψ3(c3) = ψ3(c ∩ e) = κ e | c

Note that the above representation of the joint rank of V by the potential functions of the cliques by simple substitution:

p  κ v = ∑ ψi(ci) i=1

= ψ1(a ∩ b ∩ c) + ψ2(b ∩ c ∩ d) + ψ3(c ∩ e) = κa + κb | a + κc | a + κd | b ∩ c + κe | c

  Thus, C1, C2, C3 ; ψ is a potential representation of κ v in accordance to definition 3.63 on page 110. Let the following state be the starting point:

A = a˙ B = b˙ C = c D = d E = e˙

We will now show how the belief base is initialized with this state.

202 2.2 Initialization Phase

2.2.1 Going Bottom-Up:Computing the Conditional Ranks

We demonstrate the computations triggered by a call of procedure send-lambda-msg on page 186 for i = p = 3, which means we start on clique Cp = C3.

It was argued on page 184 that theorem 4.29 can be applied to Cp to compute (4.25), the first of the conditional ranks. This is done in line 2 of procedure send-lambda-msg and we obtain:

   κ r3 | s3 = ψ3(c3) − min ψ3(r ∩ s3) r∈rg(R3)   = κe | c − min κe0 | c e0∈rg(E) n o = κe | c − min κe˙ | c˙, κe | c˙

Due to the properties of a NRF, either κe | c = 0 or κe | c = 0 and in either case the minimum is 0.

= κe | c

We respect the actual values for the variables and obtain:

= κe˙ | c = 8

   := ( ∩ ) After having completed this computation λ c3 minr∈rg(R3) ψ3 r s3 is send as a bottom-up message to all parents of clique C3, which is just one clique, namely C1.   We complete the local computation on C3 by setting ψ3(c3) := κ r3 | s3 = κ e | c which is legal by theorem 3.84 on page 130 (and does not imply any factual change for the particular case).  When vertex C2 receives the message λ c3 , procedure process-lambda-msg (cf. page D (1)E 186) is called which computes the marginal potential representation C1, C2 ; ψ from  the potential representation C1, C2, C3 ; ψ by using theorem 4.30 (cf. page 178):

(1)   ψ2 (c2) = ψ2(c2) + min ψ3(r ∩ s3) r∈rg(R3)  0  = ψ2(b ∩ c ∩ d) + min κ e | c e0∈rg(E)   = κd | b ∩ c + min κe0 | c e0∈rg(E) n o = κd | b ∩ c + min κe˙ | c, κe | c

= κd | b ∩ c

203 Appendix BAComputed Example for Updating

We respect the actual values for the variables and obtain:

= κd | b˙ ∩ c = 3.

(1) (1) Obviously it happens that also ψ2 (c2) = ψ2(c2), so ψ2 and ψ2 are identical by coincidence. (1) (1) Note that ψ1 = ψ1 and ψ3 = ψ3 in accordance to theorem 4.30. (1) D (1)E Having derived the potential functions ψi , we can access C1, C2 ; ψ which is a  potential representation of the marginal NRF on C1 ∪ C2 = A, B, C, D .   This enables us to compute the conditional rank κ rp−1 | sp−1 = κ r2 | s2 by a new call of procedure send-lambda-msg on C2 as follows:

 (1)  (1)  κ r2 | s2 = ψ2 (c2) − min ψ2 (r ∩ s2) r∈rg(R2) (1)  (1) 0 = ψ2 (b ∩ c ∩ d) − min ψ2 b ∩ c ∩ d d0∈rg(D)   = κd | b ∩ c − min κd0 | b ∩ c d0∈rg(D) n o = κd | b ∩ c − min κd˙ | b˙ ∩ c, κd | b˙ ∩ c

= κd | b ∩ c

We respect the actual values for the variables and obtain:

= κd | b˙ ∩ c = 3.

 ( )   := 1 ( ∩ ) Message λ c2 minr∈rg(R2) ψ2 r s2 is send to C1, the only parent vertex of C2, and  thereafter the local computation on C2 is completed by setting ψ2(c2) := κ r2 | s2 = κ d | b ∩ c = 3.

D (2)E When C1 receives the message from C2, the potential representation C1 ; ψ of the marginal NRF on C1 is computed by a call of procedure process-lambda-msg. (2) Starting with an application of theorem 4.30 the value ψ1 (c1) is computed as follows:

(2) (1)  (1)  ψ1 (c1) = ψ1 (c1) + min ψ2 (r ∩ s2) r∈rg(R2) (1)  (1)  = ψ1 (a ∩ b ∩ c) + min ψ2 (b ∩ c ∩ d) d0∈rg(D)   = κa + κb | a + κc | a + min κd0 | b ∩ c d0∈rg(D) n o = κa + κb | a + κc | a + min κd˙ | b˙ ∩ c, κd | b˙ ∩ c

= κa + κb | a + κc | a

204 2.2 Initialization Phase

We respect the actual values for the variables and obtain:

= κa˙ + κb˙ | a˙ + κc | a˙ = 17 + 5 + 0 = 22

(2) (1)   Thus, we notice: ψ1 (c1) = ψ1 (c1) = ψ1(c1) = κ r1 | ∅ = κ c1 .

After this, it is recognized that C1 has received all λ-messages from its children.  (2) It is therefore set κ c1 := ψ1 (c1) (with no factual change) and the bottom-up step of the message passing process is completed.

The result so far is the joint rank for C1:

 κ c1 = 22.

2.2.2 Going Top-Down:Joint Ranks of the Cliques

 (2)   Since it is already known that κ c1 = ψ1 (c1) = 22, the ranks κ c2 and κ c3 remain to be computed. Remembering (4.30) on page 186 we recognize that we need the ranks of the separators to complete the computing of the posterior NRF. Thus, we start with:

  κ ci := ψi(ci) + κ si .

Once having the ranks of the separators, we can compute the joint ranks of the cliques in accordance to the argumentation in paragraph 4.5.5.

After vertex C1 has processed all its λ-messages it starts the top-down part of the message passing since the rank of the separators to its children can be computed locally on C1.  The first joint rank to be computed is κ c2 , which can be derived as follows

 (1)  κ c2 := ψ2 (c2) + κ s2 .  To perform this, κ s2 has to be locally accessible on C2.

The message to be passed top-down from C1 to C2 is therefore:

  π2 c1 := κ s2   = min κ t ∩ s2 t∈rg(C1\S2)   = min κa0 ∩ b ∩ c a0∈rg(A)   = min κa0 + κb | a0 + κb | c a0∈rg(A)

205 Appendix BAComputed Example for Updating

We respect the actual values b and c for the variables B and C and resolve the iteration over the actual values for a:

n o = min κa˙ + κb˙ | a˙ + κc | a˙, κa + κb˙ | a + κc | a | {z } | {z } 17+5+0=22 0+17+0=17 = 17

On C2 the message is locally processed by calling procedure process-pi-msg, which provides the posterior rank of c2 as

 (1)  κ c2 := ψ2 (c2) + π2 c1 = 3 + 17 = 20.  This computation is performed analogously for κ c3 . The message to be passed top-down from C1 to C3 is:

  π3 c1 := κ s3   = min κ t ∩ s3 t∈rg(C1\S3)   = min κa0 ∩ b0 ∩ c a0∈rg(A) b0∈rg(B) a0∩b0∈rg(A∪B) n = min κa˙ + κb˙ | a˙ + κc | a˙, κa + κb | a + κc | a, | {z } | {z } 17+5+0=22 0+0+0 o κa˙ + κb | a˙ + κc | a˙, κa + κb˙ | a + κc | a˙ | {z } | {z } 17+0+0 0+17+0 = 0

and a call of procedure process-pi-msg locally on C3 yields:

 (1)   κ c3 := ψ3 (c3) + π3 c1 = κ e | c + 0 = 8.

This completes the initialization of the permanent belief base. The initial state is:

 κ c1 = 22  κ c2 = 20  κ c3 = 8 κv = 22 + 3 + 8 = 33.

2.3 Update By New Evidence

We consider now what happens when an actual update occurs. This is the case when one or more of the variables in V are instantiated.

206 2.3 Update By New Evidence

(1) (2) i Ci Ri Si ci ψi(ci) ψi (ci) ψi (ci) 1 {A, B, C}{A, B, C} ∅ a˙b˙ c˙ 39 39 39 a˙b˙ c 22 22 22 a˙bc˙ 34 34 34 a˙bc 17 17 17 ab˙ c˙ 53 53 53 ab˙ c 17 17 17 abc˙ 36 36 36 abc 0 0 0 2 {B, C, D}{D}{B, C} b˙ c˙d˙ 5 5 – b˙ c˙d 0 0 – b˙ cd˙ 0 0 – b˙ cd 3 3 – bc˙d˙ 7 7 – bc˙d 0 0 – bcd˙ 36 36 – bcd 0 0 – 3 {C, E}{E}{C} c˙e˙ 5 – – c˙e 0 – – ce˙ 8 – – ce 0 – –

Figure B.2: Numerical values for different potential functions

Be now the variable D concretely instantiated with some value d then we have an evidence that requires an update:

U := D and u := d˙.

The task is to compute the posterior NRFκ ˆ which reflects the update. It is very obvious that this will be done via computing a potential representation ofκ ˆ. The process is formally equal to the initialization we have computed previously.

κˆa ∩ b ∩ c ∩ d ∩ e := κa ∩ b ∩ c ∩ e | d˙.

The prior NRF κ is given by its potential representation on the cliques

 C1 = A, B, C  C2 = B, C, D  C3 = C, E

207 Appendix BAComputed Example for Updating and the potential functions

   ψ1(a ∩ b ∩ c) = κ a + κ b | a + κ c | a  ψ2(b ∩ c ∩ d) = κ d | b ∩ c  ψ3(c ∩ e) = κ e | c .

We have to run procedure init as defined on page 184 to calibrate cliques, residua and separators to the new potential representation. Note that this is the only difference to the initialization phase in the previous section. In the remainder of this section, we will only demonstrate how procedure init modifies the start situation of the update.  The new sets Cˆ i := Ci \ D are:

  Cˆ 1 := C1 \ D = A, B, C   Cˆ 2 := C2 \ D = B, C   Cˆ 3 := C3 \ D = C, E .

ˆ The potential functions ψ are transformed to ψ := ψD=d˙. Since D is not contained in C1 and C3 the transformation does not affect the associated potential functions:

   ψˆ1(a ∩ b ∩ c) = κ a + κ b | a + κ c | a  ψˆ3(c ∩ e) = κ e | c

Since C2 contains D, the associated potential function ψ2(c2) is modified to:

 ψˆ2(b ∩ c ∩ d) = κ d˙ | b ∩ c .

 Then Cˆ 1, Cˆ 2, Cˆ 3 ; ψˆ is a potential representation ofκ ˆ. We know the residua and the separators:

  Rˆ 1 := R1 \ D = A, B, C  Rˆ 2 := R2 \ D = ∅   Rˆ 3 := R3 \ D = E ˆ  S1 := S1 \ D = ∅   Sˆ 2 := S2 \ D = B, C   Sˆ 3 := S3 \ D = C

 Now the actual values of the conditional ranksκ ˆ Rˆ i | Sˆ i can be computed by recursively applying theorems 4.29 and 4.30 as previously presented in section 2.2.1. After having com-  pleted this computation, the ranks of the separatorsκ ˆ Sˆ i can be computed. From this point the joint ranks of the cliques are easily computable as it was already shown before.

208 Acknowledgements

It is a great pleasure to thank everyone who supported me in writing my dissertation success- fully. I am sincerely and heartily grateful to my advisor Professor Dr. Wolfgang Spohn, for all the support and guidance he has shown me throughout my dissertation writing. He initially confronted me with the fascinating matter of ranking theory. (My first encounter with ranking theory dates back to the late Nineties when I was a student of philosophy.) Consequently, throughout the time this thesis was written, he always took the time to discuss my thoughts and proposals with me. On the naiveness of my questions he reacted with patience and on my ideas with interest and support. I drew considerable and invaluable benefit from our discussions and tried my best to reflect this in my thesis. Furthermore, I owe sincere and earnest thankfulness to my reviewer PD Dr. Sven Kosub from the Department of Computer and Information Science. He helped me to discover and remove severe lacks of conceptual clarity in early versions of the graph-related parts of the thesis and supported my efforts to improve the quality of all the parts related to data structures and algorithms. He offered helpful guidance and invaluable inputs to me at all times and I drew important benefit from his proposals. I am truly indebted to him for his support. I am sure it would not have been possible to successfully write my dissertation without the help of both the aforementioned persons. I would further like to show my gratitude to Anna Dowden-Williams who helped me care- fully to improve this English text to an adequate level of language. I also owe the most sincere thanks to my dear colleague Wiebke Knop from the library of the University of Constance, who supported me in double-checking all references in my bibliography. I also owe thanks to Annegret Brandau for test-reading chapterII. I am obliged to many of the librarians of the University of Konstanz who supported me always friendly and unhesitantly in the search for relevant literature, which was an intricate task on some occasions.

I am truly indebted and thankful to my wife Marina for all the support she gave me during the years this thesis was written. She supported me always with her patience, her under- standing and her willingness to accept that writing this text consumed most of my time. This dissertation would not have been possible without her contribution.

—Stefan Hohenadel Konstanz, November 9, 2013

209 210 Index of Definitions

1.2 Propositional Algebra...... 30 1.3 Proposition...... 30 1.4 Sigma-Algebra...... 31 1.5 Complete Algebra...... 31 1.7 Atom...... 32 1.8 Atomic Algebra...... 32 1.12 Belief Set...... 40 1.14 Complete Belief Set...... 41 1.15 Core of a Belief Set...... 41 2.1 Ranking Function on Possibilities...... 48 2.2 Rank of w ...... 49 2.3 Complete Negative Ranking Function...... 49 2.4 Negative Rank of A ...... 49 2.6 Belief Set of κ ...... 50 2.7 Core of κ ...... 50 2.8 Regularity of κ ...... 51 2.9 Naturalness of κ ...... 51 2.11 Negative Ranking Function II...... 53 2.14 Two-sided Ranking Function...... 56 2.15 Core of τ ...... 56 2.18 Conditional Negative Rank of w ...... 57 2.19 Conditional Negative Rank of B ...... 58 2.24 Conditional Negative Ranking Function...... 61 2.25 Complete Conditional Negative Ranking Function...... 62 2.26 Conditional Two-Sided Rank...... 62 2.29 Positive Ranking Function...... 65 2.30 Positive Rank of A ...... 65 2.31 Complementarity of β and κ ...... 66 2.32 Core of β ...... 66 2.34 Conditional Positive Rank of B ...... 67 2.38 Plain Conditionalization of κ by A ...... 68 2.40 Simple Indirect Spohn-Conditionalization of $ ...... 70 2.41 Generalized Indirect Spohn-Conditionalization of $ ...... 71

211 Index of Definitions

2.42 Spohn-Conditionalization of κ ...... 72 2.43 Spohn-Revision of $ ...... 73 2.44 Spohn-Revision of κ ...... 73 2.45 Transition Function for Epistemic States...... 73 2.46 Generalized Indirect Shenoy-Conditionalization of $ ...... 73 2.47 Shenoy-Conditionalization of κ ...... 74 2.48 Conditional Independence Among Propositions in κ ...... 75 2.53 Unconditional Independence Among Propositions...... 77 3.1 Measurability...... 87 3.2 A-measurability of κ ...... 87 3.3 Variable...... 87 3.8 Minimal Algebra AV over a Set of Variables V ...... 90 3.9 Marginal Negative Ranking Function...... 90 3.11 Graph...... 91 3.12 Directed Graph...... 92 3.13 Undirected Graph of a Directed Graph...... 92 3.14 Complete and Empty Graphs...... 92 3.15 Subgraph and Induced Subgraph...... 93 3.16 Complete Subset...... 93 3.17 Clique...... 93 3.18 Walk...... 93 3.19 Path...... 93 3.20 Directed and Undirected Paths...... 93 3.21 Closed Path, Cycle, Polygon...... 94 3.22 Directed Acyclic Graph...... 94 3.23 Incidence...... 94 3.24 Adjacency, Neighborhood of a Vertex...... 94 3.25 Neighborhood of a Vertex Set...... 94 3.26 Closure of a Vertex or a Vertex Set...... 95 3.27 Relationships between Vertices...... 95 3.28 Ancestral Set...... 95 3.29 Relationships between Vertex Sets...... 95 3.30 Component...... 95 3.31 Connected Graph...... 95 3.32 Singly Connected Graph, Polytree...... 96 3.33 Tree...... 96 3.34 Multiply Connected Graphs...... 96 3.35 Moral Graph of a DAG...... 96 3.36 Chord...... 97 3.37 Chordal Graph...... 97 3.38 Triangulated Graph, Triangulation of a Graph, Triangulation...... 97 3.39 Running Intersection Property...... 98 3.40 Clique Tree...... 98

212 Index of Definitions

3.41 Decomposability, Decomposition, Decomposable Graph...... 99 3.43 Hypergraph, Hyperedge...... 99 3.44 Hypertree...... 99 3.45 Conditional Independence Among Variables in κ ...... 100 3.46 Semi-Graphoid...... 101 3.47 Graphoid...... 102 3.49 Unconditional Independence Among Variables in κ ...... 102 3.51 Undirected Graphical Separation...... 104 3.52 D-map...... 104 3.53 I-map, Markov field, Global Markov property...... 104 3.54 Minimal I-map, Markov graph...... 105 3.55 Perfect Map...... 105 3.57 Markov blanket, Markov boundary...... 106 3.63 Potential Representation, Potential Function...... 110 3.64 Residua and Separators...... 110 3.70 Consonance of G with θ ...... 121

3.71 Predecessors of Vi in θ ...... 121 3.72 Causal List...... 121 3.74 Ranking Network, Local Directed Markov property...... 123 3.76 Global Directed Markov Property...... 125 3.77 Pairwise Directed Markov Property...... 125 3.88 Hypertree Construction Ordering...... 134 3.89 Hierarchical Causal List...... 134 4.2 Monotonely Adjacent Vertices of a Vertex...... 149 4.3 Deficiency of a Vertex...... 150 4.4 Fill-in...... 150 4.5 Fill-in-Graph...... 151 4.9 Perfect Elimination Ordering...... 152 4.12 Perfect Ordering...... 152 4.15 Maximum Cardinality Search...... 156 4.24 Weigthed Clique Intersection Graph...... 167 4.25 Maximum Weight Spanning Tree...... 169

213 214 Index of Symbols

Symbol Description Reference Page

chapterI u, v, w, . . . Possibilities (in a space W of possibilities) in text 29 A Propositional algebra (over a space W of Def. 1.2 30 possibilities) A, B, C, . . . proposition A ∈ A Def. 1.3 30 B Belief set B ⊆ A of an algebra A Def. 1.12 40 coreB Core of a belief set B Def. 1.15 41 chapterII $w Rank of a possibility w ∈ W Def. 2.1 48 N∞ N∞ := N ∪ {∞} in Def. 2.1 48 κA NRF, Negative rank of proposition A Defs. 2.3, 2.4, 49, 53 2.11 Belκ Belief set of a NRF κ. Def. 2.6 50   Belκ := A ∈ A : κ A > 0 coreκ Core of a NRF κ. coreκ := κ−1 (0) Def. 2.7 50 τA TRF, Two-sided rank of proposition A Defs. 2.14 56   S −1  core τ Core of τ. core τ = i>0τ i Def. 2.15 56 $w | A Rank of possibility w conditional on Def. 2.18 57 proposition A κB | A Negative rank of proposition B condi- Defs. 2.19, 2.24 58, 61 tional on proposition A τB | A Two-Sided rank of proposition B condi- Defs. 2.26 62 tional on proposition A βA Positive rank of proposition A Defs. 2.29, 2.30 65, 65 coreβ Core of β. coreβ := β−10 Def. 2.32 66 βB | A Positive rank of B conditional on A Def. 2.19 67  κA B Plain conditionalization of κ by A. Def. 2.38 68  κA (B) = κ B | A  $A→n w Simple indirect Spohn-conditionalization Def. 2.40 70 of $

215 Index of Symbols

Symbol Description Reference Page  $Ei→ni w Generalized indirect Spohn- Def. 2.41 71 conditionalization of $ κ  κ Ei→ni w Spohn-conditionalization of Def. 2.42 72  $E→λ w Spohn-revision of $ Def. 2.43 73  κE→λ A Spohn-revision of κ Def. 2.44 73  $Ei↑ni w Generalized indirect Shenoy- Def. 2.46 73 conditionalization of $ κ  κ Ei↑ni B Shenoy-conditionalization of Def. 2.47 74 A ⊥κ B | C Rank-based conditional independence of Def. 2.48 75 (proposition) A from B given C chapter III U, V,W, X, . . . Variables (singleton) Def. 3.3 87f. U, V, W, X, . . . Variables (compound) in text 87f. rgX, cdX Range and codomain of variable X in text 87f.

vX, vX actual value of variable X (or X) which in text 87f. means some proposition from cdX  κ vX rank of proposition vX in text 87f. AV Minimal algebra over a set of variables V Def. 3.8 90 u, v The undirected edge connecting vertices u and v hu, vi The directed edge leading from vertex u to vertex v u ∼ v Assertion: an undirected edge exists be- tween vertices u and v u → v Assertion: a directed edge leads from ver- tices u to vertex v G A graph G = hV,Ei Def. 3.11 91 GD Undirected graph of a directed graph D Def. 3.13 92 G0 ⊆ G, G0 ⊂ G Graph G0 is a subgraph of G Def. 3.15 93 GW Induced subgraph: the subgraph of G in- Def. 3.15 93 duced by the subset W of vertices of G

Ci A clique of a graph, uniquely identified by Def. 3.17 93 index i  κ ci Rank of proposition ci, which is the actual in text value of variable Ci adjX, adjX Neighborhood of a vertex/variable or a set Defs. 3.24, 3.25 94, 94 of vertices/variables clsrX, clsrX Closure of vertex X: adjX ∪ X Def. 3.26 95 pa(v), ch(v) Parents/Children of vertex v Def. 3.27 95 an(v), dn(v), nd(v) Ancestors, descendants, nondescendants Def. 3.27 95 of vertex v

216 Index of Symbols

Symbol Description Reference Page

AnX Smallest ancestral set containing X from Def. 3.28 95

Gm, Gm(D) Moral graph, moral graph of D Def. 3.35 96 Gc Triangulated graph, triangulation of G Def. 3.38 97 TG A clique tree of G Def. 3.40 98 ψ Set of potential functions of potential rep- Def. 3.63 110 resentation for some NRF

ψi Potential function for a particular subset Def. 3.63 110 Wi ⊆ V K Normalization factor for potential repre- in text 110 sentations of ranking functions, default   value is − min ψi vWi for the particu- lar subset Wi ⊂ V. VH Vertex set of hypergraph H in text 99 EH Edge set of hypergraph H in text 99 N Hypertree Def. 3.44 99 X ⊥κ Y | Z Rank-based conditional independence of Def. 3.45 100 variable X from variable Y given an actual value of variable Z  bdryκ X Markov boundary of X Def. 3.57 106 θ A strict linear ordering (on the vertices V) from Def. 3.70 121  predθ Vj Set of predecessors of variable Vj in the Def. 3.71 121 ordering θ

Lθ Causal list based on the ordering θ Def. 3.72 121 R := hV, E, κi A ranking network with a DAG hV,Ei and Def. 3.74 123 a NRF κ chapterIV madjX Monotone adjacency of a vertex or a set of Def. 4.2 149 vertices Dv Deficiency of vertex v. Dv := hu, wi : Def. 4.3 150

v ∼ u ∧ v ∼ w ∧ u  w ∧ u 6= w α A total ordering on the vertices of a graph. in text 149 Fα Fill-in of a graph G in accordance to a total Def. 4.4 150 vertex ordering α Gα Fill-in graph of G in accordance to a total Def. 4.5 151 vertex ordering α CG Set of cliques of G Def. 3.40 98 vir Representative vertex of clique Cr := in text 162   vir ∪ madj vir clqidxv The lowest index r of any clique in the or- in text 166   dering C1, C2,..., Cp such that Cr con-   tains v clqidx v := min r : v ∈ Cr .

217 Index of Symbols

Symbol Description Reference Page

KG, w Weighted clique intersection graph of G Def. 4.24 167

218 Index of Algorithms

1 Algorithm for Hunter’s parallel single update on Spohnian networks...... 141 2 EliminationGame, the basic vertex elimination algorithm...... 151 3 MCS-M, a method to compute a minimal fill-in...... 158 4 Prim’s (modified) algorithm, a method to compute a Max-WST...... 170 5 Decompose, a method to decompose a ranking network to a clique tree...... 172 - Procedure init...... 184 - Procedure send-lambda-msg...... 186 - Procedure process-lambda-msg...... 186 - Procedure send-pi-msg...... 187 - Procedure process-pi-msg...... 187 - Procedure init-update...... 189 6 Update, a method to incorporate new evidence...... 189

219 220 Definitions and Theorems from “The Laws of Belief”

Item in (Spohn, 2012) References in chapterI Definition 1.2 (p. 30) definition 2.1 (p. 17) Definiton 1.3 (p. 30) implicit in the beginning of section 2.1, p. 17 Definitions 1.4 (p. 31) and 1.5 definition 2.1 (p. 17) (p. 31) Definitions 1.7 (p. 32) and 1.8 definition 2.2 (p. 18) (p. 32) Definitions 1.12 (p. 40), 1.14 definition 4.5 (p. 50) (p. 41) and 1.15 (p. 41) Postulate 1.10 (p. 38) sentence 4.1 (p. 48) Postulate 1.11 (p. 38) sentence 4.2 (p. 48) Postulate 1.18 (p. 43) sentence 4.8 (p. 52) Postulate 1.20 (p. 44) sentence 4.7 (p. 51) Postulate 1.21 (p. 44) sentence 4.9 (p. 52) Postulate 1.22 (p. 44) sentence 4.10 (p. 52) Corollary 1.6 (p. 31) remark on definition 2.1 (p. 18), proof by the author Corollary 1.9 (p. 32) remark on definition 2.2 (p. 18), proof taken from unpub- lished draft version of (Spohn, 2012) Corollary 1.13 (p. 40) implicit, proof by the author Corollary 1.16 (p. 41) and 1.17 sentence 4.6 (p. 51), proofs by the author (p. 42) References in chapterII Definitions 2.1 (p. 48), 2.2 (p. definition 5.5 (p. 70) 49), 2.3 (p. 49), and 2.4 (p. 49) Corollary 2.5 (p. 49) implicit, proof by the author Definition 2.6 (p. 50) sentence 5.6 (p. 71) Definition 2.7 (p. 50) remark on sentence 5.6 (p. 71) Definition 2.8 (p. 51) definition 5.27 (p. 84) Definition 2.9 (p. 51) not used

221 Definitions and Theorems from “The Laws of Belief”

Item in (Spohn, 2012) Theorem 2.10 (p. 51) theorems 5.8b (“law of disjunction”, “finite minimitivity”) and 5.8c (“law of infinite disjunction”, “complete minimiti- vity”) (p. 72), proof by the author Theorem 2.13 (p. 55) theorem 5.8a (p. 72) (“law of negation”), proof by the au- thor Definition 2.11 (p. 53) definition 5.5 (p. 70) Definition 2.14 (p. 56) definition 5.12 (p. 76) Definition 2.15 (p. 56) not used Corollary 2.16 (p. 56) sentence 5.13 (p. 76) (“law of negation for two-sided ranks”) Corollary 2.17 (p. 57) implicit Definition 2.18 (p. 57) definition 5.15 (p. 78) Definition 2.19 (p. 58) definition 5.15 (p. 78) Corollary 2.20 (p. 59) sentence 5.16 (p. 78) (“law of conjunction for negative ranks”) Corollary 2.21 (p. 60) implicit Theorem 2.22 (p. 60) theorem 5.23c (p. 81) Theorem 2.23 (p. 61) theorem 5.23d (p. 81) (“Bayes’ theorem for negative ranks”) Definition 2.24 (p. 61) definition 5.19 (p. 79) Definition 2.25 (p. 62) definition 5.19 (p. 79) Definition 2.26 (p. 62) definition 5.22 (p. 80) Theorem 2.27 (p. 63) theorem 5.23a (p. 81) (“law of disjunctive conditions”) Theorem 2.28 (p. 64) theorem 5.23b (p. 81) (“formula of the total negative rank”) Definition 2.30 (p. 65) definition 5.10 (p. 75) Definition 2.31 (p. 66) and re- resembles sentence 5.11 (p. 75) mark on induction of comple- mentary functions Definition 2.32 (p. 66) not used Theorem 2.33 (p. 66) implicit, proof by the author Definition 2.29 (p. 65) definition 5.10 (p. 75) Definition 2.34 (p. 67) definition 5.20 (p. 80) Corollary 2.35 (p. 67) sentence 5.21 (p. 80) (“law of material implication”) Theorem 2.36 (p. 67) not used Theorem 2.37 (p. 67) theorem 5.23e (p. 81) (“Bayes’ theorem for positive ranks”) Definition 2.38 (p. 68) definition 5.15 (p. 78) Corollary 2.39 (p. 69) assertion is part of definition 5.15 (p. 78) Definition 2.40 (p. 70) definition 5.24 (p. 83) Definition 2.41 (p. 71) definition 5.32 (p. 86) Definition 2.42 (p. 72) not used Definition 2.43 (p. 73) not used Definition 2.44 (p. 73) not used

222 Definitions and Theorems from “The Laws of Belief”

Item in (Spohn, 2012) Definition 2.46 (p. 73) definition 5.29 (p. 85) Definition 2.47 (p. 74) not used Definition 2.48 (p. 75) definition 7.1 (p. 127) Lemma 2.49 (p. 75) not described, proof by the author with insightful tipps by Wolfgang Spohn Lemma 2.50 (p. 76) not described, proof by the author Lemma 2.51 (p. 77) not described, proof by the author Theorem 2.52 (p. 77) definition 7.1 (p. 127) Definition 2.53 (p. 77) definition 7.1 (p. 127) Theorem 2.54 (p. 77) theorem 7.2 (p. 127) References in chapter III Theorem 3.50 (p. 102) theorem 7.6 (p. 130) Theorem 3.48 (p. 102) theorem 7.10 (p. 132)

223 224 Literature

Note: all hyperlinks are supposed to either point directly to the fulltext of the bibliographic reference or to lead to sites where the fulltext is legally available for download. The linked site nonetheless may or may not individually restrict the online access in accordance to applicable law. Especially a download may or may not be available when accessing the linked site from a system which is not part of the network of a university or other educational institution. All hyperlinks have been verified to be valid on September 15, 2013.

Adams, Ernest W. 1975. The Logic of Conditionals. Dordrecht: Reidel.

Alchourrón, Carlos E., Gärdenfors, Peter, & Makinson, David. 1985. On the Logic of Theory Change: Partial Meet Functions for Contraction and Revision. Journal of Symbolic Logic, 50 (2), 510–530. http://www.jstor.org/stable/2274239

Anderson, John. 1938. The Problem of Causality. Australasian Journal of Psychology and Philos- ophy, 16 (2), 127–142. doi:10.1080/00048403808541382

Andersson, Steen A., Madigan, David, & Perlman, Michael D. 1997. A Characterization of Markov Equivalence Classes for Acyclic Digraphs. Annals of Statistics, 25 (2), 505–541. doi:10.1214/aos/1031833662

Arnborg, Stefan, Corneil, Derek G., & Proskurowski, Andrzej. 1987. Complexity of finding embeddings in a k-tree. SIAM Journal on Algebraic and Discrete Methods, 8 (2), 277–284. doi:10.1137/0608024

Bacchus, Fahiem & Grove, Adam J. 1995. Graphical Models for Preference and Utility. In: (Besnard & Hanks, 1995), 3–10. http://uai.sis.pitt.edu/papers/95/p3-bacchus.pdf

Bacchus, Fahiem, Grove, Adam J., Halpern, Joseph Y., & Koller, Daphne. 1994a. Form- ing Beliefs about a Changing World. Pages 222–229 of: Hayes Roth, Barbara & Korf, Richard E. (eds), Proceedings of the 12th National Conference on Artificial Intelligence (AAAI- 94), vol. 1. Menlo Park, CA: AAAI Press. http://robotics.stanford.edu/~koller/Papers/Bacchus+al:AAAI94.pdf

. 1994b. Generating New Beliefs From Old. Pages 37–45 of: de Mántaras, Ramon López &Poole, David (eds), Proceedings of the 10th Annual Conference on Uncertainty in Artificial Intelligence (UAI-94). San Francisco, CA: Morgan Kaufmann. http://uai.sis.pitt.edu/papers/94/p37-bacchus.pdf

. 1996. From Statistical Knowledge Bases to Degrees of Belief. Artificial Intelligence, 87 (1–2), 75–143. http://ai.stanford.edu/~koller/Papers/Bacchus+al:AIJ96.pdf

225 Literature

Becker, Ann & Geiger, Dan. 1996. A Sufficiently Fast Algorithm for Finding Close to Optimal Junction Trees. Pages 81–89 of: Horvitz, Eric & Jensen, Finn Verner (eds), Proceedings of the 12th Annual Conference on Uncertainty in Artificial Intelligence (UAI-96). San Francisco, CA: Morgan Kaufmann. http://uai.sis.pitt.edu/papers/96/p81-becker.pdf

Beeri, Catriel, Fagin, Ronald, Maier, David, & Yannakakis, Mihalis. 1983. On the Desirabil- ity of Acyclic Database Schemes. Journal of the ACM, 30 (3), 479–513. doi:10.1145/2402.322389

Bennett, Jonathan. 2003. A Philosophical Guide to Conditionals. Oxford: Oxford University Press.

Bernstein, Philip A. & Goodman, Nathan. 1981. Power of Natural Semijoins. SIAM Journal on Computing, 10 (4), 751–771. doi:10.1137/0210059

Berry, Anne, Blair, Jean R. S., Heggernes, Pinar, & Peyton, Barry W. 2004. Maximum Cardinality Search for Computing Minimal Triangulations of Graphs. Algorithmica, 39 (4), 287–298. doi:10.1007/s00453-004-1084-3 http://www.ii.uib.no/~pinar/MCSM-r.pdf

Berry, Anne, Dahlhaus, Elias, Heggernes, Pinar, & Simonet, Geneviève. 2008. Sequential and Parallel Triangulating Algorithms for Elimination Game and New Insights on Mini- mum Degree. Theoretical Computer Science, 409 (3), 601–616. doi:10.1016/j.tcs.2008.09.059 http://www.ii.uib.no/~pinar/MD_Berryal_Revd.pdf

Besag, Julian. 1974. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, B (Methodological), 36 (2), 192–236. http://www.jstor.org/stable/2984812

Besnard, Phillipe & Hanks, Steve (eds). 1995. Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI-95). San Francisco, CA: Morgan Kaufmann.

Binkley, Robert W. 1968. The Surprise Examination in Modal Logic. Journal of Philosophy, 65 (5), 127–136. http://www.jstor.org/stable/2024556

Blair, Jean R. S., England, R. E., & Thomason, M. G. 1988. Cliques and their Separators in Triangulated Graphs. Tech. rept. CS-78-88. Dept. of Computer Science, University of Tennessee, Knoxville, TN. (Could not be acquired.).

Blair, Jean R. S. & Peyton, Barry W. 1992. An Introduction to Chordal Graphs and Clique Trees. Tech. rept. ORNL/TM-12203. Oak Ridge National Laboratory, Oak Ridge, TN. Also pub- lished as (Blair & Peyton, 1993). http://www.ornl.gov/info/reports/1992/3445603686740.pdf

. 1993. An Introduction to Chordal Graphs and Clique Trees. Pages 1–29 of: George, Alan, Gilbert, John R., & Liu, Joseph W. H. (eds), Graph Theory and Sparse Matrix Com- putations. IMA Volumes in Mathematics and its Applications, vol. 56. New York, NY: Springer. First published as (Blair & Peyton, 1992).

Bonissone, Piero P., Henrion, Max, Kanal, Laveen N., & Lemmer, John F. (eds). 1990. Pro- ceedings of the 6th Annual Conference on Uncertainty in Artificial Intelligence (UAI-90). New York, NY: Elsevier Science.

226 Literature

Bouchitté, Vincent & Todinca, Ioan. 2002a. Listing all Potential Maximal Cliques of a Graph. Theoretical Computer Science, 276 (1–2), 17–32. doi:10.1016/S0304-3975(01)00007-X http://www.univ-orleans.fr/lifo/Members/todinca/PS/ListingPMCs.ps

. 2002b. Treewidth and Minimum Fill-in: Grouping the minimal separators. SIAM Journal on Computing, 31, 212–232. doi:10.1137/S0097539799359683

Boutilier, Craig. 1993. Revision Sequences and Nested Conditionals. Pages 519–525 of: Ba- jcsy, Ruzena (ed), Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93). San Mateo, CA: Morgan Kaufmann. http://ijcai.org/PastProceedings/IJCAI-93-VOL1/PDF/073.pdf

Boutilier, Craig, Bacchus, Fahiem, & Brafman, Ronen I. 2001. UCP-Networks: A Directed Graphical Representation of Conditional Utilities. Pages 56–64 of: Proceedings of the 17th Annual Conference on Uncertainty in Artificial Intelligence (UAI ’01). San Francisco, CA: Morgan Kaufmann. http://uai.sis.pitt.edu/papers/01/p56-boutilier.pdf

Bovens, Luc & Hartmann, Stephan. 2004. Bayesian Epistemology. Oxford: Oxford University Press. Brafman, Ronen I. & Engel, Yagil. 2009. Directional Decomposition of Multiattribute Utility Functions. Pages 192–202 of: Proceedings of the 1st International Conference on Algorithmic Decision Theory (ADT ’09). Berlin: Springer. doi:10.1007/978-3-642-04428-1_17

. 2010. Decomposed Utility Functions and Graphical Models for Reasoning about Preferences. In: Proceedings of the 24th Conference on Artificial Intelligence (AAAI-10). http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/download/1630/ 1970

Brafman, Ronen I. & Tennenholtz, Moshe. 2000. An Axiomatic Treatment of Three Quali- tative Decision Criteria. Journal of the ACM, 47 (3), 452–482. doi:10.1145/337244.337251 http://www.cs.bgu.ac.il/~brafman/jacm99.ps

Cantor, Georg. 1874. Über eine Eigenschaft des Inbegriffs aller reellen algebraischen Zahlen. Journal für die Reine und Angewandte Mathematik, 77, 258–262. (In German). http://resolver.sub.uni-goettingen.de/purl?GDZPPN002155583

. 1890/91. Über eine elementare Frage der Mannigfaltigkeitslehre. Jahresbericht der Deutschen Mathematikervereinigung, 1, 72–78. (In German). Cantwell, John. 1997. On the Logic of Small Changes in Hypertheories. Theoria, 63 (1–2), 54–89. doi:10.1111/j.1755-2567.1997.tb00740.x

Carnap, Rudolf. 1947/56. Meaning and Necessity. 2nd edn. Chicago, IL: The University of Chicago Press. . 1950/62. The Logical Foundations of Probability. 2nd edn. Chicago, IL: The University of Chicago Press. . 1971a. A Basic System of Inductive Logic, Part 1. Pages 33–165 of: Carnap, Rudolf &Jeffrey, Richard C. (eds), Studies in Inductive Logic and Probability, vol. 1. Berkeley, CA: University of California Press.

227 Literature

Carnap, Rudolf. 1971b. Inductive Logic and Rational Decisions. Pages 5–31 of: Carnap, Rudolf &Jeffrey, Richard C. (eds), Studies in Inductive Logic and Probability, vol. 1. Berkeley, CA: University of California Press.

. 1980. A Basic System of Inductive Logic, Part 2. Pages 7–155 of: Jeffrey, Richard C. (ed), Studies in Inductive Logic and Probability, vol. 2. Berkeley, CA: University of California Press.

Chalmers, David J. 2006. The Foundations of Two-Dimensional Semantics. Pages 55–140 of: Garcia Carpintero, Manuel & Macia, Joseph (eds), Two-Dimensional Semantics: Founda- tions and Applications. Oxford: Oxford University Press. http://consc.net/papers/foundations.pdf

Charitos, Theodore, de Waal, Peter R., & van der Gaag, Linda C. 2006. Convergence in Markovian Models with Implications for Efficiency of Inference. Tech. rept. UU-CS-2006-038. Department of Information and Computing Sciences, Utrecht University, Utrecht. http://www.cs.uu.nl/research/techreps/repo/CS-2006/2006-038.pdf

Charniak, Eugene. 1991. Bayesian Networks Without Tears. AI Magazine, 12 (4), 50–63. http://www.aaai.org/ojs/index.php/aimagazine/article/download/918/ 836

Clifford, Peter E. 1990. Markov random fields in statistics. In: (Grimmett & Welsh, 1990), 19–32. http://www.statslab.cam.ac.uk/~grg/books/hammfest/3-pdc.ps

Cohen, Laurence Jonathan. 1970. The Implications of Induction. London: Methuen.

. 1977. The Probable and the Provable. Oxford: Oxford University Press.

. 1980. Some Historical Remarks on the Baconian Conception of Probability. Journal of the History of Ideas, 41 (2), 219–231. http://www.jstor.org/stable/2709457

Cooper, Gregory F. 1984. NESTOR: A computer-based medical diagnostic aid that integrates causal and probabilistic knowledge. Ph.D. thesis, Department Of Computer Science, Stanford Uni- versity, Stanford, CA.

Cowell, Robert G., Dawid, A. Philip, Lauritzen, Steffen L., & Spiegelhalter, David J. 1999. Probabilistic Networks and Expert Systems. Berlin: Springer.

Cox, Richard T. 1946. Probability, Frequency, and Reasonable Expectation. American Journal of Physics, 14 (1), 1–13. doi:10.1119/1.1990764

Dahlhaus, Elias & Karpinski, Marek. 1994. An Efficient Parallel Algorithm for the Minimal Elimination Ordering (MEO) of an Arbitrary Graph. Theoretical Computer Science, 134 (2), 493–528. doi:10.1016/0304-3975(94)90250-X

Darroch, John N., Lauritzen, Steffen L., & Speed, Terry P. 1980. Markov Fields and Log-Linear Interaction Models for Contingency Tables. Annals of Statistics, 8 (3), 522–539. http://projecteuclid.org/DPubS?service=UI&version=1.0&verb= Display&handle=euclid.aos/1176345006

Darwiche, Adnan & Pearl, Judea. 1997. On the Logic of Iterated Belief Revision. Artificial Intelligence, 89 (1–2), 1–29. doi:10.1016/S0004-3702(96)00038-0

228 Literature

Davis, Martin & Stoljar, Daniel (eds). 2004. Philosophical Studies. Vol. 118.

Dawid, A. Philip. 1979. Conditional Independence in Statistical Theory. Journal of the Royal Statistical Society, B (Methodological), 41, 1–31. http://www.jstor.org/stable/2984718

. 1980. Conditional Independence for Statistical Operations. Annals of Statistics, 8 (3), 598–617. http://www.jstor.org/stable/2240595

. 1992. Applications of a General Propagation Algorithm for Probabilistic Expert Sys- tems. Statistics and Computing, 2 (1), 25–36. doi:10.1007/BF01890546

. 2001. Separoids: A Mathematical Framework for Conditional Independence and Irrelevance. Annals of Mathematics and Artificial Intelligence, 32 (1–4), 335–372. doi:10.1023/A:1016734104787

Diestel, Reinhard. 2010. Graph Theory. 4th edn. Graduate Texts in Mathematics, vol. 173. Heidelberg: Springer.

Dirac, G. A. 1961. On Rigid Circuit Graphs. Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg, 25 (1–2), 71–76. doi:10.1007/BF02992776

Dobruschin, P. L. 1968. The Description of a Random Field by Means of Conditional Prob- abilities and Conditions of Its Regularity. Theory of Probability and its Applications, 13 (2), 192–224. doi:10.1137/1113026

Domotor, Zoltan. 1969. Probabilistic Relational Structures and Their Application. Tech. rept. 144. Institute for the Mathematical Studies in the Social Sciences, Stanford University, Stanford, CA. http://www.eric.ed.gov/contentdelivery/servlet/ERICServlet?accno= ED031407

Doyle, Jon. 1979. A Truth Maintenance System. Artificial Intelligence, 12 (3), 231–272.

Dubois, Didier & Prade, Henri. 1988. Possibility Theory: An Approach to Computerized Processing of Uncertainty. New York, NY: Plenum Press.

. 1998. Possibility Theory: Qualitative and Quantitative Aspects. In: (Gabbay & Smets, 1998), 169–226.

Dubois, Didier, Wellman, M. P., D’Ambrosio, B., & Smets, Philippe H. (eds). 1992. Proceedings of the 8th Annual Conference on Uncertainty in Artificial Intelligence (UAI-92). San Mateo, CA: Morgan Kaufmann.

Dubus, Jean-Philippe, Gonzales, Christophe, & Perny, Patrice. 2009. Fast Recommendations Using GAI Models. Pages 1896–1901 of: Proceedings of the 21st International Join Conference on Artificial Intelligence (IJCAI 09). San Francisco, CA: Morgan Kaufmann. http://ijcai.org/papers09/Papers/IJCAI09-314.pdf

Earman, John. 1992. Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory. Reading, MA: MIT Press.

Edgington, Dorothy. 1995. On Conditionals. Mind, 104 (414), 235–327. http://www.jstor.org/stable/2254793

229 Literature

Ellis, Brian. 1976. Epistemic Foundations of Logic. Journal of , 5 (2), 187–204. http://www.jstor.org/stable/30226140

. 1979. Rational Belief Systems. Oxford: Blackwell.

Festa, Roberto. 1997. Analogy and Exchangeability in Predictive Inferences. , 45 (2– 3), 229–252. http://www.jstor.org/stable/20012728

Field, Hartry. 1978. A Note on Jeffrey Conditionalization. , 45 (3), 361–367. http://www.jstor.org/stable/187023

. 1996. XV* – The A Prioricity of Logic. Proceedings of the Aristotelian Society, 96, 359–379. http://www.jstor.org/stable/4545244 de Finetti, Bruno. 1937. La Prévision: Ses Lois Logiques, Ses Sources Subjectives. Annales de l’Institut Henri Poincaré, 7 (1), 1–68. (In French). http://www.numdam.org/item?id=AIHP_1937__7_1_1_0

. 1964. Foresight: Its Logical Laws, Its Subjective Sources. Pages 93–158 of: Kyburg, Jr., Henry Ely & Smokler, Henry Edward (eds), Studies in Subjective Probability. New York, NY: Wiley. English translation of (de Finetti, 1937).

Fishburn, Peter C. 1964. Decision and Value Theory. New York, NY: Wiley.

. 1967. Interdependence and Additivity in Multivariate, Unidimensional Expected Util- ity Theory. International Economic Review, 8 (3), 335–342. http://www.jstor.org/stable/2525541

. 1968. Utility Theory. Management Science, 14 (5), 335–378. http://www.jstor.org/stable/2628674

. 1970. Utility Theory for Decision Making. New York, NY: Wiley. doi:10.1287/mnsc.14.5.335

Fitelson, Branden. 2001. Studies in Bayesian Confirmation Theory. Ph.D. thesis, Department of Philosophy, University of Wisconsin, Madison, WI.

Fodor, Jerry A. 1987. Psychosemantics: The Problem of Meaning in the . Read- ing, MA: MIT Press. van Fraassen, Bas C. 1983. Calibration: A Frequency Justification for Personal Probabil- ity. Pages 295–319 of: Cohen, Robert S. & Laudan, Larry (eds), Physics, Philosophy, and Psychoanalysis. Dordrecht: Reidel.

. 1984. Belief and the Will. Journal of Philosophy, 81 (5), 235–256. http://www.jstor.org/stable/2026388

. 1995a. Belief and the Problem of Ulysses and the Sirens. Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition, 77 (1), 7–37. http://www.jstor.org/stable/4320551

. 1995b. Fine-Grained Opinion, Probability, and the Logic of Full Belief. Journal of Philosophical Logic, 24 (4), 349–377. http://www.jstor.org/stable/30226553

Frege, Gottlob. 1892. Über Sinn und Bedeutung. Zeitschrift für Philosophie und philosophische Kritik, NF 100, 25–50. (In German).

230 Literature

Frege, Gottlob. 1919. Der Gedanke. Beiträge zur Philosophie des deutschen Idealismus, 2 (1), 58–77. (In German).

Freund, John E. 1965. Puzzle or Paradox? The American Statistician, 19 (4), 29–44. http://www.jstor.org/stable/2681571

Fuhrmann, André. 1988. Relevant , Modal Logics, and Theory Change. Ph.D. thesis, Aus- tralian National University, Canberra.

. 1997. An Essay on Contraction. Stanford, CA: CSLI Publications.

Fuhrmann, André & Hansson, Sven Ove. 1994. A Survey of Multiple Contractions. Journal of Logic, Language, and Information, 3 (1), 39–75. http://www.jstor.org/stable/40180040

Fulkerson, D. R. & Gross, O. A. 1965. Incidence Matrices and Interval Graphs. Pacific Journal of Mathematics, 15 (3), 835–855. http://projecteuclid.org/euclid.pjm/1102995572

Gabbay, Dov M. & Smets, Philippe H. (eds). 1998. Quantified representation of uncertainty and imprecision. Handbook of defeasible reasoning and uncertainty management systems, vol. 1. Dordrecht: Kluwer.

Gaifman, Haim. 1988. A Theory of Higher Order Probabilities. Pages 191–219 of: Harper, William Leonard & Skyrms, Brian (eds), Causation, Chance, and Credence. Dordrecht: Kluwer.

Garber, Daniel. 1980. Field and Jeffrey Conditionalization. Philosophy of Science, 47 (1), 142– 145. http://www.jstor.org/stable/187152

Gärdenfors, Peter. 1978. Conditionals and Changes of Belief. Pages 381–404 of: Niinilu- oto, Ilkka & Tuomela, Raimo (eds), The Logic and Epistemology of Scientific Change. Acta Philosophica Fennica, vol. 30, nos. 2–4. Amsterdam: North-Holland.

. 1981. An Epistemic Approach to Conditionals. American Philosophical Quarterly, 18 (3), 203–211. http://www.jstor.org/stable/20013914

. 1988. Knowledge in Flux: Modelling the Dynamics of Epistemic States. Cambridge, MA: MIT Press.

Gärdenfors, Peter & Makinson, David. 1989. Relations between the Logic of Theory Change and Nonmonotonic Logic. Pages 185–205 of: Fuhrmann, André & Morreau, Michael (eds), The Logic of Theory Change: Proceedings of the Workshop on The Logic of Theory Change. Lecture Notes in Computer Science, vol. 465. Berlin: Springer.

Gärdenfors, Peter & Rott, Hans. 1995. Belief Revision. Pages 35–132 of: Gabbay, Dov M., Hogger, C. J., & Robinson, J. A. (eds), Epistemic and Temporal Reasoning. Handbook of Logic in Artificial Intelligence and Logic Programming, vol. 4. Oxford: Oxford University Press.

Garey, Michael R. & Johnson, David S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. New York, NY: W. H. Freeman & Co.

Gavril, F˘anic˘a. 1974. The Intersection Graphs of Subtrees in Trees are Exactly the Chordal Graphs. Journal of Combinatorial Theory, B, 16, 47–56. doi:10.1016/0095-8956(74)90094-X

231 Literature

Gavril, F˘anic˘a. 1987. Generating the Maximum Spanning Weighted Graph. Journal of Algo- rithms, 8 (4), 592–597. doi:10.1016/0196-6774(87)90053-8

Geiger, Dan. 1987. The Non-Axiomatizability of Dependencies in Directed Acyclic Graphs. Tech. rept. 870048 (R-83). Cognitive Systems Laboratory, University of California, Los Angeles, CA. ftp://ftp.cs.ucla.edu/tech-report/198_-reports/870048.pdf

. 1988. Towards the formalization of informational dependencies. Tech. rept. 880053 (R-102). Cognitive Systems Laboratory, University of California, Los Angeles, CA. ftp://ftp.cs.ucla.edu/tech-report/198_-reports/880053.pdf

. 1990. Graphoids: A Qualitative Framework for Probabilistic Inference. Ph.D. thesis, Uni- versity of California, Los Angeles, CA. http://ftp.cs.ucla.edu/pub/stat_ser/r142-geiger-phd.pdf

Geiger, Dan, Paz, Azaria, & Pearl, Judea. 1988. Axioms and Algorithms for Inferences Involving Probabilistic Independence. Tech. rept. 890031 (R-119). Cognitive Systems Laboratory, Uni- versity of California, Los Angeles, CA. Also published as (Geiger et al., 1991). ftp://ftp.cs.ucla.edu/tech-report/198_-reports/890031.pdf

. 1991. Axioms and Algorithms for Inferences Involving Probabilistic Independence. Information and Computation, 91, 128–141. First published as (Geiger et al., 1988).

Geiger, Dan & Pearl, Judea. 1988. On the Logic of Causal Models. In: (Shachter et al., 1988), 136–147. Also in (Shachter et al., 1990), 3–14. http://uai.sis.pitt.edu/papers/88/p136-geiger.pdf

. 1989. Logical and Algorithmic Properties of Conditional Independence and their Application to Bayesian Networks. Tech. rept. 890035 (R-123). Cognitive Systems Laboratory, University of California, Los Angeles, CA. A revised version was published as (Geiger & Pearl, 1993). ftp://ftp.cs.ucla.edu/tech-report/198_-reports/890035.pdf

. 1993. Logical and Algorithmic Properties of Conditional Independence and Graphical Models. Annals of Statistics, 21 (4), 2001–2021. Revised version of the earlier (Geiger & Pearl, 1989). http://www.jstor.org/stable/2242326

Geiger, Dan, Pearl, Judea, & Verma, Thomas. 1989. Identifying Independence in Bayesian Net- works. Tech. rept. 890028 (R-116). Cognitive Systems Laboratory, University of California, Los Angeles, CA. Also published as (Geiger et al., 1989). ftp://ftp.cs.ucla.edu/tech-report/198_-reports/890028.pdf

. 1990. Identifying Independence in Bayesian Networks. Networks, 20 (5), 507–534. First published as (Geiger et al., 1989).

Ghysels, Eric, Swanson, Norman R., & Watson, Mark W. (eds). 2001. Essays in Econometrics. Cambridge, MA: Harvard University Press.

Giang, Phan Hong & Shenoy, Prakash P. 2000. A Qualitative Linear Utility Theory for Spohn’s Theory of Epistemic Beliefs. Pages 220–229 of: Boutilier, Craig & Goldszmidt, Moisés (eds), Proceedings of the 16th Annual Conference on Uncertainty in Artificial Intelligence (UAI- 00), vol. 16. San Francisco, CA: Morgan Kaufmann. http://uai.sis.pitt.edu/papers/00/p220-giang.pdf

232 Literature

Giang, Phan Hong & Shenoy, Prakash P.. 2005. Two Axiomatic Approaches to Decision Making Using Possibility Theory. European Journal of Operational Research, 162 (2), 450– 467. http://hdl.handle.net/1808/149

Gilboa, Ithzak. 1987. Expected Utility with Purely Subjective Non-Additive Probabilities. Journal of Mathematical Economics, 16 (1), 65–88. doi:10.1016/0304-4068(87)90022-X

Glymour, Clark, Scheines, Richard, & Spirtes, Peter. 1993. Causation, Prediction, and Search. Berlin: Springer.

Glymour, Clark, Scheines, Richard, Spirtes, Peter, & Kelly, Kevin. 1987. Discovering Causal Structure: Artificial Intelligence, Philosophy of Science, and Statistical Modeling. San Diego, CA: Academic Press.

Goldszmidt, Moisés & Pearl, Judea. 1992a. Rank-Based Systems: A Simple Approach to Belief Revision, Belief Update, and Reasoning About Evidence and Actions. Pages 661– 672 of: Nebel, Bernhard, Rich, Charles, & Swartout, William (eds), Proceedings of the 3rd International Conference on Knowledge Representation and Reasoning (KR ’92). San Mateo, CA: Morgan Kaufmann.

. 1992b. Reasoning With Qualitative Probabilities Can Be Tractable. In: (Dubois et al., 1992), 112–120. http://uai.sis.pitt.edu/papers/92/p112-goldszmidt.pdf

. 1996. Qualitative Probabilities for Default Reasoning, Belief Revision, and Causal Modeling. Artificial Intelligence, 84 (1–2), 57–112. doi:10.1016/0004-3702(95)00090-9 ftp://ftp.cs.ucla.edu/pub/stat_ser/R161-L.pdf

Golumbic, Martin Charles. 1980. Graph Theory and Perfect Graphs. New York, NY: Academic Press.

Granger, Clive W. J. 1969. Investigating Causal Relations by Econometric Models and Cross- Spectral Methods. Econometrica, 37 (3), 424–438. http://www.jstor.org/stable/1912791

. 1980. Testing for Causality: A Personal Viewpoint. Journal of Economic Dynamics and Control, 2, 329–352. Also in (Ghysels et al., 2001), 48–70. doi:10.1016/0165-1889(80)90069-X

Grimmett, Geoffrey R. 1973. A theorem about random fields. Bulletin of the London Mathemat- ical Society, 5 (1), 81–84. doi:10.1112/blms/5.1.81

Grimmett, Geoffrey R. & Welsh, D. J. A. 1990. Disorder in Physical Systems: A Volume in Honour of John M. Hammersley. Oxford: Clarendon Press.

Grove, Adam J. 1988. Two modellings for theory change. Journal of Philosophical Logic, 17 (2), 157–170. http://www.jstor.org/stable/30227207

Haas, Gordian. 2005. Revision und Rechtfertigung: Eine Theorie der Theorieänderung. Heidelberg: Synchron Wissenschaftsverlag der Autoren. (In German).

Haas-Spohn, Ulrike. 1995. Versteckte Indexikalität und subjektive Bedeutung. Berlin: Akademie Verlag. (In German).

233 Literature

Haas-Spohn, Ulrike & Spohn, Wolfgang. 2001. Concepts Are Beliefs About Essences. Pages 287–316 of: Newen, Albert, Nortmann, Ulrich, & Stuhlmann Laeisz, Rainer (eds), Building on Frege: New Essays on Sense, Content, and Concept. Stanford, CA: CSLI Pub- lications. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62624

Haavelmo, Trygve M. 1943. The Statistical Implications of a System of Simultaneous Equa- tions. Econometrica, 11 (1), 1–12. http://www.jstor.org/stable/1905714

Hacking, Ian. 1967. Slightly More Realistic Personal Probability. Philosophy of Science, 34 (4), 311–325. http://www.jstor.org/stable/186120

. 1975. The Emergence of Probability. Cambridge: Cambridge University Press.

Haenni, Rolf. 2009. Non-Additive Degrees of Belief. Pages 121–159 of: Huber, Franz & Schmidt Petri, C. (eds), Degrees of Belief. Dordrecht: Springer.

Hájek, Alan. 2003a. Conditional Probability is the Very Guide of Life. Pages 183–203 of: Kyburg, Jr., Henry Ely & Thalos, Mariam (eds), Probability is the Very Guide of Life: The Philosophical Uses of Chances. Chicago, IL: Open Court. (Abridged version in: Proceedings of the International Society for Bayesian Analysis 2002).

. 2003b. What Conditional Probability Could Not Be. Synthese, 137 (3), 273–323. http://www.jstor.org/stable/20118365

Halpern, Joseph Y. 2003. Reasoning about Uncertainty. Reading, MA: MIT Press.

Hammersley, John M. & Clifford, Peter E. 1971. Markov Fields on Finite Graphs and Lattices. (unpublished). http://www.statslab.cam.ac.uk/~grg/books/hammfest/hamm-cliff.pdf

Hansson, Sven Ove (ed). 1997. Special Issue on Non-Prioritized Belief Revision. Theoria, vol. 63, nos. 1–2.

. 1999. A Textbook of Belief Dynamics: Theory Change and Database Updating. Dordrecht: Kluwer.

Harper, William Leonard. 1975. Rational Belief Change, Popper Functions and Counterfactu- als. Synthese, Methodologies: Bayesian and Popperian, 30 (1–2), 221–262. Also published as (Harper, 1976a). http://www.jstor.org/stable/20115029

. 1976a. Rational Belief Change, Popper Functions and Counterfactuals. Pages 73–115 of: Harper, William Leonard & Hooker, Clifford Alan (eds), Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, vol. 1. Dordrecht: Reidel. First published as (Harper, 1975).

. 1976b. Rational Conceptual Change. Pages 462–494 of: Suppe, F. & Asquith, P. D. (eds), Proceedings of the Biennial Meeting of the Philosophy of Science Association (PSA 1976), vol. 2 (Symposia and Invited Papers). The University of Chicago Press, for Philosophy of Science Association. http://www.jstor.org/stable/192397

Hausman, Daniel M. & Woodward, James. 1999. Independence, Invariance, and the Causal Markov Condition. The British Journal for the Philosophy of Science, 50 (4), 521–583.

234 Literature

Heggernes, Pinar. 2006. Minimal Triangulations of Graphs: A Survey. Discrete Mathematics, 306 (3), 297–317. doi:10.1016/j.disc.2005.12.003 http://www.ii.uib.no/~pinar/MinTriSurvey.pdf

Heggernes, Pinar, Telle, Jan Arne, & Villanger, Yngve. 2005. Computing Minimal Trian- gulations in Time Onα log n = on2.376. SIAM Journal on Discrete Mathematics, 19 (4), 900–913. http://www.ii.uib.no/~pinar/n-alpha-log-n.pdf

Hempel, Carl Gustav. 1945a. Studies in the Logic of Confirmation 1. Mind, 54 (213), 1–26. http://www.jstor.org/stable/2250886

. 1945b. Studies in the Logic of Confirmation 2. Mind, 54 (214), 97–120. http://www.jstor.org/stable/2250948

. 1952. Fundamentals of Concept Formation in Empirical Science. In: International Encyclopedia of Unified Science, vol. 2. Chicago, IL: The University of Chicago Press.

. 1961–62. Rational Action. Proceedings and Addresses of the American Philosophical Asso- ciation, 35, 5–23. http://www.jstor.org/stable/3129344

. 1962. Deductive-Nomological vs. Statistical Explanation. Pages 98–169 of: Feigl, Her- bert & Maxwell, Grover (eds), Scientific Explanation, Space, and Time. Minnesota Studies in the Philosophy of Science, vol. 3. Minneapolis, MN: University of Minnesota Press.

. 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. New York, NY: Free Press.

Hild, Matthias. 1998a. Auto-Epistemology and Updating. Philosophical Studies: An Interna- tional Journal for Philosophy in the Analytic Tradition, 92 (3), 321–361. http://www.jstor.org/stable/4320896

. 1998b. The Coherence Argument against Conditionalization. Synthese, 115 (2), 229– 258. http://www.jstor.org/stable/20118052

Hild, Matthias & Spohn, Wolfgang. 2008. The Measurement of Ranks and the Laws of Iterated Contraction. Artificial Intelligence, 172 (10), 1195–1218. doi:10.1016/j.artint.2008.03.002

Hintikka, Jaakko. 1962. Knowledge and Belief. Ithaca, NY: Cornell University Press.

Howard, Ronald A. & Matheson, James E. 1981. Influence Diagrams. Pages 719–762 of: Howard, Ronald A. & Matheson, James E. (eds), Readings on the Principles and Applica- tions of Decision Analysis, vol. 2 (1984). Menlo Park, CA: Strategic Decisions Group. Also released as (Howard & Matheson, 2005).

. 2005. Influence Diagrams. Decision Analysis. Special Issue on Graph-Based Representa- tions, Part 1 of 2: Influence Diagrams, 2 (3), 127–143. First released as (Howard & Matheson, 1981). doi:10.1287/deca.1050.0020 http://www.csun.edu/~hcmgt004/influencediagrams.pdf

Howson, Colin & Urbach, Peter. 1989/2005. Scientific Reasoning: The Bayesian Approach. 3rd edn. La Salle, IL: Open Court.

235 Literature

Huber, Franz. 2006. Ranking Functions and Rankings on Languages. Artificial Intelligence, 170 (4–5), 462–471. doi:10.1016/j.artint.2005.10.016 http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-83278

. 2007. The Consistency Argument for Ranking Functions. Studia Logica, 86 (2), 299–329. doi:10.1007/s11225-007-9062-9 http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-83247

. 2009. Ranking Functions. Pages 1351–1355 of: Dopico, Juan Ramón Rabuñal, de la Calle, Julián Dorado, & Sierra, Alejandro Pazos (eds), Encyclopedia of Artificial Intelligence. Hershey, PA: Information Science Reference. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-83360

Hume, David. 1748. An Inquiry Concerning Human Understanding.

Hunter, Daniel. 1988. Parallel Belief Revision. In: (Shachter et al., 1988), 241–251. http://uai.sis.pitt.edu/papers/88/p170-hunter.pdf

. 1991a. Graphoids and Natural Conditional Functions. International Journal of Approxi- mate Reasoning, 5 (6), 489–504. doi:10.1016/0888-613X(91)90026-I

. 1991b. Maximum Entropy Updating and Conditionalization. Pages 45–57 of: Spohn, Wolfgang, van Fraassen, Bas C., & Skyrms, Brian (eds), Existence and Explanation: Essays in Honor of Karel Lambert. Dordrecht: Kluwer.

Isham, Valerie. 1981. An Introduction to Spatial Point Processes and Markov Random Fields. International Statistical Review, 49 (1), 21–43. http://www.jstor.org/stable/1403035

Jaffray, Jean-Yves. 1989. Linear Utility Theory for Belief Functions. Operations Research Letters, 8 (2), 107–112. doi:10.1016/0167-6377(89)90010-2

Jeffrey, Richard C. 1965/83. The Logic of Decision. 2nd edn. Chicago, IL: The University of Chicago Press.

. 1971. Probability Measures and Integrals. Pages 167–223 of: Carnap, Rudolf & Jeffrey, Richard C. (eds), Studies in Inductive Logic and Probability, vol. I. Berkeley, CA: University of California Press.

Jensen, Finn Verner & Jensen, Frank. 1994. Optimal Junction Trees. Pages 360–366 of: de Mán- taras, Ramon López & Poole, David (eds), Proceedings of the 10th Annual Conference on Uncertainty in Artificial Intelligence (UAI-94). San Francisco, CA: Morgan Kaufmann. http://uai.sis.pitt.edu/papers/94/p360-jensen.pdf

Jensen, Finn Verner, Lauritzen, Steffen L., & Olesen, Kristian G. 1990. Bayesian Updating in causal probabilistic networks by local computations. Computational Statistics Quarterly, 4, 269–282.

Jensen, Finn Verner & Nielsen, Thomas D. 2001/2007. Bayesian Networks and Decision Graphs. 2nd edn. Berlin: Springer.

Joyce, James M. 1998. A Nonpragmatic Vindication of Probabilism. Philosophy of Science, 65 (4), 575–603. doi:10.1086/392661 http://www.jstor.org/stable/188574

236 Literature

Joyce, James M.. 1999. The Foundations of Causal Decision Theory. Cambridge: Cambridge University Press.

Kiiveri, Harri, Speed, Terry P., & Carlin, John B. 1984. Recursive Causal Models. Journal of the Australian Mathematical Society, A, 36 (1), 30–52. doi:10.1017/S1446788700027312 http://journals.cambridge.org/production/action/cjoGetFulltext? fulltextid=4980932

Kim, Jin H. & Pearl, Judea. 1983. A Computational Model for Causal and Diagnostic Rea- soning in Inference Systems. Pages 190–193 of: Bundy, Alan (ed), Proceedings of the 8th International Joint Conference on Artificial Intelligence, vol. 1. San Mateo, CA: Morgan Kauf- mann. http://ijcai.org/PastProceedings/IJCAI-83-VOL-1/PDF/041.pdf

Kohlas, Jürg. 2003. Information Algebras: Generic Structures For Inference. London: Springer.

Kozlov, Alexander V. & Singh, Jaswinder Pal. 1994. A Parallel Lauritzen-Spiegelhalter Al- gorithm for Probabilistic Inference. Pages 320–329 of: Proceedings of the 1994 Conference on Supercomputing (Supercomputing ’94). Los Alamitos, CA: IEEE Computer Society Press. doi:10.1145/602770.602830

Kratsch, Dieter & Spinrad, Jeremy. 2003. Between Onm and Onα. Pages 709–716 of: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’03). Philadelphia, PA: Society for Industrial and Applied Mathematics. Also published as (Kratsch & Spinrad, 2006a). http://dl.acm.org/citation.cfm?id=644108.644225

. 2006a. Between Onm and Onα. SIAM Journal on Computing, 36 (2), 310–325. First published as (Kratsch & Spinrad, 2003).

. 2006b. Minimal fill in On2.69 time. Discrete Mathematics, 306 (3), 366–371.

Kripke, Saul A. 1972. Naming and Necessity. Pages 253–355 of: Davidson, Donald & Harman, Gilbert (eds), Semantics of Natural Language. Dordrecht: Kluwer.

. 1979. A puzzle about belief. Pages 239–283 of: Margalit, Avishai (ed), Meaning and Use. Dordrecht: Kluwer.

Krüger, Lorenz, Daston, Lorraine J., & Heidelberger, Michael (eds). 1987a. Ideas in History. The Probabilistic Revolution, vol. 1. Reading, MA: MIT Press.

Krüger, Lorenz, Gigerenzer, Gerd, & Morgan, Mary S. (eds). 1987b. Ideas in the Sciences. The Probabilistic Revolution, vol. 2. Reading, MA: MIT Press.

Kudo, Yasuo, Murai, Tetsuya, & Da-te, Tsutomu. 1998. The Correspondence of Belief Change in Logical Settings and the Possibilistic Framework. Pages 221–229 of: Proceedings of the Second International Conference on Knowledge-Based Intelligent Electronic Systems, vol. 2. doi:10.1109/KES.1998.725915

. 1999. Iterated Belief Update Based on Ordinal Conditional Functions. Pages 526–529 of: Proceedings of the Third International Conference on Knowledge-Based Intelligent Information Engineering Systems. doi:10.1109/KES.1999.820239

Kyburg, Jr., Henry Ely. 1961. Probability and the Logic of Rational Belief. Middletown, CT: Wesleyan University Press.

237 Literature

Kyburg, Jr., Henry Ely. 1963. A Further Note on Rationality and Consistency. Journal of Philosophy, 60 (16), 463–465. http://www.jstor.org/stable/2022875

Lauritzen, Steffen L. 1996. Graphical Models. Oxford: Clarendon Press. . 2002. Lectures on Contingency Tables. Electronic edition of a 1982 manuscript. http://www.stat.osu.edu/~tjs/865/SLauritzen-LectureNotes-2002.pdf

Lauritzen, Steffen L., Dawid, A. Philip, Larsen, B. N., & Leimer, H.-G. 1990. Independence Properties of Directed Markov Fields. Networks, 20 (5), 491–505. doi:10.1002/net.3230200503

Lauritzen, Steffen L. & Spiegelhalter, David J. 1988. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society, B (Methodological), 50 (2), 157–224. (with discussions). http://www.jstor.org/stable/2345762

Lauritzen, Steffen L. & Wermuth, Nanny. 1989. Graphical Models for Associations Between Variables, Some of Which are Qualitative and Some Quantitative. Annals of Statistics, 17 (1), 31–57. doi:10.1214/aos/1176347003 http://projecteuclid.org/euclid.aos/1176347003

Lehman, R. Sherman. 1955. On Confirmation and Rational Betting. Journal of Symbolic Logic, 20 (3), 251–262. http://www.jstor.org/stable/2268221

Lehrer, Keith. 2000. Theory of Knowledge. 2nd edn. Boulder, CO: Westview Press. Levi, Isaac. 1967a. Gambling With Truth: An Essay on Induction and the Aims of Science. New York, NY: Knopf. . 1967b. Probability Kinematics. The British Journal for the Philosophy of Science, 18 (3), 197–209. doi:10.1093/bjps/18.3.197

. 1977. Subjunctives, Dispositions, and Chances. Synthese, 34 (4), 423–455. http://www.jstor.org/stable/20115172

. 1991. The Fixation of Belief and Its Undoing: Changing Beliefs Through Inquiry. Cambridge: Cambridge University Press. . 1996. For the Sake of Argument: Ramsey Test Conditionals, Inductive Inference, and Non- monotonic Reasoning. Cambridge: Cambridge University Press. . 2004. Mild Contraction: Evaluating Loss of Information Due to Loss of Belief. Oxford: Oxford University Press. Lewis, David K. 1973. Causation. Journal of Philosophy, 70 (17), 556–567. http://www.jstor.org/stable/2025310

. 1976. Probabilities of Conditionals and Conditional Probabilities. The Philosophical Review, 85 (3), 297–315. http://www.jstor.org/stable/2184045

. 1980. A Subjectivist’s Guide to Objective Chance. Pages 263–293 of: Jeffrey, Richard C. (ed), Studies in Inductive Logic and Probability, vol. 2. Berkeley, CA: University of California Press.

238 Literature

Lipton, Peter. 1991. Inference to the Best Explanation. London: Routledge.

Loar, Brian. 1988. Social Content and Psychological Content. Pages 99–110 of: Grimm, Robert H. & Merrill, Daniel D. (eds), Contents of Thought. Tucson, AZ: University of Arizona Press.

Luce, Robert Duncan & Raiffa, Howard. 1957. Games and Decisions. New York, NY: Wiley.

Madsen, Anders L. & Jensen, Finn Verner. 1998. Lazy Propagation in Junction Trees. Pages 362–369 of: Cooper, Gregory & Moral, Serafin (eds), Proceedings of the 14th Annual Con- ference on Uncertainty in Artificial Intelligence (UAI-98). San Francisco, CA: Morgan Kauf- mann. http://uai.sis.pitt.edu/papers/98/p362-madsen.pdf

. 1999. Lazy Propagation: A Junction Tree Inference Algorithm Based on Lazy Evalua- tion. Artificial Intelligence, 113 (1–2), 203–245. doi:10.1016/S0004-3702(99)00062-4

Maher, Patrick. 1993. Betting on Theories. Cambridge: Cambridge University Press.

. 2002. Joyce’s Argument for Probabilism. Philosophy of Science, 69, 73–81. doi:10.1086/338941

Margolis, Eric & Laurence, Stephen (eds). 1999. Concepts: Core Readings. Reading, MA: MIT Press.

Martínez, Concha, Rivas, Uxía, & Villegas Forero, Luis (eds). 1998. Truth in Perspective: Recent Issues in Logic, Representation and . Aldershot, GB-HAM: Ashgate.

Matúš, František. 1992. On Equivalence of Markov Properties over Undirected Graphs. Journal of Applied Probability, 29 (3), 745–749. http://staff.utia.cas.cz/matus/fmmark.ps

. 1994. Stochastic Independence, Algebraic Independence and Abstract Connectedness. Theoretical Computer Science, 134 (2), 445–471. doi:10.1016/0304-3975(94)90248-8 http://staff.utia.cas.cz/matus/fmindep.ps

. 1999. Conditional Independences Among Four Random Variables 3: Final Conclusion. Combinatorics, Probability and Computing, 8 (3), 269–276. http://staff.utia.cas.cz/matus/fmtri.ps

McGee, Vann. 1994. Learning the Impossible. Pages 179–199 of: Eells, Ellery & Skyrms, Brian (eds), Probability and Conditionals: Belief Revision and Rational Decision. Cambridge: Cambridge University Press.

Meek, Christopher. 1995. Strong Completeness and Faithfulness in Bayesian Networks. In: (Besnard & Hanks, 1995), 411–418. http://uai.sis.pitt.edu/papers/95/p411-meek.pdf

Miller, David. 1966. A Paradox of Information. The British Journal for the Philosophy of Science, 17 (1), 59–61. http://www.jstor.org/stable/686404

Mitchell, Tom M. 1997/2003. Machine Learning. New York, NY: MacGraw-Hill.

Moon, J. W. & Moser, L. 1965. On Cliques in Graphs. Israel Journal of Mathematics, 3, 23–28. doi:10.1007/BF02760024

239 Literature

Mosteller, Frederick. 1965. Fifty Challenging Problems in Probability with Solutions. Reading, MA: Addison-Wesley.

Nayak, Abhaya C. 1994. Iterated Belief Change Based on Epistemic Entrenchment. Erkenntnis, 41 (3), 353–390. http://www.jstor.org/stable/20012590

Neapolitan, Richard E. 1990. Probabilistic Reasoning in Expert Systems: Theory and Algorithms. New York, NY: Wiley. von Neumann, John & Morgenstern, Oskar. 1947. Theory of Games and Economic Behavior. 2nd edn. Princeton, MA: Princeton University Press.

Niiniluoto, Ilkka. 1972. Inductive Systematization: Definition and a Critical Survey. Synthese, 25 (1–2), 25–81. http://www.jstor.org/stable/20114852

Olsson, Erik J. 2005. Against Coherence: Truth, Probability, and Justification. Oxford: Clarendon Press.

Parter, S. 1961. The Use of Linear Graphs in Gauss Elimination. SIAM Review, 3 (2), 119–130. http://www.jstor.org/stable/2027387

Pearl, Judea. 1978. On the connection between the complexity and credibility of inferred models. International Journal of General Systems, 4 (4), 255–265. doi:10.1080/03081077808960690

. 1982. Reverend Bayes on Inference Engines: A Distributional Hierarcical Approach. Pages 133–136 of: Waltz, Daniel L. (ed), Proceedings of the 2nd National Conference on Artifi- cial Intelligence (AAAI-82). Pittsburgh, PA: AAAI Press.

. 1985. A constraint-propagation approach to probabilistic reasoning. Pages 357–370 of: Kanal, Laveen N. & Lemmer, John F. (eds), Proceedings of the 1st Annual Conference on Uncertainty in Artificial Intelligence (UAI-85). New York, NY: Elsevier Science. http://uai.sis.pitt.edu/papers/85/p31-pearl.pdf

. 1986a. Fusion, propagation and structuring in belief networks. Artificial Intelligence, 29 (3), 241–288. doi:10.1016/0004-3702(86)90072-X

. 1986b. On evidential reasoning in a hierarchy of hypotheses. Artificial Intelligence, 28 (1), 9–15. doi:10.1016/0004-3702(86)90027-5

. 1987a. Deciding consistency in inheritance networks. Tech. rept. 870053 (R-96). Cognitive Systems Laboratory, University of California, Los Angeles, CA. ftp://ftp.cs.ucla.edu/tech-report/198_-reports/870053.pdf

. 1987b. Distributed revision on composite beliefs. Artificial Intelligence, 33 (2), 173–215. doi:10.1016/0004-3702(87)90034-8

. 1987c. Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence, 32 (2), 245–257. doi:10.1016/0004-3702(87)90012-9

. 1987d. Probabilistic Semantics for inheritance hierarchies with exceptions. Tech. rept. 870052 (R-93). Cognitive Systems Laboratory, University of California, Los Angeles, CA. ftp://ftp.cs.ucla.edu/tech-report/198_-reports/870052.pdf

240 Literature

Pearl, Judea. 1988a. On logic and probability. Computational Intelligence, 4 (1), 99–103. doi:10.1111/j.1467-8640.1988.tb00107.x

. 1988b. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann.

. 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press.

Pearl, Judea & Paz, Azaria. 1985. Graphoids: A Graph-Based Logic for Reasoning About Relevant Relations. Tech. rept. 850038 (R-53-L). Cognitive Systems Laboratory, University of Cali- fornia, Los Angeles, CA. ftp://ftp.cs.ucla.edu/tech-report/198_-reports/850038.pdf

. 1986. Graphoids: A Graph-Based Logic for Reasoning About Relevancy Relations. Pages 357–363 of: du Boulay, Ben, Hogg, David, & Steels, Luc (eds), Advances in Artificial Intelligence II. Amsterdam: North-Holland. Short version of (Pearl & Paz, 1985).

Peirce, Charles S. 1877/1982. The Fixation of Belief. In: Fisch, Max (ed), Writings of Charles S. Peirce, vol. 1. Bloomington, IN: University of Indiana Press.

Peppas, Pavlos & Williams, Mary-Ann. 1995. Constructive Modelings for Theory Change. Notre Dame Journal of Formal Logic, 36 (1), 120–133. http://projecteuclid.org/euclid.ndjfl/1040308831

Plantinga, Alvin. 1993a. Warrant and Proper Function. Oxford: Oxford University Press.

. 1993b. Warrant: The Current Debate. Oxford: Oxford University Press.

Pollock, John L. 1990. Nomic Probability and the Foundations of Induction. Oxford: Oxford University Press.

. 1995. Cognitive Carpentry: A Blueprint for How to Build a Person. Reading, MA: MIT Press.

Popper, Karl R. 1938. A Set of Independent Axioms for Probability. Mind, 47 (186), 275–277. doi:10.1093/mind/XLVII.186.275

. 1955. Two Autonomous Axiom Systems for the Calculus of Probabilities. The British Journal for the Philosophy of Science, 6 (21), 51–57. doi:10.1093/bjps/VI.21.51

Pourret, Olivier, Naim, Patrick, & Marcot, Bruce G. (eds). 2008. Bayesian Networks: A Prac- tical Guide to Applications. Chichester: Wiley.

Preston, Christopher J. 1973. Generalized Gibbs states and Markov random fields. Advances in Applied Probability, 5 (2), 242–261. doi:10.2307/1426035 http://www.jstor.org/stable/1426035

Prim, Robert C. 1957. Shortest Connection Networks and Some Generalizations. Bell System Technical Journal, 36, 1389–1401.

Putnam, Hilary. 1975. The Meaning of ‘Meaning’. Pages 131–193 of: Gunderson, Keith (ed), Language, Mind and Knowledge. Minnesota Studies in the Philosophy of Science, vol. 7. Minneapolis, MN: University of Minnesota Press.

Quine, Willard Van Orman. 1951. Two Dogmas of Empiricism. The Philosophical Review, 60 (1), 20–43. http://www.jstor.org/stable/2181906

241 Literature

Quine, Willard Van Orman. 1960. Word and Object. Reading, MA: MIT Press.

. 1986. Reply to Morton White. Pages 663–665 of: Hahn, Lewis Edwin & Schilpp, Paul Arthur (eds), The Philosophy of W. V. Quine. La Salle, IL: Open Court.

. 1995. From Stimulus to Science. Reading, MA: Harvard University Press.

Rabinowicz, Wlodek. 1995. Global Belief Revision Based on Similarities Between Worlds. Pages 80–105 of: Hansson, Sven Ove & Rabinowicz, Wlodek (eds), Logic for a Change: Essays Dedicated to Sten Lindström on the Occasion of His Fiftieth Birthday. Uppsala Prints and Preprints in Philosophy 1995:9. Department of Philosophy, Uppsala University.

Rebane, George & Pearl, Judea. 1989. The Recovery Of Causal Polytrees From Statistical Data. Pages 222–228 of: Levitt, Tod S., Kanal, Laveen N., & Lemmer, John F. (eds), Proceedings of the 3rd Annual Conference of Uncertainty in Artificial Intelligence (UAI-87), vol. 3. Amsterdam: North-Holland. http://uai.sis.pitt.edu/papers/87/p222-rebane.pdf

Rényi, Alfréd. 1955. On a new axiomatic theory of probability. Acta Mathematica Academiae Scientiarium Hungaricae, 6 (3–4), 285–335. doi:10.1007/BF02024393

. 1962. Wahrscheinlichkeitsrechnung. Berlin: VEB Deutscher Verlag der Wissenschaften. (In German).

Rescher, Nicholas. 1964. Hypothetical Reasoning. Amsterdam: North-Holland.

Richardson, Thomas S. & Spirtes, Peter. 2002. Ancestral Graph Markov Models. Annals of Statistics, 30 (4), 962–1030. http://www.jstor.org/stable/1558693

. 2003. Causal Inference Via Ancestral Graph Models. Pages 83–106 of: Green, Peter J., Hjort, Nils Lid, & Richardson, Sylvia (eds), Highly Structured Stochastic Systems. Oxford: Oxford University Press.

Rose, Donald J. 1970. Triangulated Graphs and the Elimination Process. Journal of Mathematical Analysis and Applications, 32 (3), 597–609. doi:10.1016/0022-247X(70)90282-9

Rose, Donald J. & Tarjan, Robert E. 1976. Algorithmic Aspects of Vertex Elimination on Directed Graphs. SIAM Journal on Applied Mathematics, 34 (1), 176–197. doi:10.1137/0134014

Rose, Donald J., Tarjan, Robert E., & Lueker, G. S. 1976. Algorithmic Aspects of Vertex Elimination on Graphs. SIAM Journal on Computing, 5 (2), 266–283. doi:10.1137/0205021

Rott, Hans. 1989. Conditionals and Theory Change: Revisions, Expansions and Additions. Synthese, 81 (1), 91–113. http://www.jstor.org/stable/20116704

. 1999. Coherence and Conservatism in the Dynamics of Belief, Part 1: Finding the Right Framework. Erkenntnis, 50 (2–3), 387–412. http://www.jstor.org/stable/20012925

. 2001. Change, Choice and Inference: A Study of Belief Revision and Nonmonotonic Reasoning. Oxford: Oxford University Press.

242 Literature

Rott, Hans. 2003. Coherence and Conservatism in the Dynamics of Belief, Part 2: Iterated Belief Change Without Dispositional Coherence. Journal of Logic and Computation, 13 (1), 111–145. doi:10.1093/logcom/13.1.111

Salmon, Wesley C. 1975. Confirmation and Relevance. Pages 3–36 of: Maxwell, Grover & Anderson, Robert Milford (eds), Induction, probability, and confirmation: Minnesota Studies in the Philosophy of Science, vol. 6. Minneapolis, MN: University of Minnesota Press.

Samuelson, Paul A. 1938. A Note on the Pure Theory of Consumers’ Behaviour. Economica, 5 (17), 61–71. http://www.jstor.org/stable/2548836

. 1947. Foundations of Economic Analysis. Reading, MA: Harvard University Press.

Sarin, Rakesh K. & Wakker, Peter P. 1992. A Simple Axiomatization of Non-additive Ex- pected Utility. Econometrica, 60 (6), 1255–1272. http://www.jstor.org/stable/2951521

Savage, Leonard J. 1972. The Foundations of Statistics. 2nd edn. New York, NY: Dover Publica- tions. First edition published by Wiley, New York, NY, 1954.

Schmeidler, David. 1989. Subjective Probability and Expected Utility Without Additivity. Econometrica, 57, 571–587. http://www.jstor.org/stable/1911053

Seidenfeld, Teddy. 2001. Remarks on the Theory of Conditional Probability: Some Issues of Finite Versus Countable Additivity. Pages 167–178 of: Hendricks, Vincent F., Pedersen, Stig Andur, & Jørgensen, Klaus Frovin (eds), Probability Theory – Philosophy, Recent History and Relations to Science. Dordrecht: Kluwer.

Sen, Amartya K. 1970. Collective Choice and Social Welfare. San Francisco, CA: Holden-Day.

Shachter, Ross D., Levitt, Tod S., Kanal, Laveen N., & Lemmer, John F. (eds). 1988. Proceed- ings of the 4th Annual Conference on Uncertainty in Artificial Intelligence (UAI-88). New York, NY: Elsevier Science.

Shachter, Ross D., Levitt, Tod S., Kanal, Laveen N., & Lemmer, John F. (eds). 1990. Uncer- tainty in Artificial Intelligence. Vol. 4. Amsterdam: North-Holland.

Shackle, George L. S. 1949. Expectation in Economics. Cambridge: Cambridge University Press.

. 1961/69. Decision, Order, and Time in Human Affairs. 2nd edn. Cambridge: Cambridge University Press.

Shafer, Glenn. 1976. A Mathematical Theory of Evidence. Princeton, MA: Princeton University Press.

. 1978. Non-Additive Probabilities in the Work of Bernoulli and Lambert. Archive for History of Exact Sciences, 19, 309–370. Also in (Shafer, 2008), 117–182. doi:10.1007/978-3-540-44792-4_6

. 1985. Conditional Probability. International Statistical Review, 53 (3), 261–277. http://www.jstor.org/stable/1402890

. 2008. Classic Works of the Dempster-Shafer Theory of Belief Functions. Studies in Fuzziness and Soft Computing, vol. 219. Berlin, Heidelberg, and New York, NY: Springer.

243 Literature

Shafer, Glenn & Shenoy, Prakash P. 1990. Probability Propagation. Annals of Mathematics and Artificial Intelligence, 2 (1–4), 327–352. doi:10.1007/BF01531015

Shenoy, Prakash P. 1991a. On Spohn’s Rule for Revision of Beliefs. International Journal of Approximate Reasoning, 5 (2), 149–181. doi:10.1016/0888-613X(91)90035-K

. 1991b. On Spohn’s Theory of Epistemic Beliefs. Pages 1–13 of: Bouchon Meunier, B., Yager, R. R., & Zadeh, L. A. (eds), Uncertainty in Knowledge Bases. Lecture Notes in Computer Science, vol. 521. Berlin: Springer. doi:10.1007/BFb0028091

Sherman, S. 1973. Markov random fields and Gibbs random fields. Israel Journal of Mathemat- ics, 14 (1), 92–103. doi:10.1007/BF02761538

Shimony, Solomon E. & Domshlak, Carmel. 2003. Complexity of Probabilistic Reasoning in Directed-Path Singly-Connected Bayes Networks. Artificial Intelligence, 151 (1–2), 213–225. doi:10.1016/S0004-3702(03)00110-3

Shore, John E. & Johnson, Rodney E. 1980. Axiomatic Derivation of the Principle of Maxi- mum Entropy and the Principle of Minimum Cross-Entropy. IEEE Transactions in Informa- tion Theory, 26 (1), 26–37. doi:10.1109/TIT.1980.1056144

Skyrms, Brian. 1980. Causal Necessity: A Pragmatic Investigation of the Necessity of Laws. New Haven, CT: Yale University Press.

. 1990. The Dynamics of Rational Deliberation. Reading, MA: MIT Press.

. 1991. Carnapian Inductive Logic for Markov Chains. Erkenntnis, 35 (1–3), 439–460. http://www.jstor.org/stable/20012378

Smyth, Padhraic, Heckerman, David, & Jordan, Michael I. 1997. Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9 (2), 227–269. First released as Tech. Report MSR-TR-96-03, Microsoft Research (1996). doi:10.1162/neco.1997.9.2.227

Speed, Terry P. 1979. A Note on Nearest-Neighbor Gibbs and Markov Distributions Over Graphs. Sankhya:¯ The Indian Journal of Statistics,A(1961–2002), 41 (3–4), 184–197. http://www.jstor.org/stable/25050194

Spiegelhalter, David J. 1986. Probabilistic reasoning in predictive expert systems. Pages 47– 67 of: Kanal, Laveen N. & Lemmer, John F. (eds), Uncertainty in Artificial Intelligence, vol. 1. Amsterdam: North-Holland.

Spirtes, Peter & Glymour, Clark. 1991. An Algorithm for Fast Recovery of Sparse Causal Graphs. Social Science Computer Review, 9 (1), 62–72. doi:10.1177/089443939100900106

Spitzer, Frank. 1971. Markov Random Fields and Gibbs Ensembles. The American Mathematical Monthly, 78 (2), 142–154. http://www.jstor.org/stable/2317621

Spohn, Wolfgang. 1978. Grundlagen der Entscheidungstheorie. Ph.D. thesis, University of Mu- nich, Kronberg i. Ts. (In German). http://nbn-resolving.de/urn:nbn:de:bsz:352-135468

244 Literature

Spohn, Wolfgang. 1980. Stochastic Independence, Causal Independence, and Shieldability. Journal of Philosophical Logic, 9 (1), 73–99. doi:10.1007/BF00258078

. 1983. Eine Theorie der Kausalität. Habilitationsschrift, University of Munich. (In Ger- man). http://nbn-resolving.de/urn:nbn:de:bsz:352-135451

. 1986. The Representation of Popper Measures. Topoi, 5 (1), 69–74. doi:10.1007/BF00137831

. 1988a. A general non-probabilistic theory of inductive reasoning. In: (Shachter et al., 1988), 315–322. Also in (Shachter et al., 1990), 149–158. http://uai.sis.pitt.edu/papers/88/p315-spohn.pdf

. 1988b. Ordinal Conditional Functions: A Dynamic Theory of Epistemic States. Pages 105–134 of: Harper, William Leonard & Skyrms, Brian (eds), Causation in Decision, Belief Change, and Statistics, vol. 2. Dordrecht: Kluwer. http://nbn-resolving.de/urn:nbn:de:bsz:352-136611

. 1993. Wie kann die Theorie der Rationalität normativ und empirisch zugleich sein? Pages 151–196 of: Eckensberger, Lutz H. & Gähde, Ulrich (eds), Ethische Norm und em- pirische Hypothese. Frankfurt a.M.: Suhrkamp. (In German).

. 1994. On the Properties of Conditional Independence. Pages 173–196 of: Humphreys, Paul & Suppes, Patrick (eds), Probability and Probabilistic Causality. Scientific Philosopher, vol. 234, Part 1. Dordrecht: Kluwer. doi:10.1007/978-94-011-0774-7_7 http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-63407

. 1997. The Intentional versus the Propositional Conception of the Objects of Belief. Pages 291–321 of: Martinez, C., Rivas, U., & Villegas Forero, L. (eds), Proceedings of the 2nd Conference “Perspectives in Analytical Philosophy”, vol. 1: Logic, Epistemology, Philosophy of Science. Berlin: de Gruyter. Also in (Martínez et al., 1998), 271–291. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62732

. 1999. Ranking Functions, AGM Style. In: Hansson, B. (ed), Internet Festschrift for Peter ’Gärdenfors. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62630

. 2000a. Bayesian Nets Are All There Is To Causal Dependence. Pages 157–172 of: Costantini, Domenico, Galavotti, Maria Carla, & Suppes, Patrick (eds), Stochastic Causality. CSLI Lecture Notes, vol. 131. Stanford, CA: CSLI Publications. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62399

. 2000b. Wo stehen wir heute mit dem Problem der Induktion? Pages 151–164 of: Enskat, Rainer (ed), Erfahrung und Urteilskraft. Würzburg: Königshausen & Naumann. (In German). http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62693

. 2002a. A Brief Comparison of Pollock’s Defeasible Reasoning and Ranking Functions. Synthese, 131 (1), 39–56. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62376

. 2002b. Laws, Ceteris Paribus Conditions, and the Dynamics of Belief. Erkenntnis, 57 (3), 373–394. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62291

245 Literature

Spohn, Wolfgang. 2002c. Lehrer Meets Ranking Theory. Pages 119–132 of: Olsson, Erik J. (ed), The Epistemology of Keith Lehrer. Dordrecht: Kluwer. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62360

. 2002d. The Many Facets of the Theory of Rationality. Croatian Journal of Philosophy, 2 (6), 247–262. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62352

. 2004. Laws are Persistent Inductive Schemes. Pages 135–150 of: Stadler, F. (ed), Induction and Deduction in the Sciences. Dordrecht: Kluwer. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62264

. 2005. Enumerative Induction and Lawlikeness. Philosophy of Science, 72 (1), 164–187. doi:10.1086/428076 http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62216

. 2006a. Causation: An Alternative. British Journal for the Philosophy of Science, 57 (1), 93–119. doi:10.1093/bjps/axi151 http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62220

. 2006b. Isaac Levi’s Potentially Surprising Epistemological Picture. Pages 125–142 of: Olsson, Erik J. (ed), Knowledge and Inquiry: Essays on the Pragmatism of Isaac Levi. Cambridge: Cambridge University Press. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62193

. 2009. A Survey of Ranking Theory. Pages 185–228 of: Huber, Franz & Schmidt Petri, Christoph (eds), Degrees of Belief. Synthese Library, vol. 342. Berlin: Springer. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-62167

. 2010. The Structural Model and the Ranking Theoretic Approach to Causation: A Comparison. Pages 493–508 of: Dechter, Rina, Geffner, Hector, & Halpern, Joseph Y. (eds), Heuristics, Probability and Causality: A Tribute to Judea Pearl. San Mateo, CA: Kauff- mann.

. 2012. The Laws of Belief: Ranking Theory and Its Philosophical Applications. Oxford: Oxford University Press.

Spohn, Wolfgang & Hild, Matthias. 2008. The Measurement of Ranks and the Laws of Iterated Contraction. Artificial Intelligence, 172 (10), 1195–1218. http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-59624

Stalnaker, Robert C. 1968. A Theory of Conditionals. Pages 98–112 of: Rescher, Nicholas (ed), Studies in Logical Theory. Oxford: Blackwell.

. 1999. Context and Content. Oxford: Oxford University Press.

Steel, Daniel. 2005. Indeterminism and the Causal Markov Condition. The British Journal for the Philosophy of Science, 56 (1), 3–26. doi:10.1093/phisci/axi101

Stuart, Alan & Ord, J. Keith. 1991. Kendall’s Advanced Theory of Statistics, Vol. 2: Classical Inference and Relationship. 5th edn. London: Edward Arnold.

Studený, Milan. 1989. Multiinformation and the Problem of Characterization of Conditional Independence Relations. Problems of Control and Information Theory, 18 (1), 3–16.

246 Literature

Studený, Milan. 1992. Conditional Independence Relations Have No Finite Complete Charac- terization. Pages 377–396 of: Visek, Jan Ámos (ed), Transactions of the 11th Prague Conference on Information Theory, Statistical Decision Functions, and Random Processes, held at Prague from August 27 to 31, 1990, vol. B. Dordrecht: Kluwer. ftp://ftp.utia.cas.cz/pub/staff/studeny/condi.ps

. 1995. Conditional Independence and Natural Conditional Functions. International Journal of Approximate Reasoning, 12 (1), 43–68. doi:10.1016/0888-613X(94)00014-T ftp://ftp.utia.cas.cz/pub/staff/studeny/ncf1.ps

. 2005. Probabilistic Conditional Independence Structures. Berlin: Springer.

Studený, Milan & Bouckaert, Remco R. 1998. On Chain Graph Models for Description of Conditional Independence Structures. Annals of Statistics, 26 (4), 1434–1495. http://www.jstor.org/stable/120007

Suppes, Patrick. 1970. A Probabilistic Theory of Causality. Amsterdam: North-Holland.

Tan, Sek-Wah & Pearl, Judea. 1994. Qualitative Decision Theory. Pages 928–933 of: Hayes Roth, B. & Korf, R. (eds), Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-94), vol. 2. Menlo Park, CA: AAAI Press. http://www.aaai.org/Papers/AAAI/1994/AAAI94-142.pdf

Tarjan, Robert E. & Yannakakis, Mihalis. 1984. Simple Linear-Time Algorithms to test Chordality of Graphs, Test Acyclicity of Hypergraphs, and Selectively Reduce Hyper- graphs. SIAM Journal on Computing, 13 (3), 566–579. doi:10.1137/0213035

Teller, Paul. 1976. Conditionalization, Observation and Change of Preference. Pages 205–259 of: Harper, William Leonard & Hooker, Clifford Alan (eds), Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science. Dordrecht: Reidel.

Verma, Thomas & Pearl, Judea. 1987. Causal Networks: Semantics and Expressiveness. Tech. rept. 870032 (R-65). Cognitive Systems Laboratory, University of California, Los Angeles, CA. ftp://ftp.cs.ucla.edu/tech-report/198_-reports/870032.pdf

. 1988. Causal Networks: Semantics and Expressiveness. In: (Shachter et al., 1988), 352–359. First published as (Verma & Pearl, 1987), also contained in (Shachter et al., 1990), 69–76. http://uai.sis.pitt.edu/papers/88/p352-verma.pdf

. 1990. Equivalence and Synthesis of Causal Models. In: (Bonissone et al., 1990), 255– 268. http://uai.sis.pitt.edu/papers/90/p220-verma.pdf

. 1992. An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation. In: (Dubois et al., 1992), 323–330. http://uai.sis.pitt.edu/papers/92/p323-verma.pdf

Villanger, Yngve. 2006. Lex M versus MCS-M. Discrete Mathematics, 306 (3), 393–400. doi:10.1016/j.disc.2005.12.005 http://www.ii.uib.no/~yngvev/publications/journals/Villanger06.pdf

Villanueva, Enrique (ed). 1998. Concepts: Philosophical Issues. Atascadero, CA: Ridgeview.

247 Literature de Waal, Peter R. 2008. Marginals of DAG-Isomorphic Independence Models. Tech. rept. UU- CS-2008-050. Department of Information and Computing Sciences, Utrecht University, Utrecht. Also published as (de Waal, 2009). http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-050.pdf

. 2009. Marginals of DAG-Isomorphic Independence Models. Pages 192–203 of: Sos- sai, C. & Chemello, G. (eds), Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2009). Lecture Notes in Artificial Intelligence, vol. 5590. Springer. First published as (de Waal, 2008). de Waal, Peter R. & van der Gaag, Linda C. 2005a. Stable Independence and Complexity of Representation. Tech. rept. UU-CS-2005-056. Department of Information and Computing Sciences, Utrecht University, Utrecht. http://www.cs.uu.nl/research/techreps/repo/CS-2005/2005-056.pdf

. 2005b. Stable independence in perfect maps. Tech. rept. UU-CS-2005-057. Department of Information and Computing Sciences, Utrecht University, Utrecht. http://www.cs.uu.nl/research/techreps/repo/CS-2005/2005-057.pdf

Wagner, Carl G. 1992. Generalized Probability Kinematics. Erkenntnis, 36 (2), 245–257. http://www.jstor.org/stable/20012404

Wakker, Peter P. 2005. Decision-Foundations for Properties of Nonadditive Measures: General State Spaces or General Outcome Spaces. Games and Economic Behavior, 50 (1), 107–125. doi:10.1016/j.geb.2003.10.007

Walley, Peter. 1991. Statistical Reasoning with Imprecise Probabilities. London: Chapman & Hall.

Weirich, Paul. 1983. Conditional Probabilities and Probabilities Given Knowledge of a Con- dition. Philosophy of Science, 50 (1), 82–95. doi:10.1086/289091

Wen, Wilson X. 1990. Optimal Decomposition of Belief Networks. In: (Bonissone et al., 1990), 245–256. http://uai.sis.pitt.edu/papers/90/p245-wen.pdf

. 1991. From Relational Databases to Belief Networks. Pages 406–413 of: Proceedings of the 7th Annual Conference on Uncertainty in Artificial Intelligence (UAI-91). San Mateo, CA: Morgan Kaufmann. http://uai.sis.pitt.edu/papers/91/p406-wen.pdf

Wermuth, Nanny. 1980. Linear Recursive Equations, Covariance Selection, and Path Analysis. Journal of the American Statistical Association, 75 (372), 963–997. http://www.jstor.org/stable/2287189

Wermuth, Nanny & Lauritzen, Steffen L. 1983. Graphical and Recursive Models for Contin- gency Tables. Biometrika, 70 (3), 537–552. http://www.jstor.org/stable/2336490

Weydert, Emil. 1996. System J – Revision Entailment: Default Reasoning Through Ranking Measure Updates. Pages 121–135 of: Gabbay, Dov M. & Ohlbach, Hans Jürgen (eds), Practical Reasoning: International Conference on Formal and Applied Practical Reasoning 1996. Berlin: Springer.

. 2003. System JLZ – Rational Default Reasoning by Minimal Ranking Constructions. Journal of Applied Logic, 1 (3–4), 273–308. doi:10.1016/S1570-8683(03)00016-8

248 Literature

Weydert, Emil. 2012. Conditional Ranking Revision: Iterated Revision with Sets of Condi- tionals. Journal of Philosophical Logic, 41 (1), 237–271. doi:10.1007/s10992-011-9204-4

Williams, Mary-Ann. 1994. Transmutations of Knowledge Systems. Pages 619–629 of: Doyle, J., Sandewall, E., & Torasso, P. (eds), Proceedings of the 4th International Conference on Principles of Knowledge Representation and Reasoning. San Mateo, CA: Morgan Kaufmann. http://research.it.uts.edu.au/magic/Mary-Anne/publications/ KR94Williams.pdf

. 1995. Iterated Theory Base Change: A Computational Model. Pages 1541–1547 of: Mellish, Christopher S. (ed), Proceedings of the 14th International Joint Conference on Artifi- cial Intelligence (IJCAI ’95), vol. 2. San Mateo, CA: Morgan Kaufmann. http://ijcai.org/PastProceedings/IJCAI-95-VOL2/PDF/068.pdf

Williamson, Jon (ed). 2005. Bayesian Nets and Causality: Philosophical and Computational Foun- dations. Oxford: Oxford University Press. Wong, S. K. Michael, Wu, Dan, & Lin, Tao. 2002. A Structural Characterization of DAG- isomorphic Independency Models. Pages 195–209 of: Cohen, Robin & Spencer, Bruce (eds), Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence (AI 2002). Lecture Notes in Computer Science, vol. 2338. New York, NY: Springer. doi:10.1007/3-540-47922-8_17

Wrenn, Chase. 2006. Epistemology as Engineering? Theoria, 72 (1), 60–79. doi:10.1111/j.1755-2567.2006.tb00943.x

Wright, Sewall. 1921. Correlation and Causation. Journal of Agricultural Research, 20 (10), 557– 585. http://naldc.nal.usda.gov/catalog/IND43966364

. 1934. The Method of Path Coefficients. Annals of Mathematical Statistics, 5 (3), 161–215. http://www.jstor.org/stable/2957502

Yannakakis, Mihalis. 1981. Computing the Minimum Fill-in is NP-complete. SIAM Journal on Algebraic and Discrete Methods, 2 (1), 77–79. doi:10.1137/0602010

249 250 Index

Acceptance chain rule, 59, 123 in κ, 55 for ranking networks, 123 acceptance child vertex, 95 in β, 65 chord, 97 additive decomposition term, 124, 132 chordal graph, 97 adjacency, 94 clique, 93 monotone, 149 clique ordering, 153 algebra, 30 clique tree, 138, 139, 145 atomic algebra, 32 construction, 166 atomless algebra, 32 of a moralized and triangulated net- complete algebra, 31 work, 144 minimal algebra over V, 90 optimal clique trees, 170 σ-algebra, 31 closure algorithm of a vertex, 94 HUGIN-algorithm, 138 of a vertex set, 94 Hunter’s algorithm, 141 of an algebra, 31 JLO-algorithm, 138 complementarity of Lauritzen & Spiegelhalter, 138, 145 of NRFs and positive ranking functions, ancestor, 95 65 ancestral ordering, 123 complete belief set, 41 asymmetry complete conditional negative ranking func- of negative ranking functions and TRFs, tion, 62 57 complete subset, 93 atom, 32 completeness atomless algebra, 32 of finite algebras, 31 component Bayes’ theorem of a graph, 95 for positive ranks, 67 compound variable, 88 for ranks, 61 concept, 35 Bayesian network, 83, 138, 142 conditional independence belief among propositions in κ, 74 conceptual nature of belief, 34 among propositions in τ, 77 perceptual, 34 among variables in κ, 100 representation of belief, 35 conditional negative rank belief core, 41 of a possibility, 57 belief set, 40 conditional negative ranking function, 61 complete, 41 conditional two-sided rank, 62 of a negative ranking function, 50 conjunction rational, 40 law of conjunction for negative ranks, Burge, 36 59 connectedness causal list, 121 of graphs, 95

251 Index consistency Equivalence informal, 38 of the directed Markov properties, 127 informal explanation, 38 of the Markov properties, 107–109 of a belief core, 41 equivalence consonance, 120 of the directed Markov properties, 125 content event of a belief, 29 in probability theory, 29 contradiction, 31 event space, 29 core exhaustive independency statement, 121 of a belief, 41 of a negative ranking function, 50 factorization according to cliques, 113, 114, of a positive ranking function, 65 128 of a two-sided ranking function, 56 fill-in, 150, 151 cycle zero fill-in, 151 in a graph, 94 fill-in graph, 151 filter D-map, 104 in mathematics, 41 d-separation, 122 Frege, 28, 35, 36 DAG, 94 definition, 94 Gärdenfors, 26 DAG-isomorphism Global directed Markov property, 125 axiomatization, 131 global Markov property, 104 Decompose, 170, 172, 173, 182, 183 graph, 91 Decompose-algorithm, 171 chordal, 97 decomposition directed acyclic, 94 criteria for optimality, 147 moral, 97 decomposition algorithm, 171 multiply connected graph, 96 deductive closure singly connected, 96 informal, 38 triangulated, 97 of belief sets, 40 undirected, 92 deficiency of a vertex, 150 graph isomorphism dependency model, 100 axiomatization, 118 descendant, 95 graphoid, 101 Directed Markov properties equivalence, 127 Hammersley-Clifford Theorem directed Markov properties for Ranks, 114 equivalence, 125 HCL, 134 doxastic states, 27 HCO, 133 Hierarchical Causal List, 134 elementary events, 32 Hintikka, 27, 38 elimination game, 150, 153 rationality postulates, 27 elimination graph, 151 Huber, 51, 58, 59 elimination process, 150 HUGIN-algorithm, 138 EliminationGame, 150, 153, 155 Hunter, 139, 140, 142, 143 EliminationGame-algorithm, 151 equivalence theorem, 131 engineering, 18, 21 Hunter’s algorithm, 139–141 epistemic hyperedge, 99 attitudes, 26 hypergraph, 99, 133 input, 27 hypertree, 99, 134 states, 27 Hypertree Construction Ordering, 133 units, 26 update, 27 I-map, 104 epistemology minimal I-map, 105 normative, 16, 17 incidence, 94

252 Index inclusion MCS, see maximum cardinality search of $ in κ, 50 MCS ordering, 156 independence MCS-M, 157, 158, 160–162, 164, 167, 170, conditional, see conditional independence 172 marginal independence, 102 MCS-M-algorithm, 158 unconditional independence among vari- meaning ables in κ, 102 internalism, 35 independency statement measurability, 87 exhaustive, 121 MEO, 154 induced subgraph, 93 message induction λ-message, 143 problem of, 23, 25, 26 π-message, 143 initial doxastic state, 24 message passing, 137, 142, 145 intentionality, 36 for trees, 143, 144 internalism, 35 minimal elimination ordering, see MEO minimal I-map, 105 JLO-algorithm, 138 minimal triangulation, 154 minimitivity, 51 Kripke, 35 minimum fill-in, 153 Kruskal’s algorithm, 170 minimum triangulation, 153 monotonely adjacent vertices, 149 Lauritzen, 13, 84, 95, 127, 144, 145, 156, moral graph, 97 173 moralization, 146 Lauritzen, 116 MTNS-criterion, 147, 148 Lauritzen-Spiegelhalter-Algorithm, 138 multiply connected network, 144 law of disjunctive conditions, 62 law of material implication, 66 naturalness law of negation (for κ), 55 of negative ranking functions, 51 Loar, 36 Neapolitan, 13, 147, 156, 157, 165, 173 Local Markov property, 107 negative ranking function logical omniscience marginal, 90 problem of, 34 negative rank lottery paradoxon, 54 of a proposition, 49 LS-strategy, 138 negative ranking function, 49 neighborhood Möbius inversion lemma, 115 of a vertex, 94 marginal NRF, 90 of a vertex set, 94 Markov blanket, 106 network Markov boundary, 106 Bayesian network, 83, 138, 142 uniqueness, 106 multiply connected, 144 Markov field, 104 multiply connected network, 144 Markov graph, 105 ranking network, 86, 122, 142 Markov properties Spohnian, 140, 141, 143 equivalence, 107–109 non-descendant, 95 Markov property global directed Markov property, 125 object orientation global Markov property, 104 paradigm of, 138 local directed Markov property, 122 ordering local Markov property, 107 of cliques, 153 pairwise directed Markov property, 125 perfect, 152 pairwise Markov property, 105 Max-WST, 168, 169 pairwise directed Markov property, 125 maximum cardinality search, 155, 156 pairwise Markov property, 105 maximum-weight spanning tree, 168, 169 parent clique function, 166

253 Index parent vertex, 95 two-sided ranking function, 56 path, 93 ranking network, 86, 122, 137, 142 closed, 94 local ranking network, 130 directed, 93 rational belief set undirected, 93 complete, 41 Paz, 84, 100, 101, 103, 105, 144 definition, 40 Pearl, 13, 83, 84, 86, 100–103, 105, 106, 110, rational state, 27 122, 124, 131, 132, 135, 142–144, rationality 173, 186 rationality postulates, 37 PEO, 152 RCI, 100 perfect elimination ordering, 152 regularity perfect map, 105 of negative Ranking Functions, 51 of an NRF, 131 relevance boundary, 144 perfect ordering, 152 representative perspective of a clique, 162 descriptive, 17 residual set, 176 normative, 17 residuum, 176 phases running intersection property, 98, 153 of an epistemic update, 139 polygon sample space, 29 in a graph, 94 semi-graphoid, 101 polytree, 96 separation positive ranking function, 65 undirected, 104 possibilities separator, 176 W, set of, 29 separator set, 176 possibility, 29 set of cliques, 159 potential representation singly connected graph, 96 of κ, 130 Spiegelhalter, 13, 144, 145, 156, 173 potential function, 110, 130 Spohn, 9, 11, 12, 15, 16, 26, 28, 29, 33–35, definition of ψ0, 178 37, 39, 50, 52, 56, 61–64, 66, 70, 77, for posterior NRF, 181 78, 82, 86, 100, 102, 173 potential representation, 110 Spohn-conditionalization of Markov fields, 113 generalized indirect, 71 predecessors of a negative ranking function, 71 of a vertex, 121 simple indirect, 70 Prim’s algorithm, 169, 170 Spohnian network, 140, 141, 143 proposition, 28, 29 Stich, 36 pseudo-graphoid, 106 subgraph, 93 Putnam, 35 induced, 93 symmetry Quine, 18, 28, 34 of a two-sided ranking function, 56 rank Tarjan, 155 of a possibility, 49 tautology, 31 of a proposition, 49 TNS, 147 positive, 65 total negative rank, 60 ranking function total positive rank, 67 complete conditional negative, 61 transition between epistemic states conditional negative, 61 informal, 42 negative, 49 postulates, 43, 44 non-natural negative ranking function, transition function, 45 54 transition to posterior belief on possibilities, 49 consistent case, 43, 44 pointwise, 49 on inconsistent evidence, 44

254 Index tree, 96 treewidth-problem, 154 triangulated graph, 97 characterization, 152 triangulation, 97 truth condition, 28, 35 twin-earth, 35 unconditional independence among propositions, 77 among variables in κ, 102 undirected graphical separation, 104 Update-algorithm, 188 variable, 86, 87 compound variable, 88 random variable, 86 Verma, 84, 86 vertex elimination process, 150 walk, 93 weighted clique intersection graph, 167 Wen, 147, 148, 153, 154, 157

Yannakakis, 153, 155 zero fill-in, 151

255