Lecture Notes The Algebraic Theory of Information

Prof. Dr. J¨urgKohlas

Institute for Informatics IIUF University of Fribourg Rue P.-A. Faucigny 2 CH – 1700 Fribourg (Switzerland) E-mail: [email protected] http://diuf.unifr.ch/tcs

Version: November 24, 2010 2

Copyright c 1998 by J. Kohlas Bezugsquelle: B. Anrig, IIUF Preface

In today’s scientific language is confined to Shannon’s mathematical theory of communication, where the information to be trans- mitted is a sequence of symbols from a finite alphabet. An impressive mathe- matical theory has been developed around this problem, based on Shannon’s seminal work. Whereas it has often been remarked that this theory , albeit its importance, cannot seize all the aspect of the concept of “information”, little progress has been made so far in enlarging Shannon’s theory. In this study, it is attempted to develop a new mathematical theory cov- ering or deepening other aspects of information than Shannon’s theory. The theory is centered around basic operations with information like aggregation of information and extracting of information. Further the theory takes into account that information is an answer, possibly a partial answer, to ques- tions. This gives the theory its algebraic flavor, and justifies to speak of an algebraic theory of information. The mathematical theory presented her is rooted in two important de- velopments of recent years, although these developments were not originally seen as part of a theory of information. The first development is local compu- tation, a topic starting with the quest for efficient computation methods in probabilistic networks. It has been shown around 1990 that these computa- tion methods are closely related to some algebraic structure, for which many models or instances exist. This provides the abstract base for generic infer- ence methods covering a wide spectrum of topics like databases, different systems of , probability networks and many other formalisms of uncer- tainty. This algebraic structure, completed by the important idempotency axiom, is the the base of the mathematical theory developed here. Whereas the original structure has found some interest for computation purposes, it becomes its flavor as a theory of information only with the additional idem- potency axiom. This leads to information algebras, the object of the present study. Surprisingly, the second development converging to the present work is Dempster-Shafer Theory of Evidence. It was proposed originally as an ex- tension of probability theory for inference under uncertainty, an important topic in Artificial Intelligence and related fields. In everyday language one often speaks of uncertain information. In our framework this concept is

3 4 rigorously seized by random variables with values in information algebras. This captures the idea that a source emits information, which may be in- terpreted in different ways, which depending on various assumptions may result in different information elements. It turns out that this results in a natural generalization or extension of Dempster-Shafer Theory of Evidence, linking thus this theory in a very basic way to our theory of information. In this way the algebraic of theory of information is thought of as a mathematical theory forming the rigorous common background of many the- ories and formalisms studied and developed in . Whereas each individual formalism has its own particularities, the algebraic theory of information helps to understand the common structure of information processing. Contents

1 Introduction 9

I Information Algebras 13

2 Modeling Questions 15 2.1 Refining and Coarsening: the Order Structure ...... 15 2.2 Concrete Question Structures ...... 16

3 Axiomatics of 21 3.1 Projection Axiomatics ...... 21 3.2 Multivariate Information Algebras ...... 26 3.3 File Systems ...... 31 3.4 Variable Elimination ...... 33 3.5 The Transport Operation ...... 41 3.6 Domain-Free Algebra ...... 46

4 Order of Information 61 4.1 Partial Order of Information ...... 61 4.2 Ideal Completion of Information Algebras ...... 65 4.3 Atomic Algebras and Relational Systems ...... 71 4.4 Finite Elements and Approximation ...... 77 4.5 Compact Ideal Completion ...... 83 4.6 Labeled Compact Information Algebra ...... 85 4.7 Continuous Algebras ...... 91

5 Mappings 99 5.1 Order-Preserving Mappings ...... 99 5.2 Continuous Mappings ...... 101 5.3 Continuous Mappings Between Compact Algebras ...... 106

6 Boolean Information Algebra 111 6.1 Domain-Free Boolean Information Algebra ...... 111 6.2 Labeled Boolean Information Algebra ...... 118

5 6 CONTENTS

6.3 Representation by Algebras ...... 125

7 Independence of Information 129 7.1 Conditional Independence of Domains ...... 129 7.2 Factorization of Information ...... 137 7.3 Atomic Algebras and Normal Forms ...... 141

8 Measure of Information 143 8.1 Information Measure ...... 143 8.2 Dual Information in Boolean Information Algebras ...... 143

9 Topology of information Algebras 145 9.1 Observable Properties and Topology ...... 145 9.2 Alexandrov Topology of Information Algebras ...... 146 9.3 Scott Topology of Compact Information Algebras ...... 146 9.4 Topology of Labeled Algebras ...... 146 9.5 Representation Theory of Information Algebras ...... 146

10 Algebras of Subalgebras 147 10.1 Subalgebras of Domain-free Algebras ...... 147 10.2 Subalgebras of Labeled Algebras ...... 147 10.3 Compact Algebra of Subalgebras ...... 147 10.4 Subalgebras of Compact Algebras ...... 147

II Local Computation 149

11 Local Computation on Markov Trees 151 11.1 Problem and Complexity Considerations ...... 151 11.2 Markov Trees ...... 153 11.3 Local Computation ...... 158 11.4 Updating and Retracting ...... 162

12 Multivariate Systems 165 12.1 Local Computation on Covering Join Trees ...... 165 12.2 Fusion Algorithm ...... 165 12.3 Updating ...... 165

13 Boolean Information Algebra 167

14 Finite Information and Approximation 169

15 Effective Algebras 171

16 173 CONTENTS 7

17 175

III Information and Language 177

18 Introduction 179

19 Contexts 181 19.1 Sentences and Models ...... 181 19.2 of Contexts ...... 181 19.3 Information Algebra of Contexts ...... 181

20 Propositional Information 183

21 Expressing Information in Predicate Logic 185

22 Linear Systems 187

23 Generating Atoms 189

24 191

IV Uncertain Information 193

25 Functional Models and Hints 195 25.1 Inference with Functional Models ...... 195 25.2 Assumption-Based Inference ...... 197 25.3 Hints ...... 199 25.4 Combination and Focussing of Hints ...... 202 25.5 Algebra of Hints ...... 205

26 Simple Random Variables 209 26.1 Domain-Free Variables ...... 209 26.2 Labeled Random Variables ...... 211 26.3 Degrees of Support and Plausibility ...... 212 26.4 Basic Probability Assignments ...... 214

27 Random Information 217 27.1 Random Mappings ...... 217 27.2 Support and Possibility ...... 219 27.3 Random Variables ...... 224 27.4 Random Variables in σ-Algebras ...... 229 8 CONTENTS

28 Allocation of Probability 235 28.1 Algebra of Allocation of Probabilities ...... 235 28.2 Random Variables and Allocations of Probability ...... 242 28.3 Labeled Allocations of Probability ...... 247

29 Support Functions 253 29.1 Characterization ...... 253 29.2 Generating Support Functions ...... 257 29.3 Extension of Support Functions ...... 262

30 Random Sets 277 30.1 Set-Valued Mappings: Finite Case ...... 277 30.2 Set-Valued Mappings: General Case ...... 281 30.3 Random Mappings into Set Algebras ...... 283

31 Systems of Independent Sources 287

32 Probabilistic Argumentation Systems 289 32.1 Propositional Argumentation ...... 289 32.2 Predicate Logic and Uncertainty ...... 289 32.3 Stochastic Linear Systems ...... 289

33 Random Mappings as Information 291 33.1 Information Order ...... 291 33.2 Information Measure ...... 291 33.3 Gaussian Information ...... 291

References 293 Chapter 1

Introduction

Information is a central concept of science, especially of Computer Science. The well developed fields of statistical and algorithmic information theory focus on measuring information content of messages, statistical data or com- putational objects. There are however many more facets to information. We mention only three:

1. Information should always be considered relative to a given specific question. And if necessary, information must be focused onto this question. The part of information relevant to the question must be extracted.

2. Information may arise from different sources. There is therefore the need for aggregation or combination of different pieces of information. This is to ensure that the whole information available is taken into account.

3. Information can be uncertain, because its source is unreliable or be- cause it may be distorted. Therefore it must be considered as defeasible and it must be weighed according to its reliability.

The first two aspects induce a natural algebraic structure of information. The two basic operations of this algebra are combination or aggregation of information and focusing or extraction of information. Especially the oper- ation of focusing relates information to a structure of interdependent ques- tions. These notes are devoted to a discussion of this algebraic structure and its variants. It will demonstrate that many well known, but very different formalisms such as for example relational algebra and probabilistic networks, as well as many others, fit into this abstract framework. The algebra allows for a generic study of the structure of information. Furthermore it permits to construct generic architectures for inference. The third item above, points to a very central property of real-life infor- mation, namely uncertainty. Clearly, probability is the appropriate tool to

9 10 CHAPTER 1. INTRODUCTION describe and study uncertainty. Information however is not only numerical. Therefore, the usual concept of random variables is not sufficient to study uncertain information. It turns out that the algebra of information is the natural framework to model uncertain information. This will be another important subject of these notes. In gthe first part of these notes, the basic algebraic structure of in- formation will be siscussed. Questions are formally represented by their possible answers. However the answers may be given by varying granular- ity. This introduces a partial order between questions which reflects refining and coarsening of questions. If two different questions are considered si- multaneously, then this means that the coarsest set of answers which allows to obtain answers to both questions should be considered. In terms of the partial order of questions this means that the supremum of the questions should exist. Similarly, the infimum of two questions, representing the com- mon part of the questions, should also exist. In other words, the system of questions considered in a particular context should form a lattice. The problem of modeling questions is discussed in Chapter 2. Often, in appli- cations questions are related to the unknown values of variables. In this case, a question is represented by the domain of the variable considered, or by the cartesian product of the domains of a set of variables. The lattice of questions in this case can be identified by the lattice of the subsets of the set of variables considered in the given context. More generally, it is often assumed that there is a universe spanning the realm of interest in a given situation. Specific question are then represented by partitions of the universe. Again we select then a lattice of partitions to represent the system of questions considered in the context at hand. The existence of a global universe is even not necessary. More generally a lattice of frames, linked by refining and coarsening relations can be considered. As mentioned above information has a corpuscular nature, it comes in pieces. In one possible view, one may assume that any piece of information has a domain associated with it, which indicates the question to which the information refers. Then, of course, it must be possible to extract from it the part of information referring to a more coarse questions. This is the operation of projection. Further, two or more pieces of information must be aggregated to a new, combined piece of information. Its domain is then the supremum of the domains of the parts. These operations lead to a two-sorted algebra, which satisfies some natural axioms, and which we call information algebra. It will be shown in Chapter 3 that there are several variants of this algebraic structure. In particular, there is a related alternative, but es- sentially equivalent point of view, were pieces of information are considered independent of a particular question they may relate to. It is convenient to have these two alternative views of labeled and domain-free information algebras. Depending on the problem or the application, either one or the other variants may be more convenient to work with. Several models of in- 11 formation algebras will be presented in this chapter, ranging from relational algebra to classical logic, passing by affine spaces related to systems of linear equations and convex sets, especially convex polytopes. The idempotency axiom of an information algebra, permits to introduce a partial order reflecting the information content of pieces of information. With respect to combination, the algebra becomes then a semilattice. This permits several interesting extension of the theory, which are examined in Chapter 4. One of these extension is to introduce the concept of finiteness and approximation: Finite information elements are isolated and conditions formulated, which allow to approximate all other elements by finite ele- ments. This leads then to compact information algebras, a theory which is similar and closely related to algebraic lattices and . In our context, we need to differentiate between compactness in the labeled and the domain-free case. The former leads to the more general case of continu- ous information algebras, which are similar to continuous lattices. One way to compactify an information algebra is by ideal completion, which can be considered both in the labeled as well as in the domain-free view. Finally, the case of information algebras with atoms is discussed: It permits to show that information algebras are closely related to abstract versions of rela- tional algebras. This establishes a link between information algebras and Boolean structures and algebras of subsets, a topic which will be pursued in Chapter 6. Chapter 5 is devoted to mappings between information algebras. We study mappings which preserve information order. Not surprisingly, it turns out that such mappings form themselves information algebras, under some conditions of continuity even compact ones. Many important examples of information algebras have a Boolean struc- ture. The most prominent example is relational algebra. But also the alge- bras associated with propositional and predicate logic possess this Boolean structure. In Chapter 6 the impact of this particular structure is examined. Again the case of labeled and domain-free algebras must be distinguished. In the latter case the information algebra is also a Boolean algebra, which is not true in the former case. The duality theory of Boolean algebras in- dicates that information has also a double interpretation: a disjunctive and conjunctive one. This important, but mostly neglected topic is further dis- cussed in Chapter 8. Finally, it is well-known that Boolean algebras are embedded in set algebras. Similar results can also be obtained for Boolean information algebras. An important issue with any kind of information is the question of inde- pendence of information. This has so far been in particular studied in the realm of probability theory and of relational databases, there especially with respect to normal forms. In Chapter 7.1 we take this subject up and discuss some aspects of it. One issue is the conditional independence of domains, which is especially simple in the case of domains related to sets of variables. 12 CHAPTER 1. INTRODUCTION

But here we treat it in the more general setting of any lattice of domains. The topic is important for structures like Markov trees or join trees, used in computation and inference (see part II). It is also important to single out special classes of information, which factorize in some way, like normal forms in relational database theory. In the last Chapter 8 ... Part II is devoted to local computation... In part III uncertain information is considered... Finally part IV considers different way informaiton may be represented.... Part I

Information Algebras

13

Chapter 2

Modeling Questions

Information relates to question. There may be several, often many ques- tions of interest. But these questions will be related somehow. Questions have also a certain granularity. Therefore, questions can be refined, and a question may be the refinement of several questions. Questions can further be coarsened and a certain question can be the coarsening of several other questions. We need first to define a mathematical structure capturing the essential of a family of related questions. This will be done abstractly in the next section. In the following section, several important examples of concrete forms of systems of questions will be given.

2.1 Refining and Coarsening: the Order Structure

The varying granularity of questions can be captured by a partial order between questions. So let, in an abstract setting, D be a set whose elements represent questions. The elements of D are also called domains, for a reason which becomes clear later. Its elements are typically denoted by x, y, u, v, etc. We assume that a partial order ≤ is defined on D, where x ≤ y means that x is coarser than y and the latter finer than the first one. So ≤ satisfies the following three conditions

1. Reflexivity: x ≤ x.

2. Antisymmetry: x ≤ y and y ≤ x implies x = y.

3. Transitivity: x ≤ y and y ≤ z implies x ≤ z.

But we need more: Given two questions x and y, we want to form the combined question, i.e. the least or coarsest refined question of both x and y. Therefore we suppose that for all pair x, y ∈ D the supremum or the join x ∨ y exists in D. Similarly, we want for a pair of questions x and y also extract their common part, i.e. the largest coarsening of both. So we

15 16 CHAPTER 2. MODELING QUESTIONS assume that for all pair x, y ∈ D the infimum or the meet x ∧ y exists in D. In other words, we assume that D is a lattice. A lattice D is called modular, if

∀x, y, z ∈ D, x ≥ z implies x ∧ (y ∨ z) = (x ∧ y) ∨ z.

Sometimes the lattice D is even distributive, i.e.

x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z), x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z).

We do not assume these additional properties for the lattice of domains in general. But in some considerations we do assume them in order to get stronger results. Sometimes, we may also require that D contains a largest element, the top element > representing the finest question in the family of questions considered. Sometimes we require further that D contains a least or bottom element ⊥, representing the empty question, i.e. the question with only one possible answer. But in the general case the existence of these particular elements in D are not assumed. If D is a lattice and x ∈ D, then we may consider D≤x = {y ∈ D : y ≤ x}. This is clearly still a lattice, and on with a top element, namely x. Similarly D≥x = {y ∈ D : y ≥ x} is a lattice, as is Dx≤z = {y ∈ D : x ≤ y ≤ z}. The former has a bottom element x and the latter both a bottom element x and a top element z.

2.2 Concrete Question Structures

Here we sketch as examples a few particular question structures, which are important in applications.

Example 2.1 Multivariate Structures: Consider a countable family of vari- ables {X1,X2,...}. Associate with each variable Xi its domain Di, which represents the set of possible values for variable Xi. Define for any subset of J of the natural numbers N Y DJ = Dj. j∈J

To simplify we write Dj for D{j} and we define D∅ = {}. Then D = {DJ : J ⊆ N} is a lattice where DJ ≤ DK if J ⊆ K and

DJ ∨ DK = DJ∪K ,

DJ ∧ DK = DJ∩K . 2.2. CONCRETE QUESTION STRUCTURES 17

This lattice has as top element DN and as bottom element D∅. Note further that this lattice is distributive. The idea here is that a domain Dj represents the question “What are the values of variables Xj, j ∈ J?”. Then DJ is in fact the set of possible answers. The larger the set of variables we consider, the finer is the question we ask. Sometimes we consider also the sublattice associated with finite subsets J or also the lattice D≤J associated with the subsets of some set J. In all these cases we may as well identify D with the corresponding lattice of subsets of J.

Example 2.2 Partition Structures: Consider any set U. A partition P of U is a family of pairwise disjoint subsets of U, whose union equals U. The members of a partitions are called blocks. A partition represents the question “To which block belongs the unknown element of U?”. The set P art(U) of partitions of U is partially ordered by the relation P1 ≤ P2 if for every block P2 from P2 there is a block P1 from P1 such that P2 ⊆ P1. We note that with regard to partitions often the reverse order is considered, but this one corresponds better to our needs: A question is finer, the more blocks are in the partition. The join of two partitions is defined as

P1 ∨ P2 = {P1 ∩ P2 6= ∅ : P1 ∈ P1,P2 ∈ P2}

We may even generalize the join to arbitrary nonempty families of partitions, _ \ Pj = { Pj 6= ∅ : Pj ∈ Pj, j ∈ J} j∈J j∈J

Now, every set S of partitions of U has as lower bound the partition con- sisting of just one block, namely U itself. So

Sl = {P :(∀S ∈ S)P ≤ S} is not empty and ∨Sl exists therefore. We claim that ∨Sl is the infimum of S. In fact, ∨Sl is a lower bound of S, and if P is another lower bound of S, it must be in Sl, hence P ≤ ∨Sl. So P art(U) is a complete lattice, where meet is defined as ^ _ Pj = {P : ∀j ∈ J, P ≤ Pj}. j∈J

We cite here as result (Gr¨atzer,1978) which shows how to obtain the meet of a collection of partitions. If two elements a and b of U belong to the same block of a partition P of U we write a ≡ b (mod P). Then,

V Theorem 2.1 a ≡ b( i∈I Pi) if, and only if, there exists a n ≥ 0, i0, . . . , in ∈ I and x0, . . . , xn+1 ∈ U such that a = x0, b = xn+1 and xj ≡ xj+1

(mod Pij ) for 0 ≤ j ≤ n. 18 CHAPTER 2. MODELING QUESTIONS

We remark that the lattice P art(U) is neither distributive nor modular. But, besides a bottom element, the lattice possesses also a top element, the partition consisting of blocks of single elements of U. Sometimes we may be interested only in partitions with a finite number of blocks. Since the meet of two partitions has less blocks than each of the partitions, and the join less than the product of the number of the two partitions, these partitions form a lattice too, although not necessarily a complete one. This is called a lattice of finite partitions. Also, if U is infinite, it has no top element. The importance of partition lattices is that they are universal. Every lattice can be embedded in some partition lattice and every finite lattice can be embedded in a finite partition lattice (Gr¨atzer,1978). So, in some sense we may represent all question structures by partition lattices. For example the lattices of cartesian products DJ considered in example 2.1 is in fact a partition lattice with the universe

∞ Y DN = Dj. j=1

Example 2.3 Compatible Frames: Different partitions of a given universe reflect different granularity considered in asking questions. However, a fixed universe is not really needed. There is no need to assume a most fine ques- tion. In fact, consider any question whose possible answers form a set Θ. In an attempt to refine the analysis related to this question, we may partition every element of Θ into several finer subelements. This gives us a new set Λ together with a mapping τ :Θ → P(Λ) such that 1. ∀θ ∈ Θ τ(θ) 6= ∅,

2. θ0 6= θ00 implies τ(θ0) ∩ τ(θ00) = ∅, S 3. θ∈Θ τ(θ) = Λ. We remark that the family τ(θ) for θ ∈ Θ is a partition of Λ and Θ is therefore a representation of this partition. Such a τ is called a refining of Θ, whereas Λ is a refining of Θ and the latter is a coarsening of Λ. We shall write in the following τ :Θ → Λ, instead of τ :Θ → P(Λ). Note that a refining on Θ can be extended to P(Θ) by defining for any subset A of Θ, [ τ(A) = τ(θ). θ∈A Further, if τ :Θ → Λ and µ :Λ → Ω are two refinings, then µ ◦ τ :Θ → Ω, defined by

µ ◦ τ(θ) = µ(τ(θ)), 2.2. CONCRETE QUESTION STRUCTURES 19 is clearly also a refining. We consider now a family F of sets. If Θ, Λ ∈ F, then we write Θ ≤ Λ, if there is a refining τ :Θ → Λ. It is easy to verify that ≤ defines a partial order in F. We call F a family of compatible frames, if F is a lattice under this order. Any lattice of partitions relative to some universe U represents a particular instance of a family of compatible frames. In fact, in P art(U), if P1 ≤ P2, then there is a refining τ : P1 → P2 and the order P1 ≤ P2 corresponds to the order induced by these refinings. Hence a family of compatible frames, as a lattice, is in general not distributive nor modular. If it has a bottom element U, then the family of compatible frames represents a lattice of partitions with the universe U.

Example 2.4 σ-Fields: A σ-field F ⊆ P(U) is a family of subsets of U, which is closed under countable unions and intersections. The intersection of any family ∩i∈I Fi of σ-fields is still a σ-field. As is well-known (Davey & Priestley, 1990a) such a ∩-structure induces a complete lattice under inclusion, whose meet is just intersection and whose join is defined by _ \ [ Fi = {F : F ⊇ Fi}. i∈I i∈I This is a lattice of domains D, where a σ-field F ∈ D represents a question about an unknown element of U, whose granularity allows allows as answers sets in F. This structure is particularly important in probability theory.

Later, we shall see many more important examples of domain structures. τ algebras 20 CHAPTER 2. MODELING QUESTIONS Chapter 3

Axiomatics of Information Algebra

Information comes in pieces, often from different sources. The pieces must be combined to an aggregated information. Further, pieces of information bear on specified questions. With relation to questions of interest, the relevant in- formation must therefore be extracted from the given pieces of information. This can be captured by algebraic structures allowing for the operations of combination and focusing of information. There are several closely related algebras which seize these elements. We start with a basic axiomatic sys- tem of an information algebra. Several interesting models of this structure will then be presented. Finally, further different, but equivalent axiomatic structures will be deduced.

3.1 Projection Axiomatics

At the beginning of Chapter 2 we said that information relates to questions. A structure of related and compatible questions is abstractly represented by a lattice D of domains. Consider now a domain x ∈ D. Pieces of information related to this domain will then abstractly be represented by elements φ from a set Φx. Let then [ Φ = Φx x∈D be the set of all pieces of information over the whole family of domains x. Suppose that three operations are defined in (Φ,D):

1. Labeling: d :Φ → D; φ 7→ d(φ),

2. Combination: ⊗ :Φ × Φ → Φ; (φ, ψ) 7→ φ ⊗ ψ,

3. Projection: ↓:Φ × D → Φ; (φ, x) 7→ φ↓x, defined for x ≤ d(φ).

21 22 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

The labeling operation retrieves the domain of an information φ ∈ Φ. The combination operation returns the aggregation of two pieces of information into the combined information. Projection finally represents the operation of focusing an information φ to a smaller (more coarse) domain x. Thus we have a two-sorted algebra (Φ,D) with a unary and two binary operations. We impose now the following axioms on Φ and D:

1. Semigroup: Φ is associative and commutative under combination.

2. Labeling: ∀φ, ψ ∈ Φ, d(φ) = x, d(ψ) = y imply

d(φ ⊗ ψ) = d(φ) ∨ d(ψ),

and ∀φ ∈ Φ and ∀x ∈ D, x ≤ d(φ), we have that

d(φ↓x) = x

3. Neutrality: ∀x ∈ D ∃ex ∈ Φx such that ∀φ ∈ Φx we have ex ⊗ φ = φ ↓y and ∀y ∈ D, y ≤ x we have ex = ey.

4. Nullity:∀x ∈ D ∃zx ∈ Φx such that ∀φ ∈ Φx we have zx ⊗ φ = zx and ∀y ∈ D, x ≤ y we have zx ⊗ ey = zy. 5. Projection: ∀φ ∈ Φ and ∀x, y ∈ D, x ≤ y ≤ d(φ) implies that

φ↓x = (φ↓y)↓x.

6. Combination: ∀φ, ψ ∈ Φ, d(φ) = x, d(ψ) = y implies

(φ ⊗ ψ)↓x = φ ⊗ ψ↓x∧y.

7. Idempotency: ∀φ ∈ Φ and x ∈ D, x ≤ d(φ) implies

φ ⊗ φ↓x = φ.

The semigroup axiom states that pieces of information may be combined in any sequence without changing the result. As a consequence we do not need to write parentheses, when we combine several pieces of information. The labeling axiom says that the combined information has the join of the domains of its factors as domain: If two pieces of information on two do- mains x and y are aggregated, then the combined information pertains to the least refined question. Similarly, if an information is focused on a coarser question, the resulting information bears on the corresponding coarser do- main. The neutrality axiom stipulates the existence of neutral elements in each domain x. They represent the vacuous information relative to each question. The axiom further states that neutral elements project to neutral 3.1. PROJECTION AXIOMATICS 23 elements, vacuous information remains vacuous upon projection. The nul- lity axiom similarly requires the existence of a null element in each domain. They represent contradictory (“impossible”) information, which destroy any other information. The projection axiom guarantees that projection can be done in steps. The most powerful axiom is the combination axiom. It per- mits in some cases to avoid large domains resulting from combination, if afterwards a projection is performed. This has important consequences for computation, but is also important in many other respects. Idempotency finally tell us that the combination of a piece of information with a part of it gives nothing new. The repetition of the same information yields nothing new. A system (Φ,D) satisfying the axioms above is called a labeled informa- tion algebra. On some occasions we shall later relax certain axioms. For instance, the idempotency axiom is not needed for developing efficient computational schemes in part II on local computation. Also the existence of neutral or null elements is not always required. However, these axioms reflect what is expected in general, when a general concept of information is modeled. Therefore, without explicit mention to the contrary, we assume in the sequel the system of the axioms above. We now derive a few very elementary consequences from these axioms. But before we define for all φ ∈ Φ and y ∈ D, y ≥ d(φ)

↑y φ = φ ⊗ ey.

We call φ↑y the vacuous extension of φ to domain y. In fact, as we shall see below, we transport φ by this operation from its original domain to the new, finer domain, without adding information. The following lemma collects some results on vacuous extension and related issues.

↑y Lemma 3.1 1. x ≤ y implies ex = ey. 2. d(φ), d(ψ) ≤ z implies (φ ⊗ ψ)↑z = φ↑z ⊗ ψ↑z.

3. d(φ) = x, d(ψ) = y implies φ ⊗ ψ = φ↑x∨y ⊗ ψ↑x∨y.

4. ∀x, y ∈ D we have ex ⊗ ey = ex∨y.

↑x∨y 5. d(φ) = x implies ∀y ∈ D that φ ⊗ ey = φ . 6. d(φ) = x implies φ↑x = φ.

7. y ≤ d(φ) implies φ ⊗ ey = φ. 8. d(φ) ≤ x ≤ y implies φ↑y = (φ↑x)↑y.

9. d(φ) = x implies (φ↑x∨y)↓y = (φ↓x∧y)↑y. 24 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

10. d(φ) = x ≤ y implies (φ↑y)↓x = φ.

Proof. (1) We have by definition of vacuous extension, neutrality and idempotency axioms

↑y ↓x ex = ex ⊗ ey = ey ⊗ ey = ey.

(2) By definition of vacuous extension and the neutrality axiom

↑z ↑z ↑z (φ ⊗ ψ) = (φ ⊗ ψ) ⊗ ez = (φ ⊗ ez) ⊗ (ψ ⊗ ez) = φ ⊗ ψ .

(3) By the labeling and the neutrality axioms and point (2) above just proved we have

↑x∨y ↑x∨y ↑x∨y φ ⊗ ψ = (φ ⊗ ψ) ⊗ ex∨y = (φ ⊗ ψ) = φ ⊗ ψ .

(4) follows from (3) with φ = ex and ψ = ey and (1):

↑x∨y ↑x∨y ex ⊗ ey = ex ⊗ ey = ex∨y ⊗ ex∨y = ex∨y.

(5) By the neutrality axiom, point (4) and the definition of vacuous extension,

↑x∨y φ ⊗ ey = φ ⊗ ex ⊗ ey = φ ⊗ ex∨y = φ .

↑x (6) Follows since φ = φ ⊗ ex = φ . (7) If d(φ) = x, then y ≤ x implies x ∨ y = x. Hence, using point (4) above

φ ⊗ ey = φ ⊗ ex ⊗ ey = φ ⊗ ex∨y = φ ⊗ ex = φ.

(8) Once more by point (4) above, using x ∨ y = y, we obtain

↑y ↑x ↑y φ = φ ⊗ ey = φ ⊗ ex∨y = (φ ⊗ ex) ⊗ ey = (φ ) .

(9) Using point (5) above and the combination axiom, we obtain

↑x∨y ↓y ↓y ↓x∧y ↓x∧y ↑y (φ ) = (φ ⊗ ey) = φ ⊗ ey = (φ ) .

(10) By the definition of vacuous extension, the combination and the neutrality axiom, using x ∧ y = x,

↑y ↓x ↓x ↓x∧y ↓x (φ ) = (φ ⊗ ey) = φ ⊗ ey = φ ⊗ ey = φ ⊗ ex = φ.

ut Next we prove some results concerning null elements.

↓x Lemma 3.2 1. ∀x, y ∈ D, x ≤ y implies zy = zx. 3.1. PROJECTION AXIOMATICS 25

↓x 2. ∀x, y ∈ D, x ≤ y = d(φ) and φ = zx imply φ = zy.

3. ∀x, y ∈ D, zx ⊗ zy = zx∨y.

4. ∀φ ∈ Φ, x, y ∈ D, d(φ) = x implies φ ⊗ zy = zx∨y. Proof. (1) From the nullity, the combination and the neutrality axioms, using x ∧ y = x, we derive ↓x ↓x ↓x zy = (zx ⊗ ey) = zx ⊗ ey = zx ⊗ ex = zx. (2) By the idempotency and the nullity axioms we have ↓x φ = φ ⊗ φ = φ ⊗ zx = φ ⊗ ey ⊗ zx = φ ⊗ zy = zy. (3) The labeling, neutrality and nullity axioms give

zx ⊗ zy = (zx ⊗ zy) ⊗ ex∨y = (zx ⊗ ex∨y) ⊗ (zy ⊗ ex∨y)

= zx∨y ⊗ zx∨y = zx∨y. (4) This follows from labeling, neutrality and nullity axioms: ↑x∨y φ ⊗ zy = (φ ⊗ ex∨y) ⊗ (zy ⊗ ex∨y) = φ ⊗ zx∨y = zx∨y. ut If the lattice D of domains is modular (e.g distributive), then the com- bination axiom may be strengthened to the following result. Lemma 3.3 Let D be modular. Then ∀φ, ψ ∈ Φ, d(φ) = x, d(ψ) = y and x ≤ z ≤ x ∨ y imply that (φ ⊗ ψ)↓z = φ ⊗ ψ↓y∧z. Proof. We have first ↓z ↓z (φ ⊗ ψ) = (φ ⊗ ψ) ⊗ ez. Since z ≤ x ∨ y implies that z ∧ (x ∨ y) = z and x ≤ z implies x ∨ z = z, the labeling and the combination axioms, together with associativity of combination gives us, ↓z ↓z (φ ⊗ ψ) ⊗ ez = (φ ⊗ ψ ⊗ ez) ↓y∧z = (φ ⊗ ez) ⊗ ψ ↓y∧z = (φ ⊗ ψ ) ⊗ ez. Now, here the first factor in the last line has domain x ∨ (y ∧ z). But by the modular law this equals (x ∨ y) ∧ z = z. Thus we obtain finally (φ ⊗ ψ)↓z = φ ⊗ ψ↓y∧z. ut At this point a first, basic example will clarify the concept of information algebras. 26 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

Example 3.1 Subsets in a System of Compatible Frames: Let F be a family of compatible frames. In accordance with the notation in this section we denote the frames in F by x, y, . . .. Let then Φx be the powerset of x for any x ∈ F and [ Φ = Φx. x∈D

The idea is that a subset φ of x represents an information about the question represented by x: namely the information that the unknown element θ ∈ x belongs to φ. Define d(φ) = x, if φ ∈ Φx. If y ≤ x, denote the refining of x to y by τx→y : x → y. If now d(φ) = y, and d(ψ) = y, then the combination of these two subsets is defined by

φ ⊗ ψ = τx→x∨y(φ) ∩ τy→x∨y(ψ).

This means that both subsets φ and ψ are refined to the coarsest common frame x ∨ y and intersected there, such that d(φ ⊗ ψ) = d(φ) ∨ d(ψ). If φ is a subset of y, and x ≤ y, then the projection of φ to x is defined by

↓x φ = {θ ∈ x : τx→y(θ) ∩ φ 6= ∅}.

The idea behind this definition is the following: If we consider some θ ∈ x, we may ask which elements of y are compatible with θ, or, in other words, can not be excluded if θ is assumed. This are exactly the elements of τx→y(θ). Now, given the subset φ ⊆ y, only elements θ ∈ x, which are compatible with elements in φ remain possible, hence those for which τx→y(θ) ∩ φ 6= ∅}. This defines an algebra on (Φ, F). It is left as an exercise to verify that the axioms of an information algebra are satisfied. Note that this example covers among other things subset algebras on lattices of partitions of some universe U.

An important special case are information algebras related to multivari- ate structures (example 2.1 in Section 2.2). The following section will be devoted to these particular algebras.

3.2 Multivariate Information Algebras

Multivariate structures for domains are of particular interest. Remember that associated with multivariate structures are distributive domain lattices D. Thus, the extended combination property lemma 3.3 will apply to them. The case of multivariate structures is rich in models. Here we introduce are a few them: 3.2. MULTIVARIATE INFORMATION ALGEBRAS 27

Example 3.2 Subsets in a Multivariate Structure: We refer to example 2.1 in Section 2.2. There is a countable number of variables X1,X2,... where each variable Xi has its domain Di, the set of its possible values. To a set {Xj : j ∈ J} of variables we associate their domain Y DJ = Dj. j∈J

The elements x ∈ DJ are called J-tuples; they represent the possible com- mon values of the variables Xj for j ∈ J. Note that a J-tuple can be seen as a mapping x : J → DJ such that x(i) ∈ Di. An information about the set of variables Xj can be given by a subset φ ∈ DJ , i.e. a set of J-tuples which contains the unknown common value of the variables. So let ΦJ be the powerset of DJ , and [ Φ = ΦJ . J⊆N

Let D be the lattice of all subsets of ω = {1, 2,...}. For φ ∈ ΦJ define the labeling d(φ) = J. In order to define combination and projection within Φ, we define the projection φ↓K of a J-tuple to a K-configuration for K ⊆ J as the K-tuple y such that y(i) = x(i) for i ∈ K. Then the projection of a J-information φ ∈ ΦJ to K ⊆ J is defined by

φ↓K = {x↓K : x ∈ φ}.

The combination of a J-information φ with a K-information ψ is defined as

↓J ↓K φ ⊗ ψ = {x ∈ DJ∪K : x ∈ φ, x ∈ ψ}.

Thus we have a two-sorted algebra (Φ,D) with the unary operation d :Φ → D, and the binary operations ⊗ :Φ × Φ → Φ and ↓:Φ × D → Φ, defined for K ⊆ d(φ). We leave it to the reader to verify that the axioms of an information algebra are satisfied. As a variant we may also take for D the lattice of finite subsets of ω.

A very important example of an information algebra is provided by re- lational databases. In fact, an important reduct of relational algebra is an information algebra. It can be seen as a particular case of example 3.2, but it is worthwhile to develop it in its own right.

Example 3.3 Relational Algebra. Let A be a finite set of symbols, called attributes. For each α ∈ A let Dα be a non-empty set, the set of all possible values of attribute a. The set of attributes corresponds to the set of variables in the terminology above, only that we restrict it here to a finite set. The Dα are the domains of the variables. 28 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

Let x ⊆ A. An x-tuple is a function f with domain x and f(α) ∈ Dα for each attribute α ∈ x. The set of all x-tuples is called Ex. The empty set of no x-tuple (the empty relation) is denoted by ∅x. For any x-tuple f and a subset y ⊆ x the restriction f[y] is defined to be the y-tuple g so that g(α) = f(α) for all α ∈ y. Tuples correspond to J-tuples in example 2.1 and the restriction to a projection. A relation R over x is a set of x-tuples, i.e. a subset of Ex. The set of attributes x is called the domain of R. As usual we denote it by d(R). Relations represent subsets of domains of a group of variables. For y ⊆ d(R), the projection of R onto y is defined as follows:

πy(R) = {f[y]: f ∈ R}.

The join of a relation R over x and a relation S over y is defined as

R ./ S = {f : f ∈ Ex∪y, f[x] ∈ R, f[y] ∈ S}.

Clearly, the join is combination. In fact, it is easy to see that the fraction of relational algebra containing the two operations of projection and join satisfies the axioms of information algebra. Here the axioms are expressed in the notation of relational algebra:

1. (R1 ./ R2) ./ R3 = R1 ./ (R2 ./ R3) (associativity),

2. R1 ./ R2 = R2 ./ R1 (commutativity),

3. d(R ./ S) = d(R) ∪ d(S) and d(πy(R)) = y (labeling),

4. R ./ Ex = R, if d(R) = x and πy(Ex) = Ey, if y ⊆ x (neutrality),

5. R ./ ∅x = ∅x, if d(R) = x, and ∅x ./ Ey = ∅y, if x ⊆ y (nullity),

6. πx(R) = πx(πy(R)), if x ⊆ y ⊆ d(R) (projection),

7. πx(R ./ S) = R ./ πx∩y(S), if d(R) = x and d(S) = y (combination),

8. R ./ πx(R) = R, if x ⊆ d(R) (idempotency). Relational algebra is thus a model of an information algebra, and a very fundamental one, as explained in the following Section 3.3. Of course rela- tional algebra contains more operations than projection and joins, in fact it is also a model of a Bayesian information algebra, we shall see later (see Section 6.2).

Example 3.4 Idempotent . A rather general family of informa- tion algebras on multivariate structures can be derived from distributive lattices with a bottom element ⊥ and a top element >. Let A be a dis- tributive lattice in which we denote the meet by × and the join by +. In 3.2. MULTIVARIATE INFORMATION ALGEBRAS 29 this view, a distributive lattice is a particular instance of a more general algebraic structure, called . Semirings play an important role in theoretical Computer Science. In our context too, they prove to be a rich source of models of information algebras. The present example is a first hint at this fact. More examples in this framework can be found in (Pouly & Kohlas, 2010) In what follows we write consequently a join over several elements of P the lattice as . For any subset J of ω consider the set ΦJ of J-tuples introduced in example 3.2 above and mappings ψ :ΦJ → A, whereby to each J-tuple a value ψ(x) in A is assigned. Such a mapping is called an A-valuation on J. Let now ΨJ denote the set of all A-valuation on J and [ Ψ = ΨJ . J⊆ω

Within Ψ we define the unary operation d by d(ψ) = J if ψ is a J-valuation. Further we define two binary operations as follows:

1. Combination:⊗ :Ψ×Ψ → Ψ, (φ, ψ) 7→ φ⊗ψ defined for x ∈ Dd(φ)∪d(ψ) by

φ ⊗ ψ(x) = φ(x↓d(φ)) × ψ(x↓d(ψ)).

2. Projection:↓:Ψ × D → Ψ, (ψ, K) 7→ ψ↓K , defined for K ⊆ d(ψ) and x ∈ DK by X ψ↓K (x) = ψ(y). ↓K y∈Dd(ψ):y =x

The defining equation for projection can also be written in the following way, if we decompose the tuples y in domain J = d(ψ) into subtuples x belonging to domain K and subtuples y belonging to domain J − K, y = (x, z): X ψ↓K (x) = ψ(x, z).

z∈DJ−K

We claim that the two-sorted algebra (Ψ,D) with the unary operation d and the two binary operations ⊗ and ↓ as defined above forms an information algebra. The semigroup, labeling axioms are immediate consequences of the definition and the commutativity and associativity of meet in the underlying distributive lattice A. The neutral A-valuation on J is the constant mapping eJ (x) = > (the neutral element for ×) for all x ∈ DJ . By the idempotency of the join in the lattice A we see that for any K ⊆ J = d(ψ),

↓K X eJ (x) = eJ (x, z) = >. z∈DJ−K 30 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

↓K Thus we conclude that eJ = eK , hence that the neutrality axiom is satisfied too. The null A-valuation on J is the constant mapping zJ (x) = ⊥, the null element for multiplication. Then we obtain for K ⊆ J,

↓K zK ⊗ eJ (x) = zK (x ) × eJ (x) = ⊥ × > = ⊥.

Thus, we see that zK ⊗ eJ = zJ , hence the nullity axiom holds also. Tran- sitivity of projection means simply that we can sum out variables in two steps. The combination property also follows easily, since × distributes over + in the distributive lattice A. Finally, the idempotency axiom follows if we remember that × denotes the infimum. Thus

ψ ⊗ ψ↓K (x) = ψ(x) × ψ↓K (x↓K ) ≤ ψ(x).

Conversely, for y ∈ DK and z ∈ DJ−K , X ψ ⊗ ψ↓K (y, z) = ψ(y, z) × ψ(y, z0) 0 z ∈DJ−K ≥ ψ(y, z) × ψ(y, z) = ψ(y, z), since ψ(y, z) is a term of the sum in the first line and × is idempotent. Now, there are many examples of distributive lattices which yield inter- esting information algebras. For instance the bottleneck algebra defined on the set of real numbers, augmented with +∞ and −∞, A = R∪{−∞, +∞}, takes max for the +-operation and min for the ×-operation. The element −∞ is the bottom element⊥, and the top-element is +∞. More generally, distributive lattices can be used to express qualitative degrees of memberships of tuples to fuzzy sets. Further, Boolean algebras are distributive lattices. Elements of Boolean algebras can be seen as describe parts of belief associ- ated with tuples. These parts of belief may even be quantitatively measured by probabilities. Such systems will be discussed in part III on Uncertain Information. In particular, A-valuations with the binary Boolean algebra {0, 1} as A represent subsets of tuples φ = {x : ψ(x) = 1}. This information algebra is essentially identical (isomorph) to the subset algebra of Example 3.2.

Example 3.5 Convex Subsets in Euclidian Vector Spaces: Let Dj = R in a multivariate structure. Here we admit, to simplify, only finite sets J in n the lattice D, such that DJ are finite-dimensional real vector spaces R .A subset φ of DJ is convex, if x, y ∈ φ implies λ · x + (1 − λ) · y ∈ φ for all 0 ≤ λ ≤ 1. Let now ΦJ be the family of convex subsets of DJ , and [ Φ = ΦJ . J⊆ω,J finite Within the two-sorted system (Φ,D) the unary operation d of labeling and the binary operations of combination and projection are defined as in the 3.3. FILE SYSTEMS 31 subset algebra of Example 3.2. We claim that the combination of two convex subsets is again convex, just as the projection of a convex subset is still convex, i.e. Φ is closed under combination and projection. Then it follows automatically from Example 3.2 that (Φ,D) is an information algebra, in fact, a subalgebra of the algebra of subsets in the corresponding multivariate structure. Indeed, since addition in DJ is component-wise, we have, for x, y ∈ DJ and K ⊆ J, that

(λ · x + (1 − λ) · y)↓K = λ · x↓K + (1 − λ) · y↓K .

Hence, if d(φ) = K and d(ψ) = J, then, for x, y ∈ φ ⊗ ψ consider λ · x + (1 − λ) · y. By definition of combination x↓K , y↓K ∈ φ, hence (λ · x + (1 − λ) · y)↓K = λ · x↓K + (1 − λ) · y↓K ∈ φ. Similarly we conclude also that (λ · x + (1 − λ) · y)↓J ∈ ψ. Thus λ · x + (1 − λ) · y ∈ φ ⊗ ψ, which proves that φ ⊗ ψ is convex. Further, let x0, y0 ∈ φ↓K , then there exist x, y ∈ φ such that x0 = x↓K and y0 = y↓K . Then λ · x + (1 − λ) · y ∈ φ implies that λ · x0 + (1 − λ) · y0 ∈ φ↓K . This shows that φ↓K is convex. In particular, n-dimensional intervals (open, closed, half-open) as well as convex polyhedra form a subalgebra of convex sets.

Further examples of information algebras are provided by open, closed, n compact and Borel sets in real spaces R . We leave it to the reader to verify theses claims.

3.3 File Systems

The example of relational algebra can be generalized in a more abstract framework. Let D be a lattice. A tuple system over D is a set T together with two operations

1. d : T → D,

2. ·[·]: T × D → T , defined for x ≤ d(f), which satisfies the following axioms for f, g ∈ T and x, y ∈ D:

1. x ≤ d(f) implies d(f[x]) = x.

2. x ≤ y ≤ d(f) implies f[y][x| = f[x].

3. d(f) = x implies f[x] = f.

4. d(f) = x, d(g) = y and f[x ∧ y] = g[x ∧ y] imply that there exists a h ∈ T such that d(h) = x ∨ y and h[x] = f, h[y] = g.

5. d(f) = x and x ≤ y imply that there exists a g ∈ T such that d(g) = y and g[x] = f. 32 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

The elements of T are called tuples, d(f) is the domain of a tuple f and f[x] its projection to domain x. Obviously, the mathematical tuples of ordinary relational algebra (see previous example 3.3) form a tuple system, where D is the lattice of the set of attributes A. Let (T,D) be a tuple system with the operations d and ·[·]. Generalizing ordinary relations we define a relation over x ∈ D to be a subset R of tuples such that d(f) = x for all f ∈ R. The domain of R equals then the domain of its tuples, d(R) = d(f), if f ∈ R. As in the former example, we define Ex to be the set of all tuples f ∈ T with d(f) = x and ∅x as the set empty set with d(∅x) = x. For a relation R and x ≤ d(R), the projection of R onto x is defined as

πx(R) = {f[x]: f ∈ R}.

Similarly, the join of two relations R and S with d(R) = x and d(S) = y is defined as follows:

R ./ S = {f ∈ T : d(f) = x ∨ y, f[x] ∈ R, f[y] ∈ S}.

If the set on the right is empty, we identify it with ∅x∨y. Let RT be the set of all relations in T . It forms an information algebra.

Theorem 3.1 (RT ,D) is an information algebra with the operations of pro- jection and join as combination.

Proof. (1) The join is commutative by definition. It is also associative, since f ∈ (R1 ./ R2) ./ R3 implies f[x ∨ y] ∈ R1 ./ R2 and f[z] ∈ R3, hence f[x] ∈ R1, and f[y] ∈ R2, if d(R1) = x, d(R2) = y and d(R3) = z. The same result is obtained, if f ∈ R1 ./ (R2 ./ R3). (2) d(R ./ S) = d(R) ∪ d(S) holds by definition. So does d(πx(R)) = x if x ≤ d(R) and by axiom (1) for tuples. (3) By definition πy(Ex) = {f[y]: f ∈ Ex} if y ≤ x. Now, if g ∈ Ey, then by axiom (5) for tuples there exists a tuple f ∈ Ex such that f[y] = g, hence g ∈ πy(Ex) and therefore πy(Ex) = Ey. Clearly Ex ./ R = R if d(R) = x. So the neutrality axiom holds. (4) The nullity axiom holds by definition. (5) The projection axiom follows from the definition of projection and axiom (2) for tuples. (6) In oder to prove the combination axiom (6) assume first that f ∈ πx(R ./ S) where d(R) = x and d(S) = y. Then there exists a g ∈ R ./ S such that g[x] = f and g[x] ∈ R, g[y] ∈ S. This implies f ∈ R. Further g[x∧y] = g[x][x∧y] = f[x∧y] and therefore g[x∧y] = g[y][x∧y] = f[x∧y]. But this shows that f[x ∧ y] ∈ πx∧y(S), hence f ∈ R ./ πx∧y(S). Conversely assume f ∈ R ./ πy∧z(S). This means that f ∈ R and f[x ∧ y] ∈ πx∧y(S). The latter inclusion implies that there is a g ∈ S such 3.4. VARIABLE ELIMINATION 33 that g[x ∧ y] = f[x ∧ y]. By property (4) of a tuple system, there is a h with d(h) = x ∨ y and such that h[y] = g and h[x] = f. Thus, h ∈ R ./ S and hence f = h[x] ∈ πx(R ./ S). (7) Idempotency axiom (7) finally follows directly from the definition of the join. ut We shall encounter later many instances of a tuple system. In fact, remarkably, an information algebra (Φ,D) is also a tuple system.

Lemma 3.4 Let (Φ,D) be an information algebra. Then (Φ,D) is a tuple system with domain d and projection ↓.

Proof. We have to show that the five axioms for tuples are satisfied: (1) follows from the labeling axiom. (2) is the projection axiom. (3) Using the axioms of an information algebra, assuming d(φ) = x, we ↓x ↓x ↓x see that φ = (φ ⊗ ex) = φ ⊗ ex = φ ⊗ ex = φ. This is axiom (3) for tuples. (4) Assume d(φ) = x, d(ψ) = y and φ↓x∧y = ψ↓x∧y, then d(φ⊗ψ) = x∨y, and we claim that φ ⊗ ψ is the element whose existence is asserted in axiom (4) for tuples. Indeed,

(φ ⊗ ψ)↓x = φ ⊗ ψ↓x∧y = φ ⊗ φ↓x∧y = φ, (φ ⊗ ψ)↓y = φ↓x∧y ⊗ ψ = ψ↓x∧y ⊗ ψ = ψ.

(5) Assume that d(φ) = x and x ≤ y. Then d(φ↑y) = y and (φ↑y)↓x = φ (lemma 3.1 (10)). So φ↑y is the element whose existence is asserted in axiom (5) ut

3.4 Variable Elimination

In a multivariate framework we may replace projection by an alternative primitive operation. Let V be the countable set of variables {X1,X2,...} and let now d(φ) ⊆ V denote the set of variables defining the domain of φ. For an element φ ∈ Φ and a variable X ∈ d(φ) we define

φ−X = φ↓d(φ)−{X}. (3.1)

This operation is called variable elimination. Let v = {X1,X2,...} denote a countable family of variables. If we replace projection by variable elimi- nation, then we obtain a system (Φ,V ) with the following operations: 1. Labeling: d :Φ → V ; φ 7→ d(φ),

2. Combination: ⊗ :Φ × Φ → Φ; (φ, ψ) 7→ φ ⊗ ψ,

3. Projection: − :Φ × V → Φ; (φ, X) 7→ φ−X , for X ∈ d(φ). 34 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

We shall prove below, that if (Φ,D) is an information algebra, then the following system of axioms holds in (Φ,V ) and that this system is in fact equivalent to the previous one, based on projection. 1. Semigroup: Φ is associative and commutative under combination. 2. Labeling: ∀φ, ψ ∈ Φ, d(φ) = x, d(ψ) = y imply d(φ ⊗ ψ) = d(φ) ∪ d(ψ), and ∀φ ∈ Φ and ∀X ∈ d(φ), we have that d(φ−X ) = d(φ) − {X}.

3. Neutrality: ∀x ∈ D ∃ex ∈ Φx such that ∀φ ∈ Φx we have ex ⊗ φ = φ −X and ∀X ∈ x, we have ex = ex−{X}.

4. Nullity:∀x ∈ D ∃zx ∈ Φx such that ∀φ ∈ Φx we have zx ⊗ φ = zx and ∀y ∈ D, x ≤ y we have zx ⊗ ey = zy. 5. Variable Elimination: ∀φ ∈ Φ and ∀X,Y ∈ d(φ), X 6= Y implies that (φ−X )−Y = (φ−Y )−X .

6. Combination: ∀φ, ψ ∈ Φ, d(φ) = x, d(ψ) = y and Y ∈ y, Y 6∈ X implies (φ ⊗ ψ)−Y = φ ⊗ ψ−Y .

7. Idempotency: ∀φ ∈ Φ and X ∈ d(φ) implies φ ⊗ φ−X = φ.

Theorem 3.2 If (Φ,D) is an information algebra over D = P(V ) and variable elimination is defined by (3.1), then the system of axioms above for variable elimination is satisfied. Proof. Let (Φ,D) be an information algebra over D = P(V ). We only have to prove axioms (2),(3) and (5) to (7) for variable elimination, the other ones being identical to those of labeled information algebra. Axiom (2) follows from the labeling axiom since d(φ−X ) = d(φ↓d(φ)−{X}) = d(φ) − {X} by the definition of variable elimination. Axiom (3) follows in the same way from the neutrality axiom. Axiom (5) follows from the definition of variable elimination, axiom (2) just proved and the projection axiom of information algebras, (φ−X )−Y = (φ↓d(φ)−{X})↓d(φ)−{X,Y } = φ↓d(φ)−{X,Y } = (φ↓d(φ)−{Y })↓d(φ)−{X,Y } = (φ−Y )−X . 3.4. VARIABLE ELIMINATION 35

To derive axiom (6) note that x ⊆ (x ∪ y) − {Y } ⊆ x ∪ y. Take then z = (x ∪ y) − {Y } and apply lemma 3.3 (remember that the lattice P(V ) is distributive, hence modular),

(φ ⊗ ψ)−Y = (φ ⊗ ψ)↓(x∪y)−{Y } = φ ⊗ ψ↓y∩((x∪y)−{Y }).

Since Y ∈ d(ψ) = y, Y 6∈ d(φ) we obtain y ∩ ((x ∪ y) − {Y }) = y − {Y }, hence

(φ ⊗ ψ)−Y = φ ⊗ ψ↓y−{Y } = φ ⊗ ψ−Y .

Axiom (7) finally follows from the definition of variable elimination and the idempotency axiom of information algebra. ut The axioms of variable elimination allow to define unambiguously the elimination of two or more variables X1,X2,...,Xn ∈ d(φ) by

φ−{X1,X2,...,Xn} = (··· ((φ−X1 )−X2 ) ···)−Xn , (3.2) independently of the sequence of variable elimination actually used. Here are a few elementary results on variable elimination:

Lemma 3.5 1. ∀φ ∈ Φ, x ⊆ d(φ) implies

d(φ−x) = d(φ) − x.

2. ∀φ ∈ Φ, x, y ⊆ d(φ), x and y disjoint, implies

(φ−x)−y = φ−(x∪y).

3. ∀φ, ψ ∈ Φ, y ⊆ d(ψ), y ∩ d(φ) = ∅, implies

(φ ⊗ ψ)−y = φ ⊗ ψ−y.

4. ∀φ ∈ Φ, x ⊆ d(φ) implies

φ ⊗ φ−x = φ.

Proof. (1) in order to prove this first item, we order the variables of the set x arbitrarily as X1,X2,...,Xn and apply definition (3.2) as well as axiom (2) of the system above repeatedly

d(φ−x) = d((··· ((φ−X1 )−X2 ) ···)−Xn )

= (··· ((d(φ) − {X1}) − {X2}) ···) − {Xn} = d(φ) − x. 36 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

(2) Follows directly from the definition of the elimination of subsets of variables (3.2). (3) Follows from (3.2) and a repeated application of axiom (6) of the system above. (4) By definition we have

φ ⊗ φ−x = φ ⊗ (... ((φ−X1 )−X2 ) ···)−Xn .

If we apply idempotency axiom (7) above repeatedly, then we obtain

φ ⊗ φ−x = φ ⊗ φ−X1 ⊗ (φ−X1 )−X2 ⊗ · · · ⊗ (... ((φ−X1 )−X2 ) ···)−Xn .

A second, repeated application of idempotency axiom (7) from right to left to the right hand side of the last equation then yields finally that the right hand side equals φ. ut Since focusing is defined for arbitrary sets x of variables, it can not, in general, be recovered from variable elimination, since this operation is only defined for finite sets. However, there is an important special case where this inversion is possible: If the domain d(φ) of all elements φ ∈ Φ is finite, then the information algebra (Φ,D) is called locally finite dimensional. In this case projection can now conversely be expressed as variable elimination. For x ⊆ d(φ) we have

φ↓x = φ−(d(φ)−x). (3.3)

The following theorem states now that the two ways of defining information algebras (Φ,D) and (Φ,V ) over variables are in fact equivalent for locally finite dimensional information algebras.

Theorem 3.3 If (Φ,V ) satisfies the system of axioms for variable elimina- tion given above, and if it is locally finite, then, if projection is defined by (3.3), the algebra (Φ,D), with D as the lattice of finite subsets of V , is an information algebra.

Proof. We have to verify axiom (2),(3) and (5) to (7) of an information algebra. In order to prove axiom (2) apply lemma 3.5 (1) to the definition of projection (3.3),

d(φ↓x) = d(φ−(d(φ)−x)) = d(φ) − (d(φ) − x) = x.

Axiom (3) follows from the neutrality axiom (3) above for variable elim- ination and the definition of projection (3.3). In order to prove axiom (5) assume d(φ) = z and x ⊆ y ⊆ z. Then, by the definition of projection (3.3) and lemma 3.5 (2)

(φ↓y)↓x = (φ−(z−y))−((z−y)−x) = φ−(z−x) = φ↓x. 3.4. VARIABLE ELIMINATION 37

Next assume that d(φ) = x and d(ψ) = y. Then, by the labeling axiom we have d(φ ⊗ ψ) = x ∪ y. Thus, using the definition of projection and lemma 3.5 (3), we obtain, since (x ∪ y) − x = y − (x ∩ y)

(φ ⊗ ψ)↓x = (φ ⊗ ψ)−((x∪y)−x) = φ ⊗ ψ−(y−(x∩y)) = φ ⊗ ψ↓x∩y.

This is the combination axiom (6) for labeled information algebras. The idempotency axiom (7) for labeled information algebras finally fol- lows from Lemma 3.5 (4). ut In certain cases variable elimination is the more natural operation than projection. Let’s present two examples which illustrate this.

Example 3.6 Affine Spaces. Consider a countable set of variables X1,X2,..., each with values in a field F. For a finite subset s ⊆ ω = {1, 2,...} a map- ping x : s → F is called a s-vector over the field F, and the set of all s-vectors is denoted by F s. If i ∈ s, the value x(i) ∈ F is called the i-th component of the s-vector x. Clearly, for any s, F s is a linear space, with addition x + y as usual defined by (x + y)(i) = x(i) + y(i) and scalar multi- plication λ · x defined by λx(i) for any scalar λ ∈ F. The projection x↓t of a s-vector x to a subset t ⊆ s is simply the restriction of x to t. Consider now a system of linear equations over a set of variables s: X ai,jXj = ai,0, i = 1, . . . , m. j∈s

s P The set of solutions φ = {x ∈ F : j∈s ai,jxj = ai,0, i = 1, . . . , m} de- termines an affine subspace of F. Of course the same affine space may be determined by different systems of linear equations. The solution set φ may also be empty. Otherwise we may assume that m ≤ |s| and that the m equations are linearly independent. Then the associated homogeneous system X ai,jXj = 0, i = 1, . . . , m. j∈s has |s| − m linearly independent solutions b1, . . . , b|s|−m. If b0 is a solution of the original system, then

φ = {x ∈ R : x = b0 + λ1 · b1 + ··· λ|s|−m · b|s|−m}. is the affine solution space of the system of equations. For t ⊇ s, we may transform any system of linear equations over s into a system of linear equations over t, X ai,jXj = ai,0, i = 1, . . . , m, where ai,j = 0 for j ∈ t − s. j∈t 38 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

The corresponding affine space in F ↑t is denoted by φ↑t, if φ is the affine space associated with the original system. Similarly, if S(φ) is a system of linear equations over s, whose affine space is φ, then S(φ)↑t denotes the same system, but over t. s Let now Φs be the set of affine spaces in F and [ Φ = Φs. s⊆ω In Φ we may define operations of combination and variable elimination, which satisfy the axioms of an information algebra.

1. Combination: Assume φ and ψ two affine spaces over s and t re- spectively, with associated systems of linear equations S(φ) and S(ψ). Then we define the combined affine space φ ⊗ ψ by φ ⊗ ψ = φ↑s∪t ∩ ψ↑s∪t. Note that this new affine space is determined by the system of linear equations S(φ)↑s∪t ∪ S(ψ)↑s∪t. In terms of systems of linear equations combination is essentially union of system of equations. 2. Variable Elimination: Let φ be an affine space in F s, defined by a system S(φ) of linear equations over s. Select k ∈ S. We want to define the operation of elimination of variable Xk from φ. There are three cases to distinguish:

(a) Either Xk appears in no equation with coefficient ai,k 6= 0. Then −X let S(φ) k be the system S(φ), but with the term 0 · Xk elimi- nated in each equation. This is a system over s − {k}.

(b) Or Xk appears in exactly one equation with a coefficient ai,k 6= 0. Then let S(φ)−Xk be the system S(φ) with this equation elim- inated, and with, in the remaining equations, the term 0 · Xk eliminated everywhere. Again S(φ)−Xk is a system over s − {k}.

(c) Else, select one equation with ai,k 6= 0 and solve it for Xk,

ai,0 X ai,h Xk = − Xh. ai,k ai,k h∈s−{k}

Then, replace in all other equations Xk by the right hand side above, and regroup terms;   X ai,h ai,0 aj,h − Xh Xj = aj,0 − , j = 1, . . . , m, j 6= i. ai,k ai,k h∈s−{k} 3.4. VARIABLE ELIMINATION 39

The resulting system of equations S(φ)−Xk contains one equation less than S(φ) and is a system over s − {k}.

It is easy to verify that variable elimination corresponds to a projection φ−Xk of affine affine space φ in F s to the set of variables s − {k}. It is not difficult to verify that Φ equipped with the two operations of combination and variable elimination satisfies the axioms of an information algebra. The affine space defined by a single linear equation over s is called a hyperplane in F s. Note then that any affine space in F s is the combination of a finite number of hyperplanes in F s- This is a first example where the information elements φ in an algebra Φ are represented by expressions S(φ) of some formal language, in this case the language of linear equations over a field F. The association of elements φ to their representation S(φ) is nondeterministic, because different systems may define the same element φ. However the basic operations of combination and variable elimination can be expressed in terms of the representations S(φ). As we shall see later (part IV), this is a very general situation. The next example is of the same kind.

Example 3.7 Convex Polytopes and Polyhedra. Instead of linear equations as in the previous example, we may look at linear inequalities. However we fix here the field F to be the field of reals R and consider thus linear spaces s R for finite subsets s ⊆ ω = {1, 2,...}. We consider systems of linear inequalities over s, X ai,jXj ≤ ai,0, i = 1, . . . , m. j∈s

The set of all s-vectors x satisfying such a system of linear inequalities,

s X φ = {x ∈ R : ai,jxj ≤ ai,0, i = 1, . . . , m}. j∈s is called a convex polyhedron. Note that different systems of linear inequal- ities may define the same convex polyhedron. As usual it may be convenient to use matrix notation. Let Am×s be the matrix with m rows and a column j for any j ∈ s. If Xs is the s-column vector of variables Xj, j ∈ s and a0 the column vector with components ai,0, i = 1, . . . , m, , then the system of linear inequalities above may be s ↑t written as Am×sX ≤ a0. If s ⊆ t, then Am×s is the m × t matrix, where the columns for j ∈ s correspond to those of Am×s, and the columns for ↑t t j ∈ t−s are 0-columns. The system Am×sX ≤ a0 is then a system of linear s inequalities over t, the extension of the system Am×sX ≤ a0. It defines a convex polyhedron over t, which clearly is nothing else than the cylindrical 40 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

t extension of the polyhedron over s in R . This cylindrical extension of a convex polyhedra φ over s to t is denoted by φ↑t. Let S(φ) denote an arbitrary system of linear inequalities over s, which determines φ. As before, S(φ)↑t denotes the extension of S(φ) to t. We admit also the empty system of linear inequalities over s, i.e. the sys- tem consisting of no linear inequality. The corresponding convey polyhedron s is then simply the whole space R . Also, a system of linear inequalities may be inconsistent, i.e. have no solution. The corresponding convex polyhedron s is then the empty set (as a subset of R ). Let now Φs denote the set of all convex polyhedra over s, let D denote the lattice of all finite subsets s ⊆ ω, and [ Φ = Φs. s∈D

We are now going to define the operations of combination and variable elimination among elements of Φ and claim that with these operations Φ forms an information algebra.

1. Combination: Assume φ and ψ convex polyhedra over s and t respec- tively, with systems S(φ) and S(ψ) of linear inequalities determining the two polyhedra. The we define the combined convex polyhedra φ ⊗ ψ by

φ ⊗ ψ = φ↑s∪t ∩ ψ↑s∪t.

Note that this combined polyhedron is determined by the system

S(φ)↑s∪t ∪ S(ψ)↑s∪t.

Combination is simply union of the respective systems of linear in- equalities.

2. Variable Elimination: Let φ be a convex polyhedron over s deter- mined by a system S(φ). If k ∈ s, let S(φ)+ denote the subsystem of linear inequalities with a positive coefficient ai,k > 0 of the variable − Xk, whereas S(φ) denote the subsystem of linear inequalities with a negative coefficient ai,k < 0 of the variable Xk. If i is the index of an inequality in S(φ)+ and j an index of an inequality in S(φ)−, then these two inequalities may be rewritten as follows:

ai,0 X ai,h Xk ≤ − Xh, ai,k ai,k h∈s−{k} aj,0 X aj,h Xk ≥ − Xh. aj,k aj,k h∈s−{k} 3.5. THE TRANSPORT OPERATION 41

This leads then to the inequality, where the variable Xk is eliminated ai,0 X ai,h aj,0 X aj,h − Xh ≤ − Xh. ai,k ai,k aj,k aj,k h∈s−{k} h∈s−{k} + If we eliminate the variable Xk in every pair of inequalities of S(φ) and S(φ)−, then we obtain a new system of linear inequalities, which + − contains no more the variable Xk. If either S(φ) or S(φ) are empty, then elimination of Xk leads to the empty system. This new system of linear inequalities corresponds to the projection φ−Xk of the original convex polyhedron φ to the set of variables s − {k}. It can be readily verified that the system Φ of convex polyhedra satisfies the axioms of an information algebra. As in the case of the previous example 3.6, the elements φ of the algebra have a representation S(φ) in terms of a formal language and the operations of the algebra (combination and variable elimination) are reflected in these representations. The convex polyhedron associated with a single linear inequality over s is called a halfspace of Rs. The discussion above makes it clear, that any convex polyhedron is the intersection, i.e. the combination, of a finite number of halfspaces. Halfspaces are thus a kind of generating elements of the algebra of convex polyhedra. A bounded polyhedron is called a polytope. It is evident that combination of polytopes and variable elimination of a polytope yield again polytopes. Hence, polytopes form a subalgebra of the information algebra of convex polyhedra.

3.5 The Transport Operation

When we vacuously extend an information φ from its domain x to a larger ↑y domain y ≥ x by φ = φ ⊗ ey, then we do not add any information, since (φ↑y)↓x = φ (see lemma 3.1 (10)). So φ and its vacuous extensions represent the same information. Pursuing this idea we define the following equivalence in an information algebra (Φ,D): For φ, ψ ∈ Φ, if d(φ) = x and d(ψ) = y, then φ ≡ ψ (mod σ) if, and only if, φ↑x∨y = ψ↑x∨y. (3.4) For further reference we define the following operation for φ ∈ Φ with d(φ) = x and any y ∈ D, φ→y = (φ↑x∨y)↓y This is called transport of an information φ from its domain x to another domain y. By (9) of Lemma 3.1 we have also φ→y = (φ↓x∧y)↑y. 42 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

Note that, if y ≤ x, then transport is projection, and if x ≤ y, then transport is vacuous extension. Further φ ≡ ψ (mod σ) holds if, and only if, φ→y = ψ, and ψ→x = φ. Thus φ and ψ really represent the same information in this case, simply with reference to different domains. This implies also that φ↓x∧y = ψ↓x∧y. (3.5) The following property is fundamental for the theory to be developed. Lemma 3.6 The relation defined in (3.4) is a congruence relative to com- bination, projection and transport in the information algebra (Φ,D). Proof. We verify first that the relation is an equivalence. Reflexivity and symmetry are direct consequences from the definition. In order to prove transitivity assume d(φ) = x, d(ψ) = y, d(η) = z and φ ≡ ψ (mod σ), ψ ≡ η (mod σ). This means that φ↑x∨y = ψ↑x∨y and ψ↑y∨z = ψ↑y∨z. By the transitivity of vacuous extension, see lemma 3.1 (8), we obtain φ↑x∨y∨z = ψ↑x∨y∨z = η↑x∨y∨z. Point (10) of lemma 3.1 implies then φ↑x∨z = (φ↑x∨y∨z)↓x∨z = (η↑x∨y∨z)↓x∨z = η↑x∨z. Thus we have φ ≡ η (mod σ). Assume now φ1 ≡ ψ1 (mod σ) and φ2 ≡ ψ2 (mod σ) and let x1, x2, y1, y2 be the domains of φ1, φ2, ψ1, ψ2 respectively. Then, the assumed equiva- lences, together with lemma 3.1 (5), allow the following derivation,

↑(x1∨x2)∨(y1∨y2 (φ1 ⊗ φ2) ) = (φ1 ⊗ φ2) ⊗ ey1∨y2

= (φ1 ⊗ ey1 ) ⊗ (φ2 ⊗ ey2 )

= (ψ1 ⊗ ex1 ) ⊗ (ψ2 ⊗ ex2 )

= (ψ1 ⊗ ψ2) ⊗ ex1∨x2 ↑(x1∨x2)∨(y1∨y2 = (ψ1 ⊗ ψ2) .

Thus we conclude that φ1 ⊗ φ2 ≡ ψ1 ⊗ ψ2 (mod σ). ↓z ↓z Further we claim that φ1 ≡ ψ1 (mod σ) implies φ = ψ if z ≤ d(φ) = x, d(ψ) = y. Indeed, by the transitivity axiom and (3.5) above, φ↓z = (φ↓x∧y)↓z = (ψ↓x∧y)↓z = ψ↓z. So we have φ↓z ≡ ψ↓z (mod σ), whenever the two projections exist. →z ↓z Note that for any z ∈ D we have that φ = (φ ⊗ ez) . By the congruence relative to combination and projection we have then that φ ≡ ψ (mod σ) implies φ→z ≡ ψ→z (mod σ). In fact, according to the proof we have even that φ→z = ψ→z. So the congruence holds also relative to transport. ut The next lemma summarizes some elementary results concerning the transport operation. 3.5. THE TRANSPORT OPERATION 43

Lemma 3.7 1. d(φ) = x implies φ→x = x. 2. d(φ) = x, and x, y ≤ z imply φ→y = (φ↑z)↓y. 3. y ≤ z implies φ→y = (φ→z)→y. 4. ∀φ, and ∀x, y ∈ D, (φ→x)→y = (φ→x∧y)→y. 5. d(φ) = x implies (φ ⊗ ψ)→x = φ ⊗ ψ→x.

Proof. (1) follows since φ→x = φ↓x = φ as we saw above. (2) Use first transitivity of vacuous extension (lemma 3.1 (8)) and pro- jection, together with the fact that x, y ≤ z implies y ≤ x ∨ y ≤ z, and then lemma 3.1 (10) to obtain

(φ↑z)↓y = (((φ↑x∨z)↑z)↓x∨y)↓y = (φ↑x∨y)↓y = φ→y.

(3) Assume d(φ) = x. Then, use first the definition of transport, then transitivity of projection, followed by transitivity of vacuous extension (lemma 3.1 (8)) and of projection, and then lemma 3.1 (10),

(φ→z)→y = ((φ↑x∨z)↓z)↓y = (φ↑x∨z)↓y = (((φ↑x∨y)↑x∨z)↓x∨y)↓y = (φ↑x∨y)↓y = φ→y.

(4) To prove this identity, apply first (3) just proved, then lemma 3.1 (9)

(φ→x∧y)→y = ((φ→x)→x∧y)→y = ((φ→x)↓x∧y)↑y = (φ→x)→y.

(5) This follows from the combination axiom,

→x ↓x ↓x∧y ↓x∧y (φ ⊗ ψ) = (φ ⊗ ψ) = φ ⊗ ψ = φ ⊗ ex ⊗ ψ = phi ⊗ (ψ↓x∧y)↑x = φ ⊗ ψ→x.

ut Since the relation σ is a congruence relative to combination and trans- port, we may consider the congruence classes [φ]σ of the elements of Φ. The elements of each such class represent the same information. Therefore these classes can be considered as a representation of information without refer- ence to a particular domain. Because σ is a congruence, we may define the operations of combination and focusing unambiguously by

⇒x →x [φ]σ ⊗ [ψ]σ = [φ ⊗ ψ]σ, [φ]σ = [φ ]σ. We may now consider the algebra of domain free information associated to the information algebra (Φ,D). Denote the set of all congruences classes [φ]σ by Φ/σ. Note that Φ/σ is a commutative semigroup under combination. The class [ex] is its neutral element and the class [zx] its null element. Further the operations of combination and focusing have interesting properties as stated in the next theorem. 44 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

⇒x ⇒y ⇒x∧y Theorem 3.4 1. ∀x, y ∈ D, ([φ]σ ) = [φ]σ ,

⇒x ⇒x ⇒x ⇒x 2. ∀x ∈ D, ([φ]σ ⊗ [ψ]σ) = [φ]σ ⊗ [ψ]σ ,

⇒x ⇒x 3. ∀x ∈ D, [ex]σ = [ex]σ, and [zx]σ = [zx]σ,

⇒x 4. ∀([φ]σ ∈ Φ/σ, ∃x ∈ D such that [φ]σ = [φ]σ.

Proof. (1) By lemma 3.7 (4) (φ→x)→y = (φ→x∧y)→y. But (φ→x∧y)→y = →x∧y ↑y →x∧y →y →x∧y ⇒x ⇒y (φ ) , hence (φ ) ≡ φ (mod σ). Therefore, ([φ]σ ) = →x →y →x∧y ⇒x∧y [(φ ) ]σ = [φ ]σ = [φ]σ . (2) Since d(φ→x) = x, we obtain, using lemma 3.7 (5),

⇒x ⇒x →x →x →x →x ([φ]σ ⊗ [ψ]σ) = [(φ ⊗ ψ) ]σ = [φ ⊗ ψ ]σ ⇒x ⇒x = [φ]σ ⊗ [ψ]σ .

→x →x (3) Follows from ex = ex and zx = zx. →x ⇒x (4) If d(φ) = x, then φ = φ, hence [φ]σ = [φ]σ. ut Let’s illustrate the meaning of domain-free information by an example.

Example 3.8 Valuation and Interpretation in Propositional Logic. The language of logic can be used to describe information. As an example we consider here propositional logic. The vocabulary of the language of proposi- tional logic consist of a countable number of propositional symbols, denoted as p1, p2,..., the logical constants ⊥ and >, the symbols ¬,∧ and auxiliary symbols like parentheses. Propositional formulae or sentences are formed by the following inductive rules:

1. Each propositional variable is a formula.

2. The constants ⊥ and > are formulae.

3. If A is a formula, then (¬A) is a formula

4. If A and B are formulae, then (A ∧ B) is a formula.

The information expressed by a formula of propositional logic is obtained through its semantics:A truth function is a mapping α which assigns each propositional variable pi a truth value αi ∈ {f, t}. If α is a truth function, then define the extension αˆ of α to all propositional formulae inductively as follows:

1. If pi is a propositional variable, thenα ˆ(pi) = αi. 2.α ˆ(⊥) = f andα ˆ(>) = t.

3. If A is a propositional formula, thenα ˆ(¬A) = f, ifα ˆ(A) = t and αˆ(¬A) = t, ifα ˆ(A) = f. 3.5. THE TRANSPORT OPERATION 45

4. If A and B are formulae, thenα ˆ(A ∧ B) = t ifα ˆ(A) =α ˆ(B) = t and αˆ(A ∧ B) = f otherwise.

A truth function α is called a model of a propositional formula A ifα ˆ(A) = t; then we write α |= A. The set of all models of a formula A will be denoted byr ˆ(A) = {α : α |= A}. Let var(A) denote the set of variables of a formula A. The restriction of a truth function α to the set of variables var(A) is called an interpretation of A. Clearly, truth functions with the same interpretation of A yield the same valueα ˆ of A. So, if {f, t}ω is the set of all truth functions, then a propositional formula A determines a subsetr ˆ(A) of its model. Now, consider the multivariate structure associated with the proposi- tional variables pi, that is, Di = {f, t} and, for any finite subset J of vari- J ω ables DJ = {f, t} . If then S is any subset of {f, t} , and J a finite subset, let

S↓J = {x↓J : x ∈ S}.

↓J ω Further, let ΦJ be the family of sets S for all S ⊆ {f, t} and

[ Φ = ΦJ . J⊆ω, finite

Clearly Φ is an information algebra in the multivariate structure defined ↓J ↓K above. If φ = S ∈ ΦJ and ψ = S ∈ ΦK , then clearly

φ→K = ψ, ψ→J = φ.

We remark now, that if var(A) ⊆ J, K for a propositional formula, then φ =r ˆ(A)↓J and ψ =r ˆ(A)↓K are equivalent, φ ≡ ψ (mod σ), and indeed, they represent the same information expressed by the formula A. So a formula A is a domain-free expression of an information, which, however, byr ˆ(A)↓J may be focussed any time to sets of variables of interest. In this way propositional logic gives rise to an associated information algebra. If projection is neglected, the algebra corresponds to the Linden- baum algebra of propositional logic. This is a Boolean algebra, we refer to Chapter 6 for a discussion of information algebras in relation with Boolean algebras.

Instead of starting with labeled information we may as well start with domain-free information directly and derive labeled information from it. This will be discussed in the following Section 3.6. This gives us an alter- native way to look at information and its algebraic structure. 46 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

3.6 Domain-Free Algebra

Guided by the discussion in the previous Section 3.5, we introduce in this section an alternative algebra of information, which we call domain-free. We shall show how we can derive from it the old labeled variant. This demonstrates then that both, domain-free and labeled information algebras, are two sides of the same coin, two different representations of the same thing, i.e. information as related to questions. So, let Ψ be a set of elements, called domain-free pieces of informa- tion. Let further, as before, D be a lattice of domains. Suppose that two operations are defined,

1. Combination: ⊗ :Ψ × Ψ → Ψ; (φ, ψ) 7→ φ ⊗ ψ,

2. Focusing: ⇒:Ψ × D → D;(φ, x) 7→ φ⇒x.

As before, the elements of Ψ are thought to represent pieces of knowledge or information, this time independent of any domain or question, i.e. in a domain-free way. The operation of combination is again introduced in order to represent aggregation of different pieces of information into a new piece of information, called the combined information. Focusing on the other hand is thought to represent the extraction of information relative to a given domain x, i.e. removing the part not pertaining to the domain x. As we shall see in a moment, focusing is closely related to variable elimination as projection is in the labeled version. We require the system (Ψ,D) to satisfy the following system of axioms:

1. Semigroup: Ψ is associative and commutative under combination.

2. Support: ∀ψ ∈ Ψ, ∃x ∈ D such that ψ⇒x = ψ.

3. Neutrality: ∃e ∈ Ψ such that ∀ψ ∈ Ψ, ψ⊗e = e, and ∀x ∈ D, e⇒x = e.

4. Nullity: ∃z ∈ Ψ such that ∀ψ ∈ Ψ, ψ ⊗ z = z, and ∀x ∈ D, z⇒x = z.

5. Focusing: ∀ψ ∈ Ψ and ∀x, y ∈ D,(ψ⇒x)⇒y = ψ⇒x∧y.

6. Combination: ∀φ, ψ ∈ Ψ and ∀x ∈ D,(φ⇒x ⊗ ψ)⇒x = φ⇒x ⊗ ψ⇒x.

7. Idempotency: ∀ψ ∈ Ψ, and ∀x ∈ D, ψ ⊗ ψ⇒x = ψ.

These axioms require that Ψ is a commutative semigroup under combi- nation, which contains both a neutral element e and a null element z. Both of these elements are unique. A domain x, such that ψ⇒x = ψ is called a support of ψ. It means that the full content of a piece of information ψ per- tains to domain x. The support axiom requires that every element of Ψ has such a support. The focusing axiom means that extracting the information 3.6. DOMAIN-FREE ALGEBRA 47 pertaining to a domain x and then a to domain y, is the same as extracting the information pertaining to the infimum domain x ∧ y, i.e. the “common” part of the two questions represented by x and y. The combination axiom states that combination of a piece of information pertaining to a domain x with any other information and then extracting the part pertaining to x from the aggregated information is the same as first extracting the part of the second information pertaining to x and the combining it with the part of the first one pertaining to x. Idempotency finally says that aggregating a piece of information with part of itself gives nothing new. As with labeled information algebra, we may relax sometimes some ax- ioms. The existence of the neutral element need not be postulated. We may always adjoin a neutral element e by setting φ ⊗ e = e ⊗ φ = φ for all elements φ ∈ Φ and e⇒x = e for all x ∈ D. The support axiom could also be omitted in certain cases. A system (Ψ,D) satisfying these axiom are called a domain-free infor- mation algebra. As we have seen the previous Section 3.5, and in particular in Theorem 3.4, a domain-free algebra is associated with any labeled infor- mation algebra. We shall show below that the converse is true too. But before let’s give an example of a domain-free information algebra.

Example 3.9 System of Subsets A rather trivial example of a domain- free information algebra is given by the subsets of the universe of a lattice D of partitions which contains U as top element. Combination is simply intersection. If P is a partition in D, and φ any subset of the universe D, then φ⇒P is defined as [ φ⇒P = {P ∈ P : P ∩ φ 6= ∅}.

Each element is supported by U. It is left as an exercise to verify the axioms. If the universe does not belong to the lattice D, then the subsets must be limited to those, which are unions of blocks of some partition of D.

In a domain-free information algebra, we call two elements φ, ψ ∈ Φ contradictory or incompatible, if

φ ⊗ ψ = z.

Here are a few results on incompatible elements:

Lemma 3.8 1. ∀x ∈ D and φ ∈ Φ, φ⇒x = z implies φ = z.

2. ∀x ∈ D and φ, ψ ∈ Φ we have φ⊗ψ⇒x = z if, and only if, φ⇒x⊗ψ = z.

3. ∀x, y ∈ D and φ, ψ ∈ Φ we have φ⇒x ⊗ ψ⇒y = z if, and only if, φ⇒y ⊗ ψ⇒x = z. 48 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

Proof. (1) By idempotency φ = φ ⊗ φ⇒x = φ ⊗ z = z. (2) Suppose φ ⊗ ψ⇒x = z. Then, by the combination and nullity axioms

z = (φ ⊗ ψ⇒x)⇒x = φ⇒x ⊗ ψ⇒x = (φ⇒x ⊗ ψ)⇒x

Therefore, by (1) above φ⇒x ⊗ ψ = z. The converse follows by symmetry. (3) Suppose φ⇒x ⊗ ψ⇒y = z. Note that (φ⇒x)⇒y = φ⇒x∧y = (φ⇒y)⇒x by the focusing axiom. Then, by the combination axiom and (2) above

z = (φ⇒x ⊗ ψ⇒y)⇒y = (φ⇒x)⇒y ⊗ ψ⇒y = (φ⇒y)⇒x ⊗ ψ⇒y = φ⇒y ⊗ (ψ⇒y)⇒x = φ⇒y ⊗ (ψ⇒x)⇒y = (φ⇒y ⊗ ψ⇒x)⇒y.

Therefore, by (1) above we have φ⇒y ⊗ ψ⇒x = z. The converse follows by symmetry. ut Support is an important concept of information. The following lemma collects a few properties of support.

Lemma 3.9 1. ∀ψ ∈ Ψ, and ∀x ∈ D, x is support of ψ⇒x.

2. ∀ψ ∈ Ψ, and ∀x, y ∈ D, x is support of ψ implies x is support of ψ⇒y.

3. ∀ψ ∈ Ψ, and ∀x, y ∈ D, x and y are supports of ψ imply x ∧ y is support of ψ.

4. ∀ψ ∈ Ψ, and ∀x, y ∈ D, x is support of ψ implies x ∧ y is support of ψ⇒y.

5. ∀ψ ∈ Ψ, and ∀x, y ∈ D, x is support of ψ implies ψ⇒y = ψ⇒x∧y.

6. ∀ψ ∈ Ψ, and ∀x, y ∈ D, x is support of ψ, and x ≤ y imply y is support of ψ.

7. ∀φ, ψ ∈ Ψ, and ∀x ∈ D, x is support of φ and ψ implies x is support of φ ⊗ ψ.

8. ∀φ, ψ ∈ Ψ, and ∀x ∈ D, x is support of φ, and y is support of ψ imply x ∨ y is support of φ ⊗ ψ.

Proof. (1) The focusing axiom implies that (ψ⇒x)⇒x = ψ⇒x∧x = ψ⇒x. (2) Again by the focusing axiom we derive (ψ⇒y)⇒x = ψ⇒x∧y = (ψ⇒x)⇒y = ψ⇒y since ψ⇒x = ψ. (3) By the focusing axiom again, ψ⇒x∧y = (ψ⇒x)⇒y = ψ⇒y = ψ. (4) Once more, by the focusing axiom, we conclude (ψ⇒y)⇒x∧y = ψ⇒x∧y, since y ∧ (x ∧ y) = x ∧ y. (5) Here we use the focusing axiom to deduce that ψ⇒y = (ψ⇒x)⇒y = ψ⇒x∧y. (6) We have ψ = ψ⇒x = ψ⇒x∧y since x = x ∧ y. But then the focusing axiom gives us ψ⇒x∧y = (ψ⇒x)⇒y = ψ⇒y. This proves that ψ = ψ⇒y. 3.6. DOMAIN-FREE ALGEBRA 49

(7) Here we use the combination axiom to obtain (φ ⊗ ψ)⇒x = (φ⇒x ⊗ ψ)⇒x = φ⇒x ⊗ ψ⇒x = φ ⊗ ψ. (8) Follows from (6) and (7). ut We use now these results to construct a labeled information algebra from a domain-free one: Let (Ψ,D) be a domain free algebra. For all x ∈ D define

⇒x Φx = {(ψ, x): ψ ∈ Ψ, ψ = ψ}.

In words: Φx contains all pairs (ψ, x) of elements ψ from Ψ which have x as a support. Let further [ Φ = Φx. x∈D We define now in Φ the three operations of labeling, combination and pro- jection, where labeling is simply the extraction of the support from a pair (ψ, x) and combination and projection are defined from the operations of combination and focusing in the domain-free algebra (Ψ,D):

1. Labeling: ∀(ψ, x) ∈ Φ define

d(ψ, x) = x. (3.6)

2. Combination: ∀(φ, x), (ψ, y) ∈ Φ define

(φ, x) ⊗ (ψ, y) = (φ ⊗ ψ, x ∨ y). (3.7)

3. ∀(ψ, x) ∈ Φ and ∀y ∈ D such that y ≤ x, define

(ψ, x)↓y = (ψ⇒y, y). (3.8)

Note that combination and projection give indeed results within Φ (use Lemma 3.9 to verify this). The next theorem states that (Φ,D) with the operations as defined above is a labeled information algebra.

Theorem 3.5 (Φ,D) with labeling, combination and projection defined by (3.6), (3.7), and (3.8), respectively, is a labeled information algebra.

Proof. (1) The semigroup axiom follows from the associativity and the commutativity of combination in Ψ and of join in the lattice D. (2) The labeling axiom follows from the definitions of labeling, combina- tion and projection above. (3) Note that all x ∈ D are supports of the neutral element e ∈ Ψ. Therefore (e, x) ∈ Φ for all domains x. This is certainly a neutral element ↓y of combination in Φx. Further we have by definition of projection (e, x) = (e⇒y, y) = (e, y). 50 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

(4) Similarly, all x ∈ D are supports for the null element z ∈ Ψ, hence (z, x) ∈ Φ for all x. The elements (z, x) are null elements in Φx. Further (z, x) ⊗ (e, y) = (z ⊗ e, y) = (z, y). (5) Using the focusing axiom, we obtain by definition of projection (3.8), assuming that z is a support of φ ∈ Φ,

((φ, z)↓y)↓x = (φ⇒y, y)↓x = ((φ⇒y)⇒x, x) = (φ⇒x∧y, x) = (φ⇒x, x), since x ≤ y implies x ∧ y = x. This is the projection axiom for labeled algebras. (6) Assume that x is a support of φ and y a support of ψ. Then the combination axiom of the domain-free algebra allows us to deduce

((φ, x) ⊗ (ψ, y))↓x = (φ ⊗ ψ, x ∨ y)↓x = ((φ ⊗ ψ)⇒x, x) = (φ ⊗ ψ⇒x, x)

But, since y is a support of ψ, we have ψ⇒x = ψ⇒x∧y (see lemma 3.9 (5)). Thus we obtain further

((φ, x) ⊗ (ψ, y))↓x = (φ ⊗ ψ⇒x∧y, x) = (φ, x) ⊗ (ψ⇒x∧y, x ∧ y) = (φ, x) ⊗ (ψ, y)↓x∧y.

This proves the combination axiom of labeled algebras. (7) Let x be a support of φ and y ≤ x. Then, by the idempotency axiom for domain-free algebras

(φ, x) ⊗ (φ, x)↓y = (φ, x) ⊗ (φ⇒y, y) = (φ ⊗ φ⇒y, x) = (φ, x).

So the idempotency axiom for labeled algebras holds too. ut If we start with a labeled information algebra (Φ,D), pass to the cor- responding domain-free algebra (Φ/σ, D) and then from there on to the corresponding labeled algebra (Φ∗,D), then we would expect the latter to be essentially the original algebra. Similarly, if we start with a domain-free algebra (Ψ,D), derive the associated labeled algebra (Ψ∗,D) and construct from it the domain-free algebra (Ψ∗/σ, D) we expect the latter to be essen- tially equal to the first one.

Theorem 3.6 With the denotations introduced above (Φ,D) is isomorphic to (Φ∗,D) and (Ψ,D) to (Ψ∗/σ, D).

Proof. Define a mapping h from (Φ,D) to (Φ∗,D) as follows: for φ ∈ Φ let

∗ h(φ) = ([φ]σ, d(φ)) ∈ Φ . 3.6. DOMAIN-FREE ALGEBRA 51

The mapping h is one-to-one since ([φ]σ, d(φ)) = ([ψ]σ, d(ψ)) implies φ ≡ ψ (mod σ) and d(φ) = d(ψ). This together implies φ = ψ. It is a mapping ∗ onto Φ , since for an element ([ψ]σ, x) ∈ Φ, there must be a φ ∈ Φ with →x [φ]σ = [ψ]σ (namely φ = ψ ) and d(φ) = x. Finally h is a homomorphism, since for φ, ψ ∈ Φ and x ∈ D, if z ≤ d(φ) = x and d(ψ) = y,

h(φ ⊗ ψ) = ([φ ⊗ ψ]σ, x ∨ y) = ([φ]σ, x) ⊗ ([ψ]σ, y) = h(φ) ⊗ h(ψ). ↓z ↓z ⇒z ↓z ↓z h(φ ) = ([φ ]σ, z) = ([φ]σ , z) = ([φ]σ, x) = (h(φ)) . h(ex) = ([ex]σ, x), h(zx) = ([zx]σ, x) and the elements on the right hand side of the equations of the last two lines are the neutral and null elements in the domain x of the algebra (Φ/σ, D). This proves the first part of the theorem. The second part is proved similarly. ut Just as in labeled information algebras related to a multivariate structure we may also define the operation of variable elimination in domain-free algebras. As before denote by V the countable set of variables {X1,X2,...} and let D be the associated lattice of domains. Consider a domain-free information algebra (Φ,D). Let x be a support of φ ∈ Φ. For any variable X ∈ V we define then

φ−X = φ⇒x−{X}. (3.9)

We remark, that this definition is independent of the support of φ used. In fact, if y is another support of φ, then according to the focusing axiom we see that

φ⇒x−{X} = (φ⇒y)⇒x−{X} = φ⇒(x∩y)−{X} = (φ⇒x)⇒y−{X} = φ⇒y−{X}.

For this new operation we prove the following basic set of results:

Lemma 3.10 Let (Φ,D) be a domain-free information algebra with D the lattice of domains associated with the set of variables V . Then the following holds 1. Commutativity of Variable Elimination. ∀φ ∈ Φ and X,Y ∈ V ,

(φ−X )−Y = (φ−Y )−X .

2. Combination. ∀φ, ψ ∈ Φ and X ∈ V ,

(φ−X ⊗ ψ)−X = φ−X ⊗ ψ−X .

3. Idempotency. ∀φ ∈ Φ and X ∈ V ,

φ ⊗ φ−X = φ. 52 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

4. Neutrality and Nullity. ∀X ∈ V ,

e−X = e, z−X = z, .

Proof. (1) Let x be a support of φ. Then x−{X} is a support of φ⇒x−{X} (lemma 3.9 (1)). Thus, by the definition of the variable elimination (3.9) and the focusing axiom of domain-free information algebras, we obtain

(φ−X )−Y = (φ⇒x−{X})⇒x−{X}−{Y } = (φ⇒x−{Y })⇒x−{Y }−{X} = (φ−Y )−X .

(2) Here we use the combination axiom for domain-free information al- gebras, together with the definition of variable elimination (3.9). Let x be a support of φ and y a support of ψ. Then x ∪ y is a support of φ, ψ and φ ⊗ ψ (lemma 3.9 (6) and (8)). Hence

(φ−X ⊗ ψ)−X = (φ⇒(x∪y)−{X} ⊗ ψ)⇒(x∪y)−{X} = φ⇒(x∪y)−{X} ⊗ ψ⇒(x∪y)−{X} = φ−X ⊗ ψ−X .

(3) follows directly from the idempotency axiom for domain-free algebras and the definition of variable elimination. (4) follows since by the neutrality and nullity axioms of domain-free algebras all domains x are supports of both e and z. ut As usual we identify in a a domain-free information algebra (Φ,D) over a multivariate structure the lattice D with the corresponding lattice of subsets of the set V of variables. We call then a domain-free information algebra (Φ,D) locally finite dimensional, if every element φ ∈ Φ has a support x of finite cardinality. Clearly, the domain-free algebra derived from a locally finite dimensional labeled information algebra is locally finite dimensional. If x and y are two finite support sets of φ, then x ∩ y is also a finite support set of φ (lemma 3.9 (3)). Hence, in a locally finite dimensional information algebra, any information element φ ∈ Φ has a unique finite least support ∆(φ). This set is called dimension set of φ. In terms of variable elimination, we can define this set alternatively as

∆(φ) = {X ∈ V : φ−X 6= φ}.

The cardinality of the dimension set is called the dimension of the infor- mation element. The neutral and the null element have dimension zero. Clearly, if V has finite cardinality, then (Φ,D) is locally finite dimensional. But even, if V is not finite, there are locally finite dimensional information algebras. Here are two simple properties of dimension sets: 3.6. DOMAIN-FREE ALGEBRA 53

Lemma 3.11 1. ∀φ ∈ Φ and ∀x ∈ D,

∆(φ⇒x) ⊆ ∆(φ).

2. ∀φ, ψ ∈ Φ,

∆(φ ⊗ ψ) ⊆ ∆(φ) ∪ ∆(ψ).

Proof. (1) ∆(φ) is a support for φ. By lemma 3.9 (5) it is also a support of φ⇒x, thus ∆(φ⇒x) ⊆ ∆(φ). (2) Since ∆(φ) and ∆(ψ) are supports of φ and ψ, by lemma 3.9 (8) ∆(φ) ∪ ∆(ψ) is a support of φ ⊗ ψ, thus ∆(φ ⊗ ψ) ⊆ ∆(φ) ∪ ∆(ψ). ut In a locally finite dimensional algebra, we may recover focusing from variable elimination, just as projection can be recovered in the corresponding case of labeled information algebras. First we define for any finite set x = {X1,X2,...,Xm} of variables

φ−x = (··· ((φ−X1 )−X2 ) ···)−Xm .

According to the commutativity of variable elimination (lemma 3.10 (1) φ−x is well defined and independent of the sequence of the variable elimination. Then define

φ⇒x = φ−∆(φ)−x.

This definition is valid for finite sets x. But it can be extended to arbitrary sets y. Note that, if y is not finite, then φ⇒y = (φ⇒∆(φ))⇒y = φ⇒∆(φ)∩y. As in the case of labeled information algebras, we may here too recover the axioms of a domain-free information algebra. So, we may in a multivari- ate setting, as well work with variable elimination instead of focusing if the algebras are locally finite dimensional. But we may also consider algebras with variable elimination which are not locally finite dimensional, as the next example shows.

Example 3.10 Cylindric Algebras. In order to develop an algebraic theory of predicate logic so-called cylindric algebras were introduced (Henkin et al., 1971). The algebras have as a base a Boolean algebra Φ together with an operation of variable elimination φ−X for all φ ∈ Φ and X element of some set. In relation to this operation the properties of commutativity of variable elimination and of combination, with combination replaced by meet, and nullity of lemma 3.10 above are required. In addition ∀φ ∈ Φ and X

φ ≤ φ−X is added as an axiom. This last property implies the neutrality property of lemma 3.10 above: φ ≤ ψ means that φ ∨ ψ = ψ. Hence e−X = e ∨ e−X = e. 54 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

Since φ ≤ ψ means also that φ ∧ ψ = φ, the last property implies also idempotency (with respect to meet). Cylindric algebras have an additional operation, called diagonalization. If this operation is retracted, the algebra is called a diagonal-free algebra. Thus, a diagonal-free algebra is a domain-free information algebra with meet as combination, although a particular one, since Φ is a Boolean algebra. In the cylindric algebra associated with predicate logic, variable elimination corresponds to existential quantification, see the following example.

Example 3.11 Information Algebras Related to Predicate Logic. As we have seen in Example 3.8 propositional logic can be used to express infor- mation about the truth of certain proposition or combinations of them. This is reflected in the information algebra associated with propositional logic. Now, more generally, predicate logic can be used to express information about the values of certain variables. This is explained in this example. As in propositional logic this leads to information algebras related to predicate logic. As a language, predicate logic has a vocabulary consisting of a count- able family of variables, denoted by X1,X2,..., constants ⊥, >, a finite or countable family of non-logical constants denoted by P1,P2,... and sym- bols ∧, ∨,¬ and ∃. To each logical constant Pi a definite, finite rank ρi (a non-negative integer) is associated. From this vocabulary expressions are formed, based on the following inductive rules:

1. P X ,...,X is an expression for any nonlogical constant P . i i1 iρi i 2. ⊥ and > are expressions.

3. if φ is an expression, then ¬φ and ∃Xi[φ] are expressions. 4. if φ and ψ are expressions, then so are φ ∧ ψ and φ ∨ ψ.

Expressions (1) and (2) are called atomic. The predicate language Λ is the set of expressions obtained from atomic expressions by applying a finite number of times the rules (3) and (4) above. It is convenient to introduce a number of further expressions as short hands for some often used formulas,

φ → ψ = (¬φ) ∨ ψ, φ ↔ ψ = (φ → ψ) ∧ (ψ → φ).

In an expression ∃Xi[φ] the expression φ is called the scope of the ex- istential quantification of Xi and Xi is called quantified in ∃Xi[φ]. A vari- able is bounded in an expression φ, if it appears only in quantified expres- sions. Otherwise it is called free. For instance the variable X1 is free in the expression P1X1 ∧ ∃X1[P2X1X2 ∨ P3X2], but bounded in the expression P1X2 ∧∃X1[P2X1X2 ∨P3X2]. An expression in which all occurring variables 3.6. DOMAIN-FREE ALGEBRA 55 are bounded is called a sentence. Expressions of the predicate language Λ are also called formulas (including sentences). Formulae of predicate logic are interpreted using relational structures.A relational structure is defined by a set U, the domain, and a family R1,R2,... of ρi relations in U, where to each non-logical constant Pi corresponds ex- 1 actly a relation Ri with the same rank ρi . Such a relational structure is denoted by R =< U, R1,R2, . . . >. Further, a valuation is a mapping v : {1, 2 ...} → U. By a valuation variable Xi becomes value v(i) assigned. A valuation v can be extended to a truth valuation vˆR of formulas of Λ relative to a relational structure by the following inductive rules: 1.v ˆ (P X ,...,X ) = t if values v(i ), . . . , v(iρ ) satisfy relation R , R i i1 iρi 1 i i andv ˆ (P X ,...,X ) = f otherwise. R i i1 iρi

2.v ˆR(⊥) = f andv ˆR(>) = t.

3.v ˆR(¬φ) = f, ifv ˆR(φ) = t andv ˆR(¬φ) = t, ifv ˆR(φ) = f.

4.v ˆ(∃Xi[φ]) = t is there exists a valuation u with u(j) = v(j) for j 6= i such thatv ˆR(φ) = t andv ˆR(∃Xi[φ]) = f otherwise.

5.v ˆR(φ ∧ ψ) = t ifv ˆR(φ) = t andv ˆR(ψ) = t, andv ˆR(φ ∧ ψ) = f otherwise.

6.v ˆR(φ∨ψ) = t ifv ˆR(φ) = t orv ˆR(ψ) = t, andv ˆR(φ∨ψ) = f otherwise.

For a sentence σ, the truth valuev ˆR(σ) does not depend on the valuation v used, but only on the relational structure R. Ifv ˆR(σ) = t for some relational structure R we say that σ holds in R and that the latter is a model of σ. We write then R |= σ. If Σ is a set of sentences, then we say Σ holds in R, if all σ ∈ Σ hold in R, i.e. R |= Σ, if R |= σ for all σ ∈ Σ. The relational structure R is then also called a model of Σ. If φ is a formula containing free variables Xi1 ,...,Xin , thenv ˆR(φ) depends only on the values v(i1), . . . , v(in). It may be thatv ˆR(φ) = t for all valuations v. Then we say φ holds in R. Let now Σ be a set of sentences and φ a formula of Λ. Then φ is called a logical consequence of Σ, or φ is implied by Σ, if φ holds in all models of Σ; then we write Σ |= φ. The set of all sentences,

C(Σ) = {σ : σ is a sentence Σ |= σ}, which are logical consequences of Σ are called a theory, or the theory ax- iomatized by Σ. A theory Θ is closed, that is, it contains all sentences it implies, C(Θ) = Θ. Given a relational structure R, the set

ΘR = {σ : σ is a sentence R |= σ}

1In many-sorted logic, different domains man be associated with each variable 56 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA is always a theory, the (first order) theory of R. More generally, any set of relational structures induces in this way a theory, the set of all sentences which hold in all structures of the set. The theory of the empty set of structures is the set of tautologies. Any theory Θ is the theory of some (possibly empty) set of structures, e.g. the set of all it models. Now we are in a position to introduce domain-free information algebras related to predicate logic. Let Σ be a set of sentences and φ and ψ formulas of Λ. Then

φ ≡Σ ψ if Σ |= φ ↔ ψ is an equivalence in Λ. Let [φ]Σ,[ψ]Σ, etc. denote the corresponding equiv- alence classes and FΣ the family of these equivalence classes. Then the following operations are well-defined in FΣ

1. [φ]Σ ∧ [ψ]Σ = [φ ∧ ψ]Σ.

2. [φ]Σ ∨ [ψ]Σ = [φ ∨ ψ]Σ.

c 3. [φ]Σ = [¬φ]Σ.

−Xi 4. [φ]Σ = [∃Xi[φ]]Σ.

It can easily be verified that FΣ is relative to the first three operations a Boolean algebra, and becomes by adding the fourth operation a cylindric algebra, hence an information algebra. Of course, if Θ is the theory of Σ, then FΣ = FΘ. These algebras are also called algebras of (predicate logic) formulas.

The most interesting case arises, if we consider the algebra FΘR asso- ciated with the theory ΘR of a single relational structure R. Note that in this case

φ ≡ΘR ψ ifv ˆR(φ) =v ˆR(ψ).

We write in this case simply φ ≡R ψ,[φ]R for the corresponding equivalence classes and FR for the associated information algebra. Define the set

rˆR(φ) = {v :v ˆR(φ) = t} of valuations v under which φ is true. In other words, stating the formula φ, we indicate the valuations inr ˆR(φ) to be the ones which may contain the “real world”. This is the information conveyed by φ. The setr ˆR(φ) is a ω subset of U . It is a cylindrical set over the free variables Xi1 ,...,Xin of φ, that is, if a valuation v belongs tor ˆR(φ), then any other valuation u with u(i1) = v(i1), . . . , u(in) = v(In) belongs tor ˆR(φ) too. The free variables form the base of the cylindrical set. 3.6. DOMAIN-FREE ALGEBRA 57

The cylindrical setsr ˆR(φ) form a cylindrical algebra with set-intersection, union and complementation as Boolean operations and cylindrification for variable elimination, i.e. for S ⊆ U ω,

S−Xi = {u ∈ U ω : u(j) = v(j) for j 6= i for some v ∈ S}.

It is evident that the following relations hold

vˆR(φ ∧ ψ) =v ˆR(φ) ∩ vˆR(ψ),

vˆR(φ ∨ ψ) =v ˆR(φ) ∪ vˆR(ψ), c vˆR(¬φ) =v ˆR(φ), −Xi vˆR(∃Xi[φ]) =v ˆR (φ).

The algebra of formulas and the corresponding cylindrical set algebra are essentially the same (isomorph). In particular, existential quantification translates into cylindrification of information. Since all elements of the the set algebra have a finite base, the algebra is called locally finite. Any locally finite algebra is isomorph to an algebra of formulas. However, there are cylindrical set algebras, which are not locally finite, for example the algebra of all subsets of U ω. Such an algebra does not correspond to an algebra of formulas, at least not for a language like Λ, where any formula has finite length. We refer to (Langel, 2009) for more on information algebras and predicate logic.

Example 3.12 Quantifier Algebra. The previous example can be gener- alized by abstracting the operation of existential quantification. If Φ is a Boolean algebra, or, more generally, a semilattice, then an existential quan- tifier is a mapping ∃ :Φ → Φ subject to the following conditions:

1. ∃⊥ = ⊥, if ⊥ is the least element in Φ.

2. φ ∧ ∃φ = φ,

3. ∃(φ ∧ ∃ψ) = ∃φ ∧ ∃ψ.

Note that any variable elimination defined by φ 7→ φ−X in a domain-free information algebra is an existential quantifier. More generally, let D be a lattice of subsets of some set I. Assume that there is an existential quantifier ∃(J) for every subset J ∈ D on the Boolean algebra Φ, and that

1. ∃(∅)φ = φ,

2. If J, K ∈ D, then ∃(J ∪ K)φ = ∃(J)(∃(K)φ. 58 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA

Then (Φ,D) is termed a quantifier algebra over D. If we take the meet operation of the Boolean algebra Φ for combination and define φ⇒I−J = ∃(J)φ, then (Φ,Dc) is a domain-free information algebra. Here Dc is the lattice of subsets I − J for J ∈ D. This can also be seen more in the spirit of variable elimination. If, for all i ∈ I, ∃(i) is a an existential quantifier, and ∃(i)∃(j) = ∃(j)∃(i), if i 6= j, one can define ∃(J) = ∃(i1) · · · ∃(im) if J = {1, . . . , m} and ∃(∅)(φ) = φ. Then (Φ,D) is a quantifier algebra. For instance, let Φ be the powerset of some cartesian product of sets Ui for i ∈ I and D a lattice of subsets of I. The mapping ∃(J) is defined by Y ∃(J)A = {b : Ui : ∃a ∈ A such that bi = ai, ∀i 6∈ J}. i∈I

It can be shown that this is an existential quantifier and (Φ,D) forms a quantifier algebra. It is clear, that the operation ∃({i}) is similar to variable elimination. Note also that it is sufficient for Φ to be a semilattice, i.e. a partially ordered set, where meet is defined, in order to define existential quantification and then a quantifier algebra (Φ,D).

Example 3.13 Constraints. The examples of linear equalities and linear inequalities are instances of constraint system. This can be captured by a more general and abstract formulation: Let C be a class of first order formulas, expressing constraints and suppose that Θ is a first order theory of these constraints. Two constraints are equivalent, φ ≡Θ ψ modulo the theory Θ, if Θ |= φ ↔ ψ. This is of course just as in predicate logic in general (Example 3.11 above). We require now further that C is closed under conjunction and existential quantification and contains the constants ⊥, > (as is the case for linear equations and inequalities for instance):

1. ⊥, > ∈ C,

2. If φ, ψ ∈ C, then there exists a η ∈ C, such that φ ∧ ψ ≡Θ η,

3. If φ ∈ C and Xi is a variable, then exists a η such that ∃Xi[φ] ≡Θ η. As usual, the lattice D of domains are the finite subsets of variables in the corresponding multivariate structure. The elements of the algebra FC are the equivalence classes [φ]Θ. Within FC the following operations are defined

1. Combination: [φ]Θ ⊗ [ψ]Θ = [φ ∧ ψ]Θ.

−Xi 2. Variable Elimination: [φ]Θ = [∃Xi[φ]]Θ. This is a domain-free information algebra, as can be easily verified. In contrast to algebras of formula FΘ discussed in Example 3.11, this algebra 3.6. DOMAIN-FREE ALGEBRA 59 is not Boolean in general. Examples are the algebras of linear equations and inequalities. Further examples include Herbrand constraints, Boolean constraints, and interval constraints.

60 CHAPTER 3. AXIOMATICS OF INFORMATION ALGEBRA Chapter 4

Order of Information

The structure of an information algebra permits to introduce a partial or- der between its elements. This order reflects information content. With this order an important issue of information is addressed, which has many repercussions. After defining the order and discussing some of its elemen- tary properties, ideals in information algebras are identified as describing consistent systems of information and as forming themselves an informa- tion algebra, into which the original algebra is embedded. Then algebras containing most informative elements, called atoms, are studied. They are shown to be closely related to relational algebras in a generalized way. Next the issue of finite information elements and the approximation of more gen- eral elements by finite ones is discussed. This leads to compact information algebras, both in a domain-free and a labeled form. A more general form of approximation, represented by the “way below” relation between infor- mation elements is then introduced and the related concept of continuous information algebras is discussed. These last parts of the present chapter testify that compact and continuous information algebras are very close to the subject of domain theory, another theory treating the concept of infor- mation.

4.1 Partial Order of Information

A commutative idempotent semigroup like Φ in an information algebra (Φ,D) is known to be a semilattice. That is, there is a partial order be- tween the elements of Φ. This order expresses in fact a relation with respect to “information content” of the elements of Φ, i.e. a qualitative measure of information. Due to the additional structure of information algebras, several different orders can be defined. The study of these orders is the goal of this Chapter. To start with, consider a domain-free information algebra (Φ,D). We say that an information φ ∈ Φ is more informative than another information

61 62 CHAPTER 4. ORDER OF INFORMATION

ψ ∈ Ψ, and write φ ≥ ψ, if the latter combined with the first one does not change the first one, i.e. if ψ adds nothing to φ,

φ ⊗ ψ = φ. (4.1)

This defines a partial order in Φ:

1. Reflexivity: φ ≤ φ follows from idempotency, φ ⊗ φ = φ.

2. Antisymmetry: φ ≤ ψ and ψ ≤ φ implies φ = ψ, since φ = φ ⊗ ψ = ψ.

3. Transitivity: φ ≤ ψ and ψ ≤ η implies φ ≤ η. This follows since φ ⊗ ψ = ψ and ψ ⊗ η = η implies φ ⊗ η = φ ⊗ ψ ⊗ η = ψ ⊗ η = η.

The following lemma contains a number of simple results concerning this ordering.

Lemma 4.1

1. e ≤ φ ≤ z for all φ ∈ Φ.

2. φ, ψ ≤ φ ⊗ ψ.

3. φ ⊗ ψ = sup{φ, ψ}.

4. φ⇒x ≤ φ.

5. φ ≤ ψ implies φ⇒x ≤ ψ⇒x.

6. φ1 ≤ φ2 and ψ1 ≤ ψ2 imply φ1 ⊗ ψ1 ≤ φ2 ⊗ ψ2.

7. φ⇒x ⊗ ψ⇒x ≤ (φ ⊗ ψ)⇒x.

8. x ≤ y implies φ⇒x ≤ φ⇒y.

Proof. (1) Follows since e ⊗ φ = φ and φ ⊗ z = z. (2) φ ⊗ (φ ⊗ ψ) = (φ ⊗ φ) ⊗ ψ = φ ⊗ ψ shows that φ ≤ φ ⊗ ψ and in the same way with ψ at the place of φ we obtain ψ ≤ φ ⊗ ψ. (3) According to (2) φ ⊗ ψ is an upper bound of φ and ψ. If η is another upper bound of φ and ψ, that is, if φ ⊗ η = η and ψ ⊗ η = η, then (φ ⊗ ψ) ⊗ η = φ ⊗ (ψ ⊗ η) = φ ⊗ η = η, thus φ ⊗ ψ ≤ η and φ ⊗ ψ is in fact the least upper bound of φ and ψ. (4) This follows from idempotency φ⇒x ⊗ φ = φ. (5) Let φ ≤ ψ. Then we have (using idempotency) φ⇒x ⊗ψ⇒x = (φ⇒x ⊗ ψ)⇒x = (φ⇒x ⊗ φ ⊗ ψ)⇒x = (φ ⊗ ψ)⇒x = ψ⇒x. (6) We have (φ1 ⊗ ψ1) ⊗ (φ2 ⊗ ψ2) = (φ1 ⊗ φ2) ⊗ (ψ1 ⊗ ψ2) = φ2 ⊗ ψ2. (7) We have by (4) φ⇒x ≤ φ. Then, by (6), φ⇒x ⊗ ψ ≤ φ ⊗ ψ and thus φ⇒x ⊗ ψ⇒x = (φ⇒x ⊗ ψ)⇒x ≤ (φ ⊗ ψ)⇒x by (5). 4.1. PARTIAL ORDER OF INFORMATION 63

(8) x ≤ y implies x = x ∧ y and therefore φ⇒x ⊗ φ⇒y = φ⇒x∧y ⊗ φ⇒y = (φ⇒y)⇒x ⊗ φ⇒y = φ⇒y because of idempotency. ut Property (3) of this lemma means that any finite set of elements from Φ has a least upper bound in Φ, that is, Φ is a semilattice. We write sometimes also φ ⊗ ψ = φ ∨ ψ to emphasize the fact that the combination of two pieces of information is their join. Order may be introduced in a similar way into a labeled information algebra (Φ,D). As before we define φ ≥ ψ if, and only if, (4.1) holds. As before it can easily be verified that this defines a partial order in Φ. This order is a more technical concept, it has less meaning as an expression of information content. For example we have φ ≤ φ↑y (see point (7) of the lemma below), although φ and φ↑y represent the same information, albeit relative to different domains. In this case we have the following elementary results:

Lemma 4.2

1. φ ≤ ψ implies d(φ) ≤ d(ψ).

2. x ≤ y implies ex ≤ ey and zx ≤ zy.

3. x ≤ d(φ) implies ex ≤ φ and d(φ) ≤ x implies φ ≤ zx. 4. φ, ψ ≤ φ ⊗ ψ.

5. φ ⊗ ψ = sup{φ, ψ}.

6. x ≤ d(φ) implies φ↓x ≤ φ.

7. d(φ) ≤ y implies φ ≤ φ↑y.

8. x ≤ y = d(φ) implies (φ↓x)↑y ≤ φ.

9. φ1 ≤ φ2 and ψ1 ≤ ψ2 imply φ1 ⊗ ψ1 ≤ φ2 ⊗ ψ2. 10. φ→x ⊗ ψ→x ≤ (φ ⊗ ψ)→x.

11. φ ≤ ψ implies φ→x ≤ ψ→x.

12. If d(ψ) ≤ y, then φ ≤ ψ implies φ↑y ≤ ψ↑y.

13. x ≤ y implies φ→x ≤ φ→y.

14. If d(φ) ≤ x ≤ d(ψ), then φ ≤ ψ implies φ ≤ ψ↓x.

We leave the proof, which is similar to the one of Lemma 4.1, to the reader. We may also compare, in a labeled information algebra (Φ,D), informa- tion content of elements φ of Φ relative to a fixed domain x ∈ D (i.e. a fixed 64 CHAPTER 4. ORDER OF INFORMATION

→x →x question): For x ∈ D we write φ ≤x ψ, if φ ≤ ψ , where here the order defined above in labeled information algebras is used. This is a preorder in Φ (i.e. antisymmetry does not hold). If we restrict this order to the domain x, then on Φx it becomes a partial order coinciding with the order defined above on Φ, restricted to Φx. There is of course a strong relation between these three orders of infor- mation.

Theorem 4.1 Let (Φ,D) be a labeled information algebra and (Φ/σ, D) the associated domain-free algebra. Then φ ≤ ψ implies [φ]σ ≤ [ψ]σ which in turn implies φ ≤x ψ for all x ∈ D. On the other hand, ∀x ∈ D, φ ≤x ψ ⇒x ⇒x implies [φ]σ ≤ [ψ]σ

Proof. From φ ⊗ ψ = ψ we deduce that [φ]σ ⊗ [ψ]σ = [φ ⊗ ψ]σ = [ψ]σ, ⇒x ⇒x hence [φ]σ ≤ [ψ]σ. This last inequality implies that [φ]σ ≤ [ψ]σ (lemma →x →x →x →x 4.1 (5)). Thus we conclude that [φ ]σ ≤ [ψ ]σ, hence [φ ]σ ⊗ [ψ ]σ = →x →x →x →x →x →x →x →x [φ ⊗ ψ ]σ = [ψ ]σ, hence φ ⊗ ψ = ψ , therefore φ ≤ ψ . →x →x →x ⇒x Conversely, φ ≤x ψ means that φ ≤ ψ , hence [φ ]σ = [φ]σ ≤ →x ⇒x [ψ ]σ = [ψ]σ ut According to Theorem 4.1 we could also define φ ≤x ψ equivalently by ⇒x ⇒x [φ]σ ≤ [ψ]σ through the domain-free version of the information algebra. Finally, we can also consider a domain-free algebra (Φ,D) and define φ ≤x ψ by φ⇒x ≤ ψ⇒x. To conclude we collect in the next lemma a few elementary properties of the preorder ≤x.

Lemma 4.3 ∀x, y ∈ D and ∀φ, ψ, φ1, ψ1, φ2, ψ2 ∈ Φ,

1. ey ≤x φ ≤x zy,

2. φ, ψ ≤x φ ⊗ ψ,

→y 3. φ ≤x φ,

→y →y 4. [φ]σ ≤ [ψ]σ implies φ ≤x ψ ,

5. [φ1]σ ≤ [ψ1]σ and [φ2]σ ≤ [ψ2]σ imply φ1 ⊗ φ2 ≤x ψ1 ⊗ ψ2,

→y →y →y 6. φ ⊗ ψ ≤x (φ ⊗ ψ) ,

→y →z 7. y ≤ z implies φ ≤x φ .

These properties can be easily proved using lemma 4.2 and theorem 4.1. We add an example to illustrate how the information order in a semiring induced information algebra is induced by the semiring. 4.2. IDEAL COMPLETION OF INFORMATION ALGEBRAS 65

Example 4.1 Order Induced by a Semiring. In example 3.4 we have in- troduced information algebras induced by idempotent semirings. If A is a distributive lattice a valuation ψ is a mapping from the set ΦJ of J-tuples (see example 3.2) into A. Then, by the definition of combination (see ex- ample 3.4) we have φ ≤ ψ if for all x ∈ Dd(φ)∪d(ψ),

φ ⊗ ψ(x) = φ(x↓d(φ)) × ψ(x↓d(ψ)) = ψ(x). (4.2)

This is the case if, and only if, d(φ) ≤ d(ψ) and for all x ∈ Dd(ψ) it holds that φ(x↓d(φ)) ≤ ψ(x), in the partial order of the lattice A. In particular, if d(φ) = d(ψ), then φ ≤ ψ if, and only if, φ(x) ≤ ψ(x) for all x ∈ Dd(ψ).

4.2 Ideal Completion of Information Algebras

Information may be asserted, that is, a piece of information may be stated to hold, to be true. Suppose that a body of information represented by an element φ from a domain-free information algebra Φ is asserted to hold. Then, intuitively, all pieces of information ψ which are part of φ, that is for which ψ ≤ φ holds, should hold too. Moreover, if two pieces of information φ and ψ are asserted, then their combination φ ⊗ ψ should also hold. So, an assertion of an information should be a set of elements from Φ, which satisfy these conditions. This leads to the following definition:

Definition 4.1 Ideals: A non-empty set I ⊆ Φ is called an ideal of the domain-free information algebra (Φ,D), if the following holds:

1. φ ∈ I and ψ ∈ Φ, ψ ≤ φ imply ψ ∈ I.

2. φ ∈ I and ψ ∈ I imply φ ⊗ ψ ∈ I.

Note that this concept is a notion related to a semi-lattice rather than to an information algebra (Davey & Priestley, 1990a). However, it follows immediately, that the neutral element e is contained in every ideal, and that, since φ⇒x ≤ φ, φ ∈ I implies also φ⇒x ∈ I for all domains x and every ideal I. This relates the notion of ideals more specifically to information algebras. Φ itself is an ideal. Any ideal I different from Φ is called a proper ideal. The set of all elements ψ ∈ Φ which are less informative than φ, including φ itself, forms also an ideal. It is called the principal ideal I(φ) of φ. An ideal, as stated above, represents a consistent and complete assertion of an information. The principal ideal I(φ) represents the simple information asserted by the element of information φ alone. The order-relation ψ ≤ φ can, in this respect, be considered as a relation stating that ψ is implied or entailed by φ. 66 CHAPTER 4. ORDER OF INFORMATION

The intersection of an arbitrary number of ideals is clearly still an ideal. Thus we may define the ideal generated by a subset X of Φ by \ I(X) = {I : I ideal in Φ,X ⊆ I}. (4.3)

This represents the full body of information generated, if the pieces of in- formation φ in the set X are asserted. There is also an alternative represen- tation of I(X).

Lemma 4.4 If X ⊆ Φ, then

I(X) = {φ ∈ Φ: φ ≤ φ1 ⊗ · · · ⊗ φm for some

finite set of elements φ1, . . . , φm ∈ X}. (4.4)

Proof. If φ1, . . . , φm ∈ X, then φ1, . . . , φm ∈ I(X), hence φ1 ⊗· · ·⊗φm ∈ I(X), hence φ ≤ φ1 ⊗ · · · ⊗ φm implies φ ∈ I(X). Conversely, the set on the right-hand side in the lemma is clearly an ideal which contains X. Thus this set contains I(X), hence it equals it. ut As a corollary we obtain that if the set X is finite, then

I(X) = I(∨X), (4.5)

The ideal generated from X is the principal ideal of the element ∨X of Φ. It is well known in lattice theory, that a system of sets which is closed under intersection, such as ideals, form a complete lattice (see for example (Davey & Priestley, 1990a)), where the supremum of two ideals I1 and I2 is defined as \ I1 ∨ I2 = I(I1 ∪ I2) = {I : I ideal in Φ,I1 ∪ I2 ⊆ I}.

It is then obvious that this is the natural definition of the aggregation or the combination of two assertions of information. So we equip the family of ideals of an information algebra Φ with the operation of combination

I1 ⊗ I2 = I1 ∨ I2. (4.6)

There is an alternative, equivalent definition of this operation between ideals.

Lemma 4.5

I1 ⊗ I2 = {ψ ∈ Φ: ψ ≤ φ1 ⊗ φ2 for some φ1 ∈ I1, φ2 ∈ I2}. (4.7)

Proof. The right-hand side of equation (4.7) is clearly an ideal and it contains both I1 and I2. Hence it contains I1 ∨ I2. Assume then ψ an element of the right-hand side of equation (4.7), that is, ψ ≤ φ1 ⊗ φ2 and φ1 ∈ I1, φ2 ∈ I2. But, if I is an ideal containing both I1 and I2, we have 4.2. IDEAL COMPLETION OF INFORMATION ALGEBRAS 67

φ1 ∈ I and φ2 ∈ I, hence φ1 ⊗ φ2 ∈ I and therefore ψ ∈ I. This tells us that the right-hand side of equation (4.7) is contained in I1 ∨ I2. ut We may also equip the family of ideals of Φ with a focusing operation. If x ∈ D define

I⇒x = {ψ ∈ Φ: ψ ≤ φ⇒x for some φ ∈ I}. (4.8)

It is easy to verify that this is indeed an ideal. Of course, we have φ⇒x ∈ I⇒x, if φ ∈ I. The family IΦ of ideals, equipped with the operations of combination and focusing as defined above is nearly a domain-free information algebra. This statement is made more precise in the following theorem.

Theorem 4.2 The algebraic structure (IΦ,D), where the operations ⊗ and ⇒ are defined by (4.6) or (4.7) and (4.8), satisfies all axioms of a domain- free information algebra, except the support axiom. If the lattice D has a top element, then (IΦ,D) is an information algebra.

Proof. (1) The semi-lattice axioms (idempotent semi-group) are satisfied since IΦ is a lattice, as remarked above. (2) There is not necessarily a support x for an ideal I. If however D has a top element >, then this is a support for any ideal. In this case the support axiom is satisfied. (3) The ideal I(e) = {e} is clearly the neutral element of combination and I(e)⇒x = I(e), since e⇒x = e for all domains x. (4) The ideal Φ is the null element and Φ⇒x = Φ since z⇒x = z and φ ≤ z for all φ ∈ Φ. (5) Here is the verification of the focusing axiom:

(I⇒x)⇒y = {ψ : ψ ≤ φ0⇒y, φ0 ∈ I⇒x} = {ψ : ψ ≤ (φ⇒x)⇒y = φ⇒x∩y, φ ∈ I} = I⇒x∩y.

(3) To verify the combination axiom, we use equation (4.7):

⇒x ⇒x ⇒x ⇒x (I1 ⊗ I2) = {ψ : ψ ≤ φ , φ ∈ I1 ⊗ I2} 0 ⇒x 0 ⇒x = {ψ : ψ ≤ (φ1 ⊗ φ2) , φ1 ∈ I1 , φ2 ∈ I2} ⇒x ⇒x ⇒x ⇒x = {ψ : ψ ≤ (φ1 ⊗ φ2) = φ1 ⊗ φ2 , φ1 ∈ I1, φ2 ∈ I2} 0 0 0 ⇒x 0 ⇒x = {ψ : ψ ≤ φ1 ⊗ φ2, φ1 ∈ I1 , φ2 ∈ I2 } ⇒x ⇒x = I1 ⊗ I2 . (4) Idempotency can be verified as follows:

I ⊗ I⇒x = {ψ : ψ ≤ φ ⊗ φ0, φ ∈ I, φ0 ∈ I⇒x} = {ψ : ψ ≤ φ ⊗ φ00⇒x, φ, φ00 ∈ I, }. 68 CHAPTER 4. ORDER OF INFORMATION

Since φ ⊗ φ00⇒x ≤ φ ⊗ φ00 ∈ I, it follows that ψ ∈ I ⊗ I⇒x implies ψ ∈ I, hence I ⊗ I⇒x ⊆ I. The converse inclusion is evident. ut The structure (IΦ,D) with the operations ⊗ and ⇒ as defined above, is called the ideal completion of Φ. This terminology is justified by the following theorem:

Theorem 4.3 The domain-free information algebra (Φ,D) can be embedded into the structure (IΦ,D).

Proof. We show that the mapping I :Φ → IΦ defined by I(φ) is an embedding. That is, we show that

1. I(φ ⊗ ψ) = I(φ) ⊗ I(ψ),

2. I(φ⇒x) = I⇒x(φ),

3. I(e) = {e},

4. I(z) = Φ,

5. I(φ) = I(ψ) implies φ = ψ.

(1) is a direct consequence of equation (4.7) applied to the principal ideals I(φ) and I(ψ). (2) We have that ψ ∈ I⇒x(φ) if, and only if, ψ ≤ φ⇒x which holds if, and only if, ψ ∈ I(φ⇒x). (3) holds since there are no elements different from e, for which φ ≤ e holds. (4) holds since for all elements φ ∈ Φ we have φ ≤ z. (5) ψ ∈ I(φ) means that ψ ≤ φ and φ ∈ I(ψ) means that φ ≤ ψ, thus φ = ψ. ut We shall reconsider the structure (IΦ,D) in Section 4.5. If the support axiom does not hold, we can not derive a labeled version from the domain-free algebra (IΦ,D). However in many cases in the original algebra (Φ,D), D has a top element, or it can be adjoined. This is for example the case for algebras based on a multivariate structure, or if D is a partition lattice. Ideal completion is essentially an order-theoretic concept. Therefore, it can also be defined for a labeled information algebra (Φ,D). Here, we consider ideals in Φx for any domain x. Thus, we change Definition4.1 slightly:

Definition 4.2 Labeled Ideals: A non-empty set I ⊆ Φx is called an ideal of the domain x, if the following holds:

1. φ ∈ I and ψ ∈ Φx, ψ ≤ φ imply ψ ∈ I. 4.2. IDEAL COMPLETION OF INFORMATION ALGEBRAS 69

2. φ ∈ I and ψ ∈ I imply φ ⊗ ψ ∈ I.

Again as before, an ideal is a collection of consistent and complete infor- mation, this time with respect to a specified domain or question. An ideal

I in Φx is labeled with the domain d(I) = x. Let IΦx be the set of ideals in Φx. An ideal in Φx is called proper if it is different from Φx. For φ ∈ Φx, the set I(φ) = {ψ ∈ Φx : ψ ≤ φ} is an ideal, called a principal ideal. Define [ IΦ = IΦx . x∈D

Within IΦ we define the operations of combination between ideals and pro- jection of ideals according to the models of (4.7) and (4.8):

1. Combination of Ideals: Let I1 ∈ IΦx and I2 ∈ IΦ2 . Then we define

I1 ⊗ I2 = {ψ ∈ Φx∨y : ψ ≤ φ1 ⊗ φ2 for some φ1 ∈ I1, φ2 ∈ I2}. (4.9)

2. Projection of ideals: Let I ∈ IΦy and x ≤ y. Then we define

↓x ↓x I = {ψ ∈ Φx : ψ ≤ φ for some φ ∈ I}. (4.10)

↓x It is easy to verify that both I1 ⊗ I2 and I are both indeed ideals. So the definitions above are well founded. Note also that φ↓x ∈ I↓x, if φ ∈ I. As is to be expected, (IΦ,D) with labeling, combination and projection as defined above is a labeled information algebra:

Theorem 4.4 The algebraic structure (IΦ,D) with labeling, combination and projection as defined above is a labeled information algebra.

Proof. (1) Labeling follows directly from the definition of combination and projection. (2) Semigroup: Commutativity of combination follows directly from the definition. To verify associativity we invoke labeling and obtain for I1 ∈ IΦx ,

I2 ∈ IΦy and I3 ∈ IΦz

I1 ⊗ (I2 ⊗ I3)

= {ψ ∈ Φx∨y∨z : ψ ≤ φ1 ⊗ η, for some φ1 ∈ I1, η ∈ I2 ⊗ I3}

= {ψ ∈ Φx∨y∨z : ψ ≤ φ1 ⊗ η, for some φ1 ∈ I1,

η ≤ φ2 ⊗ φ3 for some φ2 ∈ I2, φ3 ∈ I3}

= {ψ ∈ Φx∨y∨z : ψ ≤ φ1 ⊗ φ2 ⊗ φ3, for some φ1 ∈ I1, φ2 ∈ I2, φ3 ∈ I3}.

The same result can also be derived for (I1 ⊗ I2) ⊗ I3. This proves associa- tivity of combination. 70 CHAPTER 4. ORDER OF INFORMATION

(3) Neutrality: The principal ideal {ex} is the neutral element of IΦx , since for any I ∈ IΦx ,

{ex} ⊗ I = {ψ ∈ Φx : ψ ≤ ex ⊗ φ = φ for some φ ∈ I} = I, and, for y ≤ x,

↓y ↓y {ex} = {ψ ∈ Φy : ψ ≤ ex = ey} = {ey}.

(4) Nullity: The null element in IΦx is Φx. Certainly we have Φx⊗I = Φx for any I ∈ IΦx , since I ⊆ Φx. Further, if x ≤ y,

↑y Φx ⊗ {ey} = {ψ ∈ Φy : ψ ≤ φ ⊗ ey = φ for some φ ∈ Φx} = Φy, since for all ψ ∈ Φy it holds that ψ ≤ zx ⊗ ey = zy. (5) Projection: Let x ≤ y ≤ z = d(I). Then,

↓y ↓x ↓y ↓x (I ) = {ψ ∈ Φy : ψ ≤ η for some η ∈ I} ↓x ↓y = {ψ ∈ Φx : ψ ≤ φ for some φ ≤ η for some η ∈ I} ↓x ↓x = {ψ ∈ Φx : ψ ≤ η for some η ∈ I} = I .

(6) Combination: Let I1 ∈ IΦx and I2 ∈ IΦy . Then,

↓x ↓x (I1 ⊗ I2) = {ψ ∈ Φx : ψ ≤ φ for some φ ∈ I1 ⊗ I2} ↓x = {ψ ∈ Φx : ψ ≤ φ for some

φ ≤ φ1 ⊗ φ2 for some φ1 ∈ I1, φ2 ∈ I2} ↓x ↓x∧y = {ψ ∈ Φx : ψ ≤ (φ1 ⊗ φ2) = φ1 ⊗ φ2 for some φ1 ∈ I1, φ2 ∈ I2} ↓x∧y = {ψ ∈ Φx : ψ ≤ φ1 ⊗ η2 for some φ1 ∈ I1, η2 ∈ I2 } ↓x∧y = I1 ⊗ I2 .

(7) Idempotency: Let I ∈ IΦy and x ≤ y. Then

↓x ↓x I ⊗ I = {ψ ∈ Φy : ψ ≤ φ ⊗ η form some φ ∈ I, η ∈ I } ↓x = {ψ ∈ Φy : ψ ≤ φ ⊗ η form some φ, η ∈ I} ⊆ I,

↓x since we may select η = ey. On the other hand certainly I ⊆ I ⊗ I , hence I ⊗ I↓x = I. ut Just as with domain-free algebras, the labeled algebra (Φ,D) can be embedded into the labeled algebra (IΦ,D) by the mapping φ 7→ I(φ). We formulate the corresponding theorem, its proof is similar to the proof of Theorem 4.3 and is therefore omitted. 4.3. ATOMIC ALGEBRAS AND RELATIONAL SYSTEMS 71

Theorem 4.5 The labeled information algebra (Φ,D) can be embedded into the labeled information algebra (IΦ,D).

Again, this justifies to call (IΦ,D) the ideal completion of the algebra (Φ,D). We leave it to the reader to examine the relation between ideal comple- tions of associated labeled and domain-free information algebra.

4.3 Atomic Algebras and Relational Systems

Many information algebras contain for any domain x most informative ele- ments, different from the null element. In relational algebra (example 3.3), for example, the one-tuple relations on a certain set of attributes are in the sense of the information ordering most informative elements. In subset sys- tems these are the one-element sets. Such an information is called atomic. This notion can be defined relative to any labeled information algebra:

Definition 4.3 Atoms: An element α ∈ Φ with d(α) = x is called an atom on x, if

1. α 6= zx.

2. For all φ ∈ Φ, d(φ) = x and α ≤ φ implies either α = φ or φ = zx.

This says that no information in a domain, except the null information, can be more informative than an atom. Clearly, the one-tuple relations are exactly the atoms in the relational algebra RT (see Section 3.3). By abuse of language, we shall simply say that tuples are atoms in the relational algebras. On the other hand, an information algebra may well have no atoms. Here are some elementary results about atoms:

Lemma 4.6 1. If α is an atom on x, φ ∈ Φ with d(φ) ≤ x, then α⊗φ = α or zx. 2. If α is an atom on x, and y ≤ x, then α↓y is an atom on y.

3. If α is an atom on x and φ ∈ Φ with d(φ) = x, then either φ ≤ α or α ⊗ φ = zx.

4. If α, β are atoms on x, then either α = β or α ⊗ β = zx.

Proof. (1) Let α ⊗ φ = α ⊗ φ↑x = η such that, multiplying both sides by φ↑x and η, we obtain α ⊗ (φ↑x ⊗ η) = φ↑x ⊗ η. This means that α ≤ φ↑x ⊗ η. Since α is an atom we have either

↑x ↑x α = φ ⊗ η or φ ⊗ η = zx. 72 CHAPTER 4. ORDER OF INFORMATION

In the first case we obtain that α ⊗ φ = α, in the second case we obtain that α ⊗ φ = zx. ↓y (2) Since α 6= zx we have also that α 6= zy (Lemma 3.2 (2)). Assume that α↓y ≤ φ and d(φ) = y. We have (α ⊗ φ)↓y = α↓y ⊗ φ = φ. From (1) of this lemma it follows that either

α ⊗ φ = α or α ⊗ φ = zx. In the first case we obtain α↓y = (α ⊗ φ)↓y = φ and in the second case ↓y ↓y φ = (α ⊗ φ) = zx = zy. (see Lemma 3.2 (1)). This proves that α↓y is an atom on y. (3) is an immediate consequence of (1) of this lemma. (4) Since α is an atom it follows from (1) of this lemma that either α ⊗ β = α or α ⊗ β = zx. In the first case we have that β ≤ α, hence α = β, since β is also an atom. ut Denote by Atx(Φ) the set of all atoms on the domain x in the information algebra Φ and denote by At(Φ) the set of all atoms in Φ. Furthermore define, for φ ∈ Φ, At(φ) = {α ∈ At(Φ) : d(α) = d(φ), φ ≤ α}. (4.11) If α ∈ At(φ) we say also that α is an atom in φ or contained in φ. Clearly, in the relational algebra over a tuple system, a relation R over domain x contains all the tuples f ∈ R. Strictly speaking this means that At(R) = {{f} : f ∈ R}. In the example of the relational algebra RT , any element contains at least one atom, except the empty relations Zx. More than that, any relation R which is not empty, is a lower bound of all the tuples it contains. More precisely, R ≤ {f} for all f ∈ R. But if a relation S on the same domain is another lower bound of At(R), then this means that S ≤ {f}, that is, f ∈ S, for all f ∈ R, hence R ⊆ S. So we have for any other lower bound S ≤ R. Hence, R is the largest lower bound, the infimum of At(R), R = ∧At(R). Moreover, any set of tuples R has an infimum in RT , namely ∧{{f} : f ∈ R} = R.

Any element of the algebra RT is determined by the atoms it contains, and any set of atoms defines an element of RT . These are strong properties which show that the relational algebra RT really is essentially an algebra of sets. This motivates the following definitions: 4.3. ATOMIC ALGEBRAS AND RELATIONAL SYSTEMS 73

Definition 4.4 Atomic Algebras. 1. A labeled information algebra Φ is called atomic, if for all φ ∈ Φ, if φ is not a null element, the set of atoms At(φ) is not empty.

2. It is called atomic composed, if it is atomic and if for all φ ∈ Φ

φ = ∧At(φ). (4.12)

3. It is called atomic closed, if it is atomic composed and if for every subset A ⊆ Atx(Φ) the infimum exists and belongs to Φ.

Example 4.2 Systems of Subsets. We have seen before several examples of information algebras whose elements are subsets of some frames. For instance in Example 3.1 subsets of frames forming a system of compatible frames constitute an information algebra. Clearly, the one-elements sets on any frame are atoms in that frame. The algebra is atomic closed. In fact, this algebra is an example of a generalized relational algebra (see also file systems, Section 3.3). The same is the case of the algebra of subsets in a multivariate structure (see Example 3.2). The information algebras associated with propositional logic (see Example 3.8) and predicate logic (see Example 3.11) are particular cases of subset algebras in multivariate structures. All of them are atomic closed and in fact instances of a relational algebra.

Example 4.3 Affine Spaces. Any one-vector set in a linear space F s (see Example 3.6) is an affine space in that linear space and it is clearly an atom of the algebra of affine spaces. This algebra is not atomic closed, but atomic composed, since an affine space is the set of all vectors it contains. The vectors of F s are s-tuples, and the information algebra of affine spaces is embedded into the relational algebra of these tuples. An interesting property of this atomic composed algebra is that a finite number of atoms of an affine space φ suffice to determine φ. In fact, an affine space φ, defined by a system of m ≤ |s| linearly independent linear equations S(φ), is determined by n = |s| − m + 1 vectors v1, . . . , vn in the s space. If αi = {vi} are the associated atomic affine spaces in F , then

φ = α1 ∧ · · · ∧ αn. (4.13)

Hence, affine spaces are finitely generated by atoms they contain. Thus an affine space, which in general is an infinite set, can either be represented by a finite system of linear equations or, alternatively, by a finite set of atoms, which span the space.

Example 4.4 Convex Polyhedra and Sets. Like affine spaces (previous Ex- ample 4.3), convex polyhedra, or more generally, convex sets, are composed 74 CHAPTER 4. ORDER OF INFORMATION

s of vectors in R , which, as one-vector sets, are atoms in the respective in- formation algebras of convex polyhedra or convex sets. Both algebras are therefore atomic composed, but not atomic closed, since not every set of vectors is convex. It is well-known, that polytopes are the convex hull of its extremal points. This means that a polytope is determined by a finite number of certain atoms it contains (its extremal points). Thus they are, like affine spaces (Example 4.3) finitely generated by some atoms. This does not hold for polyhedra in general.

Her are a few simple results on sets of atoms.

Lemma 4.7 If (Φ,D) be an atomic, labeled information algebra. Then the following results hold.

1. ∀φ, ψ ∈ Φx, φ ≤ ψ implies At(φ) ⊇ At(ψ). 2. ∀φ, ψ ∈ Φ and ∀x ∈ D, φ ≤ ψ implies At(φ→x) ⊃ At(ψ→x).

3. ∀φ, ψ ∈ Φx, At(φ ⊗ ψ) = At(φ) ∩ At(ψ)

Proof. (1) Let α ∈ At(ψ), i.e. φ ≤ ψ ≤ α, Hence α ∈ At(φ). (2) φ ≤ ψ implies φ→x ≤ ψ→x (Lemma 4.2 (11)). Then (2) follows from (1) above. (3) Assume α ∈ At(φ ⊗ ψ), i.e. α ≥ φ ⊗ ψ ≥ φ, ψ. Then α ∈ At(φ) and α ∈ At(ψ), hence α ∈ At(φ)∩At(ψ). Conversely, assume α ∈ At(φ)∩At(ψ), i.e. α ≥ φ, ψ, hence α ≥ φ ⊗ ψ, therefore α ∈ At(φ ⊗ ψ). ut A relational information algebra RT on a tuple system T is atomic closed and the tuples (or the one-tuple relations) are its atoms. Not surprisingly, there are close relations between relational information algebras and atomic information algebra and between tuples and atoms. We begin by showing that if an information algebra is atomic, then its atoms form a tuple system.

Lemma 4.8 If the labeled information algebra (Φ,D) is atomic, then (At(Φ),D) is a tuple system with domain d and projection ↓.

Proof. We have to prove the five axioms of a tuple system (see Section 3.3). The first two axioms correspond simply to the marginalization and transitivity axioms of the information algebra and the third one is a direct consequence of the labeling axiom. In order to prove axiom (4) assume d(α) = x, d(β) = y and α↓x∧y = ↓x∧y β for two atoms α and β. Suppose first that α ⊗ β = zx∨y. Then,

↓x ↓x α = (α ⊗ β) = zx∨y = zx.

But α 6= zx and therefore α ⊗ β 6= zx∨y. 4.3. ATOMIC ALGEBRAS AND RELATIONAL SYSTEMS 75

Since Φ is atomic, the set At(α ⊗ β) is not empty. Let γ ∈ At(α ⊗ β). Then d(γ) = x ∨ y and α ⊗ β ≤ γ. It follows, using Lemma 4.2 (11), that

α = (α ⊗ β)↓x ≤ γ↓x.

↓x ↓x But α is an atom, hence γ = α or γ = zx. The latter case is excluded, ↓x ↓y since γ 6= zx∨y (Lemma 3.2 (2)). So we have γ = α. γ = β is obtained in the same way. So axiom (4) of a tuple system is satisfied. In order to verify axiom (5) let α be an atom with d(α) = x ≤ y. Suppose ↑y ↓x ↓x that α ⊗ ey = α = zy. But then we obtain that α = (α ⊗ ey) = zy = zx. ↑y But this is not possible since α is an atom. Therefore α 6= zy. Because Φ is atomic, the set At(α↑y) is not empty. Let β ∈ At(α↑y). Then d(β) = y and α↑y ≤ β. It follows that

α = (α↑y)↓x ≤ β↓x.

↓x ↓x But α is an atom. Therefore we have either β = α or β = zx. The latter case is excluded since β 6= zy (Lemma 3.2 (2)). This proves axiom (5) of tuple systems. ut The tuple system (At(Φ), D, d, ↓) is a subsystem of the tuple system (Φ, D, d, ↓) (see Lemma 3.4). In other words, if Φ is atomic, the atoms in Φ form already themselves a tuple system. We call the relational algebra associated with this tuple system RAt(Φ). The sets At(φ) belong to RAt(Φ) for every element φ in Φ. So, At can be seen as a mapping from Φ into RAt(Φ).

Theorem 4.6 If (Φ,D) is an atomic information algebra, then At :Φ → RAt(Φ) is a homomorphism.

Proof. We have to show the following:

1. At(φ ⊗ ψ) = At(φ) ./ At(ψ).

↓x 2. At(φ ) = πx(At(φ)).

3. At(ex) = Ex.

(1) Let d(φ) = x and d(ψ) = y. Assume that α ∈ At(φ ⊗ ψ). This means that φ ⊗ ψ ≤ α, hence φ ≤ α and therefore by the monotonicity of marginalization φ = φ↓x ≤ α↓x But α↓x is an atom on x (Lemma 4.6 (2)) and thus α↓x ∈ At(φ). In the same way we find that α↓y ∈ At(ψ). Since d(α) = x ∨ y this shows that α ∈ At(φ) ./ At(ψ). Conversely, assume that α ∈ At(φ) ./ At(ψ). Then d(α) = x ∨ y, α↓x ∈ At(φ), and φ ≤ α↓x ≤ α and also ψ ≤ α↓y ≤ α. This implies φ ⊗ ψ ≤ α and hence α ∈ At(φ ⊗ ψ). This proves (1). 76 CHAPTER 4. ORDER OF INFORMATION

(2) Assume first that α ∈ At(φ↓x). This means that d(α) = x and ↓x φ ≤ α. Let d(φ) = y. Then we have φ ≤ φ ⊗ α. Suppose that φ ⊗ α = zy. Then we obtain

↓x ↓x ↓x α = φ ⊗ α = (φ ⊗ α) = zy = zx.

But this is not possible, since α is an atom. Thus φ ⊗ α 6= zy. Hence, because Φ is atomic, At(φ ⊗ α) 6= ∅. Let β ∈ At(φ ⊗ α) such that d(β) = y and φ ≤ φ ⊗ α ≤ β. This shows that β ∈ At(φ). But we have also that

α = (φ ⊗ α)↓x ≤ β↓x.

Since α and β↓x are both atoms, we must have α = β↓x. This shows that α ∈ πx(At(φ)). ↓x Assume α ∈ πx(At(φ)), i.e. there is a β ∈ At(φ) such that α = β . We have φ ≤ β, hence φ↓x ≤ β↓x and thus α = β↓x ∈ At(φ↓x). This proves (2). (3) follows directly from the definition of At. ut This theorem has two important results as corollaries:

Corollary 4.1

1. If Φ is atomic composed, then At is an embedding.

2. If Φ is atomic closed and if for X,Y ⊆ Atx(Φ), X 6= Y implies ∧X 6= ∧Y , then At is an isomorphism.

Proof. (1) Assume that At(φ) = At(ψ). Then φ = ∧At(φ) = ∧At(ψ) = ψ. (2) Let X ⊆ Atx(Φ) for some domain x. That is, X is a relation in RAt(Φ). Since Φ is atomic closed ∧X exists in Φ and ∧X = ∧At(∧X). But this implies that X = At(∧X), hence the mapping At is onto RAt(Φ). ut The second part of the corollary 4.1 above says that an atomic closed information algebra is under some additional condition essentially identical to a relational algebra over a tuple system. It is a system of subsets. As such it has much more structure than an information algebra. The first part of the corollary says that an atomic composed information algebra is embedded in such a subset system. Atomicity is also closely related to partitions or compatible frames, as the following theorem shows:

Theorem 4.7 If (Φ,D) is an atomic information algebra, then, for x ≤ y, Aty(Φ) is a refining of Atx(Φ) and the latter is a coarsening of the former one.

Proof. Define τ : Atx(Φ) → P(Aty(Φ)) for α ∈ Atx(Φ) by

↓x τ(α) = {β ∈ Aty(Φ) : β = α}. 4.4. FINITE ELEMENTS AND APPROXIMATION 77

We claim that τ is a refinement (see Example 2.3 in Section 2.2). First we note that, since Φ is atomic, there is a β ∈ At(α↑y). But then ↓x ↑y ↓x β ≤ (α ) = α, hence β ∈ τ(α). Thus, ∀α ∈ Atx(Φ), the set τ(α) is not empty. 0 00 0 00 ↓x 0 Next let α 6= α ∈ Atx(Φ). Assume β ∈ τ(α ) ∩ τ(α ), i.e. β = α and β↓x = α00. But this implies α0 = α00, contrary to the assumption. So τ(α0) ∩ τ(α00) = ∅. Finally we have [ τ(α) ⊆ Aty(Φ).

α∈Atx(Φ)

↓x ↓x Consider β ∈ Aty(Φ). Then β ∈ Atx(Φ), hence β ∈ τ(β ). Thus we have [ τ(α) = Aty(Φ).

α∈Atx(Φ)

This concludes the proof that τ is a refinement of Atx(Φ). ut Let F = {Atx(Φ) : x ∈ D}. Then we may consider F as a family of frames, with the order Atx(Φ) ≤ Aty(Φ) if x ≤ y. F is in fact a lattice with Atx(Φ) ∨ Aty(Φ) = Atx∨y(Φ) and Atx(Φ) ∧ Aty(Φ) = Atx∧y(Φ), which is isomorphic to the lattice D. Therefore F is a compatible family of frames (see Example 2.3). Hence RAt(Φ) can be seen as an information algebra of subsets of frames from F and (Φ,D) is homomorphic to this algebra. Clearly, if D has a top element >, then all Atx(Φ) are essentially partitions of At>(Φ) and F is a partition lattice.

4.4 Finite Elements and Approximation

In this section we model another aspect of information, which has not been considered so far in this text. Computers can treat only “finite” informa- tion. “Infinite” information can however often be approximated by “finite” elements. We shall try to model this aspect of finiteness in this section. It must be stressed that we will not capture every aspect of finiteness. For example, we shall not treat questions of computability and related issues, which are also linked to the notion of finiteness. This is postponed to part II. The structures considered here approach the subject of information al- gebras to domain theory. Many aspects we treat in this section, are treated in domain theory, and much more in detail than here. Therefore we shall keep the presentation short. However, the one crucial feature not addressed in domain theory, is focusing. Also domain theory places more emphasis on order, approximation and convergence of information and less on combina- tion. So, although the subject is similar to domain theory, it is treated here with a different emphasis and goal. 78 CHAPTER 4. ORDER OF INFORMATION

We consider a domain-free information algebra (Φ,D). In the set Φ of pieces of information we single out a subset Φf of elements which are considered to be finite. We shall see in a moment in what respect these elements can be considered to be finite. In any way, combination of finite information should again yield finite information. So Φf is assumed to be closed under combination. The neutral element e is considered to be finite and belongs thus to Φf . May be one would expect that focusing of finite information yields also finite information. Although this is reasonable enough in many cases, we do not assume it for the purposes of this section. Any information φ in Φ should be approximated with an arbitrary degree of precision by a system of finite pieces of information. Each approximating finite information should be coarser than φ. The latter must then be the supremum of the approximating elements. Inversely, the approximating system of finite elements must, for any element of it, or more generally for any finite subset of it, contain a finer finite information. This means that φ is approximated by more and more informative finite elements and this captures the idea of convergence to the supremum. This concept corresponds to a directed set:

Definition 4.5 Directed Set. A nonempty subset X of Φ is called a di- rected set, if it contains with any finite subset {φ1, φ2, . . . , φm} ⊆ X, an element φ ∈ X finer than all the elements of the subset,

φ1 ⊗ φ2 ⊗ · · · ⊗ φm ≤ φ.

So, X is directed, if it is non-empty and for any two pieces of informa- tion φ and ψ in X there exists a η ∈ X such that φ ≤ η and ψ ≤ η.A directed set X of Φ is said to converge in Φ, if the supremum of X exists in Φ, that is ∨X ∈ Φ. In order to model approximation of an information φ in Φ by finite elements, we require first that all directed sets of finite pieces of information converge in Φ. This means that the respective suprema are elements of Φ which are approximated by finite elements. But we want more: any element of Φ must be approximated in this way. The finite elements must be dense in Φ, that is any element of Φ is the supremum of a directed set of finite elements. It seems natural to take the directed set of all finite elements smaller than φ as the converging set,

Aφ = {ψ ∈ Φf : ψ ≤ φ}. (4.14) and require that

φ = ∨Aφ. (4.15) These two requirements do not yet say much about finiteness. One thing they say, is that if φ is finite, then, according to equation (4.14) it belongs 4.4. FINITE ELEMENTS AND APPROXIMATION 79 itself to the converging set of it. This is surely an important property of finiteness. But again we want more. We may possibly approximate an element by a directed set X of finite elements which is smaller than the set of all finite elements smaller than ∨X. But then, if φ is a finite element, φ ≤ ∨X, if φ belongs not to X, there must at least exist an element ψ in X finer than φ. This is a so-called compactness property. So we impose the following conditions on Φ and Φf .

1. Convergence: If X ⊆ Φf is a directed set, then the supremum ∨X over X exists and belongs to Φ.

2. (Weak) Density: For φ ∈ Φ

φ = ∨{ψ ∈ Φf : ψ ≤ φ}. (4.16)

3. Compactness: If X ⊆ Φf is a directed set, and φ ∈ Φf such that φ ≤ ∨X, then there exists a ψ ∈ X such that φ ≤ ψ.

Whereas these requirements clearly must be satisfied, we need sometimes a stronger requirement regarding density, namely, that an information φ supported by some domain x can be approximated by finite pieces of infor- mation supported by the same domain. This is expressed by the following strong version of density: For φ ∈ Φ and x ∈ D,

⇒x ⇒x φ = ∨{ψ ∈ Φf : ψ ≤ φ, ψ = ψ }. (4.17)

Below we shall see examples of information algebras satisfying the re- quirement of density, but not the strong density. This shows that (4.16) does not imply (4.17).

Definition 4.6 Compact Domain-Free Information Algebra. A sys- tem (Φ, Φf ,D), where (Φ,D) is a domain-free information algebra, Φf is closed under combination and contains the empty information e, satisfy- ing the three conditions of convergence, density and compactness formulated above, is called a compact information algebra. If moreover the requirement of strong density is satisfied, the algebra is called strongly compact.

Here are a a few examples of compact information algebras:

Example 4.5 Finite Information Algebras: If the information algebra (Φ,D) is finite, i.e. if Φ has only finitely many elements, then it is (trivially) com- pact. In fact all elements are finite, Φf = Φ. Convergence and density, even strong density, hold clearly. If X is a (necessarily finite), directed set, X = {φ1, . . . , φm}, then φ1 ⊗ · · · ⊗ φm ∈ X, by the directedness of X. This implies compactness. Of course this is not a particularly interesting example of a compact algebra. 80 CHAPTER 4. ORDER OF INFORMATION

Example 4.6 Convex Sets: The convex polyhedra are the finite elements n of the information algebra of convex sets in R . In this example again strong density holds: Clearly, the projection of convex polyhedra are still convex polyhedra and convex polyhedra on a given domain x can be used to approximate convex sets on the same domain. in fact, convex polyhedra forma subalgebra of convex sets. We refer to Section 4.6 for a discussion of such an algebra in its labeled version. The full proof is left to the reader.

Example 4.7 Cofinite Sets: Consider the domain-free information algebra n n of subsets of R . A cofinite set in R is the complement of a finite set. Cofi- n nite sets are closed under intersection. Any subset of R is the intersection of all cofinite sets it is contained in. This indicates that the cofinite sets of n R are the finite elements of the domain-free information algebra of subsets n of R . But strong density does not hold, since projections of cofinite sets n to subspaces of R are no more cofinite. So, this algebra is not strongly compact.

We remark that the strong density equation (4.17) implies equation (4.16) because of compactness. So in a compact information algebra, any element is approximated by all finite element it dominates. However, the converse, namely that equation (4.16) implies density, equation (4.17) is not true in general, as the example of cofinite sets above shows. Here is an even more explicit counter-example: Take for Φ the set {0, 1, 2,...}∪{ω}. Define combination as follows:  max(φ, ψ), if φ 6= ω and ψ 6= ω, φ ⊗ ψ = ω, otherwise.

For the lattice take D = {0, 1}, where 0 < 1. The focusing operation is defined as follows:

φ⇒1 = φ,  0, if φ 6= ω, φ⇒0 = ω, otherwise.

As finite elements take the set {0, 1, 2,...}. Then equation (4.15) is satisfied, but (4.17) is not. However, in case Φf is closed also under focussing, that is, (Φf ,D) is a subalgebra of (Φ,D), then weak density does imply strong density. This will be stated and proved later in Lemma 4.9 The first main result is stated in the next theorem. It is less a theorem of information algebra than one of lattice theory. We refer therefore to (Davey & Priestley, 1990a) for similar results and more details.

Theorem 4.8 Let (Φ, Φf ,D) be a compact information algebra. Then: 4.4. FINITE ELEMENTS AND APPROXIMATION 81

1. Φ is a complete lattice.

2. An information φ ∈ Φ belongs to Φf if, and only if, for all directed subsets X ⊆ Φ, if φ ≤ ∨X, then there exists a ψ ∈ X such that φ ≤ ψ.

3. An information φ ∈ Φ belongs to Φf if, and only if, for all subsets X ⊆ Φ, if φ ≤ ∨X, then there exists a finite subset Y ⊆ X such that φ ≤ ∨Y .

Proof. (1) Let X be a subset of Φ. We show first that X has a greatest lower bound, an infimum. To obtain the greatest lower bound of X we define Y to be the set of all the finite lower bounds of X, i.e.

Y = {ψ ∈ Φf : ψ ≤ φ for all φ ∈ X}.

It is easy to see that Y is a directed subset of Φf and therefore, by the convergence axiom, the supremum of Y exists in Φ. We claim that ∨Y is the infimum of X. Obviously, ∨Y is a lower bound of X, since every element of X is an upper bound of Y . Let ψ be another lower bound of X, and by the density axiom ψ = ∨Aψ. But Aψ is contained in Y and thus ψ = ∨Aψ ≤ ∨Y . So ∨Y is the greatest lower bound of X. To prove that each subset X of Φ has a supremum, we consider the set of its upper bounds. The infimum of this set exists, by what we have just proved. With standard methods of lattice theory one proves now that the infimum of the upper bounds of a set X is its supremum (see (Davey & Priestley, 1990a)). So each set X has a supremum. Thus Φ is a complete lattice. (2) Assume that φ ∈ Φf ,X ⊆ Φ is a directed set and φ ≤ ∨X. We define

Y = {ψ ∈ Φf : there exists a χ ∈ X such that ψ ≤ χ}.

Since X is directed, Y is directed too. We claim that ∨Y is an upper bound of X. Let χ be an element of X. Then Aχ is contained in Y and therefore χ = ∨Aχ ≤ ∨Y . So we obtain φ ≤ ∨X ≤ ∨Y and, by the compactness axiom, it follows that there exists a ψ ∈ Y such that φ ≤ ψ. By the definition of the set Y there exists a χ ∈ X such that φ ≤ ψ ≤ χ. For the converse direction assume that for all directed sets X ⊆ Φ, if φ ≤ ∨X, then there exists a ψ ∈ X such that φ ≤ ψ. We take the directed set Aφ and, since φ = ∨Aφ, there exists a ψ ∈ Aφ such that φ ≤ ψ. Because, by the definition of Aφ, ψ ≤ φ and ψ ∈ Φf , we obtain that φ = ψ and thus φ belongs to Φf . (3) The third assertion of the theorem follows from (2) by the following observation. Let X be an arbitrary, not necessarily directed, subset of Φ. Define

Z = {∨Y : Y ⊆ X,Y finite}. 82 CHAPTER 4. ORDER OF INFORMATION

Then we claim that Z is directed and ∨X = ∨Z. That Z is directed can be seen as follows: First, e belongs to Z, since e = ∨∅. If Y1 and Y2 are finite subsets of X, then Y1 ∪ Y2 is finite too, and ∨(Y1 ∪ Y2) is an upper bound of ∨Y1 and ∨Y2 in Z. It is obvious that ∨Z ≤ ∨X, since for each subset Y of X, ∨Y ≤ ∨X. But, X is contained in Z since φ = ∨{φ} ∈ Z for all φ ∈ X. Hence we obtain that ∨X ≤ ∨Z, thus ∨Z = ∨X. Assume then that φ ∈ Φf and φ ≤ ∨X = ∨Z. By (2) there is a ψ ∈ Z such that φ ≤ ψ. But we have ψ = ∨Y for some finite subset Y of X. Conversely, assume φ ≤ ∨X = ∨Z and that Y is a finite subset of X such that φ ≤ ∨Y . Since ∨Y ∈ Z and Z is directed, it follows from (2) that φ ∈ Φf . ut This theorem says that the finite elements of a compact information al- gebra are characterized by the partial order in Φ alone. In lattice theory, elements of a complete lattice, which satisfy condition (2) of the theorem are called finite and those which satisfy condition (3) are called compact. Thus, the elements of Φf are both “finite” and “compact” in this sense. Fur- thermore, lattices in which each element is the supremum of the compact elements it dominates are called algebraic. Hence, a compact information algebra Φ is an algebraic lattice. Further Φ − {z} is an algebraic complete partial order such that, φ ∨ ψ ∈ Φ − {z}, if the φ ⊗ ψ 6= z. Such a struc- ture is called a domain or more precisley a Scott-Ershov domain. We refer to (Davey & Priestley, 1990a) for a discussion of these subjects. So, com- pact information algebras are closely related to domains and share many properties with them. At this point we prove a first important theorem about the interchange- ability of supremum and focusing in the case of strongly compact information algebras:

Theorem 4.9 Let (Φ, Φf ,D) be a strongly compact domain-free informa- tion algebra. If X is a directed subset of Φ and x ∈ D, then _ (∨X)⇒x = φ⇒x. (4.18) φ∈X

Proof. First, note that φ ∈ X implies φ ≤ ∨X, hence φ⇒x ≤ (∨X)⇒x and _ φ⇒x ≤ (∨X)⇒x. φ∈X Conversely, by strong density,

⇒x ⇒x (∨X) = ∨{ψ ∈ Φf : ψ = ψ ≤ ∨X}.

⇒x But, if ψ = ψ ∈ Φf such that ψ ≤ ∨X, then, since X is directed, there is a φ ∈ X such that ψ ≤ φ (see Theorem 4.8 (2)). For this element φ we 4.5. COMPACT IDEAL COMPLETION 83 have ψ = ψ⇒x ≤ φ⇒x. Hence _ (∨X)⇒x ≤ φ⇒x. φ∈X ut This allows us to show that weak density implies strong density in an important case:

Lemma 4.9 If (Φf ,D) is a subalgebra of (Φ,D), weak density implies strong density. Proof. By Theorem 4.9, join and focussing may be exchanged in this case because the set {ψ ∈ Φf : ψ ≤ φ} is directed. Therefore, from (4.16) it follows that ⇒x ⇒x φ = (∨{ψ ∈ Φf : ψ ≤ φ}) ⇒ = ∨{ψ : ψ ∈ Φf , ψ ≤ φ} ⇒x = ∨{ψ ∈ Φf : ψ ≤ φ, ψ = ψ }. This proves the lemma. ut So, in case (Φf ,D) is a subalgebra of (Φ,D), a compact information algebra (Φ, Φf ,D) is automatically strongly compact

4.5 Compact Ideal Completion

Compact information algebras can be obtained from any information algebra (Φ,D) by ideal completion (see Section 4.2). The elements of Φ are then the finite elements of the compact algebra.

Theorem 4.10 Let (Φ,D) be a domain-free information algebra and IΦ the ideals of Φ with the operation of combination and focusing defined by (4.6) and (4.8). If D has a top element, and I :Φ → IΦ is the embedding of Φ in IΦ (see Theorem 4.3), then (IΦ,I(Φ),D) is a strongly compact information algebra with finite elements I(Φ).

Proof. IΦ is an information algebra according to Theorem 4.2. It remains to show that I(Φ) are its finite elements, satisfying convergence, density and compactness axioms. To simplify notation we identify Φ with I(Φ) and write simply φ instead of I(φ). Note then, that φ ≤ I if, and only if, φ ∈ I. More generally, we have I1 ≤ I2 if, and only if I1 ⊆ I2. Convergence: Suppose X ⊆ Φ to be a directed subset. We claim that I(X) (the ideal generated from X, see Section 4.2) equals ∨X in IΦ. That is,  φ ∈ Φ: φ ≤ φ ⊗ · · · ⊗ φ  I(X) = 1 m . for some finite set of elements φ1, . . . , φm ∈ X. 84 CHAPTER 4. ORDER OF INFORMATION

So, if φ ∈ X, then φ ∈ I(X), hence φ ≤ I(X). Thus I(X) is an upper bound of X. Assume that I is another upper bound of X. Then X ⊆ I, hence I(X) ⊆ I or I(X) ≤ I. Thus we have proved that ∨X = I(X) ∈ IΦ. Density: We have

I⇒x = {ψ ∈ Φ: ψ ≤ φ⇒x for some φ ∈ I}.

We have to prove I⇒x = ∨X = I(X) for the set X = {φ ∈ Φ: φ = φ⇒x ∈ I}. Suppose ψ ∈ I(X) such that

ψ ≤ φ1 ⊗ · · · ⊗ φm ⇒x ⇒x = φ1 ⊗ · · · ⊗ φm ⇒x = (φ1 ⊗ · · · ⊗ φm) = φ⇒x = φ ∈ I.

So we have ψ ≤ φ⇒x for some φ ∈ I. Hence ψ ∈ I⇒x and we have shown that I(X) ⊆ I⇒x. Conversely, assume ψ ∈ I⇒x, that is ψ ≤ φ⇒x, φ ∈ I. Then φ⇒x ∈ I, since φ⇒x ≤ φ. We conclude that ψ ∈ I(X), since φ⇒x = (φ⇒x)⇒x ∈ I, hence φ⇒x ∈ X. So we have proved that I⇒x = I(X). Compactness: Let X ⊆ Φ be a directed subset, and φ ≤ ∨X. Define I = ∨X = I(X). Then φ ∈ I(X). But this means that

φ ≤ φ1 ⊗ · · · ⊗ φm, for φ1, . . . , φm ∈ X.

Since X is directed, there is some ψ ∈ X such that

φ1 ⊗ · · · ⊗ φm ≤ ψ.

Hence we have φ ≤ ψ ∈ X. ut We shall see in Part III that compact information algebras can be ob- tained in other ways. Let now, conversely (Φ, Φf ,D) be a compact information algebra. Does it hold that Φ is the ideal completion of Φf ? For this it is of course nec- essary that (Φf ,D) is itself an information algebra, that is a subalgebra of (Φ,D). Or, in other words, it is necessary that Φf is closed not only under combination, but also under focussing. This turns out also to be sufficient, and this is then the representation theorem for information algebras, a result which parallels and extends a basic theorem of domain theory

Theorem 4.11 Representation Theorem for Domain-Free Information Al- gebras. Let (Φ, Φf ,D) be a compact information algebra and (Φf ,D) itself an information algebra. Then (Φ,D) is isomorph to the ideal completion of (Φf ,D). 4.6. LABELED COMPACT INFORMATION ALGEBRA 85

Proof. Let I be an ideal in Φf . Then I is a directed set, and hence ∨I ∈ Φ by convergence in the compact algebra (Φ, Φf ,,D). Let φ = ∨I and ψ ≤ φ, ψ ∈ Φf . By compactness there is then a η ∈ I with ψ ≤ η, and therefore ψ ∈ I. This shows that I = Aφ. So, there is a one-to-one relation between ideals I in Φf and Aφ and hence φ for φ ∈ Φ. It is an isomorphism since

⇒x Aφ⊗ψ = Aφ ⊗ Aψ,Aφ⇒x = Aφ . ut Note that even in a compact information algebra, IΦ 6= Φ, in general, because there may exist ideals I different from Aφ, such that nevertheless ∨I = φ holds. We have seen in Section 4.2 that there is also a concept of ideal com- pletion in labeled information algebra. But so far we have no compactness notion for the labeled information algebras. This gap will be filled in the next section. There it will become clear that for labeled ideal completion a similar compactness result holds.

4.6 Labeled Compact Information Algebra

In a compact information algebra, the approximation of elements by finite ones takes place globally, i.e. domain-free. In practice however, finiteness and approximation is often considered per domain. In other words, a piece of information φ with domain d(φ) is approximated by finite elements of this domain. The strong density condition (Section 4.4) reflects this idea par- tially. In this section a corresponding theory based on labeled information algebra is proposed. We start with a labeled information algebra (Φ,D) and assume that for every domain x ∈ D there exists a set of finite elements Φf,x, which is closed under combination and whose elements approximate every element in Φx. More precisely, we require for all x ∈ D, the convergence, density and compactness property of a compact algebra as introduced in Section 4.4:

1. Convergence: If X ⊆ Φf,x is a directed set, then the supremum ∨X over X exists and belongs to Φx.

2. Density: For φ ∈ Φx,

φ = ∨{ψ ∈ Φf,x : ψ ≤ φ}.

3. Compactness: If X ⊆ Φf,x is a directed set, and φ ∈ Φf,x such that φ ≤ ∨X, then there exists a ψ ∈ X such that φ ≤ ψ.

We extend now Definition 4.6 in the following way: 86 CHAPTER 4. ORDER OF INFORMATION

Definition 4.7 Labeled Compact Information Algebra A system (Φ, Φf ,D), where (Φ,D) is a labeled information algebra, [ Φf = Φf,x x∈D where the sets Φf,x ⊆ Φx are closed under combination, contain the neu- tral element ex, and satisfy the three conditions of convergence, density and compactness formulated above for all x ∈ D, is called a labeled compact information algebra.

Here are two examples of labeled compact information algebras:

Example 4.8 Cofinite Sets: In Example 4.7 we have considered the com- n n pact domain-free information algebra of subsets of R with cofinite sets of R as finite elements. Let now D be the lattice of finite subsets of ω = {1, 2,...}. s Consider, for any s ∈ D the s-tuples t : s → R. They form the domain R . s And the subsets of R for all s∈ D form a labeled information algebra Φ. s The cofinite sets in R are the finite elements forming Φf,s of the domain s s R . Any subset of R is the intersection of all the cofinite sets of the domain it is contained in. Thus, (Φ, Φf ,D) with Φf = ∪s∈DΦf,s is an example of a compact labeled information algebra.

Example 4.9 Convex Sets: The domain-free information algebra of convex n sets in R has been shown to be compact with convex polyhedra as finite elements in Example 4.6. This may be extended to a labeled algebra with s convex sets in domains R for any s ∈ D, where D is the lattice of finite subsets of ω = {1, 2,...} (see the previous example). The convex polyhedra s in domain R are the finite elements of the domain. This is then a compact labeled information algebra. Note that projections of convex polyhedra remain convex polyhedra, and, further, combinations of convex polyhedra yield again convex polyhedra. Thus, the finite elements form a subalgebra of the information algebra of convex sets. This is sufficient for the domain-free algebra associated with the labeled one to be strongly compact, as we shall see (Theorem 4.16). This in contrast to the cofinite sets of the previous Example 4.8, where cofinite sets do not form a subalgebra.

It is evident that Theorem 4.8 carries over with respect to every semi- lattice Φx.

Theorem 4.12 Labeled Compact Information Algebra. Let (Φ, Φf ,D) be a labeled compact information algebra. Then, for all x ∈ D:

1. Φx is a complete lattice. 4.6. LABELED COMPACT INFORMATION ALGEBRA 87

2. An information φ ∈ Φx belongs to Φf,x if, and only if, for all directed subsets X ⊆ Φx, if φ ≤ ∨X, then there exists a ψ ∈ X such that φ ≤ ψ.

3. An information φ ∈ Φx belongs to Φf,x if, and only if, for all subsets X ⊆ Φx, if φ ≤ ∨X, then there exists a finite subsets Y ⊆ X such that φ ≤ ∨Y .

The proof of Theorem 4.8 carries over to this theorem and is therefore not repeated here. The whole algebra Φ is in general not a complete lattice. If, however, the lattice of D is complete, then so is Φ in a labeled compact information algebra:

Theorem 4.13 Let (Φ, Φf ,D) be a labeled compact information algebra, where D is a complete lattice. Then Φ is a complete lattice.

Proof. Consider a subset X ⊆ Φ. Let y = ∨ψ∈X d(ψ). Then we have ↑y ↑y for every ψ ∈ X that ψ ≤ ψ , hence ∨ψ∈X ψ is an upper bound of X. On the other hand, if η is an upper bound of X, then ψ ≤ η for all ψ ∈ X implies y ≤ d(η). Therefore we conclude that ψ↑y ≤ η for all ψ ∈ X, hence ↑y ↑y ∨ψ∈X ψ ≤ η. Thus ∨ψ∈X ψ is the least upper bound of X, i.e. ∨X exists ↑y in Φ and ∨X = ∨ψ∈X ψ . But then the existence of the infimum for any set X ⊆ Φ follows by standard methods of lattice theory: Consider the set Y of all lower bound of X. There is at least one lower bound, namely ez with z = ∧ψ∈X d(ψ). Now, ∨Y ∈ Φ exists by what we have just proved. This supremum ∨Y is a lower bound of X, and and if η is another lower bound of X, then η ∈ Y , hence η ≤ ∨Y . Therefore ∨Y is the infimum of X. Thus ∧X = ∨Y exists for any subset X ⊆ Φ. Hence Φ is a complete lattice. ut In the case of domain-free algebras, ideal completion leads to a compact algebra. The same is true for ideal completion of a labeled information algebra. In particular, Theorem 4.10 carries over to labeled information algebras:

Theorem 4.14 Let (Φ,D) be a labeled information algebra and IΦx the ideals of Φx, x ∈ D, with the operation of combination and projection defined by (4.9) and (4.10). Let [ IΦ = IΦx x∈D be the set of all ideals in Φ. Then (IΦ, Φ,D) is a labeled compact information algebra with finite elements Φ for x ∈ D.

The proof of this theorem is similar to the proof of Theorem 4.10 and is left to the reader. 88 CHAPTER 4. ORDER OF INFORMATION

We are now going to examine the relations between compact labeled and domain-free algebras. Under what conditions is the labeled version of a compact domain-free algebra itself compact, and, vice versa, is the domain- free version of a compact labeled algebra also compact? The next theorem shows that a strongly compact domain-free information algebra induces a compact labeled information algebra. The example of the compact algebra of cofinite sets (Example 4.7) on the other hand indicates that weak density alone is not sufficient for this.

Theorem 4.15 If (Φ, Φf ,D) is a domain-free strongly compact information algebra, then its associated labeled information algebra is compact too.

Proof. Let (Ψ,D) with Ψ = {(φ, x): φ ∈ Φ, φ = φ⇒x} be the labeled in- formation algebra associated with the the information algebra (Φ,D). With Ψx we denote the subset of all elements (φ, x) of Ψ with domain x. We prove first a lemma:

W Lemma 4.10 For any set X ⊆ Ψx and (ψ,x)∈X (ψ, x) ∈ Ψx _ _ (ψ, x) = ( ψ, x). (4.19) (ψ,x)∈X (ψ,x)∈X

Proof. To prove (4.19), define X0 = {ψ :(ψ, x) ∈ X}, and φ = ∨X0. But then, for ψ ∈ X, ψ ≤ φ, we have ψ = ψ⇒x ≤ φ⇒x ≤ φ. So φ⇒x is an upper bound of X, hence φ ≤ φ⇒x and therefore φ = φ⇒x. This shows that x is a support of φ. Then it follows that (ψ, x) ≤ (φ, x) for all (ψ, x) ∈ X. Let (χ, x) be another upper bound of X. Then ψ ≤ χ for all ψ ∈ X0, hence φ ≤ χ and therefore (φ, x) is the supremum of X. As we have seen, x is a W support of (φ, x), hence (ψ,x)∈X (ψ, x) ∈ Ψx. This proves (4.19). ut Now, to prove the theorem, we define

⇒x Ψf,x = {(ψ, x): ψ ∈ Φf , ψ = ψ }.

This will be the set of finite elements on domain x. Clearly, Ψf,x is closed under combination, since Φf is so, and the combination of two elements with support x has support x too. Also (e, x) ∈ Ψf,x. We have to verify the conditions of convergence, density and compactness. In order to show convergence let X ⊆ Ψf,x be a directed set. Then X0 = {ψ :(ψ, x) ∈ X} is directed to, and by convergence in the compact 0 algebra (Φ, Φf ,D), ∨X exists and belongs to Φ. Be Lemma 4.10 _ ∨X = ( η, x) ∈ Ψx. η∈X0

This proves convergence 4.6. LABELED COMPACT INFORMATION ALGEBRA 89

To verify density, we select an element (φ, x) ∈ Ψx. Then, again us- ing Lemma 4.10 and the strong density property in the compact algebra (Φ, Φf ,D), we obtain

∨{(ψ, x):(ψ, x) ∈ Ψf,x, (ψ, x) ≤ (φ, x)} ⇒x = ∨{(ψ, x): ψ ∈ Φf , ψ = ψ ≤ φ} ⇒x = (∨{ψ : ψ ∈ Φf , ψ = ψ ≤ φ}, x) = (φ, x).

This proves density. 0 For compactness consider a directed set X ⊆ Ψf,x and define X ⊆ Φf 0 0 as above. Let (φ, x) ∈ Ψf,x with (φ, x) ≤ ∨X = (∨X , x), hence φ ≤ ∨X . 0 By compactness in the algebra (Φ, Φf ,D) there is a ψ ∈ X with φ ≤ ψ. But then (φ, x) ≤ (ψ, x) ∈ X. This proves compactness. We have thus shown that (Ψ, Ψf,x,D), the labeled information algebra associated with (Φ, Φf ,D) is indeed labeled compact. ut As the example of labeled cofinite sets shows (Example 4.8), the domain- free version of a labeled compact information algebra is not necessarily itself compact. On the other hand, the example of the labeled algebra of convex sets (Example 4.9) indicates that the domain-free version may be compact too, if the finite elements (the convex polyhedra in the Example) form a subalgebra. This holds in general, as the following theorem proves.

Theorem 4.16 Let (Φ, Φf ,D) be a labeled compact information algebra. Then, if (Φf ,D) is a subalgebra of (Φ,D), and D has a top element >, the associated domain-free information algebra (Φ/σ, D) is a strongly compact information algebra with Φf /σ as finite elements.

Proof. We remark first, that Φf /σ is closed under combination. Here we use the assumption that Φf , as a subalgebra of Φ is closed under combi- nation. Hence, if [φ]σ, [φ]σ ∈ Φf /σ, that is φ, ψ ∈ Φf , therefore φ ⊗ ψ ∈ Φf and thus finally [φ]σ ⊗ [φ]σ = [φ ⊗ ψ]σ ∈ Φf /σ. It remains to show that convergence, density and compactness hold with respect to the elements of Φf /σ). Consider then a directed set X ⊆ Φf /σ. We need to show that ∨X exists in Φ/σ. To prove this, we use the following lemma, which is of independent interest:

Lemma 4.11 Let (Φ,D) be a labeled information algebra. For all X ⊆ Φ, for which ∨X ∈ Φ exists, it holds that

[∨X]σ = ∨[X]σ, where [X]σ = {[η]σ : η ∈ X}. 90 CHAPTER 4. ORDER OF INFORMATION

Proof. Define φ = ∨X ∈ Φ, such that [φ]σ = [∨X]σ. If d(φ) = x, we claim that

∨X↑x = φ, where X↑x = {η↑x : η ∈ X|}. In fact, η ≤ η↑x ≤ φ↑x = φ, hence φ = ∨X ≤ ∨X↑x ≤ φ↑x = φ. This proves the claim. Now, η ∈ X implies η ≤ φ, hence [η]σ ≤ [φ]σ and therefore ∨[X]σ ≤ [φ]σ. ↑x ↑x ↑x Assume [χ]σ to be an upper bound of [X]σ. Then η ≤ χ , thus ∨X ≤ ↑x ↑x ↑x ↑x χ . But then we have φ = (∨X) ≤ χ . This proves that [φ]σ is the supremum of [∨X]σ. ut Note now that > is a support of every element in Φ/σ. In particular any element [φ]σ ∈ X has a representant φ with d(φ) = >. Let then

0 X = {φ ∈ Φf,> :[φ]σ ∈ X}.

0 Since (Φ, Φf ,D) is compact, ∨X exists in Φ. By Lemma 4.11 we obtain 0 0 then ∨X = ∨[X ]σ = [∨X ]σ ∈ Φ/σ. This proves convergence. Next consider an element [φ]σ ∈ Φ/σ and select a representative φ ∈ Φ. ⇒x →x Then [φ]σ = [φ ]σ. By the compactness of (Φ, Φf ,D) we have

→x →x φ = ∨{ψ ∈ Φf,x : ψ ≤ φ }.

By Lemma 4.11 we obtain from this

⇒x →x [φ]σ = [∨{ψ ∈ Φf,x : ψ ≤ φ }]σ →x = {[ψ]σ : ψ ∈ Φf,x : ψ ≤ φ }. (4.20)

→x But we have ψ ∈ Φf,x : ψ ≤ φ if, and only if, [ψ]σ ∈ Φf /σ and [ψ]σ = ⇒x ⇒x →x [ψ]σ ≤ [φ]σ. This follows, since from [ψ]σ = [ψ]σ = [ψ ]σ we may select a representant ψ of [ψ]σ in Φf,x. Therefore, from (4.20) we obtain

⇒x ⇒x [φ]σ = ∨{[ψ]σ ∈ Φf,x :[ψ]σ = [ψ]σ ≤ [φ]σ}. This proves strong density. Finally consider a directed set X ⊆ Φf /σ and [φ]σ ∈ Φf /σ such that 0 [φ]σ ≤ ∨X. Select as above X to be the set of representants of elements of X with domain >. Then, if φ is a representant of [φ]σ with d(φ) = >, we have that φ ≤ ∨X0. Now let [η]σ = ∨X, such that according to Lemma 4.11,

0 [η]σ = [∨{ψ : d(ψ) = >, ψ ∈ X .

0 By the compactness of the algebra (Φ, Φf ,D) there is a ψ ∈ X such that φ ≤ ψ. But then [φ]σ ≤ [ψ]σ and [ψ]σ ∈ X. This shows that compactness holds also. ut 4.7. CONTINUOUS ALGEBRAS 91

The requirement that lattice D has a top element (or alternatively that it is a complete lattice) seems to be necessary. Otherwise a directed set in Φf /σ has not necessarily a corresponding directed set of representatives whose supremum exists in Φ. Labeled compact algebras, although they do not necessarily induce as- sociated compact domain-free algebras, as we have seen in the example of cofinite sets, have nevertheless interesting properties, as will be discussed in the following Section 4.7.

4.7 Continuous Algebras

This section is based on a yet unpublished paper by Xuechong Guan 1 and Yongmin Li 2 (Guan & Li, 2010). All main results in this section are from this paper. The finiteness or compactness property of finite elements in a compact algebra, as exhibited in Theorem 4.8 (2) and (3) can be be weakened, and replaced by a more general approximation concept.

Definition 4.8 Way Below. Let Φ be a partially ordered set. For φ, ψ ∈ Φ we write ψ  φ, and say ψ is way below φ if, for any directed set X ∈ Φ, from φ ≤ ∨X it follows that there is a η ∈ X such that ψ ≤ η.

Note that according to this definition, condition (2) of Theorem 4.8 can be rephrased as φ ∈ Φf if, and only if, φ  φ. Here are a few simple consequences of this definition

Lemma 4.12 The following hold for φ, ψ ∈ Φ:

1. ψ  φ implies ψ ≤ φ.

2. ψ  φ and φ ≤ η imply ψ  η.

3. χ ≤ ψ and ψ  φ imply χ  φ.

Proof. (1) Since {φ} is a directed set with φ ≤ ∨{φ}, ψ  φ implies ψ ≤ φ (2) If X ⊆ Φ is a directed set and η ≤ ∨X, then φ ≤ ∨X, and hence from ψ  φ it follows that there is an element χ ∈ X with ψ ≤ χ. Hence ψ  η.

1College of Mathematics and Information Science, Shaanxi Normal University, Xi’an 710062 China and School of Mathematical Science, Xuzhou Normal University, Xuzhou, 221116 China 2College of Mathematics and Information Science, and College of Computer Science, both Shaanxi Normal University, , Xi’an 710062 China 92 CHAPTER 4. ORDER OF INFORMATION

(3) Again, if X ⊆ Φ is a directed set with φ ≤ ∨X, then ψ  φ implies that there is a η ∈ X such that ψ ≤ η, hence χ ≤ η. This shows that χ  φ. ut From now on, Φ will be part of an information algebra (Φ,D), either labeled or domain-free. Then the following holds: Lemma 4.13 Assume that (Φ,D) is a domain-free information algebra. Then the following holds: 1. ∀φ ∈ Φ, the set {ψ : ψ  φ} is an ideal. 2. ψ  φ if, and only if, for all X ⊆ Φ such that ∨X exists and φ ≤ ∨X, there exists a finite subset F ⊆ X such that ψ ≤ ∨F . Proof. (1) Assume ψ  φ and χ ≤ ψ. Then by (3) of Lemma 4.12 it follows that χ  φ. Further suppose ψ1  φ and ψ2  φ. Consider any directed set X such that φ ≤ ∨X. Then there exist η1 ∈ X and η2 ∈ X such that ψ1 ≤ η1 and ψ2 ≤ η2. Since X is directed there exists also an element η ∈ X such that η1, η2 ≤ η. Therefore we conclude that ψ1⊗ψ2 ≤ η1⊗η2 ≤ η, hence ψ1 ⊗ ψ2  φ. This proves that {ψ : ψ  φ} is an ideal. (2) Suppose ψ  φ. Let X ⊆ Φ be a directed subset of Φ such that ∨X exists in Φ and φ ≤ ∨X. Let Y be the set of all finite joins of elements of Y . Then we have X ⊆ Y . Also every element of Y is bounded above by ∨X. Conversely, if η is an upper bound of Y , then it certainly is also an upper bound of X, hence ∨X ≤ η. So, ∨X is the least upper bound of Y , or ∨X = ∨Y . Furthermore, Y is directed. So there is an element η ∈ Y such that ψ ≤ η. But η = ∨F for some finite subset F ⊆ X. Conversely, let X be a directed subset such that ∨X exists in Φ and φ ≤ ∨X. Then there is a finite subset F ⊆ X such that ψ ≤ ∨F . Since X is directed, there is an element η ∈ X such that ∨F ≤ η, hence ψ ≤ η. So ψ  φ, completing the other side of the proof. ut We define now what we mean by continuous information algebras. Definition 4.9 Domain-Free Continuous Information Algebra. Let (Φ,D) be a domain-free information algebra. If there exists a set B ⊆ Φ which is closed under combination, and such that it holds that 1. Convergence: If X ⊆ B is directed, then W X exists in Φ. 2. (Weak) Density: For all φ ∈ Φ φ = ∨{ψ ∈ B : ψ  φ}, (4.21) then (Φ,D) is called continuous. If instead of (2) it holds that, for all φ ∈ Φ and x ∈ D, φ⇒ = ∨{ψ ∈ B : ψ = ψ⇒x  φ} = φ}, (4.22) then (Φ,D) is called strongly continuous. 4.7. CONTINUOUS ALGEBRAS 93

In the strong version of density, approximation of elements on Φ by elements way-down is required to be possible on domains which support φ. This is similar to the case of compact information algebra, where strong density requires that elements may be approximated by finite elements of the same domain. A similar definition can also be given for labeled information algebra. There it is convenient and natural to consider the way-below relation per domain x ∈ D. Thus we consider for any x ∈ D the partially order set Φx of elements with domain x and define in this set the way-below relation x. This leads then to the following definition:

Definition 4.10 Labeled Continuous Information Algebra. Let (Φ,D) be a labeled information algebra. If there exists a set Bx ⊆ Φx for all do- mains x ∈ D which is closed under combination, and such that it holds that for all x ∈ D W 1. Convergence: If X ⊆ Bx is directed, then X exists in Φx.

2. Density: For all φ ∈ Φx

φ = ∨{ψ ∈ Bx : ψ x φ}, (4.23) then (Φ,D) is called continuous.

Note that it is not excluded that the basis B of a continuous information algebra equals Φ. Here is an example which illustrates this.

Example 4.10 (Guan & Li, 2010). Let Φ = [0, 1] and D = {0, 1}. The operations of combination and focusing are define as follows:

∀φ, ψ ∈ Φ, φ ⊗ ψ = max{φ, ψ}, ∀φ ∈ Φ, φ⇒1 = φ,  1 ⇒0 φ if φ ∈ [0, 2 ], φ = 1 1 (4.24) 2 if φ ∈ ( 2 , 1]. The reader may verify that this defines indeed a domain-free information algebra. We have φ  ψ if, and only if, φ < ψ or φ = ψ = 0. As a basis we may take here B = Φ. Then, we claim that

⇒x ⇒x φ = ∨{ψ ∈ Bx : ψ = ψ x φ}. (4.25)

In fact, it is obvious that this identity holds for x = 1. To verify it for x = 0 1 assume first φ ∈ ( 2 , 1]. In this case 1 1 ∨{ψ ∈ B : ψ = ψ⇒0  φ} = ∨{ψ ∈ [0, ]; ψ < φ} = = φ⇒0. x 2 2 94 CHAPTER 4. ORDER OF INFORMATION

1 Otherwisen, if φ ∈ [0, 2 ], we have 1 ∨{ψ ∈ B : ψ = ψ⇒0  φ} = ∨{ψ ∈ [0, ]; ψ < φ} = φ = φ⇒0. x 2 Convergence is evident. So (Φ,D) is indeed a strongly continuous informa- tion algebra. It is not compact, because the only finite element, defined by φ  φ, is 0.

So, as the example shows, a continuous information algebra, in general, is not compact. But, inversely, a compact information algebra is always continuous.

Theorem 4.17 A (strongly) continuous domain-free information algebra (Φ,D) is (strongly) compact if, and only if, the set {φ ∈ Φ: φ  φ} is a basis for (φ, D).

Proof. Let first (Φ,D) be continuous with basis {φ ∈ Φ: φ  φ}. With this set as finite elements, (Φ,D) becomes in fact compact: Conver- gence and (strong) density are inherited from the continuity of the algebra. Compactness follows from the definition of the way-below relation φ  φ. Conversely, the relation φ  φ defines compactness of elements φ. So (Φ, Φf ,D) is compact with Φf = {φ ∈ Φ: φ  φ}, if additionally conver- gence and (strong) density hold. Note that φ ∈ Φf and ψ ≤ φ} implies ψ  φ (Lemma 4.12 (3)). But then {φ ∈ Φ: φ  φ} is in fact a basis and (Φ,D) is (strongly) continuous: ut So, any (strongly) compact domain-free information algebra is (strongly) continuous with the finite elements as basis. The converse holds only if, {φ ∈ Φ: φ  φ} is a basis of the continuous algebra. This clarifies the relation between compactness and continuity. In a compact information algebra, a further property of the way-down relation can be proved:

Theorem 4.18 If (Φ, Φf ,D) is a compact, domain-free information alge- bra, then χ  φ implies the existence of an element ψ ∈ Φf such that χ ≤ ψ ≤ φ.

Proof. Consider the set Aφ = {ψ ∈ Φf : ψ ≤ φ} which is directed and for which φ ≤ ∨Aφ. Then χ  φ implies that there is a ψ ∈ X such that χ ≤ ψ, hence χ ≤ ψ ≤ φ. ut A set of elements S having the property that χ  φ implies the existence of an element ψ ∈ S such that χ ≤ ψ ≤ φ is called separating. Thus, the set of finite elements Φf in a domain-free compact information algebra are separating. When we go over to labeled compact information algebras (Φ, Φf,x,D), then Theorem 4.17 applies to the algebras (Φx, Φf,x,D) for all x ∈ D, if the 4.7. CONTINUOUS ALGEBRAS 95

relation x is defined relative to Φx. In particular, Φf,x is separating in Φx. Further the labeled information algebra (Φ,D) is continuous. So, the situation is like with domain-free algebras. Just as a compact information algebra is a complete lattice (Theorem 4.8), continuous information algebras are also closely related to complete lattices. This is the issue we examine next.

Theorem 4.19 Let (Φ,D) be a domain-free information algebra. The the following is equivalent:

1. (Φ,D) is continuous.

2. Φ is a complete lattice and for all φ ∈ D,

φ = ∨{ψ ∈ Φ: ψ  φ}. (4.26)

Or, alternatively the following is also equivalent:

1. (Φ,D) is strongly continuous.

2. Φ is a complete lattice and for all φ ∈ D and x ∈ D,

φ⇒x = ∨{ψ ∈ Φ: ψ = ψ⇒x  φ}. (4.27)

Proof. We prove first the case of a continuous information algebra. (1) ⇒ (2): Suppose that B is a base for (Φ,D. Consider a set X ⊆ Φ. We claim that V X exists. Define Y = {ψ ∈ B : ψ ≤ φ for all φ ∈ X}. Since B is closed under combination, Y is a directed set. Hence W Y exists and is a lower bound for X. Assume that ψ ∈ Φ is another lower bound of X and define Aψ = {η ∈ B : η  ψ}. Then Aψ ⊆ Y , since η  ψ W W W implies η ≤ ψ (Lemma 4.12 (1)). Therefore, ψ = Aψ ≤ Y and Y is the infimum of X, W Y = V X. By a standard result of lattice theory, Φ is then a complete lattice. Identity (4.26) is simply the density property of the continuous algebra (Φ,D). (2) ⇒ (1): Take B = Φ as a base. Convergence follows then from completeness of the lattice Φ and density is assured by (4.26). The second part, concerning strongly continuous information algebras is proved in the same way. ut At this place we are in a position to state and prove a similar theorem for compact information algebras. The following theorem enforces Theorem 4.8:

Theorem 4.20 Let (Φ,D) be a domain-free information algebra. The the following is equivalent:

1. (Φ, Φf ,D) is compact. 96 CHAPTER 4. ORDER OF INFORMATION

2. Φ is a complete lattice and for all φ ∈ D,

φ = ∨{ψ ∈ Φf : ψ  φ}, (4.28)

where Φf = {φ ∈ φ : φ  φ}. Or, alternatively the following is also equivalent:

1. (Φ, Φf ,D) is strongly compact. 2. Φ is a complete lattice and for all φ ∈ D and x ∈ D, ⇒x ⇒x φ = ∨{ψ ∈ Φf : ψ = ψ  φ}, (4.29)

where Φf is defined as above. Proof. (1) ⇒ (2): Compact algebras are also continuous. Therefore this implication follows from Theorem 4.19. (2) ⇒ (1): Convergende follows again from completeness of the lattice Φ. If the finite elements Φf are defined as {φ ∈ Φ: φ  φ}, then density or strong density follow from (4.28) or (4.29) respectively. Compactness is a consequence of the way-below relation φ  φ. ut The last two theorems state in other words, that a domain-free infor- mation algebra (Φ,D) is a continuous or compact information algebra if, and only if, Φ is a continuous or algebraic lattice! For labeled information algebras, analogous theorems apply for the domains Φx for all x ∈ D. We leave it to the reader to write out the corresponding statements. Next we examine the domain-free algebra (Φ/σ) associated with the labeled compact information algebra (Φ, Φf,x,D). In general this algebra is not compact, as the example of cofinite sets indicates. But it turns out that it is at least continuous, as the following discussion shows. As usual, [φ]σ denotes the equivalence class of elements of Φ representing the same (domain-free) information, that is the elements of Φ/σ. Using Lemma 4.11, the following lemma can be proved.

Lemma 4.14 If [ψ]σ  [φ]σ in Φ/σ and d(ψ) = d(φ) = x, then ψ x φ. Conversely, if D has a top element >, then ψ > φ implies [ψ]σ  [φ]σ.

Proof. Assume X ⊆ Φx directed and φ ≤ ∨X. Consider then [φ]σ ≤ ∨[X]σ with [X]σ defined as in Lemma 4.11. The set [X]σ is directed, there- fore [ψ]σ  [φ]σ implies that there is a η ∈ X such that [ψ]σ ≤ [η]σ, hence ψ ≤ η. This proves that ψ x φ. For the second part, assume ψ > φ and consider a directed set X in Φ/σ. We may take as representants of the elements [η]σ of X elements in 0 0 Φ>. Let then X = {η ∈ Φ> :[η]σ ∈ X}. X is then still directed. Now, if W W 0 [φ]σ ≤ X and φ is again a representant of [φ]σ in Φ>, then also φ ≤ X . 0 Since ψ > φ, there is an element η ∈ X such that ψ ≤ η. But then [η]σ ∈ X and [ψ]σ ≤ [η]σ. This shows that [ψ]σ  [φ]σ. ut This permits to prove the main theorem. 4.7. CONTINUOUS ALGEBRAS 97

Theorem 4.21 If (Φ,D) is a continuous labeled information algebra, and lattice D has a top element >, then (Φ/σ, D) is strongly continuous. Conversely, if the continuous domain-free information algebra (Φ,D) is strongly continuous, then the associated labeled information algebra (Ψ,D) is continuous. Proof. We assume first that (Φ,D) is a continuous labeled information algebra, and lattice D has a top element >. We are going to show that Φ/σ is complete. Consider any subset X ⊆ Φ/σ. Since D has a top element >, we may for all elements [φ]σ of X take a representant φ ∈ Φ>. Then let 0 X = {φ ∈ Φ> :[φ]σ ∈ X}. Since(Φ,D) is a continuous, Φ> is a complete lattice (by the labeled version of Theorem 4.19), hence W X0 exists. But by W 0 W Lemma 4.11 we have [ X ]σ = X, hence by standard results of lattice theory Φ/σ is indeed a complete lattice. Next we show that strong density (4.11) holds in Φ/σ. Fix a domain ⇒x 0 x ∈ D and consider [φ]σ . We may take a representant φ in Φx for this ⇒x equivalence class. Also in the set {[ψ]σ ∈ Φ/σ :[ψ]σ = [ψ]σ  [φ]σ} we ⇒x may take representants ψ ∈ Φx. Further, we have that [ψ]σ = [ψ]σ  [φ] ⇒x if, and only if, [ψ]σ = [ψ]σ  [φ]σ. Therefore we obtain form the continuity of Φx, and by Lemma 4.14 0 0 φ = ∨{ψ ∈ Φx : ψ x φ }. Applying Lemma 4.11, and again Lemma 4.14, this in turn gives us ⇒x 0 [φ]σ = [∨{ψ ∈ Φx : ψ x φ }.]σ ⇒x ⇒x = ∨{[ψ]σ ∈ Φ/σ :[ψ]σ = [ψ]σ  [φ]σ . This is the required strong continuity in the complete lattice Φ/σ. By The- orem 4.19 (Φ/σ, D) is then strongly continuous. Assume next, that (Φ,D) is a domain-free labeled algebra. By definition we have Ψ = {(φ, x): φ ∈ Φ, φ = φ⇒x}. Let B be a base in the s-continuous ⇒x algebra (Φ,D) and define Bx = {(φ, x): φ ∈ B, φ = φ }. We claim that Bx is a base in Ψx. In fact, if (φ1, x) and (φ2, x) belong to Bx, then, since ⇒x φ1 ⊗ φ2 ∈ B and φ1 ⊗ φ2 = (φ1 ⊗ φ2) , it follows that (φ1, x) ⊗ (φ2, x) = (φ1 ⊗ φ2, x) belongs to Bx too. Further, let X ⊆ Bx be directed. Then X0 = {φ :(φ, x) ∈ X} is directed too, and W X0 exists in the base B. But W W 0 W by Lemma 4.10, X = ( X , x), hence we conclude that X ∈ Ψx. Again by Lemma 4.10 and also Lemma 4.14, _ {(ψ, x) ∈ Bx :(ψ, x) x (φ, x)} _ ⇒x = {(ψ, x): ψ ∈ B, ψ = ψ , (ψ, x) x (φ, x)} _ ≥ {(ψ, x): ψ ∈ B, ψ = ψ⇒x  φ = φ⇒x} _ = ( {ψ : ψ ∈ B, ψ = ψ⇒x  φ = φ⇒x}, x) = (φ, x). 98 CHAPTER 4. ORDER OF INFORMATION

W Since (φ, x) ≥ {(ψ, x) ∈ Bx :(ψ, x) x (φ, x)}, we conclude that equality must hold and this shoes that Ψx is a continuous lattice with base Bx. ut The case of the domain-free algebra associated with a compact labeled information algebra allows also to determine a separative set in the domain- free algebra:

Theorem 4.22 If (Φ, Φf ,D) is a compact labeled information algebra, and D has a top element, then [χ]σ  [φ]σ implies the existence of an element ψ ∈ Φf,x for some x ∈ D such that [χ]σ ≤ [ψ]σ ≤ [φ]σ.

Proof. Let x be a support of [χ]σ and [φ]σ. Then we may assume d(χ) = d(φ) = x and by Lemma 4.14 it follows that χ x φ. From this, by Theorem 4.17 (3) we conclude that there is an element ψ ∈ Φf,x with χ ≤ ψ ≤ φ. But this implies [χ]σ ≤ [ψ]σ ≤ [φ]σ. ut This shows that we may consider the elements [ψ]σ with a representant ψ ∈ Φf,x for some x ∈ D in some sense as a generalization of finite elements in a compact domain-free information algebra.

Example 4.11 Cofinite Sets. In this example the equivalence classes of cofinite sets on some domain x form a separative set in the domain-free version of the set algebra. Let’s call such classes cofinite too. Note that in this example χ ≤ ψ, d(χ) = d(ψ) and ψ cofinite implies that χ is cofinite too. Theorem 4.22 says then that [ψ]σ  [φ]σ implies that [ψ]σ is cofinite, i.e. all elements way below another one are cofinite.

This completes the picture of continuous and compact information alge- bras and the relations between their labeled and domain-free versions. The situation is summarized in the following table

Labeled Domain-free algebra algebra

compact ← Th. 4.15/4.16 → s-compact ↑ ↑ Bx = Φf,x/Th.4.17 B = Φf /Th. 4.17 ↓ ↓ continuous ← Th. 4.21 → s-continuous Chapter 5

Mappings

5.1 Order-Preserving Mappings

Let now (Φ,DΦ) and (Ψ,DΨ) be two information algebras. To simplify notation we denote combination and focusing in the two algebras Φ and Ψ by the same symbols, and the same holds for the order, meet and joins in the two lattices DΦ and DΨ. It will always be clear from the context, were the operations take place. We study order-preserving or monotone mappings between these two algebras. A mapping o :Φ → Ψ is called order-preserving, if

φ ≤ ψ implies o(φ) ≤ o(ψ). (5.1)

Such a mapping can be thought of as a processing which takes an element of Φ as input and produces an element of Ψ as output. (5.1) says then that less informative inputs produce less informative outputs. As an immediate consequence we see that, since φ, ψ ≤ φ ⊗ ψ, we have also o(φ)⊗o(ψ) ≤ o(φ⊗ψ). If equality holds here, we have a semi-group ho- momorphism. Inversely, a semi-group homomorphism is an order-preserving mapping. A few simple examples of monotone mappings from (Φ,DΦ) into itself are as follows:

1. Identity mapping: o(φ) = φ.

2. Constant mapping: o(φ) = ψ.

3. Focusing: o(φ) = φ⇒x

4. Conditioning: o(φ) = ψ ⊗ φ for a fixed ψ ∈ Φ.

In the first and third example we have that o(e) = e and if φ 6= z then o(φ) 6= z. Such mappings are called strict. These properties do not hold

99 100 CHAPTER 5. MAPPINGS in the second and fourth example, hence these mappings are not strict, but they are semi-group homomorphisms. Furthermore, compositions of monotone mappings are monotone. That is, if o1 is an order-preserving mapping from (Φ1,DΦ1 ) into (Φ2,DΦ2 ) and o2 is an order-preserving mappings from (Φ2,DΦ2 ) into (Φ3,DΦ3 ), then o1 ◦ o2 is a monotone mapping from (Φ1,DΦ1 ) into (Φ3,DΦ3 ). We denote the set of all monotone mappings from (Φ,DΦ) into (Ψ,DΨ) by [Φ → Ψ]. We are going to define in this set an operation of combination. Thus, if o1, o2 ∈ [Φ → Ψ], then o1 ⊗ o2 is defined by

o1 ⊗ o2(φ) = o1(φ) ⊗ o2(φ). (5.2)

If we look at o1 and o2 as two processing procedures, which apply to the same input, then they can be combined into one procedure, simply by combining the outputs. If φ ≤ ψ, then o1 ⊗ o2(φ) = o1(φ) ⊗ o2(φ) ≤ o1(ψ) ⊗ o2(ψ) = o1 ⊗ o2(ψ), hence o1 ⊗ o2 ∈ [Φ → Ψ]. Clearly, this operation is associative, commutative and the constant mapping defined by o(φ) = e for all φ ∈ Φ is the neutral element of this operation. Combinations of strict mappings are strict. We are next going to define focusing of monotone mappings. Note that the cartesian product of two lattices DΦ × DΨ becomes also a lattice, if we define (x, y) ≤ (x0, y0) if, and only if, x ≤ x0 and y ≤ y0. We then have (x, y) ∧ (x0, y0) = (x ∧ x0, y ∧ y0). So we define, with respect to this product lattice, o⇒(x,y) by

o⇒(x,y)(φ) = (o(φ⇒x))⇒y. (5.3)

We remark that φ ≤ ψ implies φ⇒x ≤ ψ⇒x. Hence we conclude that o⇒(x,y)(φ) = (o(φ⇒x))⇒y ≤ (o(ψ⇒x))⇒y = o⇒(x,y)(ψ). This shows that o⇒(x,y) ∈ [Φ → Ψ]. Note that we write also o⇒y(φ⇒x) for (o(φ⇒x))⇒y. It turns out that [Φ → Ψ] together with DΦ×DΨ becomes an information algebra with combination and focusing defined as above.

Theorem 5.1 Suppose that DΦ and DΨ have both a top element. Then ([Φ → Ψ],DΦ × DΨ), where the operations ⊗ and ⇒ are defined by (5.2) and (5.3), is a domain-free information algebra.

Proof. (1) The semigroup properties hold as noted above. (2) The pair (>, >) is the top-element of the lattice DΦ × DΨ. This element is a support of every mapping o ∈ [Φ → Ψ] since > is a support for every element in Φ and Ψ. Thus the support axiom holds. (3) The neutral element is the mapping e(φ) = e. Since e⇒(x,y)(φ) = (e(φ⇒x))⇒y = e⇒y = e, the neutrality axiom is satisfied. (4) The null element is the mapping z(φ) = z and as before we obtain z⇒(x,y)(φ) = z. Thus the nullity axiom holds. 5.2. CONTINUOUS MAPPINGS 101

0 0 (5) In order to verify focusing assume x, x ∈ DΦ and y, y ∈ DΨ. Then, using the transitivity in the two algebras (Φ,DΦ) and (Ψ,DΨ), we obtain

0 0 0 0 0 0 (o⇒(x,y))⇒(x ,y )(φ) = (o⇒(x,y)(φ⇒x ))⇒y = ((o((φ⇒x )⇒x))⇒y)⇒y 0 0 0 0 = (o(φ⇒x∧x ))⇒y∧y = o⇒(x∧x ,y∧y )(φ) 0 0 = o⇒(x,y)∧(x ,y )(φ).

(6) The combination axiom is verified as follows:

⇒(x,y) ⇒(x,y) ⇒(x,y) ⇒x ⇒y (o1 ⊗ o2) (φ) = ((o1 ⊗ o2)(φ )) ⇒(x,y) ⇒x ⇒x ⇒y = (o1 (φ ) ⊗ o2(φ )) ⇒x ⇒y ⇒x ⇒y = ((o1(φ )) ⊗ o2(φ )) ⇒x ⇒y ⇒x ⇒y = (o1(φ )) ⊗ (o2(φ )) ⇒(x,y) ⇒(x,y) = o1 (φ) ⊗ o2 (φ) ⇒(x,y) ⇒(x,y) = o1 ⊗ o2 (φ).

Here we have used the combination axiom in the algebra (Ψ,DΨ). (7) We have

o(φ) ≤ o(φ) ⊗ o⇒(x,y)(φ) = o(φ) ⊗ (o(φ⇒x))⇒y.

But φ⇒x ≤ φ implies that o(φ⇒x) ≤ o(φ). Thus we have by the idempotency axiom in Ψ

o(φ) ⊗ o⇒(x,y) ≤ o(φ) ⊗ (o(φ))⇒y = o(φ).

Hence we obtain that

o(φ) ≤ o(φ) ⊗ o⇒(x,y)(φ) ≤ o(φ).

This proves the idempotency axiom. ut Not surprisingly, mappings which describe some form of information pro- cessing represent a form of information and indeed they form also an infor- mation algebra.

5.2 Continuous Mappings

We turn now to continuos and also compact, domain-free information alge- bras (Φ,DΦ) and (Ψ,DΨ). In this context we consider continuous mappings, that is mappings, which maintain limits. 102 CHAPTER 5. MAPPINGS

Definition 5.1 Continuous Mappings. A mapping f :Φ → Ψ from a continuous domain-free information algebra (Φ,D) into another continu- ous domain-free information algebra (Ψ,E) is called continuous, if for any directed set X ⊆ Φ, it holds that _ _ f( X) = f(φ). (5.4) φ∈X We note that a continuous mapping f is order preserving.

Lemma 5.1 A continuos mapping f from a continuous domain-free infor- mation algebra (Φ,D) into another continuous domain-free information al- gebra (Ψ,E) preserves order, that is, f ∈ [Φ → Ψ].

Proof. Assume φ1 ≤ φ2 ∈ Φ and let B be the basis in Φ. Then, by the continuity of f,

f(φ1) = f(∨{ψ ∈ B, ψ  φ1})

= ∨{f(ψ): ψ ∈ B, ψ  φ1}

≤ ∨{f(ψ): ψ ∈ B, ψ  φ2}

= f(∨{ψ ∈ B, ψ  φ2})

= f(φ2). ut The following equivalent definitions of continuity are well known in lat- tice theory.

Lemma 5.2 Let f :Φ → Ψ be an order preserving mapping from a con- tinuous, domain-free information algebra (Φ,D) with basis BΦ into another continuous, domain-free information algebra (Ψ,E) with basis BΨ. Then, the following conditions are equivalent: 1. f is continuous.

2. f(φ) = ∨{f(ψ): ψ ∈ BΦ, ψ  φ}.

3. Af(φ) ⊆ ↓f(Aφ), where Af(φ) = {ψ ∈ BΨ : ψ  f(φ)} and Aφ = {ψ ∈ BΦ : ψ  φ}

Proof. (1) ⇒ (2): The set {ψ : ψ ∈ BΦ, ψ  φ} is directed, and φ = ∨{ψ : ψ ∈ BΦ, ψ  φ}. By the continuity of f,

f(φ) = ∨{f(ψ): ψ ∈ BΦ, ψ  φ}.

0 0 0 (2) ⇒ (3): Let ψ ∈ Af(φ), that is, ψ ∈ BΨ and ψ  f(φ). Then we have, by (2),

0 ψ ≤ f(φ) = ∨{f(ψ): ψ ∈ BΦ, ψ  φ}. 5.2. CONTINUOUS MAPPINGS 103

The set {f(ψ): ψ ∈ BΦ, ψ  φ} is directed. By the definition of the way- 0 below relation there is an element ψ ∈ BΦ, ψ  φ, such that ψ ≤ f(ψ). 0 But this means that ψ ∈ ↓f(Aφ). (3) ⇒ (1): Let X ∈ Φ be a directed set and define φ = W X. Then φ belongs to Φ, since the continuity of (Φ,D) implies that Φ is a complete 0 lattice (Theorem 4.19). Let ψ ∈ Af(φ). Then, by (3), there is a ψ ∈ Aφ 0 W such that ψ ≤ f(ψ). We have ψ ∈ BΦ and ψ  φ, hence ψ ≤ φ = X. There is therefore a η ∈ X such that ψ ≤ η. Thus we finally conclude that ψ0 ≤ f(ψ) ≤ f(η) ≤ W f(X), since f is order-preserving. Then W f(X) is an upper bound of Af(φ), hence _ _ f(φ) = Af(φ) ≤ f(X).

But, on the other hand, from ψ ∈ X, it follows that ψ ≤ φ, hence f(ψ) ≤ f(φ), since f is order-preserving, and therefore _ f(X) ≤ f(φ), such that finally f(φ) = f(W X) = W f(X) which means that f is continu- ous. ut Theorem 4.9 stated that in a strongly compact information algebra fo- cussing is continuous. The next theorem enforces this statement for contin- uous algebras

Theorem 5.2 If (Φ,D) is a continuous, domain-free information algebra, then it is strongly continuous if, and only if, for any directed subset X ⊆ Φ and for all x ∈ D _ _ ( X)⇒x = φ⇒x. (5.5) φ∈X

Proof. Assume first that (Φ,D) is strongly continuous. From φ ≥ φ⇒x it follows that _ _ X ≥ φ⇒x. φ∈X Therefore it holds also that _ _ ( X)⇒x ≥ ( φ⇒x)⇒x. φ∈X We claim that x is a support for the right hand side of the inequality. In W ⇒x ⇒x ⇒x ⇒x ⇒x ⇒x fact, let η = φ∈X φ . Then η ≥ φ implies η ≥ (φ ) = φ . ⇒x W ⇒x ⇒x From this we see that η ≥ φ∈X φ = η, hence η = η. So, we have _ _ ( X)⇒x ≥ φ⇒x. φ∈X 104 CHAPTER 5. MAPPINGS

From strong density we have _ _ _ ( X)⇒x = {ψ ∈ B : ψ = ψ⇒x  X}.

Since X is directed, ψ  W X implies that there is a φ ∈ X such that ψ ≤ φ. But then ψ = ψ⇒x ≤ φ⇒x. This in turn shows that _ _ ( X)⇒x ≤ φ⇒x, φ∈X W ⇒x W ⇒x hence ( X) = φ∈X φ , and identity (5.5) holds. Conversely, assume now that identity (5.5) holds and that algebra (Φ,D) is continuous. In this case, according to Theorem 4.19, φ = ∨{ψ ∈ Φ: ψ  φ}. Then, by (5.5) φ⇒x = ∨{ψ⇒x : ψ ∈ Φ: ψ  φ}. Since (ψ⇒x)⇒x = ψ⇒x and ψ⇒x ≤ ψ  φ implies ψ⇒x  φ we conclude that

0 φ⇒x = ∨{ψ0 : ψ0 = ψ ⇒x  φ}. Furthermore, Φ is a complete lattice since (Φ,D) is continuous. But then by Theorem 4.19 is strongly continuous. ut Let [Φ → Ψ]c denote the set of continuous mappings from (Φ,D) into (Ψ,E). Between continuous mappings the operations of combination are focusing are defined as above for monotone mappings. We show next, that these operations again yield continuous mappings. This means then that ([Φ → Ψ]c,D × E) is a subalgebra of the algebra ([Φ → Ψ],D × E) of monotone mappings.

Theorem 5.3 If (Φ,D) and (Ψ,E) are strongly continuous, then, if f, g ∈ ⇒(x,y) [Φ → Ψ]c, x ∈ D, y ∈ E then also f⊗g ∈ [Φ → Ψ]c, and f ∈ [Φ → Ψ]c. Proof. First, we consider combination. Let X ⊆ Φ be a directed subset. Then using continuity of f and g and associativity of join, we obtain _ _ _ (f ⊗ g)( X) = f( X) ⊗ g( X)     _ _ =  f(φ) ⊗  g(φ) φ∈X φ∈X _ = (f(φ) ⊗ g(φ)) φ∈X _ = (f ⊗ g)(φ). φ∈X 5.2. CONTINUOUS MAPPINGS 105

This shows that the mapping f ⊗ g is continuous, f ⊗ g ∈ [Φ → Ψ]c. Next we examine focusing. Let X as before be a directed subset of Φ. Let x ∈ D and y ∈ E. If X is directed, then so is {f(φ)⇒x : φ ∈ X}. Therefore, again using continuity of f, and also Theorem 5.2

_  _ ⇒(y) f ⇒(x,y)( X) = f( X)⇒(x)  ⇒(y) _ ⇒x =  f(φ ) φ∈X _ = (f(φ⇒x))⇒(y) φ∈X _ = f ⇒(x,y)(φ). φ∈X

⇒(x,y) ⇒(x,y) Thus, the mapping f is continuous, f ∈ [Φ → Ψ]c. ut Since, by this last theorem, [Φ → Ψ]c is closed under the operations of combination and focusing, if (Φ,D) and (Ψ,E) are strongly continuous, and the neutral mapping is continuous, [Φ → Ψ]c must be an information algebra, namely a subalgebra of the information algebra [Φ → Ψ] of order- preserving mappings. We are now going to show that it is a continuous information algebra.

Theorem 5.4 If (Φ,D) and (Ψ,E) are strongly continuous, domain-free information algebras, then ([Φ → Ψ]c,D × E) is a continuous domain-free information algebra.

Proof. We show that [Φ → Ψ]c is a continuous lattice in the order induced by the combination operation. Then, since ([Φ → Ψ]c,D × E) is a domain-free information algebra, as noted above, the theorem follows as a consequence of Theorem 4.19. 1 More precisely, according to (Gierz, 2003) [φ → ψ]c is a continuous lattice under point-wise order, i.e. f ≤p g if f(φ) ≤ g(φ) for all φ ∈ Φ. But this is also the order induced by combination. In fact, f ≤comb g if f ⊗g = g or f(φ) ⊗ g(φ) = g(φ) for all φ ∈ Φ. But this means exactly f(φ) ≤ g(φ) for all φ ∈ Φ. So the two orders coincide and we may drop the index and simply write f ≤ g. Thus [Φ → Ψ]c is a complete lattice and it holds that for all f ∈ [φ → ψ]c that

f = ∨{g ∈ [φ → ψ]c : g  f}.

1see Exercise I.-2.22 106 CHAPTER 5. MAPPINGS

Therefore, from Theorem 4.19 it follows that ([Φ → Ψ]c,D × E) is a contin- uous, domain-free information algebra. ut This result leaves two questions open: Is ([Φ → Ψ]c,D × E) strongly continuous or not and how can the way-below relation be characterized in ([Φ → Ψ]c in terms of the relations in (Φ,D) and (Ψ,E). In the case of strongly compact information algebras these questions are answered in the next section.

5.3 Continuous Mappings Between Compact Al- gebras

In this section we assume that (Φ, Φf ,D) and (Ψ, Ψf ,E) are two strongly compact, domain-free information algebras. According to Theorem 4.17 both algebras are then strongly continuous. Therefore, ([Φ → Ψ]c,D × E) is a continuous information algebra, according to Theorem 5.4. In fact, in this section we show that it is a strongly compact information algebra. The first question is, what are the finite elements of this algebra? Let Y be a finite subset of finite elements from Φf . A mapping f : Y → Ψf is called a simple function. Let S be the set of all simple functions. For s ∈ S and φ ∈ Φ we define Y (s) to be the domain of s and sˆ(φ) = ∨{s(ψ): ψ ∈ Y (s), ψ ≤ φ}. We haves ˆ(φ) = e, if the set on the right hand side of this definition above is empty.

Lemma 5.3 Let s ∈ S. Then

1. sˆ :Φ → Ψf . 2. sˆ is continuous.

Proof. (1) Let X = {s(ψ): ψ ∈ Y (s), ψ ≤ φ}. Then X ⊆ Ψf is a finite set. Hences ˆ(φ) = ∨X ∈ Ψf . (2) Assume φ1 ≤ φ2. Then we have that

{s(ψ): ψ ∈ Y (s), ψ ≤ φ1} ⊆ {s(ψ): ψ ∈ Y (s), ψ ≤ φ2}.

Hences ˆ(φ1) ≤ sˆ(φ2) which shows thats ˆ is an order-preserving mapping. Let now X ⊆ Φ be a directed set. Thens ˆ(φ) ≤ sˆ(∨X) if φ ∈ X, hence ∨φ∈X sˆ(φ) ≤ sˆ(∨X). We claim that the inverse inequalitys ˆ(∨X) ≤ ∨φ∈X sˆ(φ) holds also. Let ψ ∈ Y (s), ψ ≤ ∨X. Note that ψ ∈ Φf . Since X is directed, there is a φ ∈ X such that ψ ≤ φ. Thus we have that s(ψ) ≤ sˆ(ψ) ≤ sˆ(φ), so _ sˆ(∨X) = ∨{s(ψ): ψ ∈ Y (s), ψ ≤ ∨X} ≤ sˆ(φ). φ∈X 5.3. CONTINUOUS MAPPINGS BETWEEN COMPACT ALGEBRAS107

This shows that ∨φ∈X sˆ(φ) =s ˆ(∨X), which, by definition means thats ˆ is continuous. ut Let Sˆ = {sˆ : s ∈ S}. We have shown that Sˆ ⊆ [Φ → Ψ]c. We claim that Sˆ are the finite elements of [Φ → Ψ]c. We show first that Sˆ is closed under combination. Let s1 and s2 be two simple mappings and Y1 = Y (s1), Y2 = Y (s2). Define s : Y1 ∪ Y2 → Ψf by ψ 7→ sˆ1(ψ) ∨ sˆ2(ψ). Clearly, s is a simple mapping. We claim that sˆ =s ˆ1 ⊗ sˆ2. In fact,

sˆ(φ) = ∨{sˆ1(ψ) ∨ sˆ2(ψ): ψ ∈ Y1 ∪ Y2, ψ ≤ φ}

= ∨{(∨{s1(ψ1): ψ1 ∈ Y1, ψ1 ≤ ψ})

∨ (∨{s2(ψ2): ψ2 ∈ Y2, ψ2 ≤ ψ}): ψ ∈ Y1 ∪ Y2, ψ ≤ φ}

= (∨{s1(ψ1): ψ1 ∈ Y1, ψ1 ≤ ψ ∈ Y1 ∪ Y2, ψ ≤ φ})

∨ (∨{s2(ψ2): ψ2 ∈ Y2, ψ2 ≤ ψ ∈ Y1 ∪ Y2, ψ ≤ φ})

= (∨{s1(ψ1): ψ1 ∈ Y1, ψ1 ≤ φ}) ∨ (∨{s2(ψ2): ψ2 ∈ Y2, ψ2 ≤ φ})

=s ˆ1(φ) ∨ sˆ2(φ). (5.6)

The vacuous mapping e(φ) = e belongs to Sˆ since e = fˆ where Y (f) = {e}, f(e) = e. Next we verify the axioms of convergence, density and compactness for Sˆ relative to [Φ → Ψ]c. Convergence follows from the fact that for all X ⊆ [Φ → Ψ]c we have ∨X ∈ [Φ → Ψ]c. In fact, let f = ∨X, that is, f(φ) = ∨g∈X g(φ). If Y ⊆ Φ is directed, then, by the continuity of g and the associative law for the supremum in the lattice Ψ,   _ _ _ f(∨Y ) = g(∨Y ) =  g(φ) g∈X g∈X φ∈Y   _ _ _ =  g(φ) = f(φ). φ∈Y g∈X φ∈Y

So we conclude that f ∈ [Φ → Ψ]c. Density: We have to show that

f ⇒(x,y) = ∨{sˆ : s ∈ S, sˆ =s ˆ⇒(x,y) ≤ f}.

By definition and the continuity of f,

f ⇒(x,y)(φ) = (f(φ⇒x))⇒y ⇒x ⇒y = (f(∨{α : α ∈ Φf , α = α ≤ φ}) ⇒x ⇒y = (∨{f(α): α ∈ Φf , α = α ≤ φ}) . 108 CHAPTER 5. MAPPINGS

This set is directed, hence, by Theorem 4.9, ⇒(x,y) ⇒y ⇒x f (φ) = ∨{(f(α)) : α ∈ Φf , α = α ≤ φ}. ⇒x We are going to show that for α ∈ Φf , α = α ≤ φ we have (f(α))⇒y = ∨{sˆ(α): s ∈ S, sˆ =s ˆ⇒(x,y) ≤ f}. (5.7) This implies then ⇒(x,y) ⇒(x,y) ⇒x f (φ) = ∨{∨{sˆ(α): s ∈ S, sˆ =s ˆ ≤ f} : α ∈ Φf , α ≤ φ} ⇒x ⇒(x,y) = ∨{∨{sˆ(α): α ∈ Φf , α ≤ φ} : s ∈ S, sˆ =s ˆ ≤ f} ≤ ∨{sˆ(φ): s ∈ S, sˆ =s ˆ⇒(x,y) ≤ f}. This shows that f ⇒(x,y) ≤ ∨{sˆ : s ∈ S, sˆ =s ˆ⇒(x,y) ≤ f}. Since the inverse inequality clearly holds, we are done. To prove (5.7) note that by the density in Ψ we have ⇒y ⇒y (f(α)) = ∨{β ∈ Ψf : β = β ≤ f(α)}. ⇒y Consider β ∈ Ψf such that β = β ≤ f(α). Let s : {α} → Ψf defined by s(α) = β. Then s is a simple mapping in S. And,  β, if α ≤ φ, sˆ(φ) = e, otherwise. So we haves ˆ(φ) ≤ f(φ) for all φ, that is,s ˆ ≤ f. It remains to show thats ˆ⇒(x,y) =s ˆ. Consider γ ∈ Φ. We have to show that (ˆs(γ⇒x))⇒y =s ˆ(γ). If α = α⇒x, then α = α⇒x ≤ γ holds if, and only if, α ≤ γ⇒x. We have two cases: case 1: α ≤ γ. Then (ˆs(γ⇒x))⇒y = (s(α))⇒y = β⇒y = β =s ˆ(γ), since α ≤ γ⇒x impliess ˆ(γ⇒x) = s(α). case 2: α 6≤ γ. In this case (ˆs(γ⇒x))⇒y = e =s ˆ(γ). So we have indeeds ˆ⇒(x,y) =s ˆ. This proves that for α = α⇒x ≤ φ, (f(α))⇒y ≤ ∨{sˆ(α): s ∈ S, sˆ =s ˆ⇒(x,y) ≤ f}. The inverse inequality is obvious. This proves (5.7). Compactness: Let X ⊆ [Φ → Ψ]c be a directed set ands ˆ ≤ ∨X. If ψ ∈ Y (s), then s(ψ) ∈ Ψf and s(ψ) =s ˆ(ψ) ≤ (∨X)(ψ) = ∨f∈X f(ψ). The set of f(ψ) for f ∈ X is directed in Φ. Therefore, there is a f ∈ X such that s(ψ) ≤ f(ψ). Since Y (s) is finite and X directed, there is a g ∈ X such that s(ψ) ≤ g(ψ) for all ψ ∈ Y . Hence,s ˆ ≤ g ∈ X. Thus, we have proved the following theorem: 5.3. CONTINUOUS MAPPINGS BETWEEN COMPACT ALGEBRAS109

Theorem 5.5 If (Φ, Φf ,D) and (Ψ, Ψf ,E) are two strongly compact, domain- free information algebras, then ([Φ → Ψ]c, S,Dˆ × E) is a strongly compact information algebra with Sˆ as finite elements. 110 CHAPTER 5. MAPPINGS Chapter 6

Boolean Information Algebra

6.1 Domain-Free Boolean Information Algebra

We have seen that in a domain-free information algebra (Φ,D) the set Φ is a semilattice. And we have already seen, that it can even be a Boolean algebra (see e.g. example 3.10). This case is of some interest and will therefore be studied in this Chapter.

Definition 6.1 Boolean Information Algebras: A domain-free infor- mation algebra (Φ,D) is called Boolean, if

1. Φ is a distributive lattice. That is, besides the supremum (combina- tion) of two elements φ and ψ, the infimum φ ∧ ψ exists too, and the distributive laws hold:

φ ∧ (ψ ∨ η) = (φ ∧ ψ) ∨ (φ ∧ η), φ ∨ (ψ ∧ η) = (φ ∨ ψ) ∧ (φ ∨ η).

2. For all φ ∈ Φ exists an element φc, called the complement of φ such that

φ ∧ φc = e, φ ∨ φc = z.

This means essentially, that Φ is a Boolean lattice, see (Davey & Priestley, 1990a). The following results are well known for Boolean lattices (Davey & Priestley, 1990a):

1. φ ≤ ψ if, and only if, φ ∧ ψ = φ.

2. Every φ has a unique complement.

3. (φc)c = φ.

111 112 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA

4. ec = z and zc = e.

5. φ ≤ ψ if, and only if, φc ≥ ψc.

6. φ ∨ ψc = z if, and only if, φ ≥ ψ.

7. De Morgan laws:

(φ ∧ ψ)c = φc ∨ ψc, (φ ∨ ψ)c = φc ∧ ψc.

Here follows first two results on the interplay of meet and focusing in Boolean information algebras:

Lemma 6.1

1. ∀φ ∈ Φ and ∀x ∈ D, φ ∧ φ⇒x = φ⇒x.

⇒x 2. If ∧i∈I φi exists, then so does ∧i∈I φi and ⇒x ⇒x ∧i∈I φi = (∧i∈I φi) . In particular, ∀φ, ψ ∈ Φ and x ∈ D,

(φ ∧ ψ)⇒x = φ⇒x ∧ ψ⇒x.

Proof. (1) Since φ ≥ φ⇒x, φ⇒x is a lower bound of φ and φ⇒x. But φ⇒x ≥ φ ∧ φ⇒x hence φ ∧ φ⇒x = φ⇒x. ⇒x ⇒x c (2) Let ψ = ∧i∈I φi. Then ψ ⊗(ψ ) = z. Lemma 3.8 (2) implies then ⇒x c ⇒x ⇒x c ⇒x that ψ⊗(ψ ) = z. From φi ≥ ψ it follows then that φi⊗(ψ ) = z ⇒x ⇒x for all i ∈ I, hence (Lemma 3.8 (2)) φi ⊗ (ψ ) = z. This means that ⇒x ⇒x ⇒x ⇒x φi ≥ ψ . So, ψ is a lower bound of the φi . ⇒x ⇒x c Let t be another lower bound of φi . Then φi ⊗t = z, hence (Lemma c ⇒x c ⇒x 3.8 (2)) φi ⊗ t = z. This implies also ψ ⊗ t = z, thus (Lemma 3.8 (2)) ψ⇒x ⊗ tc = z. But this implies that ψ⇒x ≥ t. Therefore ψ⇒x is the ⇒x greatest lower bound of the φi for all i ∈ I. ut Next we collect some results concerning complementation. We define

φ − ψ = φ ⊗ ψc, φ∆ψ = (φ − ψ) ∧ (ψ − φ).

The first operation is called difference, the second one symmetric difference. Note however that usually these two operation are defined dually in Boolean algebras, with join and meet exchanged, which corresponds also to a dual order, ≤ changed into ≥ (for a duality theory see below). Then we have the following results concerning complementation and these new operations: 6.1. DOMAIN-FREE BOOLEAN INFORMATION ALGEBRA 113

Lemma 6.2 Let (Φ,D) be a Boolean information algebra. Then, ∀φ, ψ ∈ Φ and x, y ∈ D: 1. (φ⇒x)c ⇒x = (φ⇒x)c,

2. φ = φ⇒x if, and only if, φc = φc ⇒x,

3. φ⇒x ∧ φc ⇒x = e.

4. (φ − ψ⇒x)⇒x = φ⇒x − ψ⇒x,

5. (φ − ψ)⇒x ≤ φ⇒x − ψ⇒x,

6. (φ∆ψ)⇒x ≤ φ⇒x∆ψ⇒x,

7. (φ⇒x)c ⇒y ⊗ (φc ⇒y)c ⇒x = z.

Proof. (1) From φ⇒x ⊗ (φ⇒x)c = z we deduce (φ⇒x)⇒x ⊗ (φ⇒x)c = z, hence, by lemma 3.8 (2) φ⇒x ⊗ (φ⇒x)c ⇒x = z. This implies that (φ⇒x)c ⇒x ≥ (φ⇒x)c. The inverse inequality holds always, so we have equal- ity. (2) Assume φ = φ⇒x. Then (1) just proved implies that φc = (φ⇒x)c = (φ⇒x)c ⇒x = φc ⇒x. The other direction follows by symmetry. (3) Since φ⇒x ≤ φ and φc ⇒x ≤ φc we have φ⇒x ∧ φc ⇒x ≤ φ ∧ φc = e. Because e is the bottom element we must have equality. (4) Here we use again (1) above, together with the combination axiom, and obtain

(φ − ψ⇒x)⇒x = (φ ⊗ (ψ⇒x)c)⇒x = (φ ⊗ (ψ⇒x)c ⇒x)⇒x = φ⇒x ⊗ (ψ⇒x)c ⇒x = φ⇒x ⊗ (ψ⇒x)c = φ⇒x − ψ⇒x.

(5) Note that

φ = φ ⊗ e = φ ⊗ (ψ ∧ ψc) = (φ ⊗ ψ) ∧ (φ ⊗ ψc).

We have then by lemma 6.1 (2)

φ⇒x = ((φ ⊗ ψ) ∧ (φ ⊗ ψc))⇒x = (φ ⊗ ψ)⇒x ∧ (φ ⊗ ψc)⇒x.

This implies φ⇒x ≥ ψ⇒x ∧ (φ − ψ)⇒x. From this we deduce that

φ⇒x ⊗ (psi⇒x)c = φ⇒x − ψ⇒x ≥ z ∧ (φ − ψ)⇒x = (φ − ψ)⇒x.

(6) Using (4) just proved, together with lemma 6.1 (2) we obtain

φ⇒x∆ψ⇒x = (φ⇒x − ψ⇒x) ∧ (ψ⇒x − φ⇒x) ≥ (φ − ψ)⇒x ∧ (ψ − φ)⇒x = ((φ − ψ) ∧ (ψ − φ))⇒x = (φ∆ψ)⇒x. 114 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA

(7) Using (3) above with De Morgan’s law we obtain (φ⇒x)c ⊗(φc ⇒y)c = z. From (1) above it follows then (φ⇒x)c ⇒x ⊗(φc ⇒y)c ⇒y = z. Now, lemma 3.8 (3) gives (φ⇒x)c ⇒y ⊗ (φc ⇒y)c ⇒x = z ut The classical example of a Boolean information algebra is the algebra of subsets of a cartesian product. We give several examples, which are variants of this basic system.

Example 6.1 Multivariate Subset System. The subset system introduced in Example 3.9 is a Boolean information algebra. Complements of subsets of an universe are the usual set-theoretic complements relative to the universe. The infimum of two sets is their union. This is contrary to the usual view of lattices of subsets systems: There corresponding to the order of set inclu- sion, meet corresponds to intersection and join to union. We however use information order, which is dula to set inclusion, i.e. φ ≤ ψ means φ ⊇ ψ in terms of set inclusion. This explains the difference.

Example 6.2 Propositional Logic. Classical logic systems as propositional logic, Example 3.8, and predicate logic, Example 3.11 and Example 6.3 below. In Example 3.8 we mentioned that a formula of propositional logic is a domain-free expression of information. In fact, it is somehow more natural, to derive directly a domain-free algebra associated with proposition logic: We refer to Example 3.8 for the definition of propositional language and the interpretation of formulae by truth functions, which are elements of Dω with D = {t, f}. The set of models of a propositional formula A is denoted byr ˆ(A) which is a cylindric subset of Dω with a finite dimensional base. This means that there is a finite set J of variables such that α ∈ rˆ(A) implies β|α↓J ∈ rˆ(A) for any β ∈ dω, if β|α↓J denotes the truth function β, where β(j) is replace by α(i) for j ∈ J. Clearly the cylindric sets with finite bases form a Boolean information algebra, with respect to the lattice of finite subsets of ω as domains. This is the domain-free algebra associated with the propositional logic. Clearly the following relations hold between language and algebra:

rˆ(A ∧ B) =r ˆ(A) ∩ rˆ(B), rˆ(A ∨ B) =r ˆ(A) ∪ rˆ(B), rˆ(¬A) = (ˆr(A))c.

This reflects the Boolean structure of the doamin-free informaiton algebra associated with propositional logic. Focusing, or rather variable elimination, is expressed by existential quantification (see also Example 6.4 below)

rˆ(∃XA) =r ˆ(A)−X .

The partial order in the information algebra of models correspond to logical consequence: We have A |= B, ifr ˆ(A) ⊆ rˆ(B), or in term on information 6.1. DOMAIN-FREE BOOLEAN INFORMATION ALGEBRA 115 orderr ˆ(A) ≥ rˆ(B). Of course, different formula A and B may have the same set of models, express the same information. This is the case ifr ˆ(A) =r ˆ(B), or, in the usual logical notation A |= B and B |= A.

Example 6.3 Many-Sorted Structures. In Example 3.11 it has been shown how predicate logic is related to a domain-free information algebras, in fact an algebra of formulae and one of structures. This is very similar to propo- sitional logic, Example 6.2 above. For applications many sorted predicate logic is more natural, although this is not really an extension of usual predi- cate logic. Nevertheless workingin the many-sorted framework is somewhat simpler than reducing it to usual predicate logic.

Example 6.4 Quantor Algebra. In Section 3.6 we introduced the opera- tion of variable elimination for the case of a domain-free information algebra (Φ,D) related to a finite multivariate structure, i.e. where D is the lattice of domains associated with a set of variables V . Let now Φ be a Boolean algebra, such that (Φ,D) is a domain-free Boolean information algebra rel- ative to a multivariate structure. Then, for any variable X ∈ V we define the mapping ∃X :Φ → Φ by

∃Xφ = φ−X .

This mapping has the following properties:

Lemma 6.3 The following properties hold ∀X ∈ V for the mapping ∃X:

1. ∃Xz = z,

2. ∀φ ∈ Φ, ∃Xφ ≤ φ,

3. ∀φ, ψ ∈ Φ, ∃X(φ ∨ ∃Xψ) = ∃Xφ ∨ ∃Xψ.

Proof. (1) and (3) correspond to properties (4) and (2) of Lemma 3.10. Property (2) above follows from property (3) of Lemma 3.10. ut A mapping from a boolean algebra Φ into itself satisfying properties (1) to (3) of the lemma above is called an existential quantifier 1 If x ⊆ V is any finite set of variables, then the same properties hold also for the mapping ∃x defined by ∃xφ = φ−x:

Lemma 6.4 The following properties hold ∀x ⊆ V , x finite, for the map- ping ∃x:

1. ∃xz = z,

1In the literature however, in general the dual order is considered, such that (1) reads ∃Xe = e, (2) φ ≤ ∃Xφ and (3) ∃X(φ∧∃ψ) = ∃Xφ∧∃ψ. But this is no essential difference. 116 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA

2. ∀φ ∈ Φ, ∃xφ ≤ φ,

3. ∀φ, ψ ∈ Φ, ∃x(φ ∨ ∃xψ) = ∃xφ ∨ ∃xψ.

Proof. These properties follow immediately from Lemma 6.3 above, together with property (1) of Lemma Lemma 3.10. ut Thus, ∃x is an existential quantifier too. A Boolean algebra Φ, together with a set V such that, ∀x ⊆ V , ∃x is an existential quantifier, such that

1. ∃∅φ = φ,

2. ∃(x1 ∪ x2)φ = ∃x1(∃x2φ), is called a quantifier algebra over V . Thus, a Boolean information algebra (Φ,D) relative to a multivariate structure D based on a finite set of variables V is a quantifier algebra over V . Inversely, every quantifier algebra Φ over a set V is equivalent to a Boolean information algebra. In fact, for J ⊆ V define

φ⇒J = ∃(V \ J)φ.

We are going to verify the axioms of a domain-free information algebra. (1) The semigroup axiom holds clearly, since it holds for the join in the Boolean algebra Φ. (2) The set V is a support of every φ ∈ Φ, since φ⇒V = ∃(V \ V )φ = ∃∅φ = φ. (3) The bottom element e of a Boolean algebra is its neutral element with respect to join. Further, e⇒J = ∃(V \ J)φ ≤ e since ∃(V \ J) is an existential quantifier. This shows that e⇒J = e. (4) The top element z is a null element relative to the operation of a join. Further z⇒J = ∃(V \J)z = z by the properties of an existential quantifier. (5) We have by property 2 above

(φ⇒J )⇒K = ∃(V \ K)(∃(V \ J)φ) = ∃((V \ K) ∪ (V \ J))φ = ∃(V \ (J ∩ K))φ = φ⇒J∩K .

For (6) we see that by property 3 of an existential quantifier

(φ⇒J ⊗ ψ)⇒J = ∃(V \ J)(∃(V \ J)φ ⊗ ψ) = ∃(V \ J)φ ⊗ ∃(V \ J)ψ = φ⇒J ⊗ ψ⇒J .

(7) finally follows from the property φ⇒J = ∃(V \J)φ ≤ φ of the existen- tial quantifier. We remark that the equivalence between quantifier algebras and information algebras is not confined to Boolean information algebras, but holds in general for a semilattice Φ. Quantifier algebras are introduced 6.1. DOMAIN-FREE BOOLEAN INFORMATION ALGEBRA 117 almost exclusively with respect to Boolean algebras. The reason is of course that then the dual forall-quantifier can be defined: ∀Xφ = (∃Xφc)c. Fur- thermore, quantifier algebras are reducts of algebras with more structure such as Halmos algebras or cylindrical algebras (Halmos, 1962; Henkin et al., 1971), which are introduced in order to algebrize first order logic, which again explains their underlying Boolean structure. First order logic will be examined with respect to information and information algebra in the examples below.

Example 6.5 A Special Algebra. A particular algebra, which is closely related to predicate logic (and first order logic more generally is a general- ization of example 6.1 to infinite dimension. Consider a set U and an infinite ω sequence v = {v1, v2,...} of values in U. We denote by U the set of all these sequences and M = P(U ω) the power set of U ω. Now, M is clearly a Boolean algebra. It is also a quantor algebra, hence a Boolean information algebra, with variable elimination (or existential quantification) defined for any i = 1, 2,... as

−Xi 0 0 A = {v : ∃v ∈ A such that vj = vj∀j 6= i}

This example can be generalized to more general ordinals than ω and ex- tended to cylindric algebras or Halmos algebras (Henkin et al., 1971; ?) (see Example 3.10).

In Boolean lattices there is a duality theory, which can be carried over to Boolean information algebras. If (Φ,D) is a Boolean information algebra, then define the following dual operations of combination ⊗d and focusing ⇒d:

c c c φ ⊗d ψ = (φ ⊗ ψ ) , φ⇒dx = ((φc)⇒x)c.

We note that, by de Morgan’s law,

φ ⊗d ψ = φ ∧ ψ.

This defines a new Boolean information algebra, which is called the dual of the original one.

Theorem 6.1 (Φ,D) with the operations ⊗d and ⇒d is a Boolean informa- tion algebra. The mapping φ 7→ φc is an isomorphism between dual Boolean information algebras.

Proof. We note that the mapping φ 7→ φc is one-to-one, since every φ has a unique complement φc. We have the following relations: 118 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA

1. φc 7→ (φc)c = φ,

2. φ ∨ ψ 7→ (φ ∨ ψ)c = φc ∧ ψc,

3. φ ∧ ψ 7→ (φ ∧ ψ)c = φc ∨ ψc,

4. φ⇒x 7→ (φ⇒x)c = (φc)⇒dx.

This shows that the mapping is isomorph relative to the operations of com- plementation, supremum and infimum and focusing. This proves that (Φ,D) is a Boolean information algebra with respect to operations ⊗d and ⇒d. ut In the case of subset systems, dual combination corresponds to union. Focusing consists of projecting the complement and taking the complement of the result. This illustrates that dual focussing is not very attractive. More on Boolean information algebras, and more specifically on the related cylindric algebras, can be found in (Henkin et al., 1971).

6.2 Labeled Boolean Information Algebra

It is also possible to define a labeled version of a Boolean information alge- bra. Whereas domain-free Boolean information algebras are well present in the literature as reducts of Halmos, polyadic or cylindric algebras, labeled algebras seem not to have found much attention so far. We shall see below that relational algebra in database theory may be considered as a labeled Boolean information algebra. But again this is not the point of view adapted in relational algebra. We shall first define the labeled version of a Boolean information algebra and derive a few elementary results associated with it. We then show that its derived domain-free version is in fact, as to be ex- pected, a Boolean information algebra. In turn, the labeled algebra derived from the domain-free version turns out to be a labeled Boolean information algebra. A labeled Boolean information algebra is an information algebra, where the restriction of the algebra to each domain is a Boolean algebra and a cer- tain distributive law is valid also between different domains. The following definition specifies this more precisely.

Definition 6.2 Labeled Boolean Information Algebra. A labeled informa- tion algebra (Φ,D) is called a labeled Boolean information algebra, if

1. ∀x ∈ D, the algebra Φx is Boolean.

2. ∀x, y ∈ D and φ, ψ ∈ Φx it holds that

(φ ∧ ψ) ⊗ ey = ((φ ⊗ ey) ∧ (ψ ⊗ ey)). 6.2. LABELED BOOLEAN INFORMATION ALGEBRA 119

Note that we denote the meet in Φx as usual with the operator symbol ∧. We remark that the meet between elements with different domains is not defined so far! As in an ordinary information algebra, combination is the same as supremum or join and will be denoted, depending on the case by the combination symbol ⊗ or the join symbol ∨. It is defined between arbitrary elements. We prove first a useful lemma connecting complementation with vacuous extension.

Lemma 6.5 Let (Φ,D) be a labeled Boolean information algebra. Then, ∀x, y ∈ D, x ≤ y and φ ∈ Φx,

(φc)↑y = (φ↑y)c.

c c Proof. From φ ∨ φ = φ ⊗ φ = zx we conclude that

c ↑y ↑y c ↑y (φ ⊗ φ ) = φ ⊗ (φ ) = zy,

↑y c ↑y c hence φ ∨ (φ ) = zy. On the other hand φ ∧ φ = ex implies, using property (2) in the definition above

c c ↑y c ↑y ey = (φ ∧ φ ) ⊗ ey = ((φ ⊗ ey) ∧ (φ ⊗ ey) = φ ∧ (φ ) .

This shows then that (φc)↑y = (φ↑y)c ut We may now extend the meet operation to arbitrary elements of Φ by the following definition

φ ∧ ψ = (φc ∨ ψc)c. (6.1)

The meet operation is certainly an extension of the local meet operation within the algebras Φx. The following lemma collects some properties of the meet operation.

Lemma 6.6 Let (Φ,D) be a labeled Boolean information algebra, with meet defined by (6.1). Then the following properties hold:

1. If d(φ) = x and d(ψ) = y, then φ ∧ ψ = φ↑x∨y ∧ ψ↑x∨y.

2. d(φ ∧ ψ) = d(φ) ∨ d(ψ).

3. ∀φ, ψ ∈ Φ and z ∈ D we have (φ ∧ ψ)→z = φ→z ∧ ψ→z.

4. ∀φ, ψ, η ∈ Φ we have (φ ∧ ψ) ∨ η = (φ ∨ η) ∧ (ψ ∨ η) and (φ ∨ ψ) ∧ η = (φ ∧ η) ∨ (ψ ∧ η) ( distributive laws).

5. ∀φ, ψ ∈ Φ we have φ ∧ ψ = (φc ∨ ψc)c and φ ∨ ψ = (φc ∧ ψc)c ( De Morgan laws). 120 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA

↑x∨y 6. ∀φ ∈ Φ with d(φ) = x, and ∀y ∈ D we have φ ∧ zy = φ and φ ∧ ey = ex∨y.

Proof. (1) We use Lemma 6.5 and the definition of the meet operation (6.1), as well as (3) of Lemma 3.1,

φ ∧ ψ = (φc ∨ ψc)c = ((φc)↑x∨y ∨ (ψc)↑x∨y)c = ((φ↑x∨y)c ∨ (ψ↑x∨y)c)c = φ↑x∨y ∧ ψ↑x∨y.

(2) follows from d(φ ∧ ψ) = d((φc ∨ ψc)c) = d(φc ∨ ψc) = d(φc) ∨ d(ψc) = d(φ) ∨ d(ψ). (3) Let d(φ) = x and d(ψ) = y. Then by definition of the transport operation, and (1) proved above

(φ ∧ ψ)→z = (φ↑x∨y ∧ ψ↑x∨y)→z = ((φ↑x∨y ∧ ψ↑x∨y)↑x∨y∨z)↓z = (φ↑x∨y∨z ∧ ψ↑x∨y∨z)↓z.

The last identity follows from property (2) of a labeled Boolean information algebra. Let η = φ↑x∨y∨z ∧ ψ↑x∨y∨z. Since this is a meet within the Boolean ↑x∨y∨z ↑x∨y∨z algebra Φx∨y∨z, η is the infimum of φ and ψ . In particular, η ≤ φ↑x∨y∨z and η ≤ ψ↑x∨y∨z. This implies that η↓z ≤ φ→z and η↓z ≤ ψ→z, hence η↓z ≤ φ→z ∧ ψ→z. Let ν be another lower bound of φ→z and ψ→z. Then we conclude that ν↑x∨y∨z ≤ (φ→z)↑x∨y∨z ≤ φ↑x∨y∨z and, in the same way, ν↑x∨y∨z ≤ ψ↑x∨y∨z. This implies that ν↑x∨y∨z ≤ η. But then (ν↑x∨y∨z)↓z = ν ≤ η↓z. This shows that η↓z is the infimum of φ→z and ψ→z. Since this are both elements ↓z →z →z of the same Boolean algebra Φz we conclude that η = φ ∧ ψ . As η↓z = (φ ∧ ψ)→z, point (3) of the lemma is proved. (4) Let d(φ) = x, d(ψ) = y and d(η) = z. Then, using (1) proved above and the distributive law within the Boolean algebra Φx∨y∨z, we obtain (writing ⊗ instead of ∨),

(φ ∧ ψ) ⊗ η = (φ↑x∨y ∧ ψ↑x∨y) ⊗ η = (φ↑x∨y ∧ ψ↑x∨y)↑x∨y∨z ⊗ η↑x∨y∨z = (φ↑x∨y∨z ∧ ψ↑x∨y∨z) ⊗ η↑x∨y∨z = (φ↑x∨y∨z ⊗ η↑x∨y∨z) ∧ (ψ↑x∨y∨z ⊗ η↑x∨y∨z) = (φ ⊗ η)↑x∨y∨z ∧ (ψ ⊗ η)↑x∨y∨z = (φ ⊗ η) ∧ (ψ ⊗ η).

The dual distributive law is proved similarly. (5) Let d(φ) = x and d(ψ) = y. The first de Morgan law is the definition of the meet operation. The second law follows from De Morgan’s law within 6.2. LABELED BOOLEAN INFORMATION ALGEBRA 121 the Boolean algebra Φx∨y, Lemma 6.5 and (1) of this lemma proved above:

φ ⊗ ψ = φ↑x∨y ⊗ ψ↑x∨y = ((φ↑x∨y)c ∧ (ψ↑x∨y)c)c = ((φc)↑x∨y ∧ (ψc)↑x∨y)c = (φc ∧ ψc)c.

↑x∨y ↑x∨y (6) By (1) of this lemma φ ∧ zy = φ ∧ zx∨y = φ , since zx∨y is the top element in the Boolean algebra Φx∨y. The second equation is obtained similarly, noting that ex∨y is the least element in Φx∨y. ut Remark that φ ∧ ψ is not the infimum of φ and ψ, if the two elements do not belong to the same domain. In fact, we have, assuming d(φ) = x and d(ψ) = y,

φ ⊗ (φ ∧ ψ) = φ↑x∨y ⊗ (φ↑x∨y ∧ ψ↑x∨y) = φ↑x∨y ∧ (φ↑x∨y ⊗ ψ↑x∨y) = φ↑x∨y, since the distributive law holds and the meet within Φx∨y is the infimum. But in general, φ↑x∨y is not equal to φ. It holds only that φ ≤ φ↑x∨y. Hence φ is not even an upper bound of φ ∧ ψ. This means that Φ is not a Boolean algebra! Another deviation from a Boolean algebra is that, within Φ, complement is not defined relative to an absolute zero and one element, but only relative to those elements within Φ. In fact, if D does not have a bottom and top element, there exist no zero and one elements in Φ.

Example 6.6 Relational Algebra. Relational algebra is a model of an infor- mation algebra, as we have seen (see example 3.3 in Section 3.2). This is so, if we consider join and projection. But relational algebra contains more op- erations like union, intersection, difference, divide, product and restriction. Here we show that this defines essentially a labeled Boolean information algebra. Note first that intersection is defined only between relations with the same schema (or domain in our language). But then it corresponds to join. The Cartesian product is defined between relations with disjoint schema or domains. Again then, it corresponds to a join. Really new operations are union and complementation. As the intersec- tion, union is defined only between relation with the same domain. And it is the classical set theoretic union

R ∪ S = {f : f ∈ R or f ∈ S}, and d(R ∪ S) = d(R) = d(S). For a relation R with domain x complementa- tion is the usual set theoretic complementation with respect to the neutral relation in domain x,

c R = {f ∈ Ex : f 6∈ R}, 122 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA

Therefore, we see that d(Rc) = d(R). Complementation is usually not con- sidered as an operation of relational algebra. But it serves to define other relational operations, such as difference, which is also defined only between relations on identical domains,

R \ S = R ./ Sc = {f : f ∈ R, f 6∈ S}.

Again, d(R\S) = d(R) = d(S). The operation divide is defined for relations R, S and T with d(R) ∩ d(S) = ∅ and d(T ) = d(R) ∪ d(S). Let d(R) = x, d(S) = y, x∩y = ∅ and d(T ) = x∪y Then R divided by S by T is defined as

R ÷ S : T = {f : d((f) = x, ∀g ∈ S, (f, g) ∈ T } = R \ πx((R ./ S) \ T ).

Union can also be defined using complementation and join,

R ∪ S = (Rc ./ Sc)c. (6.2)

Clearly, for any schema or domain, the relations form a Boolean algebra. It remains to verify property (2) of the definition of a labeled Boolean information algebra above. But this means only that the union of the cylin- dric extensions of two relations on the same domain equals the cylindric extension of the union of the two relations, which clearly holds. Now, union can be generalized by (6.2) to an operation between arbitrary relations, not necessarily on the same domain. Then we have d(R ∪ S) = d(R) ∪ d(S) by the labeling operations for the join and complementation. In this way relational algebra is a reduct of a labeled Boolean information algebra. We have however so far neglected the relational operation of restriction or selection. For a set of attributes x, σx is a relation between the values of the attributes in x, i.e. σx ⊆ Ex. If the domain of a relation R covers x, then the selection σx applied to R returns all tuples of R which satisfy the relation. This is simply the join

σx ./ R.

So selection can be reduced to join too. This completes the relational alge- bra. Its basic operations are projection, join and (generalized union). Relational algebra is essential to study various important problems of databases: data retrieval or query processing, database updates, defining integrity constraints, deriving database views or snapshots, defining stability requirements and defining security constraints. Relational expressions serve as a high-level representation of the user’s intent (Date, 1975a). Most of these problems are general problems of information processing. Thus the abstraction of information algebra serves to unify and generalize analysis and solutions to these problems. 6.2. LABELED BOOLEAN INFORMATION ALGEBRA 123

In practical computations, it is important, that relations remain finite (i.e. contain a finite number of tuples). If the domains of attributes are infinite, then complementation does not maintain finiteness, but all other operations do. This is even true for selection, which itself may define an infinite relation. That is fundamentally the reason why derived operations, based on the complementation, but which maintain finiteness of relations, are introduced.

Example 6.7 File Systems. In Section 3.3 we have introduced tuple sys- tems T over a lattice D, and an abstract relational information algebra (RT ,D) over the tuple system T (see Theorem 3.1). A relation R over a do- main x ∈ D are subsets of tuples with domain x, and join within x reduces to intersection, relations over any domain x ∈ D form a Boolean algebra with meet equal to union. Certainly we have also, for two relations R and S in over domain x and any y ∈ D,

(R ∪ S) ./ Ey = ((R ./ Ey) ∪ (S ./ Ey)).

So, the second condition in Definition 6.2 is satisfied too and the algebra (RT ,D) is a labeled Boolean information algebra. As a consequence, if (Φ,D) is an atomic labeled information algebra, then (RAt(Φ,D) is a labeled Boolean information algebra (see Section 4.3). By Corollary 4.1, if the algebra (Φ,D) is atomic composed, then it is em- bedded (as an information algebra) into a Boolean information algebra, or even isomorph to one (if it is atomic closed).

From a labeled Boolean information we may derive by the usual proce- dure the corresponding domain-free algebra. It is to be expected that it will be a Boolean information algebra. The domain-free algebra is the quotient (Φ/σ, D) defined by the congruence φ ≡ ψ (mod σ), if for d(φ) = x and d(ψ) = y, we have φ→y = ψ and ψ→x = φ (see Section 3.5). In the case of a labeled Boolean information algebra, this equivalence is also a congruence relative to meet. In fact, assume φ1 ≡ ψ1 (mod σ) and φ2 ≡ ψ2 (mod σ) and let x1, x2 and y1, y2 be the domains of φ1, φ2 and ψ1, ψ2 respectively. Then

↑(x1∨x2)∨(y1∨y2) ↑(x1∨x2)∨(y1∨y2) ↑(x1∨x2)∨(y1∨y2) (φ1 ∧ φ2) = φ1 ∧ φ2 ↑x1∨y1 ↑x2∨y2 = φ1 ∧ φ2 ↑x1∨y1 ↑x2∨y2 = ψ1 ∧ ψ2 ↑(x1∨x2)∨(y1∨y2) ↑(x1∨x2)∨(y1∨y2) = ψ1 ∧ ψ2 ↑(x1∨x2)∨(y1∨y2) = (ψ1 ∧ ψ2) .

This shows that φ1 ∧ φ2 ≡ ψ1 ∧ ψ2 (mod σ). We may therefore define [φ]σ ∧ [ψ]σ = [φ ∧ ψ]σ. 124 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA

In the same way, φ ≡ ψ (mod σ) implies φc ≡ ψc (mod σ). In fact, let d(φ) = x, d(ψ) = y. Then d(φc = x, d(ψc) = y and (φc)↑x∨y = (φ↑x∨y)c = ↑x∨y c c ↑x∨y c c (ψ ) = (ψ ) . Thus we can define ([φ]σ) = |φ ]σ. It remains to show that these operations in Φ/σ define in fact a Boolean algebra. We have clearly commutativity and associativity of meet (for join this inherited from the information algebra), The distributive laws hold too, as may easily be demonstrated from the distributive laws of the labeled Boolean information algebra, for instance

([φ]σ ∧ [ψ]σ) ⊗ [η]σ = [(φ ∧ ψ) ⊗ η]σ

= [(φ ⊗ η) ∧ (ψ ⊗ η)]σ

= ([φ]σ ⊗ [η]σ) ∧ ([ψ]σ ⊗ [η)]σ.

The elements [ex]σ and [zy]σ are the null and one elements of the algebra. In fact, [φ]σ ⊗ [ex]σ = [φ]σ (as already in the information algebra) and [φ]σ ∧ [zx]σ = [φ ∧ zy]σ = [φ]σ (see lemma 6.6 (6)). Finally, we have

c c [φ]σ ∨ ([φ]σ) = [φ ∨ φ ]σ = [zx]σ. and

c c [φ]σ ∧ ([φ]σ) = [φ ∧ φ ]σ = [ex]σ.

c So, ([φ]σ) is the complement of [φ]σ. These are all the properties needed for a Boolean algebra. Hence (Φ/σ, D) is indeed a domain-free Boolean information algebra. We may finally derive a labeled version from a domain-free Boolean information algebra (Φ,D). Indeed, let x be a support of φ and y a support of ψ. Then (φ ∧ ψ)⇒x∨y = φ⇒x∨y ∧ ψ⇒x∨y = φ ∧ ψ. And we have seen that x is also a support of φc (Lemma 6.2 (2)). Hence in Φ∗ = {(φ, x): φ ∈ Φ, φ⇒x = φ} we may define

(φ, x) ∨ (ψ, y) = (φ ∨ ψ, x ∨ y) (φ, x) ∧ (ψ, y) = (φ ∧ ψ, x ∨ y) (φ, x)c = (φc, x).

Then Φ∗ is clearly a labeled Boolean information algebra. In fact, if we start with a labeled Boolean information algebra (Φ,D), then ((Φ/σ)∗,D) is isomorph to the labeled Boolean information algebra (Φ,D) (see Theorem 3.6). The other way round the same holds: If (Φ,D) is a domain-free Boolean information algebra, then (Φ∗/σ, D) is an isomorph domain-free Boolean algebra. 6.3. REPRESENTATION BY SET ALGEBRAS 125

6.3 Representation by Set Algebras

The case of finite Boolean information algebras is both important from a practical view, and particularly simple. In fact, it is well known that a finite Boolean algebra is isomorph to the subset algebra of a finite set. In this section we examine what this means for a finite Boolean information algebra. We shall show these algebras are essentially identical to subset algebras in a system of finite compatible frames (see Example 3.1). So, in some sense, there is not much room for the variety of finite Boolean information algebras. A labeled information algebra (Φ,D) is called finite, if, for all x ∈ D, the set Φx of elements with domain x is finite. If the information algebra (Φ,D) is in addition Boolean, then Φx is a finite Boolean algebra. We con- sider therefore first a number of well-known results regarding finite Boolean algebras (Davey & Priestley, 1990b). The first result is, that such a Boolean information algebra is atomic composed (see Section 4.3).

Theorem 6.2 If (Φ,D) is a finite, labeled Boolean information algebra, then ∀φ ∈ Φ,

φ = ∧At(φ).

Proof. Let x = d(φ). Clearly, there is an atom α ∈ Atx(Φ) such that φ ≤ α since the Φx is finite. Certainly φ is then a lower bound for the set At(φ). Let ψ be any lower bound of At(φ). We claim that ψ ≤ φ. If this c c were not true, then φ ⊗ ψ < zx. Choose an atom β ∈ At(φ ⊗ ψ ). Since φ ≤ φ ⊗ ψc ≤ β we have also β ∈ At(φ). So ψ ≤ β, but ψc ≤ β too. Hence ψ ⊗ ψc = z ≤ β which is a contradiction. So ψ ≤ φ and φ is the greatest lower bound of At(φ). ut Since Φx is a Boolean algebra, it follows that for any subset of atoms S ⊆ Atx(Φ) it holds that ∧S ∈ Φx. So any finite Boolean information alge- bra (Φ,D) is atomic closed. From theorem 4.7 in Section 4.3, we conclude then that a finite Boolean information algebra is isomorph to the relational algebra RAt(Φ with the the lattice F = {Atx(Φ) : x ∈ D} as domains. This is also equivalent to a subset algebra in a family of compatible frames, see Example 3.1 in Section 3.1. This is a representation theorem for finite Boolean information algebras, which is an extension of the corresponding representation theorem for finite Boolean algebras as the power set algebra of its atoms. Next we try to extend this result to a general, not necessarily finite, Boolean algebra. The way to do this, is using the ideal completion of the Boolean information algebra (see Section 4.2). So, let (Φ,D) be a labeled Boolean information algebra. Ideal completion has been defined in Section 4.2 both for domain-free and labeled information algebras. Here we con- sider a labeled Boolean information algebra (Φ,D) and consider its ideal 126 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA completion (IΦ,D) (see Section 4.2). Well known results from the theory of Boolean algebra permit to conclude that the information algebra (IΦ,D) is atomic. The atoms of IΦ are maximal ideals, a concept to be defined in the following definition:

Definition 6.3 Maximal Ideal. A proper ideal I ∈ IΦx is called maximal in Φx, if J ∈ IΦx and I ⊆ J implies J = I or J = Φx.

Maximal ideals are proper ideals which are not properly contained in another proper ideal. Now, I ⊆ J is equivalent to I ≤ J in terms of the partial order in the information algebra IΦ. This shows that maximal ideals correspond to atoms in the algebra (IΦ,D). In Boolean algebras maximal ideals have nice properties. Here is a first result (Davey & Priestley, 1990b):

Theorem 6.3 If (Φ,D) is a labeled Boolean information algebra, then the following are equivalent:

1. I is a maximal ideal in Φx.

2. I is a proper ideal, and ∀φ, ψ ∈ Φx, φ ∧ ψ ∈ I implies φ ∈ I or ψ ∈ I.

c 3. ∀φ ∈ Φx, it is the case that φ ∈ I if, and only if, φ 6∈ I.

Proof. This is a theorem of the theory of Boolean algebra (Davey & Priestley, 1990b). We prove first (1) ⇒ (2): Let I be a maximal ideal in Φx, and φ ∧ ψ ∈ I for some φ, ψ ∈ Φx. Assume φ 6∈ I. Define Iφ = {η : η ≤ φ ∨ χ for some χ ∈ I}. This is an ideal, φ ∈ Iφ and it contains I. Since I is maximal and I 6= Iφ it be that Iφ = Φx. In particular, we have zx ∈ Iφ. So, zx = φ ∨ χ for some χ ∈ I. Then

I 3 (φ ∧ χ) ∨ χ = (φ ∨ χ) ∧ (ψ ∨ χ) = ψ ∨ χ.

Since ψ ≤ ψ ∨ χ we conclude that ψ ∈ I as required. To prove (2) ⇒ (3) assume I a proper ideal, and note that, for φ ∈ Φx, c c φ ∧ φ = ex ∈ I. By (2) then either φ ∈ I or φ ∈ I. If both belong to I, c then I 3 φ ∨ φ = zx and I is not proper, contrary to the assumption. So (3) holds. Finally we prove (3) ⇒ (1): Let J be an ideal properly containing I. c c Select a φ ∈ J \ I. Then φ ∈ I ⊆ J, so zx = φ ∨ φ ∈ J, hence J = Φx. Therefore, I is maximal. ut Further, we are now in a position to show that a labeled Boolean infor- mation algebra (Φ,D) is embedded in a set algebra

Theorem 6.4 If the labeled information algebra (Φ,D) is Boolean, then it is embedded in the relational algebra (RAt(Φ,D). 6.3. REPRESENTATION BY SET ALGEBRAS 127

Proof. By Theorem 4.3 the mapping φ 7→ I(φ) is a information algebra homomorphism between (Φ,D) and its ideal completion (IΦ,D). Further, according to (1) of Theorem 6.3 above, the information algebra (IΦ,D) is atomic. Therefore, by Theorem 4.6, the mapping I 7→ At(I) is a homomor- phism between (IΦ,D) and (RAt(Φ,D). Therefore the mapping defined by φ 7→ At(I(φ)) is an information algebra homomorphism too. Now, by (2) of Theorem 6.3, if φ 6= ψ, then At(I(φ)) 6= At(I(ψ)). The mapping is thus one-to-one, hence an embedding. ut We remark that the information-algebra homomorphism considered in the proof above is in fact a Boolean algebra homomorphism for every Boolean algebra Φx. This last theorem shows that a labeled Boolean algebra is es- sentially equivalent to a certain abstract relational algebra. For domain-free Boolean information algebras, similar results can be ob- tained. Also Stone’s representation theorem could be extended along these lines to Boolean information algebras. These issues will not be developed here. 128 CHAPTER 6. BOOLEAN INFORMATION ALGEBRA Chapter 7

Independence of Information

7.1 Conditional Independence of Domains

The concepts of independence relative to information is an important struc- tural issue in the algebraic theory of information. Roughly speaking, in- dependence is thought to capture certain irrelevance relations. For exam- ple an information concerning a certain question may be entirely irrelevant with regard to another question. Or, given a certain information relative to a certain question, new information relative to a second question may be irrelevant relative to a third question, i.e. may not change the infor- mation concerning this third question. Such considerations prove not only important from a structural point of view, but are also fundamental for for- mulating requirements for a measure of information. Furthermore, it has also important consequences for computational purposes, i.e. focusing in- formation on specified questions. In this Chapter we define different notions of independence and study their properties. We start in this first section by defining conditional inde- pendence of domains or questions. This is a general and basic independence relation. In the following Section 7.2 we use this concept to define a notion of conditional independence of information which is related to a certain fac- torization of a piece or a class of information. This restricts the structure of the information to be expected in certain circumstances. The issue of factor- ization will be deepened in Section 7.3 in the context of relational algebra, or more generally, atomic algebras. It will be shown that factorization is closely related to the concept of normal forms and some forms of depen- dency defined on the level of tuples, as for example discussed in relational database theory. Let then D be lattice of domains. Then we define conditional indepen- dence of two domains x, y ∈ D given a third domain z ∈ D as follows:

Definition 7.1 Conditional Independence of Domains. Domains x, y ∈ D

129 130 CHAPTER 7. INDEPENDENCE OF INFORMATION are called conditionally independent, given a domain z ∈ D, if (x ∨ z) ∧ (y ∨ z) = z. We then write x⊥y|z. Clearly, x⊥y|z implies x∧y ≤ z. Note that, if the lattice D is distributive, then (x ∨ z) ∧ (y ∨ z) = (x ∧ y) ∨ z. Hence, in this case, x⊥y|z if, and only if, x ∧ y ≤ z. If the lattice D is at least modular, then the modular law implies (x ∨ z) ∧ (y ∨ z) = (x ∧ (y ∨ z)) ∨ z. Thus x⊥y|z holds in modular lattices if, and only if, x ∧ (y ∨ z) ≤ z. Let now D be the domain lattice of an information algebra (Φ,D). Con- ditional independence relation between domains translate into certain irrel- evance statements. Here is a first result of this kind. Theorem 7.1 If x⊥y|z, then, ∀φ, ψ ∈ Φ, d(φ) = x and d(ψ) = y imply φ→y = (φ→z)→y, ψ→x = (ψ→z)→x. (7.1) Proof. Using Lemma 3.7 (3) and (4), the definition of transport, and (x ∨ z) ∧ (y ∨ z) = z, we see that φ→y = (φ→y∨z)↓y = ((φ↓(x∨z)∧(y∨z))↑y∨z)↓y = ((φ↓z)↑y∨z)↓y = (φ→z)→y. The second part of (7.1) is proved similarly. ut This theorem says that for an information relative to a domain x, only the part concerning z is relevant for the domain y, if x and y are conditionally independent, given z. Conditional independence is also related to factorization of information. A result illustrating this, is given in the next theorem. Theorem 7.2 Let φ, ψ ∈ Φ with d(φ) = x ∨ z and d(ψ) = y ∨ z. Then x⊥y|z implies (φ ⊗ ψ)↓z = φ↓z ⊗ ψ↓z. Proof. Assume x⊥y|z, that is, (x ∨ z) ∧ (y ∨ z) = z. Then, by the combinaiton axiom, (φ ⊗ ψ)↓z = ((φ ⊗ ψ)↓x∨z)↓z = (φ ⊗ ψ↓z)↓z = φ↓z ⊗ ψ↓z. ut Again, only the part concerning z is relevant, when two pieces of infor- mation relating to conditionally independent domains given z are projected to z. Such results are of utmost importance for computation, as will be discussed in part III. Further results of this type will be given below and in the following sections. The following theorem collects a few basic properties of the ternary re- lation x⊥y|z: 7.1. CONDITIONAL INDEPENDENCE OF DOMAINS 131

Theorem 7.3 The conditional independence relation x⊥y|z in a lattice D satisfies the following properties

1. (P1) x⊥y|x,

2. (P2) x⊥y|z implies y⊥x|z,

3. (P3) x⊥y|z and w ≤ y imply x⊥w|z.

If the lattice D is modular, then following further properties hold:

1. (P4) x⊥y|z and w ≤ y imply x⊥y|(z ∨ w),

2. (P5) x⊥y|z and x⊥w|(y ∨ z) imply x⊥(y ∨ w)|z.

3. (P6) If z ≤ y and w ≤ y, then x⊥y|z and x⊥y|w imply x⊥y|(z ∧ w).

Proof. (P1) We have (x ∨ x) ∧ (y ∨ x) = x, hence x⊥y|x. (P2) Symmetry is direct consequence of the definition. (P3) Since w ≤ y we obtain z ≤ (x ∨ z) ∧ (w ∨ z) ≤ (x ∨ z) ∧ (y ∨ z) = z, hence (x ∨ z) ∧ (w ∨ z) = z or x⊥w|z. (P4) Now we assume D to be modular. As seen above, x⊥y|z holds if, and only if, x ∧ (y ∨ z) ≤ z. But then, since w ≤ y we have that x ∧ (y ∨ z ∨ w) = x ∧ (y ∨ z) ≤ z ≤ z ∨ w. This shows that x⊥y|(z ∨ w). (P6) Assume x⊥y|z and x⊥y|w and z, w ≤ y. Then x∧(y∨z) = x∧y ≤ z and x ∧ (y ∨ w) = x ∧ y ≤ w. Thus, x ∧ (y ∨ (w ∧ z)) = x ∧ y ≤ w ∧ z and (P6) follows. ut A ternary relation x⊥y|z on a lattice is called a separoid, if proper- ties (P1) to (P5) are satisfied, and a strong separoid, if furthermore (P6) holds (Dawid, 2001). Many different forms of independence occurring in domains like probability theory, statistics, logic, relational databases and elsewhere induce relations which are separoids. In particular, when the lat- tice of domains in an information algebra is distributive, like, for instance, in multivariate systems, then our conditional independence relation between domains forms a strong separoid. This is, because a distributive lattice is also modular. If the lattice is even Boolean, then separoids are equivalent to semi-graphoids, (Dawid, 2001; ?). Conversely, (Dawid, 2001) shows that if the relation x⊥y|z in a lattice is a separoid, then the lattice is modular. Now, the lattice part(S) of the partitions of a set S is not modular. Thus, in general, we have less structure than a separoid in our conditional inde- pendence relation between domains. Interestingly, this is sufficient for our purposes, for example for local computation schemes, see part III. The ben- efit of the further separoid properties is, in this context, merely to simplify concepts and proofs sometimes. We generalize now the notion of conditional independence from pairs of domains to arbitrary sets of domains: 132 CHAPTER 7. INDEPENDENCE OF INFORMATION

Definition 7.2 Conditional Independence of Families of Domains. Let {xi : i ∈ I} be a family of domains from a domain lattice D, where I is a finite set of cardinality at least 2. If z ∈ D, we call the family {xi : i ∈ I} to be conditionally independent given z, if for all disjoint, non empty subsets J, K ⊆ I, J ∩ K = ∅ we have that _ _ xj⊥ xk|z. (7.2) j∈J k∈K

We then write

⊥{xi : i ∈ I}|z.

We recall that by definition 7.1 condition (7.2) means that _ _ ( xj ∨ z) ∧ ( xk ∨ z) = z. (7.3) j∈J k∈K

In fact, it is sufficient that condition (7.2) holds for disjoint sets J and K which cover I, J∪K = I. This is so because of property (P3) of Theorem 7.3. Further, clearly, Definition 7.2 generalizes Definition 7.1, since in the case of binary sets I the former reduces to the latter definition. By convention, for all x, z ∈ D, we have {x}⊥z, since there are no nonempty pairs of subsets of a one-element set. Similarly, we admit also that ∅|z holds for all z ∈ D. If the lattice D is distributive, then _ _ _ ( xj ∨ z) ∧ ( xk ∨ z) = (xj ∧ xk) ∨ z = z. j∈J k∈K j∈J,k∈K

Therefore, xj ∧ xk ≤ z for all j, k ∈ I, j 6= k, i.e. xj⊥xk|z, holds if, and only if, ⊥{xi : i ∈ I}|z. In distributive lattices, a family of domains is conditionally independent exactly, if all pairs of elements of the family are so. We collect a few properties of conditional independence of families of domains in the next theorem.

Theorem 7.4 Assume ⊥{x1, . . . , xn}|z. Then the following properties hold:

1. If σ is a permutation of {1, . . . , n}, then ⊥{xσ(1), . . . , xσ(n)}|z.

2. If J ⊆ {1, . . . , n}, and |J| ≥ 2, then ⊥{xj : j ∈ J}|z.

3. If y1 ≤ x1 then ⊥{y1, x2, . . . , xn}|z.

4. ⊥{x1 ∨ x2, x3, . . . , xn}|z.

5. ⊥{(x1 ∨ z),..., (xn ∨ z)}|z. 7.1. CONDITIONAL INDEPENDENCE OF DOMAINS 133

Proof. All four properties follow immediately from (7.3). ut We may also generalize Theorem 7.2.

Theorem 7.5 If ⊥{x1, . . . , xn}|z and

φ = ψ1 ⊗ · · · ⊗ ψn, d(ψi) = xi ∨ z for i = 1, . . . n, then

↓z ↓z ↓z φ = ψ1 ⊗ · · · ⊗ ψn ,

Proof. Define for J ⊆ {1, . . . , n}, O φJ = ψJ . j∈J

Then we have φ = ψ1 ⊗ φI\{1} and, by Theorem 7.4 (4)

x1⊥(x2 ∨ · · · ∨ xn)|z.

Thus by Theorem 7.2 we have

↓z ↓z ↓z φ = ψ1 ⊗ φI\{1},

The theorem follows then by recursion on the second factor in this formula. ut We illustrate these notions with two of our basic examples of lattices of domains.

Example 7.1 Multivariate Systems. In this example we may identify the lattice D of domains with a lattice of subsets of variables (see Example 2.1). This lattice is distributive. Thus, if I, J, k are subsets of variables, we have I⊥J|K if, and only if I ∩ J ⊆ K. The multivariate systems is the case most often considered when discussing conditional independence, see for example (Studeny, 1993).

Example 7.2 Partitions. Partition lattices are the most general lattices. Hence it is interesting to see how conditional independence presents itself in partition structures (see Example 2.2). The following theorem shows how conditional independence of partition can be expressed in terms of the blocks of partitions.

Theorem 7.6 Let P, P1 and P2 be partitions of an universe S. Then P1⊥P2|P if, and only if, for all P ∈ P, P1 ∈ P1 and P2 ∈ P2, P1 ∩ P 6= ∅ and P2 ∩ P 6= ∅ imply P1 ∩ P2 ∩ P 6= ∅. 134 CHAPTER 7. INDEPENDENCE OF INFORMATION

Proof. We have to prove that (P1 ∨ P) ∧ (P2 ∨ P) = P. It is sufficient to show that (P1 ∨ P) ∧ (P2 ∨ P) ≤ P, since the other inequality always holds. This means that any block of the left hand side of the inequality is contained in a block of the right hand side. This in turn holds if two elements a, b are in a block of (P1 ∨ P) ∧ (P2 ∨ P) ≤ P, then they are also in a block of P. We use Theorem 2.1. So, if a ∈ P1 ∪ P and b ∈ P2 ∪ P for some blocks P ∈ P, P1 ∈ P1 and P2 ∈ P2, then a and b are in the same block of the meet (P1 ∨P)∧(P2 ∨P) ≤ P, if, and only if there is an element c ∈ P1 ∩P2 ∩P . So, if both P1 ∩P and P2 ∩P are not empty, then P1 ∩P2 ∩P is not empty either. Hence, this condition is necessary and sufficient for (P1 ∨ P)∧(P2 ∨ P) ≤ P. This proves the theorem. ut Note that blocks of a partition correspond to atoms of the domain. The- orem 7.6 is an example of a result, which relates conditional independence to atoms. More results of this kind are presented in Section 7.3. The condi- tions of Theorem 7.6 have an interpretation in terms of irrelevance: Given the information that the unknown element of S belongs to the block P of the partition P, the further information that the element belongs to a certain block P1 of partition P1 restricts in no way the possible blocks of partition P2 and vice versa. In other words, given an information on partition P, an information on partition P1 gives no information on P2 and vice versa, if P1⊥P2|P. We remark that Theorem 7.6 and the ideas related to it can be extended to families of compatible frames (see Example 2.3).

Suppose now that the lattice D has a bottom element ⊥ and that Φ⊥ = {e⊥, z⊥}. Important examples like multivariate systems or the lattice part(S) satisfy these conditions. If then x⊥y|⊥, i.e. if

x ∧ y = ⊥, we call x and y independent and write x⊥y. We have then for all φ ∈ Φ ↓⊥ that φ 6= zd(φ) implies φ = e⊥. Thus, if x⊥y, be Theorem 7.1 we have, if d(φ) = x and d(ψ) = y,

→y →x φ = ey, ψ = ex, and, if η = φ ⊗ ψ,

η↓x = φ, η↓y = ψ,

Hence

η = η↓x ⊗ η↓y.

Thus, if x⊥y, then an information on x contains no information on y and vice versa. 7.1. CONDITIONAL INDEPENDENCE OF DOMAINS 135

This can be extended to families of domains. We say that {x1, . . . , xn} are independent, and write ⊥{x1, . . . , xn}, if ⊥{x1, . . . , xn}|⊥. It follows from (7.3), if z = ⊥, that this implies

xj ∧ xk = ⊥ (7.4) for all 1 ≤ j < k ≤ n, or that xj⊥kk. If the lattice D is distributive, then ⊥{x1, . . . , xn} is equivalent to the pairwise indepence condition (7.4). Theorem 7.4 carries immediately over to independence:

Corollary 7.1 Assume ⊥{x1, . . . , xn}. Then the following properties hold:

1. If σ is a permutation of {1, . . . , n}, then ⊥{xσ(1), . . . , xσ(n)}.

2. If J ⊆ {1, . . . , n}, and |J| ≥ 2, then ⊥{xj : J ∈ J}.

3. ⊥{x1 ∨ x2, x3, . . . , xn}.

4. ⊥{(x1 ∨ z),..., (xn ∨ z)}. Let’s again consider our basic examples of lattices of domains.

Example 7.3 Multivariate Systems. In multivariate Sysstems, we have I⊥J if, and only if, I ∩ J = ∅ (compare Exampleexpl:CondIndepMultiVar).

Example 7.4 Partitions. For partition structures it follows from Theorem 7.6 (see Example 7.2) that P1⊥P2 if, and only if, for all blocks P1 ∈ P1 and P2 ∈ P2 it holds that P1 ∪ P2 6= ∅. If P1⊥P2, then an information on one partition gives no information on the other one.

We turn now to the study of certain families of conditional independence relations, which are in particular import for computational purposes (see part IIII). Consider a tree T = (V,E) with set of nodes V and set of edges E ⊆ V 2, where V 2 is the family of two-element subsets of V . Let λ : V → D be a labeling of nodes of T with domains from the lattice D. The domain λ(v) is called the label of node v. The pair (T, λ) is called a labeled tree. By ne(v) we denote the neighbors of node v, i.e. ne(v) = {w ∈ V : {v, w} ∈}. When we eliminate a node v from the tree (together with the edges incident to v), then a family of subtrees {Tv,w = (Vv,w,Ev,w): w ∈ ne(v)} of T are created. Further, for any subset U ⊆ V of nodes define

λ(U) = ∨v∈U λ(v). This leads then to the following definition:

Definition 7.3 Markov Tree. A labeled tree (T, λ) with T = (V,E) is called a Markov Tree, if ∀v ∈ V ,

⊥{λ(Vv,w): w ∈ ne(v)}|λ(v). (7.5) 136 CHAPTER 7. INDEPENDENCE OF INFORMATION

The term Markov Tree has been introduced in (Shafer et al., 1987) with respect to partition lattices and for the purpose of efficient information pro- cessing. We refer also to (Kohlas & Monney, 1995) for a discussion of this concept in the context of families of compatible frames. We prove two funda- mental theorems, whose proofs are adapted from (Kohlas & Monney, 1995). The first theorem tells us that there is conditional independence between the domains of node v and the domains covered by a neighboring subtree Tv,w, given, the doamin of the associated neighbor w.

Theorem 7.7 If (T, λ) is a Markov tree, then, ∀w ∈ ne(v),

λ(Vv,w)⊥λ(v)|λ(w). (7.6)

Proof. For w ∈ ne(v), the Markov property (7.5) reads

⊥{λ(Vw,u): u ∈ ne(w)}|λ(w).

Note further that _ λ(Vv,w) = λ(Vw,u) ∨ λ(w). u∈ne(w)−{v}

Thus, using Theorem 7.4 (4) we derive _ λ(Vw,u)⊥λ(Vw,v)|λ(w). u∈ne(w)−{v}

Further, by Theorem 7.4 (5) we obtain that

λ(Vv,w)⊥λ(Vw,v)|λ(w).

Finally, since λ(v) ≤ λ(Vw,v), we conclude (7.6) from Theorem 7.4 , which proves the theorem. ut The fact, expressed in the next theorem, that any subtree of a Markov tree remains a Markov tree, is important for purposes of induction and recursion, as will be seen later.

Theorem 7.8 If (T, λ) is a Markov tree, then every subtree is also a Markov tree.

Proof. Let T 0 = (V 0,E0) be a subtree of T and λ0 the restriction of λ to V 0. We prove that (T 0, λ0) is a markov tree, if (T, λ) is a Markov tree. Consider a node v ∈ V 0 and let ne0(v) be the set of its neighbors in T 0. Also 0 0 0 0 consider the subtrees Tv,w for w ∈ ne (v) relative to the tree T and let Vv,w 0 0 be the node sets of these subtree. Note that ne (v) ⊆ ne(v) and Vv,w ⊆ Vv,w, 7.2. FACTORIZATION OF INFORMATION 137

0 0 0 such that λ (Vv,w) ≤ λ(Vv,w) for all w ∈ ne (v). Therefore, by (2) and (3) of Theorem 7.4 we conclude that for all v ∈ V 0

0 0 0 0 ⊥{λ (Vv,w): w ∈ ne (v)}|λ (v).

This proves that (T 0, λ0) is a Markov tree. ut This Markov tree structure of conditional independence becomes some- what simpler in the case of multivariate structures.

Example 7.5 Multivariate Structures: Join Trees. as usual in multivariate structure we may work with the lattice of subsets of variables. The labeling function of a tree maps then nodes into sets. The label λ(v) of a node is thus a set of variables associated with this node, and λ(Vv,w for a neighbor w ∈ ne(v) is the set of all variables contained in the label of one of the nodes of Vv,w. Then, a tree T = (V,E) with a labeling function λ is a Markov tree, if, for all nodes v ∈ V and neigbors w, u of v,

λ(Vv,w) ∩ λ(Vv,u) ⊆ λ(v). (7.7)

In the case of multivariate structures Markov trees are usually called join trees, sometimes also junction trees. They can also be characterized in a different way:

Theorem 7.9 A couple (T, λ) of a tree T = (V,E) with a labeling function in a lattice of subsets is a join tree if, and only if, ∀v, w ∈ V , X ∈ λ(v) and X ∈ λ(w) implies X ∈ λ(u) for all nodes u on the path from v to w (this is called the running intersection property).

Proof. Assume (T, λ) to be a join tree. Consider any node u on the path from v to w and let Tu,r and Tu,s be the subtrees containg nodes v and w respectively. If X ∈ λ(v) and X ∈ λ(w), then X ∈ λ(Vu,r and X ∈ λ(Vu,s). By (7.7) it follows that X ∈ λ(u). Thus the running intersection property is satisfied. Conversley, assume that the runningintersection property is satisfied and fix a node v togeht with two neighboring nodes w and u. If X ∈ Tv,w and x ∈ Tv,u, then, by the running intersection property X ∈ λ(v). Thus (7.7) holds and (T, λ) is a join tree. ut Join trees are conditional independence structures which are used in many fields, including probabilistic networks (Lauritzen & Spiegelhalter, 1988) and relational databases (Maier, 1983).

7.2 Factorization of Information

In this section we will relate conditional independence structure to certain factorizations of information. This is motivated by fields like probability 138 CHAPTER 7. INDEPENDENCE OF INFORMATION theory and relational databases, where conditional independence is defined in terms of decompositions of information into certain factors. Also con- ditional independence and corresponding factorizations of information is, in the case of multivariate systems, related to certain graphical structures. The latter were a strong motivation, in domains like probabilistic expert systems, to develop the theory of graphical representation of independence structures. The main point is that conditional independence of information is an information about information. It is meta-information. It tells us, that the information to be considered satisfies certain independence constraints, which are given by the semantics of the problem at hand. In probability net- works, for instance, one often starts qualitatively with specifying that certain variables are conditional independent, given some other variables. This im- plies then that the only numerical probability distributions to be considered satisfy these contraints, which can be expressed by a certain factorization of the distribution (see for example (Shafer, 1996; Cowell et al., 1999)). A similar situation arises in relational databases. This will be discussed in the following Section 7.3. We are going to capture this idea by considering con- ditionally independent domains and then constraining information relative to these domains to respect this conditional independence. Let thus (Φ,D) be an information algebra, and x⊥y|z in D. Then we define

→x∨y∨z Φx⊥y|z = {φ ∈ Φ: φ = ψ1 ⊗ ψ2, d(ψ1) = x ∨ z, d(ψ2) = y ∨ z}.

This delimits all pieces of information, which respect the conditional inde- pendence of the domains in a sense which generalizes the familiar notion of conditional independence in probabilty theory (Cowell et al., 1999). The generalization is twofold: First, we do not limit ourselves to lattices of sub- sets of variables as usual in probaiblity theory, but consider general lattices D of domains, second we do not limit ourselves to a particular formalism like probability theory, but try to work out the general, generic properties for information algebras. The latter point however means that idempotency is added, a property not present in probability theory. For a general theory without this additional property, i.e. a theory of conditional independy in valuation algebras, see (Kohlas, 2003).

Theorem 7.10 If x⊥y|z, then ∀φ ∈ Φx⊥y|z

→x∨z ↓z φ = ψ1 ⊗ ψ2 , →y∨z ↓z φ = ψ1 ⊗ ψ2, →z ↓z ↓z φ = ψ1 ⊗ ψ2 . (7.8) 7.2. FACTORIZATION OF INFORMATION 139

Proof. Here we use the combination axiom and (x ∨ z) ∧ (y ∨ z) = z to obtain

→x∨z →x∨y∨z ↓x∨z ↓x∨z φ = (φ ) = (ψ1 ⊗ ψ2) ↓(x∨z)∧(y∨z) ↓z = ψ1 ⊗ ψ2 = ψ1 ⊗ ψ2 . (7.9)

The second identity of the theorem is obtained in the same way. Then, using these results and once more the combination axiom

→z →x∨z ↓z ↓z ↓z φ = (φ ) = (ψ1 ⊗ ψ2 ) ↓z ↓z = ψ1 ⊗ ψ2 .

ut If x⊥y|z, then, for the combination of an information relative to x with another one relative to y, only the parts relative to z are relevant. This result is particularly important for computational purposes. If an infor- mation is given as a combination over conditionally independent domains, then in order to focus it to the conditioning domain, it is not necessary to first compute the combination. It is sufficient to focus both factors and then combine them. This will be exploited in part IIII for so-called local computation schemes. We have the following fundamental theorem about factorizations:

Theorem 7.11 If x⊥y|z, then ∀φ ∈ Φx⊥y|z

φ→x∨y∨z = φ→x∨z ⊗ φ→y∨z. (7.10)

Proof. By idempotency we have

→x∨y∨z ↓z ↓z φ = ψ1 ⊗ ψ2 ⊗ ψ1 ⊗ ψ2.

Then (7.10) follows from (7.11). ut In relational database theory similar properties are called lossless decom- positions (Maier, 1983; Date, 1975b) The results above can be generalized to families of conditional indepen- dent domains. Assume ⊥{x1, . . . , xn}|z and define

Φ⊥{x1,...,xn}|z →x∨...∨xn∨∨z = {φ ∈ Φ: φ = ψ1 ⊗ · · · ψn, d(ψi) = xi ∨ z, i = 1, . . . , n}.

Then the following irrelevance result holds:

Theorem 7.12 If ⊥{x1, . . . , xn}|z, then ∀φ ∈ Φ⊥{x1,...,xn}|z

→z ↓z ↓z φ = ψ1 ⊗ · · · ⊗ ψn . (7.11) 140 CHAPTER 7. INDEPENDENCE OF INFORMATION

Proof. The conditional independence relation ⊥{x1, . . . , xn}|z implies also the relations x1⊥{x2 ∨ ... ∨ xn}|z and ⊥{x2, . . . , xn}|z. Let

n ψ1 = ψ2 ⊗ · · · ⊗ ψn. (7.12)

n Note that d(ψ1 ) = x2 ∨ ... ∨ xn ∨ z. Then, by Theorem 7.10 we obtain

↓z ↓z n↓z φ = ψ1 ⊗ ψ1 .

The identity (7.11) follows from this by recursion. ut The result on lossless decomposition, Theorem 7.11, generalizes also:

Theorem 7.13 If ⊥{x1, . . . , xn}|z, then ∀φ ∈ Φ⊥{x1,...,xn}|z

→x1∨...xn∨z →x1∨z →xn∨z φ = φ1 ⊗ · · · ⊗ φn . (7.13)

n Proof. Let ψ1 be defined as in (7.12), then, since ⊥{x1, . . . , xn}|z implies x1⊥{x2 ∨ ... ∨ xn}|z, hence (x1 ∨ z) ∧ (x2 ∨ ... ∨ xn ∨ z) = z, it follows form the combination axiom that

→x1∨z n↓z φ = ψ1 ⊗ ψ1 .

But since ⊥{x1, . . . , xn}|z implies also ⊥{x2, . . . , xn}|z, we conclude by The- orem 7.12 that

n↓z ↓z ↓z ψ1 = ψ2 ⊗ · · · ⊗ ψ2 .

We obtain similar results for i = 2, . . . , n,

→xi∨z n↓z φ = ψi ⊗ ψi ,

n where ψi is the combination of all ψi from 1 to n, except ψi. Therefore, using idempotency, we obtain

→x1∨...xn∨z n↓z n↓z →x1∨z →xn∨z φ = (ψ1 ⊗ ψ1 ) ⊗ · · · ⊗ (ψn ⊗ ψn ) = φ ⊗ · · · ⊗ φ .

The concludes the proof. ut We may of course apply these results to Markov trees. For instance, if M = (T, λ) is a Markov tree we define

→∨v∈V λ(v) O ΦM = {φ ∈ Φ: φ = ψv, d(ψ(v)) = λ(v)}. v∈V

Then the following irrelevance theorem holds: 7.3. ATOMIC ALGEBRAS AND NORMAL FORMS 141

Theorem 7.14 Let M = (T, λ) be a Markove tree, with T = (V,E). If φ ∈ ΦM , i.e.

→∨v∈V λ(v) O φ = ψv, with d(ψ(v)) = λ(v), v∈V then, ∀v ∈ V ,

→λ(v) O →λ(v) O ↓λ(w) →λ(v) φ = ψv ⊗ φ = ψv ⊗ (φv,w ) , (7.14) w∈ne(v) w∈ne(v) where O φv,w = ψu,

u∈λ(Vv,w)

Proof. In a Markov tree (T, λ) we have by definition ⊥{λ(Vv,w : w ∈ ne(v)}|λ(v). Note then that, by idempotency

→∨v∈V λ(v) O O φ = ψv ⊗ φv,w = (ψv ⊗ φv,w) w∈ne(v) w∈ne(v)

This shows that φ ∈ Φ⊥{λ(Vv,w:w∈ne(v)}|λ(v). It follows therefore from Theo- rem 7.12 that

→λ(v) O ↓λ(v) φ = (ψv ⊗ φv,w) . w∈ne(v)

Applying the combination axiom and once more idempotency, we obtain from this

→λ(v) O  ↓λ(v)∧λ(Vv,w) O →λ(v) φ = ψv ⊗ φv,w = ψv ⊗ φ . w∈ne(v) w∈ne(v)

This proves the first part of (7.14). Further, by Theorem 7.7 we have also λ(Vv,w)⊥λ(v)|λ(w) This implies then (see Theorem 7.1)

→λ(v) →λ(w) →λ(v) φv,w = (φv,w ) .

This proves the second part of (7.14). ut This irrelevance theorem proves to be the base of local computation, see part III.

7.3 Atomic Algebras and Normal Forms 142 CHAPTER 7. INDEPENDENCE OF INFORMATION Chapter 8

Measure of Information

8.1 Information Measure

In the literature the term “information theory” is far almost exclusively used as a theory for measurement of information content. This is clearly the case for Shannon’s theory of information (Shannon, 1948), where the concept of entropy is used to measure information content. It is also the case for algo- rithmic information theory (Chaitin, 2001; Kolmogorov, 1965; Kolmogorov, 1968; Li & Vitanyi, 1993) where the length of a shortest program is for com- puting an object is considered as measure of the information content of the object. In statistics still other measures are proposed (Kullback, 1959). A newer theory going beyond these classical measures is proposed in (Kahre, 2002) which allows for many different measures as long as they satisfy the law of diminishing information. Essentially we shall base our considerations on Shannon’s concept of in- formation, where information is measured by the amount it reduces uncer- tainty. Shannon’s theory is a statistical one, where uncertainty is measured by entropy. At this point however, no probabilistic elements are present in our theory of information algebra. Therefore, the measure of uncertainty we shall use is Hartley’s measure rather than entropy. We expect our quatitative measure of information to respect the partial order of information introduced in Chapter 4.

8.2 Dual Information in Boolean Information Al- gebras

143 144 CHAPTER 8. MEASURE OF INFORMATION Chapter 9

Topology of information Algebras

9.1 Observable Properties and Topology

It is noted in (Smyth, 1992) that topology ties up with the idea of observable properties. In more concrete terms, in (Smyth, 1992) a device which outputs binary symbols is considered. An observer inspects the output sequence as it proceeds. He wants to detect certain properties of the output sequence, but his decision whether a property is present or not, must depend on the finite sequence output so far. Obviously there are properties of the output sequence which can can never be decided on the base of finite sequences only: For example the property the there s a finite number of occurences symbol 000“ only. On the other hand, a property like that there are at least hundert occurences of the symbol 000“ can be detected in a finite sequence by the observer. Note however that the absence of this property cannot be detected in finite sequences. We do not require that both the presence and absence of a property be observable. Technically, the condition of finite observability of a property A in a binary output sequence may be stated as follows: We consider infinite se- quences σ ∈ {0, 1}∞ of binary symbols. An observable property A is then a subset of sequences in {0, 1}∞ such that if σ ∈ A, then there is some fi- nite initial segment σ↓n of n symbols such that any infinite extension of σ↓n belongs to A. This says that if property A holds on some output sequence, then some finite initial segment of the sequence is sufficient to detect the property A. From this consideration two important closure properties of observable properties are derived in(Smyth, 1992):

1. Finite Conjunction: Suppose that A1,...,An are observable proper- ties, and A1 ∩...∩An holds on a sequence σ. Then σ ∈ A1 ∩...∩An is clearly secured in a finite initial subsequence of σ, since each property

145 146 CHAPTER 9. TOPOLOGY OF INFORMATION ALGEBRAS

A1,...,An is so. Hence, if A1,...,An are observable properties, then σ ∈ A1 ∩ ... ∩ An is observable too. 2. Arbitrary Disjunction: Suppose A is a collection of observable prop- erties.If σ ∈ ∪A, then σ ∈ A for some A ∈ A. Since A is observable, σ ∈ A is detected in some finite initial sequence. But then σ ∈ ∪A is secured too. Thus A is observable too.

But this are exactly the properties of open sets in a topology. In other words, observable properties as defined above form the open sets of a topology in the set σ ∈ {0, 1}∞ of infinite binary sequences. Somehow (finitely) observable properties remind of (finite) information. In fact, topology enrich domain theory very much, as is well known. Do- main theory is about information, although concentrating on approximation issues. Information algebras are about information in more respect, adding information aggregation and information extraction, but otherwise closely related to domains. So it will be worthwhile to examine topology in relation to information algebra, and this is what will be done in this chapter. To start with, we put the example above into a more formal perspective offered by considering strings as elements of an information algebra.

Example 9.1 An Algebra of Strings.

9.2 Alexandrov Topology of Information Algebras

9.3 Scott Topology of Compact Information Alge- bras

9.4 Topology of Labeled Algebras

9.5 Representation Theory of Information Alge- bras Chapter 10

Algebras of Subalgebras

10.1 Subalgebras of Domain-free Algebras

10.2 Subalgebras of Labeled Algebras

10.3 Compact Algebra of Subalgebras

10.4 Subalgebras of Compact Algebras

147 148 CHAPTER 10. ALGEBRAS OF SUBALGEBRAS Part II

Local Computation

149

Chapter 11

Local Computation on Markov Trees

11.1 Problem and Complexity Considerations

Information comes usually in pieces. It must be combined and then focused onto the domains of interest. This situation can be formulated within an ab- stract information algebra (?). Let therefore (Φ,D) be an iformation algebra with information elements φ ∈ Φ) and a lattice of domains D. Suppose n pieces of information φi, i01, . . . , n given. The combined, total information is then

φ = φ1 ⊗ · · · ⊗ φn.

Suppose s ∈ D is a domain of interst, representing a question of interest. If we want to extract information relative domain s, then we focus φ to s, i.e. we compute

φ→s.

If we view this problem from a computational point of view, then the straight forward approach is to compute sequentially the partial combina- tions

φ1 ⊗ · · · ⊗ φi for i = 2, . . . , n and then, at the end, the focus φ→s. These partial combi- nations yield informations on increasing domains by the labeling axiom:

d(φ1) ≤ d(φ1) ∨ d(φ2) ≤ · · · ≤ d(φ1) ∨ d(φ2) ∨ · · · ∨ d(φi) ≤ · · · .

At the end we arrive at the domain d(φ1) ∨ d(φ2) ∨ · · · ∨ d(φn). Now we may assume that both the storage required as well as the time of elementary op- erations like combination and projection growths with increasing domains

151 152 CHAPTER 11. LOCAL COMPUTATION ON MARKOV TREES

(see examples below). The growth may even be so, that storage or com- putation on big domains become infeasible. This indicates that the naive, straight forward solution of the projection problem posed above becomes inefficient or even infeasible. It is also conceivable that in practical prob- lems the prieces of information φi are stored on distributed processors. The direct method just described may then also generate costly or infeasible communication costs between processors. To clarify the situation let us consider some instances of information algebra which have been introduced in (?)

Example 11.1 Subsets and Indicator Functions

Example 11.2 Propositional Logic

Example 11.3 Relational Databases

Example 11.4 Partitions

Example 11.5 Linear Systems and Linear Manifolds

Example 11.6 Convex Polyhedra

In a simplyfing view we may, at least in some cases, assume that lattice D decomposes into two disjoint subsets Df and Di, where for x ∈ Df and y ∈ Di we never have y ≤ x. Intuitively in Df we have the smaller domains on which storing information and computing is feasible (whence index f), whereas Di contains the larger domains where storing and computation be- comes infeasible (whence index i). In a multivariate system we may, for example, admit that for domains with up to n variables storage and compu- tation is still feasible, whereas for domains with more than n variables this becomes infeasible. We need then computations with intermediate results which never leeave the feasible part Df of the lattice. We shall see that, if the domains d(ψi as well as the query domain s of a projection proble as defined above, belong all to Df , then in many cases we can arrange computations so as to satisfy these feasibility requirements. This is true even if the domain of the total information

d(φ1) ∨ d(φ2) ∨ · · · ∨ d(φn) ≤ · · · . 11.2. MARKOV TREES 153 is in not feasible,, i.e. belongs to Di. This is achieved by so-called local computation techniques. They make use of conditional independence rela- tions and the corresponding factorization theorem (see Theorem 17 in (?)). These conditional independence relations are in turn associated with cer- tain tree structures, called Markov trees or, in more particular cases, also join trees. In the present Chapter we define the concept of Markov trees and develop the local computation techniques based on this concept. In the following Chapter ?? the particular case of multivariate systems is con- sidered. There are specific methods and results for this case. Next local computation in Boolean information algebras is studied. In this case the basic projection problem may be generalized. Also certain information alge- bras like atomic ones can be embedded into Boolean information algebras, and special representations (normal forms) of information are then available. Local computation can then be executed using these normal forms.

11.2 Markov Trees

In this section we introduce the concept of a Markov tree. This is a graph- ical structure, representing an organization of domains, which proves fun- damental for local computation techniques. In the multivariate setting it corresponds to the concepts of join, junction trees or hypertrees, which are well known in different fields like probabilistic networks, relational databases and constraint solving. This particular case will be treated in the following Chapter 12. Let T = (V,E) be a tree with set of nodes V and edges E ⊆ V 2, where V 2 is the family of all two-element subsets of V . We recall a certain number of notions related to trees: A path between two nodes v, u ∈ V is a sequence of distinct nodes v1, v2, . . . , vm from V , such that v1 = v, vm = u and {vi, vi+1} ∈ E. In a tree there is always exactly one path between any pair of nodes. A node u is a neighbor of another node v ∈ V , if {u, v} ∈ E}. In a tree, every node has at least one neighbor. The set of all neighbors of a node v is denoted by ne(v). If a node v has only one neighbor, it is called a leaf. A tree always has leafs. A subtree of a tree T = (V,E) is a tree T 0 = (V 0,E0) with V 0 ⊆ V and E0 ⊆ E. If a node v ∈ V is eliminated from a tree (together with all edges it is contained in), then the tree decomposes into one or several (sub-) trees, one for every neighbor u ∈ ne(v). We denote these subtrees be Tvu = (Vvu,Ev.u). Pathes in T between nodes in different subtrees Tvu pass through node v. Let now D be any lattice (of domains). If T 0(V,E) is a tree, we introduce a mapping λ : V → D, which assigns a domain λ(v) to every node v ∈ V . This is called a labeling function. If U is a subset of nodes, U ⊆ V , then we 154 CHAPTER 11. LOCAL COMPUTATION ON MARKOV TREES define _ λ(V ) = λ(u). u∈U Now we are in a position to define the concept of a Markov tree relative to such a lattice as follows:

Definition 11.1 Markov Tree. Let T = (V,E) be a tree and λ : V → D a labeling function, which assigns a domain from D to every node in V . Then TM = (V, E, λ) is called a Markov tree, if

⊥{λ(Vvu): u ∈ ne(v)}|λ(v), ∀v ∈ V. (11.1)

We recall from (?) the definition of conditional independence. So, if ne(v) contains two or more nodes, and U1 and U2 are disjoint subsets of ne(v), then

λ(∨u∈U1 Vvu)⊥λ(∨u∈U2 Vvu)|λ(v).

In particular, if w1 ∈ Vvu1 and w2 ∈ Vvu2 are nodes in two different subtrees, then

λ(w1)⊥λ(w2)|λ(v). (11.2)

We remark also that in the case of a single neighbor (i.e. for leafs), condition (11.1) is satisfied. We remind also that always ⊥∅|z holds. If D is the lattice of subsets of a reference set S, then a Markov tree with reference to this lattice is also called a join tree. This is the case in multivariate systems, as discussed later in Chapter 12. In this case (11.2) becomes

λ(w1) ∩ λ(w2) ⊆ λ(v). (11.3)

And this condition is now equivalent to (11.1). Therefore, a labeled tree is a join tree, if for any pair of nodes w1 and w2 condition (11.3) holds for any node on the path between w1 and w2. The condition (11.3) is also called the running intersection property. We shall see later that join trees are in many respects easier to handle than Markov trees. But they are also less general. We formulate and prove two basic theorems related to Markov trees.

Theorem 11.1 Any subtree of Markov tree is also a Markov tree.

0 0 0 0 Proof. Let TM = (V, E, λ) be a Markov tree and T = (V ,E , λ ) a 0 0 0 0 subtree of TM such that V ⊆ V , E ⊆ E and λ is the restriction of λ to V . 0 0 0 Consider a node v ∈ V and define ne (v) and Vvu relative to the subtree 0 0 0 0 TM . Then we have ne (v) ⊆ ne(v) and Vvu ⊆ Vvu for all u ∈ ne (v). This 11.2. MARKOV TREES 155

0 0 0 implies also λ (Vvu) ≤ λ(Vvu) for all u ∈ ne (v). But then the independence relation (11.1) implies that (Theorem 16 (2) in (?))

0 0 ⊥{λ(Vvu): u ∈ ne (v)}|λ(v), ∀v ∈ V .

This in turn implies

0 0 0 0 0 ⊥{λ (Vvu): u ∈ ne (v)}|λ (v), ∀v ∈ V

0 0 0 0 0 since λ (v) = λ(v) and λ (Vvu) = λ(Vvu) ≤ λ(Vvu). This shows that TM is a Markov tree. ut Another theorem derives for Markov trees a new independence relation, which is important for local computation (see next Section 11.3

Theorem 11.2 Let TM = (V, E, λ) be a Markov trre. Then ∀v ∈ V and ∀u ∈ ne(v), it holds that

λ(Vvu)⊥λ(v)|λ(u).

Proof. By the Markov property we have

⊥{λ(Vuw : w ∈ ne(u)}|λ(u) and _ λ(Vvu) = λ(Vuw) ∨ λ(u). (11.4) w∈ne(u)\{v}

Using Theorem 16 (4) in Chapter 5 of (?)) we deduce _ λ(Vuw)⊥λ(Vuv)|λ(u). w∈ne(u)\{v}

By Theorem 16 (5) in Chapter 5 of (?)) we can then conclude, using (11.4), that

λ(Vvu)⊥λ(Vuv)|λ(u).

Finally, since λ(v) ≤ λ(Vuv) we infer with the aid of Theorem 16 (3) in Chapter 5 of (?)) that

λ(Vvu)⊥λ(v)|λ(u) and this concludes the proof. ut Given a Markov tree TM = (V, E, λ), we consider now information φ ∈ Φ, which factors according to the nodes of a Markove tree. Thus we define

→λ(V ) O ΦM = {φ ∈ Φ: φ = ψv, d(ψv = λv}. v∈V 156 CHAPTER 11. LOCAL COMPUTATION ON MARKOV TREES

We shall examine in the next Section 11.3 the problem of computing φ→v for one or all nodes v ∈ V and for φ ∈ ΦM . Note that for all nodes v ∈ V

→λv →λ(V ) ↓λ(v) O ↓λ(v) φ = (φ ) = ( ψv) . v∈V So, it is actually the problem of projection a certain factorization according to a Markov tree to one or all nodes of the tree. We call this a projection problem. This problem has a so-called local computation solution, which organizes the computatuion in such a way that never information on do- mains larger than the domains λ(v) have to be stored and never operations, i.e. combinations and projections on domains larger than λ(v) have to be executed, see Section 11.3. So, if all λ(v) belong to the feasible domains Df , the problem is efficiently solvable, although the domain of φ itself or of λ(V ) may be in Di. For φ ∈ ΦM the following basic factorization theorem holds:

Theorem 11.3 Markov Tree Factorization Theorem. Let TM = (V, E, λ) be a Markov tree. Then, ∀φ ∈ ΦM and ∀v ∈ V ,   →λ(v) O ↓λ(u) →λ(v) φ = ψv ⊗  (φvu )  , (11.5) u∈ne(v) where O φvu = ψw.

w∈Vvu

Proof. Note first, that because of idempotency we have  ! O O φ = ψv ⊗  ψw  u∈ne(v) w∈Vvu  ! O O =  ψv ⊗ ψw  u∈ne(v) w∈Vvu

This shows that φ ∈ Φ , since d(ψ ⊗ N ψ ) = ⊥{λ(Vvu):u∈ne(v)}|λ(v) v w∈Vvu w λ(v) ∨ λ(Vvu). Therefore, Theorem 17 in (?) implies that

!↓λ(v) →λ(v) O O φ = ψv ⊗ ψw

u∈ne(v) w∈Vvu

 !↓λ(v)∧λ(Vvu) O O = ψv ⊗ ψw  u∈ne(v) w∈Vvu 11.2. MARKOV TREES 157

O ↓λ(v)∧λ(Vvu) = ψv ⊗ φuv u∈ne(v) O →λ(v) = ψv ⊗ φuv . u∈ne(v)

Next, we remark that by Theorem 11.2 λ(Vvu)⊥λ(v)|λ(u). Therefore Theo- →λ(v) →λ(u) →λ(v) rem 14 in (?) shows that φuv = (φ ) . So we obtain finally   →λ(v) O ↓λ(u) →λ(v) φ = ψv ⊗  (φvu )  , (11.6) u∈ne(v) and this concludes the proof. ut This theorem provides the base for local computation procedures. in ↓λ(u) fact, according to (11.6), once the φvu is computed for all neighbors u of node v, the remaining combinations and Projections take place within the domain λ(v). But, since the subtrees Tvu = (Vvu,Evu, λ) are still markov ↓λ(u) trees, for the computation of any of the φvu we have a similar factorization theorem, such that, by recursion, all combinations and projections finally can be constrained to domains λ(w) of the nodes w of the Markov tree. This computational scheme will be worked out in detail in Section 11.3. The membership of φ to a ΦM for some Markov tree TM may appear to be very restrictive. It is however much less so that it appears at first sight. In application we may assume that a total information φ is given by pieces, say φ1, . . . , φn with domains d(φi = xi, such that

φ = φ1 ⊗ · · · ⊗ φn.

On the other hand we may be interested in focusing the total information φ to a certain number of domains s1, . . . , sm. We may assume without loss of generality that

n _ si ≤ z = xj, for i = 1, . . . , m. j=1

0 0 Otherwise we could replace si by si = si ∧ z, and focus φ to si. The →s ↓s0 ↑s information φ i equals then simply the vacuous extension (φ i ) i . A Markov tree TM is called a covering Markov tree for the domains xj, j = 1, . . . , n and si, i = 1, . . . , m, if for all j ∈ J = {j = 1, . . . , n} there is a v ∈ V such that xj ≤ λ(v) and for all i ∈ I = {1, . . . , m} there is a v such that si ≤ λv. Note that there exists always a trivial covering Markov tree, namels the tree consisting of a single node v with v equal the join of all xj and all si. There exists now an assignment mapping

a : J ∪ I → V 158 CHAPTER 11. LOCAL COMPUTATION ON MARKOV TREES such that xj ≤ λ(a(j)) for all j ∈ J and xi ≤ λ(a(i)) for all i ∈ I. Now we define for all v ∈ V

n _ si ≤ z = xj, for i = 1, . . . , m. j=1

Note then, that

d(ψ(v)) = λ(v), ∀v ∈ V and furthermore

→λ(V ) O φ = ψv. v∈V

Thus, we have φ ∈ ΦM . This means that we can apply local computation techniques with re- spect to the covering Markov tree TM . The projection of φ to a domain si →λ(a(i)) is then simply given by the projection of φ to si. This is again a local operation. However, with the trivial covering Markov tree, nothing is gained. Everything depends on finding a good covering Markov tree, with node domains at least all in Df and, more generally, being as small as pos- sible. Techniques for finding such covering Markov trees will be examined in Chapter 12 for the particular case of multivariate systems.

11.3 Local Computation

We have seen in the previous Section, that the Markov Tree Factorization Theorem 11.3 provides a recursive procedure for computing φ→λ(v) for some node v od a Markov tree TM , if φ ∈ ΦM . In this scheme every combination or projection operation takes place on a domain λ(w) of the Markov tree. In the section we are going to work out this procedure in detail. In a tree T = (V,E) with m = |V | nodes it is always possible to number the nodes from i = 1 to m, such that a selected node v obtains number m and from any node u the numbers increase on the path to v. We identify from now on the nodes by their number i and write accordingly λ(i) for the domain associated with it. The set of neighbor nodes of node i with number j < i is denoted by pa(i)(pa for parents). If i is a leaf, then pa(i) = ∅. If i < m, then node i has a unique neighbor with number j > i. It is denoted by ch(i)(ch for child). The root node m has no child. We denote further as before by Ti = (Vi,Ei) the subtrees of T , rooted in nodes i. Define O φi = ψk

k∈Vi 11.3. LOCAL COMPUTATION 159

Note that the tree Ti is a Markov tree. Hence, the Markov Tree Facorization Theorem 11.3 tells us that for φ ∈ ΦM   ↓λ(i) O ↓λ(j) →λ(i) φi = ψi ⊗  (φj )  . (11.7) j∈pa(i)

We may now describe the computation represented by this result in the following way: We associate with any node j a memory and consider a stepwise algorithm for i = 1, . . . , m − 1. The memory content at node j (i) 1 before step i is denoted by ψj . At the beginning, for i = 1, we set ψj = ψj. At step i, node i sends a message

(i)↓λ(i)∧λ(ch(i) µi→ch(i) = ψi to its child ch(i). The receiving node ch(i) updates its memory content to

(i+1) (i) ψch(i) = ψchi ⊗ µi→ch(i).

For all other nodes j 6= ch(i) we put

(i+1) (i) ψj = ψj .

We claim that at the end, after step m − 1, node m contains φ→λ(m).

(i) Theorem 11.4 Let φ ∈ ΦM and define ψj as above. Then

(m) →λ(m) ψm = φ . (11.8)

(i) ↓λ(i) Proof. We remark that ψi = φch(i),i. Indeed, this holds clearly for i = 1. Assume it holds for j < j. Then by (11.7) and the stepwise procedure above it holds for i too and the proposition is proved by induction. Then (11.8) follows for i = m, since φm = φ. ut This gives us then a stepwise algorithm to compute φ→λ(m). At all steps in this procedure we combine or project elements on node domains λ(i). At no step do we need to go beyond these domains. Therefore, this procedure is called local computation. If, for all nodes i of the Markov tree we have λ(i) ∈ Di, local computation provides for a feasible procedure. In particular, we call this local algorithm to compute φ→λ(m) the collect algorithm, since it collects all information in order to compute its projection to λ(m). We may extend this algorithm to compute φ→λ(i) for all nodes i of the Markov tree. In fact, if we want to compute φ→λ(i) for some node i 6= m, we may redirect the the edges of the Markov tree to node i and repeat the collect algorithm with these new orientations of the edges. Clearly, only the messages on the edges from node m to node i have to be newly computed. 160 CHAPTER 11. LOCAL COMPUTATION ON MARKOV TREES

Only thes edges have changes direction. So, the messages along all other edges do not change. We may therefore assign the two messages µi→j and µj→i to an edge {i, j} of the markov tree. The messages themselves are according to the collect algorithm defined as  λ(i)∧λ(j) O µi→j = ψi ⊗ µk→i . k∈ne(i),k6=j As a corollary of the correctness of the collect algorithm, Theorem 11.4, it follows that

λ(i) O φ = ψi ⊗ µj→i. j∈ne(i) We claim now that

λ(i) µi→j ≤ φ . (11.9)

(i)λ(i)∧λ(j) This holds since from the collect algorithm we have µi→j = ψi ≤ φλ(i)∧λ(j) ≤ φλ(i). But then we obtain (using the combination axiom) that

 λ(i)∧λ(j) ↓λ(i)∧λ(j) O φ = ψi ⊗ µk→i k∈ne(i)  λ(i)∧λ(j) O = ψi ⊗ µj→i ⊗ µj→i k∈ne(i),k6=j = µi→j ⊗ µj→i.

Assume then, that in node i we have stored φ↓λ(i). If we send the message φ↓λ(i)∧λ(j) to node j and combine there this message with the information stored from the collect algorithm, then we obtain

(j) ↓λ(i)∧λ(j) ψj ⊗ φ (j) = ψj ⊗ µi→j ⊗ µj→i   O = ψi ⊗ µk→i ⊗ µj→i k∈ne(i) ↓λ(j) = φ ⊗ µj→i = φ↓λ(j) according to (11.9). This result shows that, once the collect algorithm to some root node m terminated, we may just send messages φ↓λ(i)∧λ(j) from node i to all its 11.3. LOCAL COMPUTATION 161

(j) parents j ∈ pa(i) and combine it with the stored information ψj in these nodes. This will give us all projections φ↓λ(j) in all nodes of the Markov tree. This second local algorithm is called distribute algorithm. Hence we can compute all these projections to all nodes of the Markov tree by just passing through each node (except the root node m) twice and sending messages to all it neighbors in due order. The following lemma shows now that at each time during collect, as well as during the distribute algorithm the information φ factors into the storages at the nodes of the Markov tree.

Lemma 11.1 Let M be a Markov tree and φ ∈ ΦM . Then, ∀i = 1, . . . m

m O (i) φ = ψj (11.10) j=1 i m O (j) O = ψj ⊗ φ↓λ(j). (11.11) j=1 j=i+1

Proof. We prove the first part by induction over the steps of the collect (1) algorithm. At the beginning of the collect phase we have ψj = ψj and (11.10) holds since φ ∈ ΦM . Assume then that it holds for some i < m,

m m O (i) O (i+1) (i) φ = ψj = ψj ⊗ ψch(i) j=1 j=1,j6=ch(i) m O (i+1) (i) = ψj ⊗ ψch(i) ⊗ µi→ch(i) j=1,j6=ch(i)

(i)↓λ(i)∧λ(ch(i)) ↓λ(i) (i) (i) since µi→ch(i) = ψi ≤ φ ≤ φ. But then, as ψch(i) = ψch(i) ⊗ µi→ch(i), this shows that

m O (i+1) φ = ψj j=1

In particular, at the end of the collect algorithm, we have

m−1 O (j) ↓λ(m) φ = ψj ⊗ φ j=1

(i) (j) because φj = φj for i ≥ j, i.e. the storage of node j does not change after step j in the collect algorithm. Therefore, (11.11) holds at the beginning of the distribut algorithm, for i = m − 1. We prove (11.11) by induction over the steps of the distribute 162 CHAPTER 11. LOCAL COMPUTATION ON MARKOV TREES algorithm. Assume (11.11) holds for some i. Then

j m O (i) O φ = ψj ⊗ φ↓λ(j). j=1 j=i+1

Select k such that k = ch(i), hence i < k. Then the storage of k contains ↓λ(k) ↓λ(k)∧λ(i) (i) φ , node k sends message φ to node i, whose storage ψi changes to φ↓λ(i) according to the distribute algorithm. Furthermore we note that (i) ↓λ(i) ψi ≤ φ ≤ φ. Therefore,

i m O (j) (i) ↓λ(i) O φ = ψj ⊗ ψi ⊗ φ φ↓λ(j) j=1 j=i+1 i−1 m O (j) O = ψj ⊗ φ↓λ(j). j=1 j=i

This proves (11.11). ut As a corollary of (11.11) we note that (at the end of the distribute algorithm), the information φ ∈ ΦM factors into its proejctions to the nodes of the Markov tree.

Corollary 11.1 Let M be a Markov tree and φ ∈ ΦM . Then

m O φ = φ↓λ(j). (11.12) j=1

This is an important result. We remind in particular that in relational database theory normalization often requires that the base decomposes ac- cording to (11.12) (see Part I, Chapter 5). Clearly, applying the collect or distribute algorithm to a factorization (11.12) does not change the storage contents of the Markov tree, i.e. has no effect. As we shall see in the next Section, this factorization allows for simple updating algorithms. Query processing in relational databases can be reduced to updating. This is one of the facets of this theorem.

11.4 Updating and Retracting

Once the projections of an information φ ∈ ΦM with respect to a Markov tree have been computed, a new piece of information may be added, and the projections again be computed. In this section we show that, if the new information is covered by a node of the Markov tree, then we do not need to recompute the projections from scratch. In fact, the collect phase needs not to be redone. Only a new distribute computation has to be added. 11.4. UPDATING AND RETRACTING 163

According to Corollary 11.1 at the end of a collect/distribute computa- tion for a φ ∈ ΦM , the informationφ factors into its projections to the nodes of the Markov tree M. Assume that an additional piece of information ψ must be added, where the domain of ψ is covered by a node of the markov tree M. We take the corresponding covering node as the root of a directed tree. Without loss of generality we may assume the node m covers ψ, i.e. d(ψ) ≤ λ(m). Then we have the new information

0 φ = φ ⊗ ψ = φ ⊗ (ψ ⊗ eλ(m)).

Since d(ψ ⊗ eλ(m)) = λ(m) we obtain

0↓λ(m) ↓λ(m) ↓λ(m) ↓λ(m) φ = (φ ⊗ (ψ ⊗ eλ(m))) = φ ⊗ (ψ ⊗ eλ(m)) = φ ⊗ ψ.

So, the updating of the projection in the root node m is done simply by combing the new information with the old projection to the node domain λ(m). We note that the messages µi→ch(i) in the direction to the root node do not change with this updating operation on the root node. This shows already that the collect phase does not have to be recomputed. We show no, that we only need to recompute the distribute phase, starting with φ0↓λ(m) in the root node. Indeed, we have by Corollary 11.1 that

m−1 O 0 φ0 = φ ⊗ ψ = φ↓λ(i) ⊗ φ ↓λ(m). i=1

0 Clearly φ ∈ ΦM . So we may apply the collect/distribute algorithm in order to compute the projections of φ0 with respect to the nodes of the Markov tree M. However, as noted above, in the collect phase nothing changes. So we may directly start with the distribute algorithm with the new content of the root node m. 164 CHAPTER 11. LOCAL COMPUTATION ON MARKOV TREES Chapter 12

Multivariate Systems

12.1 Local Computation on Covering Join Trees

12.2 Fusion Algorithm

12.3 Updating

165 166 CHAPTER 12. MULTIVARIATE SYSTEMS Chapter 13

Boolean Information Algebra

167 168 CHAPTER 13. BOOLEAN INFORMATION ALGEBRA Chapter 14

Finite Information and Approximation

169 170 CHAPTER 14. FINITE INFORMATION AND APPROXIMATION Chapter 15

Effective Algebras

171 172 CHAPTER 15. EFFECTIVE ALGEBRAS Chapter 16

173 174 CHAPTER 16. Chapter 17

175 176 CHAPTER 17. Part III

Information and Language

177

Chapter 18

Introduction

179 180 CHAPTER 18. INTRODUCTION Chapter 19

Contexts

19.1 Sentences and Models

19.2 Lattice of Contexts

19.3 Information Algebra of Contexts

181 182 CHAPTER 19. CONTEXTS Chapter 20

Propositional Information

183 184 CHAPTER 20. PROPOSITIONAL INFORMATION Chapter 21

Expressing Information in Predicate Logic

185 186CHAPTER 21. EXPRESSING INFORMATION IN PREDICATE LOGIC Chapter 22

Linear Systems

187 188 CHAPTER 22. LINEAR SYSTEMS Chapter 23

Generating Atoms

189 190 CHAPTER 23. GENERATING ATOMS Chapter 24

191 192 CHAPTER 24. Part IV

Uncertain Information

193

Chapter 25

Functional Models and Hints

25.1 Inference with Functional Models

In this section we examine, in guise of an introductory example, a partic- ular way how uncertain information could arise. This should serve as a motivation and illustration for the general, abstract model presented in the subsequent sections. The example is drawn from statistics. Based on a func- tional model, which describes how data is generated in some experiment or process, an assumption-based inference approach is presented. This repre- sents the logical part of the inference. On top of this inference a probability analysis allows to measure the likelihood of the possible deductions. The inference can be subsumed in an object called a hint, which represents the information drawn from the experiment. Functional models describe the process by which a data x is generated from a parameter θ and some random element ω. The set of possible values, i.e. the domain, of the data x is denoted by X, whereas the domain of the parameter θ is denoted by Θ and the domain of the random element ω is denoted by Ω. Unless otherwise stated, we assume that the sets X, Θ and Ω are finite. This is in order to simplify the mathematics, but is not a necessary assumption. The data generation process is specified by a function f :Θ × Ω → X (25.1) that relates the data with the perturbation and the parameter. In other words, if θ ∈ Θ is the correct value of the parameter and the random element ω ∈ Ω occurs, then the data x is uniquely determined by the function f according to the equation x = f(θ, ω). (25.2) We assume that the function f is known and we also assume that the prob- ability distribution of the random element ω is known. This probability distribution is denoted by p(ω) and the corresponding probability measure on Ω is denoted by P . Note that the probability measure P does not depend

195 196 CHAPTER 25. FUNCTIONAL MODELS AND HINTS on θ. The function f together with the probability measure P constitute a functional model for a statistical experiment E. In a functional model, if we assume a parameter θ, then, from the prob- abilities p(ω), we can compute the probabilities for the data x, namely X pθ(x) = p(ω). ω:x=f(θ,ω)

This shows that a functional model induces a parametric family of probabil- ity distributions on the sample space X, an object which is usually assumed a priori in modelling statistical experiments. These probability measures are the statistical specifications associated with functional model. We emphasize that different functional models may induce the same statistical specifica- tions pθ(x). This means that functional models contain more information than the family of distributions pθ(x). Let’s illustrate the idea of functional models by two examples.

Example 25.1 Sensor. Consider a sensor which supervises a dangerous event e. If this event occurs, it sends an alarm signal a. However, there is the possibility that the sensor fails, a possibility denote by ω. To complete the notation let’s denote the non-occurence of event e by ¬e, the non-occurence of the alarm signal by ¬a and finally the case of an intact sensor by ¬ω. Now, we may assume that if event e occurs and the sensor is intact, then the alarm sounds. On the other hand, if event e did not occur and the sensor is intact, then no alarm occurs. Finally, if the sensor fails, no alarm is possible. If we consider the set X = {a, ¬a} the observation space, the set Θ = {e, ¬e} the parameter space and Ω = {ω, ¬ω} the disturbance space, then these assumptions can be captured in the following functional model:

f(e, ¬ω) = a, f(¬e, ¬ω) = ¬a, f(e, ω) = f(¬e, ω) = ¬a.

If we assume a probability p for the failure of the sensor, p(ω) = p, then the functional model is complete.

Example 25.2 Poisson Process. Suppose that traffic is observed at a given point over time. An average number of λ · ∆t cars are passing during a time interval ∆t, where λ is unknown. This corresponds to a Poisson process where the time interval between two subsequent cars is exponentially dis- tributed with parameter λ. In order to get information about the unknown parameter λ, we may fix a time interval of some given length, say ∆t, and then observe the number of cars passing in this time interval. Here we have already two ingredients of a functional model describing this process, namely the parameter and the observation. The random elements ω are responsible for the random time intervals between passing cars. In fact, if ω is a ran- dom variable distributed according to the exponential distribution with unit 25.2. ASSUMPTION-BASED INFERENCE 197 parameter, given by the density function

e−t, (25.3) then µ = ω/λ is a random variable with exponential distribution with pa- rameter λ, which means a mean time of 1/λ between two subsequent cars. So, the count of cars passing during the time interval ∆t is given by the largest value i such that i X µi ≤ ∆t. (25.4) k=1

Here the µi are stochastically independent random variables, each with ex- ponential distribution with parameter λ. Thus, we may define the following functional model for the counting x of passing cars in a Poisson process:

i i+1 1 X 1 X x = i, if ω ≤ ∆t < ω , for i = 0, 1,.... (25.5) λ k λ k k=1 k=1

Here the ωk are assumed to be stochastically independent and exponentially distributed with parameter 1. This model explains how the counts in a time interval of length ∆t are generated. In fact, in this model, the random element ω is given by an infinite sequence of independent random variables ω1, ω2,... with all the same exponential distribution.

25.2 Assumption-Based Inference

Consider an experiment E described by a functional model x = f(θ, ω) with given probabilities p(ω) for the random elements. Suppose that the outcome of the experiment is observed to be x. Given this data x and the experiment E, what can be inferred about the value of the unknown parameter θ ? To answer this question, we use assumption-based reasoning. The basic idea of assumption-based reasoning is to assume that a random element ω generated the data and then determine the consequences on the unknown parameter which can be drawn from this assumption. The consequences about θ are then evaluated according to the probabilities p(ω) of the assumptions ω in Ω. Some random elements ω in Ω may become impossible after x has been observed. In fact, if, for an ω ∈ Ω, there is no θ ∈ Θ such that x = f(θ, ω), then this ω is clearly impossible: it cannot have generated the actual observation x. Therefore, the observation x defines an event in Ω, namely the event

vx = {ω ∈ Ω : there is a θ ∈ Θ such that x = f(θ, ω)}. (25.6) 198 CHAPTER 25. FUNCTIONAL MODELS AND HINTS

Since it is known that vx has occurred, we condition the probability measure P on the event vx, which leads to the revised probabilities p(ω) p0(ω) = P (vx)

0 for all ω ∈ vx. These probabilities define a probability measure P on vx given by the equation X P 0(A) = p0(ω) ω∈A for all subsets A ⊆ vx. It is still unknown which random element ω in vx generated the observation x, but p0(ω) is the probability that this element is ω. Nevertheless, let us assume for the time being that ω ∈ vx caused the observation x. Then, according to the function f relating the parameter and the random element with the data, the possible values for the parameter θ can be logically restricted to the set

Tx(ω) = {θ ∈ Θ: x = f(θ, ω}. (25.7)

Note that in general Tx(ω) may contain several elements, but it could also be a one-element subset of Θ. It is also possible that Tx(ω) = Θ, in which case the observation x does not carry any information about θ, assuming that ω caused the observations x. Therefore, in general, even if the chance element generating the observation x were known, it would still not be possible to identify the value of the parameter unambiguously. This analysis shows that an observation x in a functional model generates the structure

0 Hx = (vx,P ,Tx, Θ), (25.8) which we call a hint. This approach to inference can be illustrated by the two examples intro- duced at the end of Section 25.1.

Example 25.3 Sensor. Assume that in the sensor described in Example 25.1, an alarm a sounds. Then we may use the functional model specified there to judge the likelihood that event e occured. First it is clear, that the sensor must be intact, since otherwise there would be no alarm. Hence we have va = {¬ω}. But then, Ta(¬ω) = {e}, hence event e has occured for sure. So this is a very reliable inference. If,on the other hand no alarm sounds, how sure can we be that there is no event e present. We remark the the observation ¬a is compatible both with ω (sensor failed) as well as with ¬ω (sensor intact). Hence we have v¬a = Ω = {ω, ¬ω}. If we assume that the system is intact, then we conclude the there is no event e, i.e. T¬a(¬ω) = {¬e}. If we assume on the other hand, that the sensor failed, then both possibilities of e and ¬e 25.3. HINTS 199 remain possible, T¬a(ω = {e, ¬e}. So in this case we arrive at no unique and clear conclusion. We remark however, that it may be safe to assume that the sensor is usually functioning correctly,such that we have some guarantee that event e did not occur. This informal consideration will be made more precise below.

Example 25.4 Inference for a Poisson Process. We refer to the function model 25.5 defined in example 25.2 for the counting of passing cars. Al- though in this case neither the parameter space nor the the space of distur- bances ω is finite, the inference can nevertheless be carried out in the way described in this section. If we observe a given value x, then, according to 25.5, we must have

x x+1 X X ωk ≤ ∆t · λ < ωk. k=1 k=1

First, we note that observation x excludes no sequence ω1, ω2,..., hence no conditioning of the underlying probabilities is needed. If we assume a sequence ω = ω1, ω2,... we conclude thus that the unknown parameter must be in the interval x x+1 1 X 1 X T (ω) = [ ω , ω ). x ∆t k ∆t k k=1 k=1

Thus the elements of the hint Hx associated with the count of x cars passing during the interval ∆t are fixed: vx is still the whole set of all sequences ω = ω1, ω2,..., no one i excluded by the observation x. Thus the original probability distribution does not change. The multivalued mapping Tx is defined by the intervals obtained above.

25.3 Hints

The hint defined in (25.8) an instance of a more general notion of a hint. In general, if Θ denotes the set of possible answers to a question of interest, then a hint on Θ is a quadruple of the form

H = (Ω,P, Γ, Θ), where Ω is a set of assumptions, P is a probability measure on Ω reflecting the probability of the different assumptions and Γ is a mapping between the assumptions and the power set of Θ,

Γ:Ω −→ 2Θ.

For a fixed assumption ω ∈ Ω, the subset Γ(ω) is the smallest subset of Θ that is known for sure to contain the correct answer to the question of 200 CHAPTER 25. FUNCTIONAL MODELS AND HINTS interest. In other words, if the assumption ω is correct, then the answer in certainly within Γ(ω). The theory of hints is presented in detail in the reference (Kohlas & Monney, 1995). Intuitively, a hint represents a piece of information regarding the unknown value of the parameter in Θ. This information can then be used to evaluate the validity of certain hypotheses regarding Θ. A hypothesis is a subset H of Θ and it is correct if, and only if, it contains the answer to the question of interest. The most important concept for the evaluation of the hypothesis H is the degree of support of H induced by the hint H, which is defined by

sp(H) = P ({ω ∈ Ω : Γ(ω) ⊆ H}). (25.9)

If the assumption ω is the true one and Γ(ω) ⊆ H, then H is necessarily true because the correct answer is in Γ(ω) by definition of the mapping Γ. In other words, under the assumption ω the hypothesis H can be derived or proved. The degree of support of H is the probability of the assumptions that are capable of proving H. Such assumptions are called arguments for the validity of H and sp(H) measures the strength of these arguments. The arguments represent the logical evaluation of H, whereas sp(H) represents the quantitative evaluation of H. Similarly, the degree of plausibility of H, which is defined as

pl(H) = P ({ω ∈ Ω : Γ(ω) ∩ H 6= ∅}), (25.10) measures the level of compatibility between the hypothesis H and the as- sumptions. Obviously, degrees of support and plausibility for hypotheses define functions

sp : 2Θ −→ [0, 1], pl : 2Θ −→ [0, 1] which are called the support and plausibility functions associated with the hint. It can easily be shown that

pl(H) = 1 − sp(Hc) and sp is a belief function as defined by Shafer (Shafer, 1976). Let’s illustrate these concepts by our examples.

Example 25.5 Sensor. We continue the analysis of the sensor example started in Example 25.3. We noted that if an alarm sounds, then e event occurred for sure. This translates into sp({e}) = 1, since Ta(¬ω) = {e} and p(¬ω) = 1. More interesting is the case of no alarm. In this case sp(¬e) = 1−p, since only the assumption that the sensor is intact supports the hypothesis that event e did not occur. There is no assumption supporting the hypothesis that event e occured. However this hypothesis is not entirely excluded, in 25.3. HINTS 201 fact it has some plausibility, namely pl({e}) = 1 − sp({¬e}) = p. In practice the reliabilty of a sensor will be high, which means that the probability of failure p is small. So there is much support for the hypothesis that the event e did not occur, and there is only a small degree of plausibility that it did occur.

Example 25.6 Inference for a Poisson Process. Here we go on with an analysis of the hint developed in Example 25.4 for the Poisson process. The consideration there allow to determine the degree of support for intervals as

x x+1 ! X X spx(u ≤ λ ≤ v) = P u ≤ ωk < ωk ≤ v . k=1 k=1

It is convenient to introduce the distribution function

x x+1 ! X X Q(u, v) = P ωk ≤ u ≤ v ≤ ωk k=1 k=1 1 = uxe−v for 0 ≤ u ≤ v. x!

This distribution function has a density function

∂2 q (u, v) = Q (u, v) = Q (u, v, ) for x ≥ 1. x ∂u∂v x x−1

−v −v For x = 0 we have Q0(u, v) = e with a density q0(u, v) = e . Further we define

x ! X Fx(u) = P ωk ≤ u , k=1 x ! X Gx(v) = P v ≤ ωk . k=1

Both these function have densities too, namely

∂ f (u) = Q (u, v)| = q (u, u), x ∂u x u=v x ∂ g (v) = Q (u, v)| = q (u, u). x ∂v x u=v x+1

This permits to compute spx(u ≤ λ ≤ v), since

spx(u ≤ λ ≤ v) = 1 − (Fx(u) + Gx+1(v) − Qx(u, v)) . 202 CHAPTER 25. FUNCTIONAL MODELS AND HINTS

Similarly, we compute the degree of plausibility for the unknown parameter as

x ! x+1 ! X X plx(u ≤ λ ≤ v) = 1 − (P v < ωk + P ωk < u ) k=1 k=1 = 1 − ((1 − Fx(v)) + (1 − Gx+1(u)))

= Fx(v) + Gx+1(u) − 1.

In particular, we may determine the plausibility of singletons,

x x+1 ! X X 1 pl (u ≤ λ ≤ u) = P ω ≤ u ≤ ω = uxe−u x k k x! k=1 k=1

This could be used to determine the most plausible estimate of the unknown parameter λ, i.e. for maximum likelihood estimation.

25.4 Combination and Focussing of Hints

When there are several hints relative to the same domain Θ, the question is, how should we combine these hints in order to obtain a single hint re- flecting the aggregated information contained in both hints? For the sake of simplicity, we only consider the case where two hints are combined—the generalization to more than two hints is straightforward. So let

H1 = (Ω1,P1, Γ1, Θ) H2 = (Ω2,P2, Γ2, Θ) (25.11) be two hints referring to Θ. If the assumptions of the the two hints are independent, then the prior probability of a pair of assumptions (ω1, ω2) in Ω1 × Ω2 is P1(ω) · P2(ω)2. If the intersection Γ1(ω1) ∩ Γ2(ω2) is empty, then this pair is called contradictory because, by construction of the mappings Γ1 and Γ2, it is impossible that both ω1 and ω2 are simultaneously correct. Therefore, if

C = {(ω1, ω2) ∈ Ω1 × Ω2 :Γ1(ω1) ∩ Γ2(ω2) = ∅} denotes the set of contradictory pairs of assumptions, then the correct pair must be in the set

Ω = (Ω1 × Ω2) \ C. (25.12) Since the correct pair of assumptions is in Ω, we must condition the product probability measure P1P2 on the set Ω. Specifically, if we define X K = {P1(ω1)P2(ω2):(ω1, ω2) ∈ C}, 25.4. COMBINATION AND FOCUSSING OF HINTS 203 then the conditioning operation results in a new probability space (Ω,P ) where P (ω )P (ω ) P (ω , ω ) = 1 1 2 2 (25.13) 1 2 1 − K for all (ω1, ω2) ∈ Ω. Here we assume that K 6= 1. Furthermore, if (ω1, ω2) is the correct pair of assumption, then the correct value in Θ must necessarily be in the set Γ(ω1, ω2) = Γ1(ω1) ∩ Γ2(ω2). (25.14) This is the smallest subset of Θ that is know for sure to contain the correct answer if both ω1 and ω2 are assumed to be correct. Therefore, we define the combination of the two hints H1 and H2 as the hint

H1 ⊕ H2 = (Ω,P, Γ, Θ), where Ω is defined in (25.12), P is defined in (25.13) and Γ is defined in (25.14). Clearly, this method for combining hints corresponds to Demp- ster’s rule for combining belief functions (Shafer, 1976). If K = 1, then the combination H1 ⊕ H2 isnot defined. There is another notion regarding hints that we need to introduce, namely the operation of marginalization of a hint. Consider a hint on a domain that is the Cartesian product of two domains, namely a hint of the form

H = (Ω,P, Γ, Θ1 × Θ2).

In order to evaluate hypotheses regarding the domain Θ1, we define the projection of H to Θ1 as the hint

↓Θ1 H = (Ω,P, Γ1, Θ1) where

↓Θ1 Γ1(ω) = Γ (ω) = {θ1 ∈ Θ1 : ∃ θ2 ∈ Θ2 such that (θ1, θ2) ∈ Γ(ω)}.

The set Γ1(ω) is the projection of Γ(ω) to the domain Θ1. The definition of the mapping Γ1 is justified by the fact that the projection Γ1(ω) is the smallest subset of Θ1 that is known for sure to contain the correct value in Θ1 when we know for sure that the correct joint value is in Γ(ω). The marginalized hint H↓Θ1 can then be used as usual to evaluate hypotheses H that are subsets of Θ1. Next we present some notions related to partitions that are used to generalize the operations of combination and of proejection introduced above and that will be useful in the development of the algebraic theory of hints. Consider a general domain Θ representing the possible answers to a question of interest. A partition P of Θ is a family of non-empty subsets of Θ which are mutually disjoint and cover Θ, namely [ A = Θ,A ∩ A0 = ∅ for all A, A0 ∈ P satisfying A 6= A0. A∈P 204 CHAPTER 25. FUNCTIONAL MODELS AND HINTS

Now consider a hint H = (Ω,P, Γ, Θ) on Θ and a partition P of Θ. This partition represents a coarser granularity for the possible answers to the question of interest and we want to express the information contained in the hint with respect to this coarser granularity. By construction, under the assumption ω ∈ Ω, it is known for sure that the correct answer to the question is within the subset Γ(ω). Then every element A of the partition P that intersects with Γ(ω) might contain the answer. Therefore, the set

Γ→P (ω) = ∪ {A ∈ P : A ∩ Γ(ω) 6= ∅} is the smallest union of elements of P that necessarily contain the correct answer to the question. This shows that the hint H is represented by the hint H→P = (Ω,P, Γ→P , Θ) when the coarser domain P is used instead of the original domain Θ. This operation of coarsening the information contained in the hint H is called focussing. It generalizes the projection of hints. We say that a hint H is carried by a partition P if

H→P = H.

Since (H→P )→P = H→P , it follows that the coarsened hint H→P is carried by the partition P. It is possible to define a partial order in the set D consisting of all partitions of Θ. Indeed, if P and P0 are two partitions in D, then we say that P ≤ P0 if, for all A ∈ P, there is a A0 ∈ P0 such that A ⊆ A0. In this case we say that P is finer than P0, or, alternatively, that P0 is coarser than P. This terminology is justified by the fact that, for every A0 ∈ P0, we have

A0 = ∪ {A ∈ P : A ⊆ A0}.

The partition consisting of all one-element subsets of Θ is the bottom ele- ment of this partial order. We identify Θ with this bottom partition, which allows us to consider that Θ is in D. It can be shown that the pair (D, ≤) is actually a lattice with meet and join given by

P ∧ P0 = {A ∩ A0 6= ∅ : A ∈ P,A0 ∈ P0}

P ∨ P0 = ∧ {Q ∈ D : Q ≥ P, Q ≥ P0}.

Now consider three partitions P1, P2 and P in D. Then we say that P1 and P2 are conditionally independent given P, if, for all P1 ∈ P1, P2 ∈ P2 25.5. ALGEBRA OF HINTS 205 and P ∈ P, the condition Pi ∩ P 6= ∅ for i = 1, 2 implies P1 ∩ P2 ∩ P 6= ∅. In this case we write (P1, P2) a P. Let us now consider a hint H = (Ω,P, Γ, Θ) and define the partition W of Ω that is obtained by grouping the elements ω in Ω that map to the same set Γ(ω). If we define Ω0 = W and, for W ∈ W, if we define Γ0(W ) = Γ(ω) where ω is any element of W , and if we define X P 0(W ) = P (ω), ω∈W then the hint Hc = (Ω0,P 0, Γ0, Θ) is called the canonical version of H. Clearly, both H and its canonical version Hc have the same support and plausibility functions. A hint satisfying Γ(ω) 6= Γ(ω0) whenever ω 6= ω0 is its own canonical version and such a hint is called canonical. Let V denote the set of all canonical hints on Θ. In order to make V closed under the operations of focussing and combination, we slightly modify these operations by taking the canonical version of the resulting hint at the end. In other words, we define c H1 ⊕ H2 = (H1 ⊕ H2)

H→P = (H→P )c.

Furthermore, given two hints H1 = (Ω1,P1, Γ1, Θ) and H2 = (Ω2,P2, Γ2, Θ), it can happen that

{(ω1, ω2) ∈ Ω1 × Ω2 :Γ1(ω1) ∩ Γ2(ω2) 6= ∅} = ∅. In this case the two hints cannot be combined because they are completely contradictory. However, we nevertheless formally define the combined hint as the zero hint Z, which is not further specified. In addition, for every hint H and every partition P, we define H ⊕ Z = ZZ→P = Z and we consider that the set V also contains the zero hint Z.

25.5 Algebra of Hints

The set V endowed with the operation of combination has some interesting algebraic properties, some of which are related to the focussing operation. These properties are presented in the following theorem. Theorem 1. The set V with the operations of combination and focussing has the following properties 206 CHAPTER 25. FUNCTIONAL MODELS AND HINTS

1. Semigroup. The operation of combination is associative and commu- tative. There is a neutral hint E such that H⊕E = H for every H ∈ V. There is a null element Z such that H ⊕ Z = Z for every H ∈ V.

2. Transitivity. If H is carried by a partition P1 and (P1, P2) a P, then

H→P2 = (H→P )→P2 .

3. Combination. If H1 is carried by a partition P1 and H2 is carried by a partition P2 and (P1, P2) a P, then

→P →P →P (H1 ⊕ H2) = H1 ⊕ H2 .

4. Neutrality. For all P ∈ D we have

E→P = E.

5. Nullity. For all P ∈ D we have

H→P = Z ⇐⇒ H = Z.

6. Focussing. For all H ∈ V we have

H→Θ = H.

7. Idempotency. For all H ∈ V, if P1 ≤ P2, then

H→P1 ⊕ H→P2 = H→P1 .

Proof. Assertions (1) to (3) are proved in (Kohlas & Monney, 1995). Asser- tions (4) to (7) are immediate consequences of the definition of focussing. ut Let us now consider the particular case where Θ is a Cartesian product, namely Θ = Θ1 × ... × Θm (25.15) Then we define the set M = {1, . . . , m} and, for every subset J ⊆ N, we define Y ΘJ = Θj. j∈J

Note that we clearly have Θ = ΘN and each ΘJ corresponds to a partition of Θ, namely the partition

PJ = {{θJ } × ΘN−J : θJ ∈ ΘJ }. 25.5. ALGEBRA OF HINTS 207

It is shown in (Kohlas & Monney, 1995) that PJ ≤ PK if and only if K ⊆ J and (PI , PJ ) a PK ⇐⇒ I ∩ J ⊆ K.

If a partition PJ is simply denoted by J, then we obtain the following corollary of theorem 1. Corollary 1. The set V consisting of all canonical hints on the domain Θ given in (25.15) has the following properties

1. Transitivity. Let H be a hint in V that is carried by I. If I ∩ J ⊆ K, then H→J = (H→K )→J .

2. Combination. Let H1 be a hint in V that is carried by I and let H2 be a hint in V that is carried by J. If I ∩ J ⊆ K, then

→K →K →K (H1 ⊕ H2) = H1 ⊕ H2 .

Note that the operation of projection defined in Section 25.4 is a special case of the focussing operation. The following theorem presents algebraic properties of hints with respect to the partitions PJ , which are again simply denoted by J. Theorem 2. The set V consisting of all canonical hints on the domain Θ given in (25.15) has the following properties

1. Semigroup. The operation of combination is associative and commu- tative. There is a neutral hint E such that H⊕E = H for every H ∈ V. There is a null element Z such that H ⊕ Z = Z for every H ∈ V.

2. Transitivity. For all J, K ⊆ M and all H ∈ V, we have

(H→J )→K = H→J∩K .

3. Combination. For all K ⊆ M and all H1, H2 ∈ V, we have

→K →K →K →K (H1 ⊕ H2) = H1 ⊕ H2 .

4. Neutrality. For all J ⊆ M, we have

E→J = E.

5. Nullity. For all J ⊆ M, we have

H→J = Z ⇐⇒ H = Z 208 CHAPTER 25. FUNCTIONAL MODELS AND HINTS

6. Focussing. For all H ∈ V, we have

H→M = H.

7. Idempotency. For all H ∈ V, if J ⊆ K, then

H→J ⊕ H→K = H→K .

Proof. Assertion (1) is the same as in theorem 1 and assertions (4) to (6) are simple reformulations of assertions (4) to (7) in Theorem 1. In order to prove assertion (2), we first show that, that for J ⊆ K, we always have

H→J = (H→K )→J . (25.16)

Any hint H is carried by M according to property (6) of the present theorem and we have M ∩ J ⊆ K. Therefore, (25.16) follows from property (1) of corollary 1. Further, note that (H→J )→J = H→J by property (1) of corollary 1, which means that H→J is carried by J. Then, by assertion (1) of corollary 1, we have (H→J )→K = ((H→J )→J∩K )→K . Using (25.16) and assertion (1) of corollary 1, we finally obtain

(H→J )→K = (H→J∩K )→K = H→J∩K .

→K In order to prove assertion (3) we use again the fact that H1 is carried by K and J ∩K ⊆ K. Therefore, assuming that H2 is carried by J, assertion (2) of corollary 1 implies that

→K →K →K →K →K →K →K (H1 ⊕ H2) = (H1 ) ⊕ H2 = H1 ⊕ H2 .

ut This algebraic structure with two operations, namely combination and focussing, that satisfies the properties of theorem 2 is very similar to an information algebra (see Part I). We afre going to define and study such structures of uncertain information as exemplified by hints more generally in relation to information algebra in this part III. Chapter 26

Simple Random Variables

26.1 Domain-Free Variables

We consider a domain-free information algebra (Φ,D) (see Part I). Uncertain information can be represented by random variables taking values in such an information algebra. For example, probabilistic assumption-based reasoning as studied as discussed in the previous Chapter 25 lead to random sets in the information algebra of subsets of a set of parameters. Therefore, we develop in this section the elements of a theory of such random variables. Let (Ω, A,P ) be a probability space. It is well known that a mapping from such a space to any algebraic structure inherits this structure (Kappos, 1969). This will be exploited here. Let B = {B1,...,Bn} be any finite partition of Ω such that all Bi belong to A. A mapping ∆ : Ω → Φ such that

∆(ω) = φi ∀ω ∈ Bi, i = 1, . . . , n, is called a simple random variable in (Φ,D). Among simple random variables in (Φ,D), we can easily define the operation of combination and focussing:

1. Combination: Let ∆1 and ∆2 be simple random variables in (Φ,D). Then let ∆1 ⊗ ∆2 be defined by

(∆1 ⊗ ∆2)(ω) = ∆1(ω) ⊗ ∆2(ω).

2. Focussing: Let ∆ be a simple random variable in (Φ,D) and x ∈ D, then let ∆⇒x be defined by

(∆⇒x)(ω) = (∆(ω))⇒x.

⇒x Clearly, both ∆1 ⊗ ∆2 and ∆ are simple random variables in (Φ,D). Let Rs denote the set of simple random variables in (Φ,D). With these operations of combination and focussing, (Rs,D) is an information algebra.

209 210 CHAPTER 26. SIMPLE RANDOM VARIABLES

The axioms are simply inherited from (Φ,D) and easily verified. The neutral element of this algebra is the random variable E(ω) = e for all ω ∈ Ω, the vacuous random variable. Further, since for any φ ∈ Φ, the mapping Dφ(ω) = φ for all ω ∈ Ω is a simple random variable, the algebra (Φ,D) is embedded in the algebra (Rs,D). Since the simple random variables form themselves an information alge- bra, containing the original one, it follows that simple random variables are pieces of information in this sense. Note that there is partial order in the information algebra of simple random variables, as in any information alge- bra. As we shall argue later (Section ??) this order reflects the information content of simple random variables. In fact it holds that ∆1 ≤ ∆2 in Rs if, and only if, for all ω ∈ Ω it holds that ∆1(ω) ≤ ∆2(ω). There are two important special classes of simple random variables: If for a random variables ∆ defined relative to a partition B = {B1,...,Bn} it holds that φi 6= φj for i 6= j, the variable is called canonical. It is a simple matter to transform any random variable ∆ into an associated canonical one: Take the union of all elements Bi ∈ B with the same values φj which yields a new partition B0 of Ω. Define ∆0(ω) = ∆(ω). Then ∆0 is the canonical version of ∆ and we write ∆0 = ∆→. We may consider the set of canonical random variables, Rs,c, and define between elements of this set combination and focussing as follows:

→ ∆1 ⊗c ∆2 = (∆1 ⊗ ∆2) , ∆⇒cx = (∆⇒x)→.

Then (Rs,c,D) is still an information algebra under these modified opera- → → → → tions. We remark also that (∆1 ⊗ ∆2) = (∆1 ⊗ ∆2 ) holds. Secondly, if ∆(ω) 6= z for all ω ∈ Ω, then ∆ is called normalized. We can associate a normalized random variables ∆↓ with any simple random variable ∆ provided ∆(ω) 6= z with a positive probability. In fact, let Ω↓ = {ω ∈ Ω : ∆(ω) 6= z}. This is a measurable set with probability P (Ω↓) = 1 − P {ω ∈ Ω : ∆(ω) = z} > 0. We consider then the new probability space (Ω, A,P 0), P 0 is the probability measure on A defined by

P (A ∩ Ω↓) P 0(A) = , (26.1) P (Ω↓) if A ∩ Ω↓ 6= ∅ and P 0(A) = 0, otherwise. On this new probability space define ∆↓(ω) = ∆(ω). Clearly, it holds that (∆→)↓ = (∆↓)→. The idea behind normalization becomes clear, when we consider combi- nation of random variables: Each of two (normalized) random variables ∆1 and ∆2 represents some (uncertain) information with the following interpre- tation: One of the ω ∈ Ω must be the correct, but unknown assumption of the information. However, if ω happens to be the correct assumption, then 26.2. LABELED RANDOM VARIABLES 211 under the first random variable information ∆1(ω) can be asserted, and un- der the second variable information ∆2(ω). Thus, together, still under the assumption ω, information ∆1(ω) ⊗ ∆2(ω) can be asserted. However, it is possible that ∆1(ω) ⊗ ∆2(ω) = z, even if both ∆1 and ∆2 are normalized. But the null information z represents a contradiction, thus in view of the in- formation given by the variables ∆1 and ∆2 the assumption ω can not hold, it can (and must) be excluded. This amounts to normalize the random vari- able ∆1(ω) ⊗ ∆2, by excluding all ω ∈ Ω for which the combination results in a contradiction, and then to condition (i.e. normalize) the probability on non-contradictory assumptions. Two partitions B1 and B2 of Ω are called independent, if B1,i ∩ B2,j 6= ∅ for all blocks B1,i ∩B2,j ∈ B1 and B2,j ∈ B2. If furthermore P (B1,i ∩B2,j) = P (B1,i) · P (B2,j) for all those pairs of blocks, then the two partitions B1 and B2 are called stochastically independent. In addition, if ∆1 and ∆2 are two simple random variables defined on these two partitions respectively, then these random variables are called stochastically independent too. Note that if ∆1 and ∆2 are stochastically independent, then their canoncical versions → → ∆1 and ∆2 are stochastically independent too.

26.2 Labeled Random Variables

To the domain-free information algebra (Rs,D) we can, in the usual way, associate an equivalent labeled version (see Part I). Note that for a simple random variable ∆ any possible value φi has some support xi. Then x = ∨ixi is a support for all possible values of ∆. Hence we see that ∆⇒x = ∆. Thus any simple random variable has a support. ∗ We define Rs to be the set of all pairs (∆, x), where ∆ ∈ Rs is a simple ∗ random variable and x a support of ∆. Then (Rs,D) is a labeled information algebra where x is the label d(∆, x) of (∆, x), combination is defined by ↓y (∆1, x) ⊗ (∆2, y) = (∆1 ⊗ ∆2, x ∨ y) and projection for y ≤ x by (∆, x) = (∆⇒y, y). If (Φ,D) is a domainfree information algebra, and (Φ∗,D) the associated labeled algebra, i.e. the algebra of pairs (φ, x) with x a support of φ, then the labeled simple random variables (∆, x) can also be considered as mappings ∗ ∗ ∗ ∗ ∆ :Ω → Φx, where Φx is the set of all (φ, x) ∈ Φ . The mapping is defined by ∆∗(ω) = (∆(ω), x). This is well defined, since x being a support of ∆ means that x is a support of ∆(ω) for all ω ∈ Ω. This observation indicates that we may define labeled random variables also directly: Let (Φ,D) now be a labeled information algebra and Φx the members of Φ with label d(φ) = x. A simple random variable with label x is then a mapping ∆ : Ω → Φx, such that there is a finite decomposition of B = {B1,...,Bn} such that all Bi belong to A and

∆(ω) = φi ∈ Φx ∀ω ∈ Bi, i = 1, . . . , n. 212 CHAPTER 26. SIMPLE RANDOM VARIABLES

The label of a such a random variable is d(∆) = x. As before, it is eas- ily verified that such —labeled random variables form themselves a labeled information algebra. In fact define the operations of combination and pro- jection in the natural way as follows:

1. Combination: Let ∆1 and ∆2 be simple random variables in (Φ,D). Then let ∆1 ⊗ ∆2 be defined by

(∆1 ⊗ ∆2)(ω) = ∆1(ω) ⊗ ∆2(ω).

2. Projection: Let ∆ be a simple random variable in (Φ,D) and x ∈ D,such that x ≤ d(∆), then let ∆↓x be defined by

(∆↓x)(ω) = (∆(ω))↓x.

In the same way as the domainfree version of the labeled algebra (Φ,D) is obtained, the domain free version of the algebra of simple random variables can be obtained. We leave the details to the reader, see also Section 27.1. Important is only that we may concerning random variables switch as con- veniently between the two equivalent versions of the domainfree or labeled version as with simple information algebras. Some issues are more conve- niently discussed in the domainfree version, for others the labeled version is better adapted.

26.3 Degrees of Support and Plausibility

This section is devoted to the study of the probability distribution of simple random variables. The starting point is the following question: Given a random variable ∆ on a information algebra and an element φ ∈ Φ, under what assumptions can the information φ be asserted? And how likely is it, that these assumptions hold? If ω ∈ Ω is an assumption such that ∆(ω) ≥ φ, then ∆(ω) implies φ. In this case we may say that ω is an assumption supporting φ, given ∆. Therefore we define for every φ ∈ Φ the set

qs∆(φ) = {ω ∈ Ω: φ ≤ ∆(ω)} of assumptions supporting φ. However, if ∆(ω) = z, then ω is supporting every φ ∈ Φ, since φ ≤ z. The null element z represents the contradiction, which implies everything. In a consistent theory, contradictions must be excluded. Thus, assuming that our information is consistent, we conclude that assumptions such that ∆(ω) = z are not really possible assumptions and must be excluded. We refer once more to Chapter 25 for an illustration and justification of this consideration. Let

qs∆(z) = {ω ∈ Ω : ∆(ω) = z}. 26.3. DEGREES OF SUPPORT AND PLAUSIBILITY 213

We assume that qs∆(z) is not equal to Ω; otherwise ∆ is called the null variable representing fully contradictory “information”. In other words, we assume that proper information is never fully contradictory. If we eliminate the contradictory assumptions from qs(φ), we obtain the support set

s∆(φ) = {ω ∈ Ω: φ ≤ ∆(ω) 6= z} = qs∆(φ) − qs∆(z). of φ, which is the set of assumptions properly supporting φ and the mapping Ω s∆ :Φ → 2 is called the allocation of support induced by ∆. The set qs(φ) is called quasi-support set to underline that it contains contradictory assumptions. This set has little interest from a semantic point of view, but it is important for technical and especially for computational reasons. These concepts capture the essence of probabilistic assumption-based reasoning as exemplified in the context of functional models and hints in the previous Chapter 25. Here are the basic properties of allocations of support:

Theorem 26.1 If ∆ is a simple random variable on an information algebra (Φ,D9, then the following holds for the associated allocation of support s∆:

1. qs∆(e) = Ω, s(z) = ∅.

2. If ∆ is normalized, then qs∆(z) = ∅. 3. For any pair φ, ψ ∈ Φ,

qs∆(φ ⊗ ψ) = qs∆(φ) ∩ qs∆(ψ),

s∆(φ ⊗ ψ) = s∆(φ) ∩ s∆(ψ).

Proof. (1) and (2) follow immediately from the definition of the allocation of support. (3) follows since φ ⊗ ψ ≤ ∆(ω) if, and only if φ ≤ ∆(ω) and ψ ≤ ∆(ω). ut Knowing assumptions supporting a hypothesis φ is already interesting and important. It is the part logic can provide. On top of this, it is important to know how likely it is that a supporting assumption hold. This is the part probability adds. If we know or may assume that the information is consistent, then we should condition the original probability measure P in c c Ω on the event qs∆(z). This leads then to the probability space (qs∆(z), A∩ c 0 0 c qs∆(z),P ), where P (A) = P (A)/P (qs∆(z)). The likelihood of supporting assumptions for φ ∈ Φ can then be measured by

0 sp∆(φ) = P (s∆(φ)).

The value sp∆(φ) is called degree of support of φ associated with the random variable ∆. The function sp :Φ → [0, 1] is called the support function of ∆. It corresponds to the distribution function of ordinary random variables. 214 CHAPTER 26. SIMPLE RANDOM VARIABLES

It is for technical reasons convenient to define also the degree of quasi- support

qsp∆(φ) = P (s∆(φ)).

Then, the degree of support can also be expressed in terms of degrees of quasi-support

qsp∆(φ) − qsp(z) sp∆(φ) = . 1 − qsp∆(z)

This is the form which is usually used in applications (Kohlas et al., 1999). In another consideration, we can also ask for assumptions ω ∈ Ω, under which ∆ shows φ to be possible, i.e. not excluded, although not necessarily supported. If ∆(ω) is such that combined with φ it leads to a contradiction, i.e. if ∆(φ) ⊗ φ = z, then under ω the information φ is excluded by a consistency consideration as above. So we define the set

p∆(φ) = {ω ∈ Ω : ∆(φ) ⊗ φ 6= z}.

This is the set of assumptions under which φ is not excluded. Therefore we call it the possibility set of φ. We can define the degree of possibility, also sometimes called degree of plausibility (e.g. in (Shafer, 1976)).

0 pl∆(φ) = P (p∆(φ)).

c If ω ∈ p∆(φ), then, under this assumption, φ is impossible, is excluded. So c the set p∆(φ) contains arguments against φ and

0 c do∆(φ) = P (p∆(φ)) = 1 − pl∆(φ). can be called the degree of doubt into φ. Note that s∆(φ) ⊆ p∆(φ) since φ ≤ ∆(ω) 6= z implies φ ⊗ ∆(ω) = ∆(ω) 6= z. Hence, we see that for all φ ∈ Φ we have that sp∆(φ) ≤ pl∆(φ).

26.4 Basic Probability Assignments

Consider the canonical version ∆→ of a random variable ∆ witha corre- sponding partition B = {B1,...,Bn} of Ω and ∆(ω) = φi for ω ∈ Bi. Then we define the probabilities

m∆(φi) = P (Bi), for i = 1, . . . , n.

The m∆(φi) are called basic probability assignments (bpa). This term comes from Dempster-Shafer theory of evidence (DS theory) (Shafer, 1976), which corresponds to our theory of random random variables, when the information 26.4. BASIC PROBABILITY ASSIGNMENTS 215 algebra of subsets of a finite set is considered. It follows then immediately that X qsp∆(φ) = m∆(φi).

i:φ≤φi So, it is sufficient to know the bpa of a random variable to determine its support function. In many applications only with the bpa is used and it is not cared about the underlying random variables. However, implicitly it then generally is assumed that the underlying variables are stochastically independent, although this is often not rigorously and explicitly stated. We may also consider normalized bpas, corresponding to the normalized variable ∆↓,

m∆(φi) m∆↓ (φi) = 1 − m∆(z) for φi 6= z. In terms of this normalized bpa, the degrees of support become X sp∆(φ) = m∆↓ (φi). i:φ≤φi Similarly, the degree of plausibility may also be obtained from the normal- ized bpa, X pl∆(φ) = m∆↓ (ψ). φ⊗ψ6=z

The bpa, as the degrees of quasi-support, are not semantically important, but convenient concepts especially for computational purposes. Assume then that the two random variables ∆1 and ∆2 are stochastically → → independent. Then ∆1 and ∆2 are also stochastically independent. The following theorem shows how the bpa of the combined random variable ∆1 ⊗ ∆2 can be computed form the bpas of the individual variables.

Theorem 26.2 Let m∆1 (φ1,i) for i = 1, . . . , m and m∆2 (φ2,j) for j = 1, . . . , n be the bpas of the two stochastically independent random variables ∆1 and ∆2. Then the bpa of ∆1 ⊗ ∆2 is given by X m∆1⊗∆2 (φ) = m∆1 (φ1,i) · m∆2 (φ2,j). (26.2) i,j:φ1,i⊗φ2,j =φ

Proof. The canonical version of the combined random variables (∆1 ⊗ → → → → ∆2) = (∆1 ⊗ ∆2 ) has as possible values the combination φ1,i ⊗ φ2,j.

Each such combination has assigned the probability m∆1 (φ1,i)·m∆2 (φ2,j) of the underlying intersection B1,i ∩ B2,j of the blocks of the two orthogonal → → partitions associated with ∆1 and ∆1 . So, when the canonical version of 216 CHAPTER 26. SIMPLE RANDOM VARIABLES

→ → ∆1 ⊗ ∆2 is taken, these probabilities sum up over all pairs i, j such that φ1,i ⊗ φ2,j take the same value φ. ut Of course, in (26.2), only a finite number of φ have a non-empty sum on the right. The method to combine the bpas proposed in Theorem 26.2 is called the (non-normalized) Dempster rule, since it has first been proposed in the context of multivalued mappings in (Dempster, 1967). More precisely, Dempster proposed a normalized version of this rule, where the combined ↓ variable is normalized, (∆1 ⊗ ∆2) . Then the rule is P m (φ ) · m (φ ) i,j:φ1,i⊗φ2,j =φ ∆1 1,i ∆2 2,j m ↓ (φ) = . (26.3) (∆1⊗∆2) 1 − P m (φ ) · m (φ ) i,j:φ1,i⊗φ2,j =z ∆1 1,i ∆2 2,j Here, the combinations which result in a contradiction are excluded and the probability is renormalized. A particular case is the combination of a random variable with a degen- erate variable, i.e. an element of Φ. Let ∆ be a random variable on Φ with bpa m∆(φi) for i = 1, . . . , m, and φ ∈ Φ. To φ corresponds a degenerate random variable with bpa m(φ) = 1. Then we get by (26.2) X m∆⊗φ(ψ) = m∆(φi). (26.4)

i:φi⊗φ=ψ

This Dempster’s (non-normalized) rule of conditioning. In the normalized case the rule becomes P m∆(φi) m (ψ) = i:φi⊗φ=ψ . (26.5) ∆⊗φ 1 − P m (φ ) i:φi⊗φ=z ∆ i We can also compute the bpa of a focussed random variable from the bpa of the original variable. This is shown in the next theorem.

Theorem 26.3 Let m∆(φi) for i = 1, . . . , m, be the bpa of a random vari- able ∆. Then the bpa of ∆⇒x is given by X m∆⇒x (φ) = m∆(φi). (26.6) ⇒x i:φi =φ

Proof. This follows, since the partition of (∆⇒x)→ contains the unions → ⇒x of the blocks Bi of the partition of ∆ , for which φi = φ. ut Again, in (26.3), only a finite number of φ have a non-empty sum on the right. So, we see that the operations with simple random variables are reflected by their associated bpas. In fact later, in Chapter 31, we shall consider an algebra of bpas, similar to an information algebra, based on these operations. Chapter 27

Random Information

27.1 Random Mappings

When we want to go beyond simple random mappings, we may consider any mapping h :Ω → Φ from a probability space (Ω, A,P ) into a domain- free information algebra (Φ,D). Let’s call such mappings random mappings in (Φ,D). As before, in the case of simple random mappings, we may define between random mappings in (Φ,D) the operations of combination and focussing:

1. Combination: Let Γ1 and Γ2 be two random mappings in (Φ,D), then Γ1 ⊗ Γ2 is the random mapping in (Φ,D) defined by

(Γ1 ⊗ Γ2)(ω) = Γ1(ω) ⊗ Γ2(ω). (27.1)

2. Focussing: Let Γ be a random mapping in (Φ,D) and x ∈ D, then Γ⇒x is the random mapping in (Φ,D) defined by

Γ⇒x(ω) = (Γ(ω))⇒x. (27.2)

For a fixed probability space (Ω, A,P ), let RΦ denote the set of all random mappings in (Φ,D). With the two operations defined above, (RΦ,D) inher- its all axioms of a domain-free information algebra from (Φ,D), with the exception of the support axiom. A domain x ∈ D is a support for a random mapping Γ in (Φ,D), if Γ = Γ⇒x, i.e. Γ(ω) = Γ⇒x(ω) for all ω ∈ Ω (or at least for almost all ω ∈ Ω, i.e. for all ω except a set of probability zero). Now, every upper bound of supports of h(ω) would be a support of h. But, except if D has a top element or is a complete lattice, such an upper bound does not necessarily exist. Therefore, we need some additional condition to guarantee that (RΦ,D) is an information algebra, such that D has a top el- ement. Or we may restrict ourselves to random mappings having a support. This set is closed under combination and focussing and forms therefore an information algebra.

217 218 CHAPTER 27. RANDOM INFORMATION

We may also define a labeled version of random mappings in a labeled information algebra (Φ,D) as mappings Γ : Ω → Φx for some x ∈ D. The label of such a random mapping Γ is d(Γ) = x. Between such labeled random mappings, defined on a fixed probability space (Ω, A,P ), combination and projection can be defined as above,

1. Combination: Let Γ1 and Γ2 be two random mappings in (Φ,D), then Γ1 ⊗ Γ2 is the random mapping in (Φ,D) defined by

(Γ1 ⊗ Γ2)(ω) = Γ1(ω) ⊗ Γ2(ω). (27.3)

2. Projection: Let Γ be a random mapping in (Φ,D) and x ∈ D, x ≤ d(Γ), then Γ↓x is the random mapping in (Φ,D) defined by Γ↓x(ω) = (Γ(ω))↓x. (27.4)

The system (RΦ,D) of labeled random mappings in the labeled information algebra (Φ,D) inherits all axioms from the latter and is thus itself a labeled information algebra. We may derive from the labeled information algebra of random mappings in the usual way the associated domain-free algebra. Its elements are equivalence classes [Γ] of random mappings. If Γ1, Γ2 ∈ [Γ], →y →x then, if x = d(Γ1) and y = d(Γ2) we have Γ1 = Γ2 and Γ2 = Γ1. But this →y →x means that for all ω ∈ Ω we have Γ1 (ω) = Γ2(ω) and Γ2 (ω) = Γ1(ω), hence [Γ1(ω)] = [Γ2(ω)]. Thus, we may consider [Γ] as a mapping from Ω into Φ/σ defined by [Γ](ω) = [Γ(ω)]. Therefore, the derived domain-free information algebra is an algebra of random mappings in (Φ/σ, D). This algebra will be denoted by (RΦ/σ, D). Conversely, from an information algebra of random mappings in a domain- free information algebra (Ψ,D), where all mappings have a support, we may obtain the corresponding labeled version. The elements of the labeled al- gebra are pairs (Γ, x), where Γ is a random mapping in Φ and x a support of Γ. To such a pair we may associate a mapping Γx :Ω → Φx defined by Γx(ω) = (Γ(ω), x), where (Φ,D) is the labeled information algebra derived from (Ψ,D). Evidently combination in the labeled information information algebra can be translated into a similar operation between mappings: In fact, let (Γ1, x) and (Γ2, y) be two labeled random mappings which combine into (Γ1 ⊗ Γ2, x ∨ y). Then the associated mapping Γx∨y :Ω → Φx∨y is defined by

Γx∨y(ω) = ((Γ1 ⊗ Γ2)(ω), x ∨ y)

= (Γ1(ω), x) ⊗ (Γ2(ω), y)

= Γ1,x(ω) ⊗ Γ2,y(ω). (27.5) ⇒x Similarly, the focussed random mapping Γx = (Γ , x), where Γ has support y, and x ≤ y, generates the mapping ⇒x ↓y Γx(ω) = (Γ (ω), x) = (Γ(ω), y) (27.6) 27.2. SUPPORT AND POSSIBILITY 219

This amounts to saying that the labeled algebra derived from an infor- mation algebra of random mappings in the domain-free algebra (Ψ,D) is in fact equivalent (isomorph) to an algebra of labeled random mappings in the labeled algebra (Φ,D) derived from (Ψ,D). So, finally, as with ordinary information algebra we may with algebras of random mappings too switch freely between labeled and domain-free versions, assuming the support ax- iom in the latter case. A random mapping Γ : Ω → Φ can be seen as uncertain information in the following way: The information represented by Γ can be interpreted in different ways. If an element ω ∈ Ω is assumed to hold, then information Γ(ω) ∈ Φ is asserted. The uncertainty arises from the fact, that it not known with certainty which element ω ∈ Ω holds. This view of a random mapping corresponds to the concept of a hint as explained in Chapter 25. Now, if information Γ(ω) is asserted, than all implied pieces of information ψ ≤ Γ(ω) are asserted too. This leads to an important generalization of a random mapping. In part I of these Lectures it has been argued that ideals I in Φ represent consistent assertions of information. Thus, it is natural to extend the concept of a random mapping from maps into Φ more generally to maps into the ideal completion IΦ of Φ. Thus, we call, more generally a random mapping a mapping Γ : Ω → IΦ. Since (IΦ,D) is an information algebra, the set of random mappings into IΦ form an information algebra

(RIφ ,D) too.

Clearly (RΦ, d) is a subalgebra of (RIφ ,D), or rather, it is embedded into (RIφ , d) by the mapping Γ ∈ RΦ 7→ I(Γ) ∈ RIφ , where I(Γ) is the 0 0 principal ideal of all random mappings Γ ∈ RΦ such that Γ ≤ Γ. For a random mapping Γ ∈ (RIφ ,D) we have also that for all ω ∈ Ω, in the ideal completion IΦ of Φ, _ Γ(ω) = {φ : φ ∈ Φ, φ ≤ Γ(ω)}

Now, let ∆ ∈ RΦ such that ∆ ≤ Γ, i.e. ∆(ω) ≤ Γ(ω) for all ω ∈ Ω. Then _ Γ = {∆ : ∆ ∈ RΦ, ∆ ≤ Γ}.

This shows that (RIφ ,D) is the ideal completion of the algebra (RΦ, d).

27.2 Support and Possibility

As in the case of simple random variables (Section 26.3) we may define the allocation of support sΓ of a random mapping by

sΓ(φ) = {ω ∈ Ω: φ ≤ Γ(ω)}. (27.7) We do not any more distinguish between the semantic categories of support and quasi-support as in Section 26.3 and speak simply of support, even so (27.7) is strictly speaking a quasi-support. 220 CHAPTER 27. RANDOM INFORMATION

This support as defined in (27.7) has the same properties as the support of simple random variables, in particular, Theorem 26.1 holds still (see also Section 29.1). Again, as before, with simple random variables, we may define the degree of support induced by a random mapping Γ of a piece of information φ by

spΓ(φ) = P (sΓ(φ)). (27.8)

Now however, this probability is only defined if sΓ(φ) ∈ A. For this to hold we have no guarantee whatsoever. The only element which we know for sure to measurable is sΓ(e) = Ω. A simple way out of this problem would be to limit random mappings to mappings Γ for which sΓ(φ) ∈ A for all φ ∈ Φ or even for all elements of the ideal completion IΦ. There is however the question whether such random mappings do exist for all information algebras. Even if they do, there is no reason why we should restrict ourselves exactly to those mappings. Therefore we prefer other, more rational approaches to overcome the difficulty of an only partial definition of degrees of support. Here we propose a first solution. Later we present some alternatives. In the paper (Shafer, 1979) the use of probability algebras instead of probability spaces was proposed as a natural framework for studying belief functions. Since degrees of support are like belief functions (see Chapter 29) we adapt this idea here. First, we introduce the probability algebra associated with a probability space. Let J be the σ-ideal of P -null sets in the σ-algebra A of the probability space. Two sets A0,A00 ∈ A are equivalent modulo J , if A0 − A00 ∈ J and A00 − A0 ∈ J . This means that the two sets have the same probability measure P (A0) = P (A00). This equivalence is a congruence in the Boolean algebra A. Hence the quotient algebra B = A/J is a Boolean σ-algebra too. If [A] denotes the equivalence class of A, then, for any countable family of sets Ai, i ∈ I,

[A]c = [Ac], " # _ [ [Ai] = Ai , i∈I i∈I " # ^ \ [Ai] = Ai . (27.9) i∈I i∈I

So [A] defines a Boolean homomorphism from A onto B, called projection. We denote [Ω] by > and ∅ by ⊥. These are of course the top and bot- tom elements of B. Now, as is well known, B has some further important properties (see (Halmos, 1963)): it satisfies the countable chain condition, which means that any family of disjoint elements of B is countable. Further, any Booelan algebra B satisfying the countable chain condition is complete. 27.2. SUPPORT AND POSSIBILITY 221

That is any subset E ⊆ B has a supremum ∨E and an infimum ∧E in B. Furthermore the countable chain condition implies also that there is al- ways a countable subset D of E with the same supremum and infimum, i.e. ∨D = ∨E and ∧D = ∧E. We refer to (Halmos, 1963) for these results. Finally, by µ([A]) = P (A) a normalized, positive measure µ is defined on B. Positive means here that µ(b) = 0 implies b = ⊥. A pair (B, µ) of a Boolean σ-algebra B, satisfying the countable chain condition, and a normalized, positive measure µ on it, is called a probability algebra (see also (Kappos, 1969). We use now this construction of a probability algebra out of the prob- ability space to extend the definition of the degrees of support sΓ beyond elements φ for which sΓ(φ) are measurable. Even, if sΓ(φ) is not measurable, any A ∈ A, such that A ⊆ sΓ(φ), represents an argument, which supports φ. To exploit this remark, define for every set H ∈ P(Ω)

ρ0(H) = ∨{[A]: A ⊆ H,A ∈ A}. (27.10)

This mapping has interesting properties as the following theorem shows.

Theorem 27.1 The application ρ0 : P(Ω) → A/J as defined in (27.10) has the following properties:

ρ0(Ω) = >,

ρ0(∅) = ⊥, ! \ ^ ρ0 Hi = ρ0(Hi). (27.11) i∈I i∈I if {Hi, i ∈ I} is a countable family of subsets of Ω.

Proof. Clearly, ρ0(Ω) = [Ω] = > ∈ A/J . Similarly, ρ0(∅) = [Ω] = ⊥ ∈ A/J . In order to prove the remaining identity, let Hi, i ∈ I be a countable family of subsets of Ω. For every index i, there is a countable family of 0 0 0 0 sets Hj ∈ A such that Hj ⊆ Hi and ∨[Hj] = [∪Hj] = ρ0(Hi) since B 0 satisfies the countable chain condition. Take Ai = ∪Hj. Then Ai ⊆ Hi, Ai ∈ A and P (Ai) = µ(ρ0(Hi)). Define A = ∩i∈I Ai ∈ A. It follows that A ⊆ ∩i∈I Hi and, because the projection is a σ-homomorphism, we obtain [A] = ∧i∈I [Ai] = ∧i∈I ρ0(Hi). We are going to show now that [A] = ρ0(∩i∈I Hi) which proves then the theorem. For this it is sufficient to show that P (A) = µ(ρ0(∩i∈I Hi)). This is so, since P (A) = µ([A]) and A ⊆ ∩Hi, hence [A] ≤ ρ0(∩Hi). Therefore, if µ([A]) = µ(ρ0(∩Hi)) we must well have [A] = ρ0(∩Hi), since µ is positive. Now, clearly P (A) ≤ µ(ρ0(∩Hi)). As above, we conclude that there is 0 0 0 0 0 a A ∈ A,A ⊆ ∩Hi such that P (A ) = µ(ρ0(∩Hi)). A ∪ (A − A ) ⊆ ∩Hi 222 CHAPTER 27. RANDOM INFORMATION implies that P (A0 ∪ (A − A0)) = P (A0), hence P (A − A0) = 0. Define 0 0 0 Ai = Ai ∪ (A − A ) ⊆ Hi. Then Ai − Ai = ∅ and therefore,

0 0 µ(ρ0(Hi)) = P (Ai) ≤ P (Ai) = P (Ai) + P (Ai − Ai) ≤ µ(ρ0(Hi)). (27.12)

0 0 This implies that P (Ai − Ai) = 0, therefore we have [Ai] = [Ai]. From 0 0 0 0 this we obtain that ∩Ai = ∩(Ai ∪ (A − A)) = (A − A) ∪ (∩Ai) = (A − 0 0 0 0 A) ∪ A = A ∪ A = A ∪ (A − A ). But ∩Ai and ∩Ai are equivalent, since 0 0 [∩Ai] = ∧[Ai] = ∧[Ai] = [∩Ai]. This implies finally that P (A) = P (∩Ai) = 0 0 0 0 P (∩Ai) = P (A ) + P (A − A ) = P (A ) = µ(ρ0(∩Hi)). This is what we wanted to prove. ut Take now B = A/J and consider the probability algebra (B, µ). Then we can compose the allocation of support s from Φ into the power set P(Ω) with the mapping ρ0 from P(Ω) into B to a mapping ρ = ρ0 ◦ s :Φ → B. Now we see that

ρ(e) = ρ0(s(e)) = ρ0(Ω) = >,

ρ(φ ⊗ ψ) = ρ0(s(φ ⊗ ψ)) = ρ0(s(φ) ∩ s(ψ))

= ρ0(s(φ)) ∧ ρ0(s(ψ)) = ρ(φ) ∧ ρ(ψ). (27.13)

A mapping ρ satisfying these properties is called a normalized allocation of probability on the information algebra Φ. In fact, it allocates an element of the probability algebra B to any element of algebra Φ. In this way, a random mapping leads always to an allocation of probability, once a probability measure on the assumptions is introduced. In particular, we may now define, with ρΓ = ρ0 ◦ sΓ,

spΓ(φ) = µ(ρΓ(φ)). (27.14)

This extends the support function (27.8) to all elements φ of Φ. This extension has an interesting interpretation. In order to see this, we add an important property of probability algebras: clearly µ(∧bi) ≤ infi µ(bi) and µ(∨bi) ≥ supi µ(bi) holds for any family of elements {bi}. But there are important cases where equality hold. A subset D of B is called downward (upward) directed, if for every pair b0, b00 ∈ D there is an element b ∈ D such that b ≤ b0 ∧ b00(b ≥ b0 ∨ b00).

Lemma 27.1 If D is a downward (upward) directed subset of B, then

µ(∧i∈Dbi) = inf µ(bi), (µ(∨i∈Dbi) = sup µ(bi)) (27.15) i∈D i∈D

Proof. There is a countable subfamily of elements bi which have the 0 same meet. If ci, i = 1, 2,... are these elements, then define c1 = c1 and 27.2. SUPPORT AND POSSIBILITY 223

0 0 0 select elements ci in the downward directed set such that c2 ≤ c1 ∧ c2, 0 0 0 0 0 c3 ≤ c2 ∧ c3,.... Then c1 ≥ c2 ≥ c3 ≥ ... and this sequence has still the same infimum. However, by the continuity of probability we have now

0 0 µ(∧bi) = µ(∧ci) = lim µ(ci) ≥ inf µ(bi). (27.16) i→∞ i

But this implies µ(∧bi) = infi µ(bi). The case of upwards directed sets is treated in the same way. ut Note now that {[H0]: H0 ⊆ H,H0 ∈ A} is an upward directed family. Therefore, according to Lemma 27.1 we have

spΓ(φ) = µ(ρ(φ)) = µ(∨{[A]: A ∈ A,A ⊆ sΓ(φ)})

= sup{µ([A]) : A ∈ A,A ⊆ sΓ(φ)}

= sup{P (A): A ∈ A,A ⊆ sΓ(φ)}

= P∗(sΓ(φ)), where P∗ is the inner probability associated with P . This shows, that the degree of support of a piece of information φ is the inner probability of the allocated support sΓ(φ). Note that definitions (27.14) and (27.8) coincide, if sΓ(φ) ∈ A. Support functions and inner probability measures are thus closely related. The result above is very appealing: any measurable set A, which is contained in sh(φ) supports φ. So we expect P (A) ≤ spΓ(φ). In the absence of further information, it is reasonable to take spΓ(φ) to be the least upper bound of the probabilities of A supporting φ. A similar consideration can be made with respect to the possibility sets associated with elements of Φ with respect to a random mapping Γ. As in Section 26.3 we define the possibility set of φ as

pΓ(φ) = {ω ∈ Ω : Γ(ω) ⊗ φ 6= z}.

This set contains all assumptions ω which do not lead to a contradiction with φ under the mapping Γ. Thus, the probability of this set, if it is defined, measures the degree of possibility or the degree of plausibility of φ,

plΓ(φ) = P (pΓ(φ)). (27.17)

As in the case of the degree of support, there is no guarantee that pΓ(φ) is measurable. But we can solve this problem in a way similar to the case of the degree of support. Under any measurable event A ⊇ pΓ(φ) the element φ remains possible. Therefore we define for every set H ∈ P

ξ0(H) = ∧{[A]: A ⊇ H,A ∈ H}. (27.18)

c c Note that A ⊇ H if, and only if A ⊆ H . This implies that ξ0(H) = c c (ρ0(H )) . From this in turn we conclude that the following corollary to Theorem 27.1 holds: 224 CHAPTER 27. RANDOM INFORMATION

Corollary 27.1 The application ξ0 : P(Ω) → A/J as defined in (27.18) has the following properties:

ξ0(Ω) = >,

ξ0(∅) = ⊥, ! [ _ ξ0 Hi = ξ0(Hi). (27.19) i∈I i∈I if {Hi, i ∈ I} is a countable family of subsets of Ω.

As before we can now compose pΓ with ξ0 to obtain a mapping ξΓ = ξ0 ◦ pΓ :Φ → B = A/J . We may then define for any φ ∈ Φ a degree of plausibility by

plΓ(φ) = µ(ξΓ(φ)). (27.20) Using Lemma 27.1 we obtain also ∗ plΓ(φ) = inf{P (A): A ∈ A,A ⊇ pΓ(φ)} = P (pΓ(φ)). ∗ Here P is the outer probability of the set pΓ(φ). Thus, if pΓ(φ) is mea- ∗ surable, then P (pΓ(φ)) = P (pΓ(φ)), which shows that (27.20) defines in fact an extension of the plausibility defined by (27.17). We remark that the plausibility takes its full flavour only in Boolean information algebras (see Chapter 29).

27.3 Random Variables

Here we propose an alternative approach to define random variables in an in- formation algebra. We start with the information algebra (Rs,D) of simple random variables with values in a domain free information algebra (Φ,D) and defined on a sample space (Ω, A,P ), and consider the ideal completion of this algebra, rather than of the algebra (RΦ,D) considered in Section 27.1. Let R denote the ideal completion of Rs. Then (R, Rs,D) is a com- pact information algebra (see part I) with simple random variables as finite elements. We call the elements of R generalized random variables. A gener- alized random variable is thus an ideal of simple random variables. For any Γ ∈ R we may within the algebra (R, Rs,D) write Γ = ∨{∆ ∈ Rs : ∆ ≤ Γ}. To any Γ ∈ R we may associate a random mapping Γ : Ω → IΦ from the underlying sample space into the ideal completion of (Φ,D) by defining Γ(ω) = ∨{∆(ω) : ∆ ∈ Γ}.

This random mapping is defined by a sort of pointwise limit within IΦ. We denote the random mapping Γ deliberately with the same symbol as the generalized random variable Γ. The reason is that the two concept can essentially by identified as the following lemmata show: 27.3. RANDOM VARIABLES 225

Lemma 27.2 1. ∀ Γ1, Γ2 ∈ R,

(Γ1 ⊗ Γ2)(ω) = Γ1(ω) ⊗ Γ2(ω).

2. ∀ Γ ∈ R and ∀x ∈ D, (Γ⇒x)(ω) = (Γ(ω))⇒x. Proof. (1) By definition we have

(Γ1 ⊗ Γ2)(ω) = ∨{∆(ω) : ∆ ∈ Γ1 ⊗ Γ2}

= {φ ∈ Φ: ∃ ∆ ∈ Γ1 ⊗ Γ2 such that φ ≤ ∆(ω)}.

Assume then φ ∈ (Γ1 ⊗Γ2)(ω), hence φ ≤ ∆(ω) for some ∆ ∈ Γ1 ⊗Γ2. Then, by the definition of combination of ideals, ∆ ≤ ∆1⊗∆2 for some ∆1 ∈ Γ1 and some ∆2 ∈ Γ2. This implies φ ≤ ∆(ω) ≤ (∆1 ⊗ ∆2)(ω) = ∆1(ω) ⊗ ∆2(ω). Thus φ ∈ Γ1(ω) ⊗ Γ2(ω). Conversely, again by the definition of combination of ideals

Γ1(ω) ⊗ Γ2(ω) = {φ ∈ Φ: φ ≤ ψ1 ⊗ ψ2, ψ1 ∈ Γ1(ω), ψ2 ∈ Γ2(ω)}.

Consider a φ ∈ Γ1(ω) ⊗ Γ2(ω) and assume φ ≤ ψ1 ⊗ ψ2 for some ψ1 ∈ Γ1(ω) and ψ2 ∈ Γ2(ω). Since

Γ1(ω) = ∨{∆(ω) : ∆ ∈ Γ1}. it follows from the compactness that there is a ∆1 ∈ Γ1 such that ψ1 ≤ ∆1(ω) and similarly it follows that there is a ∆2 ∈ Γ2 such that ψ2 ≤ ∆2(ω). Hence φ ≤ ∆1(ω)⊗∆2(ω) = (∆1 ⊗∆2)(ω) and ∆1 ⊗∆2 ∈ Γ1 ⊗Γ2 such that finally φ ∈ (Γ1 ⊗ Γ2)(ω), and therefore φ ∈ (Γ1 ⊗ Γ2)(ω). (2) Assume φ ∈ (Γ⇒x)(ω). Note that (Γ⇒x)(ω) = ∨{∆(ω) : ∆ ∈ Γ⇒x}, and 0 Γ⇒x = ∨{∆ : ∆ ≤ ∆ ⇒x for some ∆0 ∈ Γ}, Therefore, there is a ∆ ∈ Γ⇒x such that φ ≤ ∆(ω). But then there is further a ∆0 ∈ Γ such that ∆ ≤ ∆0⇒x, hence φ ≤ (∆0⇒x)(ω) = (∆0 (ω))⇒x. Now, from ∆0 ∈ Γ we conclude that ∆0(ω) ∈ Γ(ω). This implies then finally φ ∈ (Γ(ω))⇒x. Conversely, let φ ∈ (Γ(ω))⇒x, i.e. φ ≤ ψ⇒x for some ψ ∈ Γ(ω), i.e. ψ ≤ ∨{∆(ω) : ∆ ∈ Γ}. By compactness it follows that ψ ≤ ∆(ω) for some ∆ ∈ Γ. Hence we conclude that φ ≤ (∆(ω))⇒x = (∆⇒x)(ω). This shows that φ ∈ (Γ⇒x)(ω). ut According to this lemma we have a homomorphism between the algebra of generalized random variables and random mappings. In fact it is an embedding, since Γ1(ω) = Γ2(ω) for all ω ∈ Ω implies Γ1 = Γ2. as the lemma can be strengthened in the following way: follows from the next lemma, which strengthens Lemma 27.2. 226 CHAPTER 27. RANDOM INFORMATION

Lemma 27.3 If X ⊆ R is a directed set, then

(∨Γ∈X Γ)(ω) = ∨Γ∈X Γ(ω).

Proof. If Γ ∈ X, then Γ ≤ ∨Γ∈X Γ, hence Γ(ω) ≤ (∨Γ∈X Γ)(ω) and therefore

∨Γ∈X Γ(ω) ≤ (∨Γ∈X Γ)(ω).

Conversely, assume φ ∈ (∨Γ∈X Γ)(ω). Since

(∨Γ∈X Γ)(ω) = ∨{∆(ω) : ∆ ≤ ∨Γ∈X Γ} we have φ ≤ ∆(ω) for some simple random variable ∆ ≤ ∨Γ∈X Γ. Now, since X is a directed set, by compactness (more precisely, Theorem 15 (2) in Part I), there is a Γ ∈ X such that ∆ ≤ Γ, that is ∆(ω) ≤ Γ(ω). It follows then that φ ≤ ∨Γ∈X Γ(ω), which in turn implies

(∨Γ∈X Γ)(ω) ≤ ∨Γ∈X Γ(ω).

This concludes the proof of the lemma. ut As before, in the previous Section 27.1, there is no guarantee that the support sΓ(φ) is measurable for every φ ∈ Φ. Of course we can extend the support function to all of Φ by the inner probability. Here however we propose alternatively to limit the class of random variables to the reasonable class of mappings which are limits in some sense of countable sequences of simple random variables. This can be accomplished in the framework of the algebra of generalized variables. In order to do this we need to introduce a new concept. Let (Φ,D) be a domain-free information algebra and (IΦ,I(Φ),D) its ideal completion. A subset S ⊆ IΦ is called σ-closed, if it is closed under countable combinations ∞ or joins, i.e. if {I1,I2,...} is a countable subset of S, then ∨i=1Ii belongs to S too. The intersection of any family of σ-closed sets is also σ-closed. Further the set IΦ itself is σ-closed. Therefore, for any subset X ⊆ IΦ we may define the σ-closure σ(X) as the intersection of σ-closed sets containing X. We are particularly interested in σ(Φ), the σ-closure of Φ in IΦ. Note that here, as in the sequel, we identify Φ with I(Φ) for simplicity of notation. Also we shall write φ rather than I(φ), even if we operate within IΦ. The σ-closure of Φ can be characterized as follows:

Theorem 27.2 If (Φ,D) is an information algebra, then

∞ σ(Φ) = {φ ∈ IΦ : φ = ∨i=1ψi, ψi ∈ Φ, ∀i = 1, 2,...}. (27.21) 27.3. RANDOM VARIABLES 227

Proof. Clearly the set on the right of equation (27.21) contains Φ and is contained in σ(Φ). We claim that this set is itself σ-closed. In fact, consider a countable set φj of elements of this set, such that

∞ φj = ∨i=1ψj,i with ψj,i ∈ Φ. Define the set I = {(j, i): j = 1, 2 ... ; i = 1, 2 ...} and the sets Ij = {(j, i); i = 1, 2,...} for j = 1, 2,..., and Ki = {(h, j) : 1 ≤ h, j ≤ i} for i = 1, 2,.... Then we have

∞ ∞ I = ∪j=1Ij = ∪i=1Ki.

By the laws of associativity in the complete lattice IΦ we obtain then

∞ ∞ ∨j=1φj = ∨j=1(∨(j,i)∈Ij ψj,i) = ∨(j,i)∈I ψj,i ∞ = ∨i=1(∨(h,j)∈Ki ψh,j). ∞ But ∨(h,j)∈Ki ψh,j ∈ Φ for i = 1, 2,.... Hence ∨j=1φj belongs itself to the set on the right hand side of (27.21). This means that this set is indeed σ-closed. Since the set contains Φ, it contains also σ(Φ), hence it equals σ(Φ). ut Consider now a monotone sequence φ1 ≤ φ2 ≤ ... of elements of Φ. In the following theorem we show that for such sequences join commutes with focussing:

Theorem 27.3 For a monotone sequence φ1 ≤ φ2 ≤ ... of elements of Φ, and for any x ∈ D,

∞ ⇒x ∞ ⇒x (∨i=1φi) = ∨i=1φi . (27.22) Proof. By the density property of a compact information algebra (see Part I), we have that

∞ ⇒x ⇒x ∞ (∨i=1φi) = ∨{ψ : ψ ∈ Φ, ψ ≤ ∨i=1φi}. Certainly, the left hand side of equation (27.22) is an upper bound of the ∞ right hand side. Consider a ψ ∈ Φ such that ψ ≤ ∨i=1φi. Note that the monotone sequence φ1, φ2,... is a directed set in Φ. Therefore, by the compactness property of a compact information algebra (see part I), there ⇒x ⇒x is a φi in the sequence such that ψ ≤ φi. This implies that ψ ≤ φi . From this we conclude that

⇒x ∞ ∞ ⇒x ∨{ψ : ψ ∈ Φ, ψ ≤ ∨i=1φi} ≤ ∨i=1φi . This proves the equality in (27.22). ut Theorem 27.3 shows in particular that σ(Φ) is closed under focussing. In fact, if φi is any sequence of elements of Φ, then we may define ψi = 228 CHAPTER 27. RANDOM INFORMATION

i ∨k=1φk ∈ Φ, such that ψk for k = 1, 2,... is a monotone sequence and ∞ ∞ φ = ∨i=1φi = ∨i=1ψi. So, for φ ∈ σ(Φ) and any x ∈ D by Theorem 27.3

⇒x ∞ ⇒x φ = ∨i=1ψi , (27.23)

⇒x ⇒x ψi ∈ Φ and hence φ ∈ σ(Φ) by Theorem 27.2. As a σ-closed set σ(Φ) is closed under combination. Therefore (σ(Φ),D) is itself an information algebra. Since it is closed under combination (i.e. join) of countable sets and satisfies condition (27.22) we call it the σ-information algebra induced by (Φ,D). We refer however to Example ?? below to clear this concept. We now take the σ-closure of Rs in the compact algebra (R, Rs,D) of generalized random variables. The elements of σ(Rs) are called proper random variables or simply random variables. According to Theorem 27.2 random variables are defined as

∞ Γ = ∨i=1∆i, with ∆i ∈ Rs, ∀I = 1, 2,....

Then (σ(Rs),D) is a σ-information algebra, containing Rs, i.e. the simple random variables. To random variables we can, just as with generalized random variables, associate random mappings, defined by

∞ Γ(ω) = ∨i=1∆i(ω), with ∆i ∈ Rs, ∀I = 1, 2,....

Random variables are thus within the ideal completion IΦ of Φ pointwise limits of sequences of simple random variables. Note that Γ(ω) ∈ σ(Φ) by Theorem 27.2. According to Lemma 27.2 combination and focusing can also be defined pointwise within the algebra (IΦ,D), or more precisely, within the σ-information algebra (σ(Φ),D). For σ-random variables Lemma 27.2 can be extended in the following way:

Lemma 27.4 If Γi ∈ σ(Rs), for i = 1, 2,..., then

∞ ∞ (∨i=1Γi)(ω) = ∨i=1Γi(ω). (27.24)

i Proof. Define Σi = ∨j=1Γi. Then Σi ∈ σ(Rs) is a monotone sequence of ∞ ∞ random variables and ∨i=1Σi = ∨i=1Γi. The same holds for Σi(ω) ∈ σ(Φ) ∞ ∞ and ∨i=1Σi(ω) = ∨i=1Γi(ω). Since a monotone sequence is a directed set, applying Lemma 27.3, we obtain

∞ ∞ ∞ ∞ (∨i=1Γi)(ω) = (∨i=1Σi)(ω) = ∨i=1Σi(ω) = ∨i=1Γi(ω).

Thus (27.24) holds indeed. ut In the next section we examine a variant of the theory above of random variables in algebras which are already themselves closed under countable combination. 27.4. RANDOM VARIABLES IN σ-ALGEBRAS 229

27.4 Random Variables in σ-Algebras

In the previous Section 27.3 we have seen, that the ideal completion of an information algebra contains a σ-algebra. There are some information alge- bras, like for example Borel algebras, which are themselves already closed under countable combination. It should be noted however that (Φ,D) is em- bedded into the ideal completion (IΦ,D) only by a homomorphism φ 7→ I(φ), respecting finite combination only. Thus, if φ1, φ2,... is a countable set of ∞ elements of Φ and φ = ∨i=1φ, then I(φ), the principal ideal of φ in Φ is not ∞ equal to ∨i=1I(φ). This is so, because the principal ideal I(φ) is a σ-ideal, i.e. an ideal closed under countable combination of its elements. The ideal completion on the other hand contains ordinary ideals, closed only under finite combinations in general. Therefore, by the definition of combination of ideals in IΦ, \ I({φ1, φ2,...}) = {I : I ideal, {φ1, φ2,...} ⊆ I}.

Certainly {φ1, φ2,...} ⊆ I(φ)}, but there may be ordinary ideals I such that {φ1, φ2,...} ⊂ I ⊂ I(φ)}, hence we can only conclude that I({φ1, φ2,...}) ⊆ I(φ). So, even if Φ is closed under countable combinations, it is not true that σ(Φ) = Φ. Nevertheless, a theory of random variables may be defined in such algebras along similar lines as in Section 27.3 above. A domain-free information algebra (Φ,D) is called a σ-information alge- bra, if 1. Countable Combination: Φ is closed under countable combination.

2. Continuity of Focussing: For every montone sequence φ1 ≤ φ2 ≤ ... ∈ Φ, and for any x ∈ D, it holds that ∞ ∞ O ⇒x O ⇒x ( φi) = φi . i=1 i=1

In this section, information algebras (Φ,D) are always assumed to be σ- algebras. Certainly, the countable combination equals also the countable supremum, ∞ ∞ O _ φi = φi. i=1 i=1 Simple random variables ∆ on a probability space (Ω, A,P ) with values in a domain-free information algebra (Φ,D) are defined as in Section 26.1 and they form still an information algebra (Rs,D). Since (Φ.D) is a σ- algebra, we may define a random mapping Γ : Ω → Φ by ∞ _ Γ(ω) = ∆i(ω). i=1 230 CHAPTER 27. RANDOM INFORMATION

We call such random mappings Γ random variables in the σ-algebra (Φ,D). Let now Rσ be the family of random variables in the σ-algebra (Φ,D). We are going to define the operations of combination and focusing in Rσ as usual by pointwise combination and focussing:

1. Combination: (Γ1 ⊗ Γ2)(ω) = Γ1(ω) ⊗ Γ2(ω),

2. Focussing: (Γ⇒x)(ω) = (Γ(ω))⇒x.

We have to verify that the resulting random mappings still belong to Rσ. So, let

∞ ∞ _ _ Γ1 = ∆1,i, Γ2 = ∆2,i. i=1 i=1 Then we obtain, using associativity of the supremum

∞ ∞ _ _ (Γ1 ⊗ Γ2)(ω) = Γ1(ω) ⊗ Γ2)(ω) = ( ∆1,i) ∨ ( ∆2,i) i=1 i=1 ∞ ∞ _ _ = (∆1,i(ω) ∨ ∆2,i(ω)) = (∆1,i ∨ ∆2,i)(ω). i=1 i=1

Since ∆1,i ∨ ∆2,i ∈ Rs this proves that Γ1 ⊗ Γ2 ∈ Rσ. Similarly, let

∞ _ Γ(ω) = ∆i(ω). i=1 Define then

n _ Γn(ω) = ∆i(ω), i=1 such that Γn(ω) is an increassing sequence in Φ. Then, by the continuity of focussing,

∞ ⇒x ⇒x _ ⇒x (Γ )(ω) = (Γ(ω)) = ( Γi) i=1 ∞ ∞ _ ⇒x _ ⇒x = (Γi(ω)) = (Γi(ω)) . i=1 i=1

Again, ∆i ∈ Rs implies Γi ∈ Rs for all i = 1, 2,.... Thus this shows that ⇒x Γ ∈ Rσ. We expect (Rσ,D) to form an information algebra, even a σ-algebra. This is indeed essentially true. 27.4. RANDOM VARIABLES IN σ-ALGEBRAS 231

Theorem 27.4 The system (Rσ,D) of random variables, with combina- tion and focussing defined as above, satisfies all axioms of a domain-free σ-information algebra, except the support axiom.

Proof. We show that (Rσ) is σ-closed: Let

∞ _ Γj(ω) = ∆j,i(ω), for j = 1, 2,..., i=1 and define the random mapping Γ by

∞ ∞ ∞ ! _ _ _ Γ(ω) = Γj(ω) = ∆j,i . j=1 j=1 i=1

Consider the sets Ij = {(j, i): i = 1, 2,...} and

∞ [ I = {(j, i): j, i = 1, 2,...} = Ij. j=1

Then we have ∞ _ Γ(ω) = ∆j,i. (j,i)∈I

Let further Ki = {(h, j): h, j ≤ i} and define

i i _ _ Γi(ω) = ∆h,j(ω). h=1 j=1

Note that Γi are simple random variables. Then we have

∞ [ I = Ki i=1 and therefore, by the general law of associativity of supremum,

∞   ∞ _ _ _ Γi(ω) =  ∆h,j(ω) = Γi(ω). i=1 (h,j)∈Ki i=1

Thus the random mapping Γ is indeed a random variable and Rσ is closed under countable combination. The axioms of the information algebra (Rσ,D) of are as usual all in- herited from the underlying information algebra (Φ,D), except the support axiom. 232 CHAPTER 27. RANDOM INFORMATION

It remains to verify the continuity of focusing. Note that in the infor- mation algebra (Rσ,D) we have Γ1 ≤ Γ2 if, and only if, Γ1(ω) ≤ Γ2(ω). So, assume Γ1 ≤ Γ2 ≤ ... be a monotone sequence of random variables in Rσ and x ∈ D. Then continuity of focusing in (Rσ,D) follows from this property in (Φ,D) as follows: ∞ ∞ ∞ _ ⇒x _ ⇒x _ ⇒x (( Γi) )(ω) = ( Γi(ω)) = (Γi(ω)) i=1 i=1 i=1 ∞ _ ⇒x = ( Γi )(ω). i=1 This ends the proof. ut The support axiom is not guaranteed, because the sequence of simple random variables defining the random variable may have supports without an upper bound in D. Certainly, (Rs,D) is a subalgebra of (Rσ,D). Within the algebra (Rσ,D), each element of Rσ is the supremum of the simple random variables, which it dominates.

Lemma 27.5 Let Γ ∈ Rσ, defined by ∞ _ Γ(ω) = ∆i(ω). i=1

Then, in the information algebra (Rσ,D) ∞ _ _ Γ = ∆i = {∆ : ∆ ∈ Rs, ∆ ≤ Γ}. (27.25) i=1 Proof. The first equality in (27.25) follows directly from the definition of Γ. Trivially, Γ is an upper bound of the set {∆ : ∆ ≤ Γ}. If Γ0 is another upper bound of this set, then it is also an upper bound of the ∆i, hence Γ ≤ Γ0. Therefore. Γ is the least upper bound of the set {∆ : ∆ ≤ Γ}. ut

Example 27.1 Random Variables in Compact-Algebras. The theory of ran- dom variables developed above may be presented in a similar way in the framework of compact information algebras. Here is a sketch of the ap- proach: Let (Φ, Φf ,D) be a compact information algebra. Define simple random variables ∆ with finite elements from Φf as values. They form an information algebra (Rs; D) with combination and focusing defined point- wise. Order between simple random variables is also defined pointwise, i.e. ∆1 ≤ ∆2 holds if, and only if ∆1(ω) ≤ ∆2(ω) for all sample points ω of the underlying probability space. If X is a directed set of simple random variables, then the random map- ping Γ = ∨X :Ω → Φ is also defined pointwise:

Γ(ω) = ∨∆∈X ∆(ω). 27.4. RANDOM VARIABLES IN σ-ALGEBRAS 233

In particular, if ∆1, ∆2,... is a countable family of simple random variables, define

i Γi = ∨j=1∆j.

∞ Then the mappings Γ = ∨i=1∆i define a σ-information algebra of random mappings. This method can, in an obvious way, also be applied for labeled compact information algebras. 234 CHAPTER 27. RANDOM INFORMATION Chapter 28

Allocation of Probability

28.1 Algebra of Allocation of Probabilities

In Section 27.1 we have introduced allocations of probability (a.o.p.) as a means to extend the support function of a random mapping beyond the measurable elements φ, i.e. the elements for which sΓ(φ) ∈ A. These allocations of probability play an important role in the theory of random information. Therefore, we start here with a study of this concept, first independent of its relation to random mappings and random variables. Random variables or hints and probabilistic argumentation systems pro- vide means to model explicitly the mechanisms which generate uncertain information. Alternatively, allocations of probability may serve to directly assign beliefs to pieces of information. This is more in the spirit of a subjec- tive description of belief, advocated especially by G. Shafer (Shafer, 1973; Shafer, 1976). In this view, allocations of probability are taken as the prim- itive elements, rather than random variables or hints. This is the point of view developed in this section (see also (Kohlas, 1995; Kohlas, 1997)). Consider a domain-free information algebra (Φ,D) and a probability algebra (B, µ). Allocations of probability are mappingρ :Φ → B such that (A1) ρ(e) = >,

(A2) ρ(φ ⊗ ψ) = ρ(φ) ∧ ρ(ψ). If furthermore ρ(z) = ⊥ holds, then the allocation is called normalized. Note, that if φ ≤ ψ, that is, φ ⊗ ψ = ψ, then ρ(φ ⊗ ψ) = ρ(φ) ∧ ρ(ψ) = ρ(ψ), hence ρ(ψ) ≤ ρ(φ). A particular a.o.p. is defined by ν(φ) = ⊥, unless φ = e, in which case ν(e) = >. This is called the vacuous allocation. It is associated with the vacuous information associated wit the random variable Γ(ω) = e for all ω ∈ Ω. We may think of an allocation of probability as the description of a body of belief relative to pieces information in an information algebra (Φ,D) ob- tained from a source of information. Two (or more) distinct sources of

235 236 CHAPTER 28. ALLOCATION OF PROBABILITY information will lead to the definition of two (or more) corresponding allo- cations of probability. Thus, in a general setting let AΦ be the set of all allocations of probability on Φ in (B, µ). Select two allocations ρi, i = 1, 2, from AΦ. How can they be combined in order to synthesize the two bodies of information they represent into a single body? The basic idea is as follows: consider a piece of information φ in Φ. If now φ1 and φ2 are two other pieces of information in Φ, such that φ ≤ φ1 ⊗ φ2, then any belief allocated to φ1 and to φ2 by the two allocations ρ1 and ρ2 respectively, is a belief allocated to φ by the two allocations simultaneously. That is, the total belief ρ(φ) to be allocated to φ by the two allocations ρ1 and ρ2 together must be at least the combined, common support allocated to φ1 and φ2 by each of the two allocations respectively,

ρ(φ) ≥ ρ1(φ1) ∧ ρ2(φ2). (28.1)

In the absence of other information, it seems then reasonable to define the combined belief in φ, as obtained from the two sources of information, as the least upper bound of all these implied beliefs,

ρ(φ) = ∨{ρ1(φ1) ∧ ρ2(φ2): φ ≤ φ1 ⊗ φ2}. (28.2)

This defines indeed a new allocation of probability.

Theorem 28.1 ρ as defined by (28.2) is an allocation of probability.

Proof. First, we have

ρ(e) = ∨{ρ1(φ1) ∧ ρ2(φ2): e ≤ φ1 ⊗ φ2}

= ρ1(e) ∧ ρ2(e) = >.

So (A1) is satisfied. Next, let ψ1, ψ2 ∈ Φ. By definition we have

ρ(ψ1 ⊗ ψ2) = ∨{ρ1(φ1) ∧ ρ2(φ2): ψ1 ⊗ ψ2 ≤ φ1 ⊗ φ2}.

Now, ψ1 ≤ ψ1 ⊗ ψ2 implies that

∨{ρ1(φ1) ∧ ρ2(φ2): ψ1 ⊗ ψ2 ≤ φ1 ⊗ φ2}

≤ ∨{ρ1(φ1) ∧ ρ2(φ2): ψ1 ≤ φ1 ⊗ φ2} and similarly for ψ2. Thus, we have ρ(ψ1 ⊗ ψ2) ≤ ρ(ψ1), ρ(ψ2), that is ρ(ψ1 ⊗ ψ2) ≤ ρ(ψ1) ∧ ρ(ψ2). On the other hand,

{(φ1, φ2): ψ1 ⊗ ψ2 ≤ φ1 ⊗ φ2} 0 00 0 00 0 0 00 00 ⊇ {(φ1, φ2): φ1 = φ1 ⊗ φ1, φ2 = φ2 ⊗ φ2, ψ1 ≤ φ1 ⊗ φ2, ψ2 ≤ φ1 ⊗ φ2}. 28.1. ALGEBRA OF ALLOCATION OF PROBABILITIES 237

By the distributive law for complete Boolean algebras we obtain then

ρ(ψ1 ⊗ ψ2) 0 00 0 00 0 0 00 00 ≥ ∨{ρ1(φ1 ⊗ φ1) ∧ ρ2(φ2 ⊗ φ2): ψ1 ≤ φ1 ⊗ φ2, ψ2 ≤ φ1 ⊗ φ2} 0 00 0 00 0 0 00 00 = ∨{ρ1(φ1) ∧ ρ1(φ1) ∧ ρ2(φ2) ∧ ρ2(φ2): ψ1 ≤ φ1 ⊗ φ2, ψ2 ≤ φ1 ⊗ φ2} 0 0 0 0  = ∨{ρ1(φ1) ∧ ρ2(φ2): ψ1 ≤ φ1 ⊗ φ2} ∧ 00 00 00 00  ∨{ρ1(φ1) ∧ ρ2(φ2): ψ2 ≤ φ1 ⊗ φ2} = ρ(ψ1) ∧ ρ(ψ2). (28.3)

This implies finally that ρ(ψ1 ⊗ ψ2) = ρ(ψ1) ∧ ρ(ψ2). Thus (A2) holds too and ρ is indeed an allocation of probability. ut In this way, in the set of allocations of probability AΦ a binary com- bination operation is defined. We denote this operation by ⊗. Thus, ρ as defined by (28.2) is written as ρ = ρ1 ⊗ ρ2. The following theorem gives us the elementary properties of this operation.

Theorem 28.2 The combination operation, as defined by (28.2), is com- mutative, associative, idempotent and the vacuous allocation is the neutral element of this operation.

Proof. The commutativity of (28.2) is evident. For the associativity note that for a φ ∈ Φ we have, due to the associativity and distributivity of complete Boolean algebras,

((ρ1 ⊗ ρ2) ⊗ ρ3)(φ)

= ∨{ρ1 ⊗ ρ2(φ12) ∧ ρ3(φ3): φ ≤ φ12 ⊗ φ3}

= ∨{∨{ρ1(φ1) ∧ ρ2(φ2): φ12 ≤ φ1 ⊗ φ2} ∧ ρ3(φ3): φ ≤ φ12 ⊗ φ3}

= ∨{ρ1(φ1) ∧ ρ2(φ2) ∧ ρ3(φ3): φ ≤ φ1 ⊗ φ2 ⊗ φ3}.

For (ρ1 ⊗ (ρ2 ⊗ ρ3))(φ) we obtain exactly the same result in the same way. This proves associativity. To show idempotency consider

ρ ⊗ ρ(φ) = ∨{ρ(φ1) ∧ ρ(φ2): φ ≤ φ1 ⊗ φ2}

= ∨{ρ(φ1 ⊗ φ2): φ ≤ φ1 ⊗ φ2} = ρ(φ) since the last supremum is attained for φ1 = φ2 = φ. Finally let ν denote the vacuous allocation. Then, for any allocation ρ and any φ ∈ Φ we have, noting that ν(ψ) = ⊥, unless ψ = e, in which case ν(e) = >,

ρ ⊗ ν(φ) = ∨{ρ(φ1) ∧ ν(φ2): φ ≤ φ1 ⊗ φ2} = ρ(φ). This shows that ν is the neutral element for combination. ut This theorem shows that AΦ is a commutative and idempotent semi- group, that is, a semilattice. Indeed, a partial order between allocations can 238 CHAPTER 28. ALLOCATION OF PROBABILITY be introduced as usual by defining ρ1 ≤ ρ2 if ρ1 ⊗ ρ2 = ρ2. This means that for all φ ∈ Φ,

ρ1 ⊗ ρ2(φ) = ∨{ρ1(φ1) ∧ ρ2(φ2): φ ≤ φ1 ⊗ φ2} = ρ2(φ).

We have therefore always ρ1(φ1) ∧ ρ2(φ2) ≤ ρ2(φ) if φ ≤ φ1 ⊗ φ2. Take now φ1 = φ and φ2 = e, such that φ ≤ φ ⊗ e = φ, to obtain ρ1(φ) ∧ ρ2(e) = ρ1(φ) ≤ ρ2(φ). Thus we have ρ1 ≤ ρ2 if, and only if, ρ1(φ) ≤ ρ2(φ) for all φ ∈ Φ. As an example consider allocations ρ of the form

 > if ψ ≤ φ, ρ (ψ) = (28.4) φ ⊥ otherwise, for some φ ∈ Φ. Such allocations are called deterministic. They express the fact that full belief is allocated to the information φ. Now, for φ1, φ2 ∈ Φ we have

ρφ1 ⊗ ρφ2 (ψ) = ∨{ρφ1 (ψ1) ∧ ρφ2 (ψ2): ψ ≤ ψ1 ⊗ ψ2}  > if ψ ≤ φ ⊗ φ ,  = 1 2 = ρ (ψ). (28.5) ⊥ otherwise φ1⊗φ2

So, the combination of allocations produces the expected result. Note also that ρe = ν. This shows that the mapping φ 7→ ρφ is a ⊗-homomorphism. Next we turn to the operation of focusing of an allocation of probability. Let ρ be an allocation of probability on an information algebra (Φ,D) and x a domain in D. Just as it is possible to focus an information φ from Φ to the domain x, it should also be possible to focus the belief represented by the allocation ρ to the information supported by the domain x. This means to extract the information related to x from ρ. Thus, for a φ ∈ Φ consider the beliefs allocated to pieces of information ψ which are supported by x and which entail φ, i.e. φ ≤ ψ = ψ⇒x. The part of the belief allocated to φ and relating to the domain x, ρ⇒x(φ) must then be at least ρ(ψ),

ρ⇒x(φ) ≥ ρ(ψ) for any ψ = ψ⇒x ≥ φ. (28.6)

In the absence of other information, it seems again reasonable to define ρ⇒x(φ) to be the least upper bound of all these implied supports,

ρ⇒x(φ) = ∨{ρ(ψ): ψ = ψ⇒x ≥ φ}. (28.7)

This defines indeed an allocation of probability.

Theorem 28.3 ρ⇒x(φ) as defined by (28.7) is an allocation of probability. 28.1. ALGEBRA OF ALLOCATION OF PROBABILITIES 239

Proof. We have by definition

ρ⇒x(e) = ∨{ρ(ψ): ψ = ψ⇒x ≥ e} = ρ(e) = >, since e = e⇒x. Thus (A1) is verified. Again by definition,

⇒x ⇒x ρ (φ1 ⊗ φ2) = ∨{ρ(ψ): ψ = ψ ≥ φ1 ⊗ φ2}.

⇒x ⇒x ⇒x From φ1, φ2 ≤ φ1 ⊗ φ2 it follows that ρ (φ1 ⊗ φ2) ≤ ρ (φ1), ρ (φ2) and ⇒x ⇒x ⇒x thus ρ (φ1 ⊗ φ2) ≤ ρ (φ1) ∧ ρ (φ2). On the other hand,

⇒x {ψ : ψ = ψ ≥ φ1 ⊗ φ2} ⇒x ⇒x ⊇ {ψ = ψ1 ⊗ ψ2 : ψ1 = ψ1 ≥ φ1, ψ2 = ψ2 ≥ φ2}.

From this we obtain, using the distributive law for complete Boolean alge- bras,

⇒x ρ (φ1 ⊗ φ2) ⇒x ⇒x ≥ ∨{ρ(ψ1 ⊗ ψ2): ψ1 = ψ1 ≥ φ1, ψ2 = ψ2 ≥ φ2} ⇒x ⇒x = ∨{ρ(ψ1) ∧ ρ(ψ2): ψ1 = ψ1 ≥ φ1, ψ2 = ψ2 ≥ φ2} ⇒x ⇒x = (∨{ρ(ψ1): ψ1 = ψ1 ≥ φ1}) ∧ (∨{ρ(ψ2): ψ2 = ψ2 ≥ φ2}) = ρ(ψ1) ∧ ρ(ψ2).

This proves property (A2) for an allocation of support. ut The following remarks are easily verified:

ρ⇒x(φ) ≤ ρ(φ) for all φ ∈ Φ, φ = φ⇒x implies ρ⇒x(φ) = ρ(φ). (28.8)

Thus we see that ρ⇒x ≤ ρ. Furthermore, if ρ⇒x = ρ, then the allocation of probability ρ is said to be supported by the domain x. The following theorem shows that allocations of probability, furnished with the operations of combination and focusing as defined so far, satisfy the axioms of domain- free information algebras.

Theorem 28.4 1. Transitivity. For ρ ∈ AΦ and x, y ∈ D,

(ρ⇒x)⇒y = ρ⇒x∧y. (28.9)

2. Combination. For ρ1, ρ2 ∈ AΦ and x ∈ D

⇒x ⇒x ⇒x ⇒x (ρ1 ⊗ ρ2) = ρ1 ⊗ ρ2 . (28.10) 240 CHAPTER 28. ALLOCATION OF PROBABILITY

3. Idempotency. For ρ ∈ AΦ and x ∈ D ρ ⊗ ρ⇒x = ρ. (28.11)

Proof. (1) By definition and the associative law for complete Boolean algebras

(ρ⇒x)⇒y(φ) = ∨{ρ⇒x(ψ): ψ = ψ⇒y ≥ φ} = ∨{∨{ρ(ψ0): ψ0 = ψ0⇒x ≥ ψ} : ψ = ψ⇒y ≥ φ} = ∨{ρ(ψ0): ψ0 = ψ0⇒x ≥ ψ = ψ⇒y ≥ φ}.

Also by definition, we have

ρ⇒x∧y(φ) = ∨{ρ(ψ): ψ = ψ⇒x∧y ≥ φ} = ∨{ρ(ψ): ψ = (ψ⇒x)⇒y ≥ φ}.

If ψ is supported by x ∧ y, then it is supported both by x and by y. Take here now ψ0 = ψ. Then it follows that ψ0 = ψ0⇒x ≥ ψ = ψ⇒y ≥ φ. This implies that

(ρ⇒x)⇒y(φ) ≥ ρ⇒x∧y(φ).

On the other hand, ψ0 = ψ0⇒x ≥ ψ = ψ⇒y ≥ φ implies that ψ0⇒y = (ψ0⇒x)⇒y ≥ ψ⇒y ≥ φ. Furthermore, ψ0⇒y ≤ ψ0 implies ρ(ψ0) ≤ ρ(ψ0⇒y). Thus we obtain

(ρ⇒x)⇒y(φ) ≤ ∨{ρ(ψ0⇒y): ψ0⇒y = (ψ0⇒y)⇒x ≥ φ}.

Now put in this formula ψ0⇒y = ψ, such that ψ⇒y = ψ. Then we obtain

(ρ⇒x)⇒y(φ) ≤ ∨{ρ(ψ): ψ = ψ⇒y = (ψ⇒y)⇒x ≥ φ} = ∨{ρ(ψ): ψ = ψ⇒y = ψ⇒x∧y ≥ φ} = ρ⇒x∧y(φ).

This proves transitivity of focusing of allocations of probability. (2) Again, by definition and the distributive and associative laws of com- plete Boolean algebras, for any φ ∈ Φ

⇒x ⇒x ⇒x ⇒x ρ1 ⊗ ρ2 (φ) = ∨{ρ1 (φ1) ∧ ρ2 (φ2): φ ≤ φ1 ⊗ φ2} ⇒x = ∨{(∨{ρ1(ψ1): ψ1 = ψ1 ≥ φ1}) ⇒x ∧ (∨{ρ2(ψ2): ψ2 = ψ2 ≥ φ2}): φ ≤ φ1 ⊗ φ2} ⇒x ⇒x = ∨{ρ1(ψ1) ∧ ρ2(ψ2): ψ1 = ψ1 ≥ φ1, ψ2 = ψ2 ≥ φ2, φ ≤ φ1 ⊗ φ2} ⇒x ⇒x = ∨{ρ1(ψ1) ∧ ρ2(ψ2): ψ1 = ψ1 , ψ2 = ψ2 , φ ≤ ψ1 ⊗ ψ2}. Also by definition,

⇒x ⇒x ρ1 ⊗ ρ2(φ) = ∨{ρ1 (φ1) ∧ ρ2(φ2): φ ≤ φ1 ⊗ φ2} 28.1. ALGEBRA OF ALLOCATION OF PROBABILITIES 241 and therefore, again by definition, and the associative and distributive laws of complete Boolean algebras ⇒x ⇒x (ρ1 ⊗ ρ2) (φ) ⇒x ⇒x = ∨{(∨{ρ1 (φ1) ∧ ρ2(φ2): ψ ≤ φ1 ⊗ φ2}): φ ≤ ψ = ψ } ⇒x ⇒x = ∨{ρ1 (φ1) ∧ ρ2(φ2): φ ≤ ψ = ψ ≤ φ1 ⊗ φ2} ⇒x ⇒x = ∨{(∨{ρ1(ψ1): φ1 ≤ ψ1 = ψ1 }) ∧ ρ2(φ2): φ ≤ ψ = ψ ≤ φ1 ⊗ φ2} ⇒x ⇒x = ∨{ρ1(ψ1) ∧ ρ2(φ2): φ1 ≤ ψ1 = ψ1 , φ ≤ ψ = ψ ≤ φ1 ⊗ φ2} ⇒x ⇒x = ∨{ρ1(ψ1) ∧ ρ2(ψ2): ψ1 = ψ1 , φ ≤ ψ = ψ ≤ ψ1 ⊗ ψ2}. ⇒x ⇒x Now, whenever ψ1 = ψ1 , ψ2 = ψ2 , φ ≤ ψ1 ⊗ ψ2, take ψ = ψ1 ⊗ ψ2. Then ⇒x ⇒x ⇒x ψ = ψ and φ ≤ ψ = ψ ≤ ψ1 ⊗ ψ2, ψ1 = ψ1 . This shows that ⇒x ⇒x ⇒x ⇒x ρ1 ⊗ ρ2 (φ) ≤ (ρ1 ⊗ ρ2) (φ). ⇒x ⇒x On the other hand, whenever ψ1 = ψ1 , φ ≤ ψ = ψ ≤ ψ1 ⊗ ψ2, then ⇒x ⇒x ⇒x ⇒x ⇒x φ ≤ ψ = ψ ≤ (ψ1⊗ψ2) = ψ1⊗ψ2 and ψ2 ≥ ψ2 . Further, ψ2 ≥ ψ2 ⇒x ⇒x implies that ρ2(ψ2) ≤ ρ2(ψ2 ). Therefore (renaming ψ2 by ψ2), we obtain ⇒x ⇒x (ρ1 ⊗ ρ2) (φ) ⇒x ⇒x ⇒x ≤ ∨{ρ1(ψ1) ∧ ρ2(ψ2): ψ1 = ψ1 , ψ2 = ψ2 , φ ≤ ψ = ψ ≤ ψ1 ⊗ ψ2} ⇒x ⇒x = ∨{ρ1(ψ1) ∧ ρ2(ψ2): ψ1 = ψ1 , ψ2 = ψ2 , φ ≤ ψ1 ⊗ ψ2}, ⇒x ⇒x since we may always take ψ = ψ1 ⊗ ψ2 such that ψ = (ψ1 ⊗ ψ2) = ψ1 ⊗ ψ2. This shows that ⇒x ⇒x ⇒x ⇒x ρ1 ⊗ ρ2 (φ) ≥ (ρ1 ⊗ ρ2) (φ). And this proves (2). (3) We have ρ⇒x ≤ ρ, hence ρ ⊗ ρ⇒x = ρ by the idempotency of combi- nation. This proves the idempotency. ut As an example consider for a φ ∈ Φ the deterministic allocation, ⇒x 0 0 0⇒x ρφ (ψ) = ∨{ρφ(ψ ): ψ = ψ ≥ ψ}. This equals >, if there is a ψ0 = ψ0⇒x > ψ such that ψ0 ≤ φ, and ⊥ otherwise. But, we have ψ0 = ψ0⇒x ≤ φ if, and only if, ψ0 = ψ0⇒x ≤ φ⇒x. ⇒x ⇒x This shows that ρφ (ψ) = ρφ⇒x (ψ) or ρφ = ρφ⇒x . The last theorem, together with Theorem 28.2 shows that (AΦ,D) is an information algebra, except that the existence of a support is not necessarily guaranteed. The existence of a support is guaranteed, if D has a top element. So, the algebraic structure of (Φ,D) carries over to AΦ. In fact, this algebra can be seen as an extension of Φ since the elements of Φ can be regarded as deterministic allocations. The mapping φ 7→ ρφ is an embedding of (Φ,D) in (AΦ,D). The allocation ζ(φ) = ⊥ for all φ is a null element in (AΦ,D), since ρ ⊗ ζ = ζ for all allocations ρ. Also clearly ζ⇒x = ζ. So the algebra satisfies the nullity axiom. 242 CHAPTER 28. ALLOCATION OF PROBABILITY

28.2 Random Variables and Allocations of Proba- bility

In Section 27.2 we have seen, that we may extend the degree of support induced by a random mapping by means of the mapping ρ to all of Φ or even its ideal completion IΦ. Now, random variables can identified with random mappings as we have seen in Section 27.3. Thus, this result applies to random variables as well. For the latter we can however obtain sharper results on this extension beyond. We start with simple random variables. For any simple random variable ∆ ∈ Rs, defined on a probability space (Ω, A,P ), we have seen that all elements of Φ and even of IΦ have measurable allocations of support s∆(φ) ∈ A and their degree of support is well defined. If we pass from the probability space (Ω, A,P ) to its associated probability algebra (B, µ) (see Section 27.1), then we can define the allocation of probability (a.o.p.) associated with the random variable ∆,

ρ∆(φ) = [s∆(φ)] for all elements φ ∈ Φ and even for all elements in IΦ. Thus, we obtain for the degree of support induced by the random variable ∆,

sp∆(φ) = P (s∆(φ)) = µ(ρ∆(φ)).

Again this holds on all of Φ and even its ideal completion IΦ. The map- ping ρ∆ :Φ → B satisfies the properties (A1) and A(2) of an allocation of probability in the previous Section 28.1. A simple random variable ∆ is defined by a partition {B1,...,Bn} and a mapping defined by ∆(ω) = φi for all ω ∈ Bi, and i = 1, . . . , n. We write ∆(ω) = ∆(Bi), if ω ∈ Bi. To the partition {B1,...,Bn} of Ω corresponds a partition {[B1],..., [Bn]} of the probability algebra B. The simple random variable ∆ can also be defined by a mapping ∆([Bi]) = φi from the partition of B into Φ. Its allocation of probability can then also be determined as

ρ∆(φ) = ∨{[Bi]: φ ≤ ∆([Bi])}.

We note that ρ∆ = ρ∆→ . As far as allocation of probability (and support) is concerned we may as well restrict ourselves to considering canonical simple random variables and their information algebra (Rs,c,D) (see Section 26.1). So, in the sequel of this section all simple random variables are assumed to be canonical. We now consider the mapping ρ : ∆ 7→ ρ∆ which maps simple random variables into a.o.p.s We show that this mapping is a homomorphism: 28.2. RANDOM VARIABLES AND ALLOCATIONS OF PROBABILITY243

Theorem 28.5 Let ∆1, ∆2, ∆ ∈ Rs,c be simple random variables, defined on partitions in a probability algebra (B, µ) with values in a domain-free information algebra (Φ,D). Then, for all φ ∈ Φ and x ∈ D,

ρ∆1⊗∆2 (φ) = (ρ∆1 ⊗ ρ∆2 )(φ) (28.12)

⇒x ρ∆⇒x (φ) = (ρ∆ )(φ). (28.13)

Proof. (1) Assume that ∆1 is defined on the partition {B1,1,...,B1,n} and ∆2 on the partition {B2,1,...,B2,m} of B. From the definition of an allo- cation of probability, and the distributive and associative laws for complete Boolean algebra, we obtain

∨{ρ∆1 (φ1) ∧ ρ∆2 (φ2): φ ≤ φ1 ⊗ φ2} = ∨{(∨{B1,i : φ1 ≤ ∆1(B1,i}))

∧ (∨{B2,j : φ2 ≤ ∆2(B2,j})) : φ ≤ φ1 ⊗ φ2}

= ∨{∨{B1,i ∧ B2,j 6= ⊥ : φ1 ≤ ∆1(B1,i), φ2 ≤ ∆2(B2,j)} : φ ≤ φ1 ⊗ φ2}

= ∨{B1,i ∧ B2,j 6= ⊥ : φ1 ≤ ∆1(B1,i), φ2 ≤ ∆2(B2,j), φ ≤ φ1 ⊗ φ2}.

But φ ≤ φ1 ⊗ φ2, φ1 ≤ ∆1(B1,i) and φ2 ≤ ∆2(B2,j) if, and only if, φ ≤ ∆1(B1,i) ⊗ ∆2(B2,j). So we conclude that

∨{ρ∆1 (φ1) ∧ ρ∆2 (φ2): φ ≤ φ1 ⊗ φ2} = ∨{B1,i ∧ B2,j 6= ⊥ : φ ≤ ∆1(B1,i) ⊗ ∆2(B2,j)}

= ∨{B1,i ∧ B2,j 6= ⊥ : φ ≤ (∆1 ⊗ ∆2)(B1,i ∧ B2,j)} (28.14)

= ρ∆1⊗∆2 (φ).

(2) Assume that ∆ is defined on the partition B1,...,Bn of B. Then ⇒x ∆ is also defined on B1,...,Bn. The associative law of complete Boolean algebra gives us then,

⇒x ⇒x ∨{ρ∆(ψ): φ ≤ ψ = ψ } = ∨ {∨{Bi : ψ ≤ ∆(Bi)} : φ ≤ ψ = ψ } ⇒x = ∨{Bi : φ ≤ ψ = ψ ≤ ∆(Bi)}.

⇒x ⇒x But, φ ≤ ψ = ψ ≤ ∆(Bi) holds if, and only if, φ ≤ (∆(Bi)) = ⇒x ∆ (Bi). Hence we see that

⇒x ⇒x ∨{ρ∆(ψ): φ ≤ ψ = ψ } = ∨{Bi : φ ≤ ∆ (Bi)} = ρ∆⇒x (φ).

ut As far as allocations of probability, which are induced by simple random variables, are concerned, this theorem shows that the combination and fo- cusing of allocations reflects correctly the corresponding operations on the underlying random variables. Let Υs be the image of Rs,c under the map- ping ρ. That is Υs is the set of all allocations of probability which are 244 CHAPTER 28. ALLOCATION OF PROBABILITY

induced by simple random variables in (B, µ). The mapping h 7→ ρh is a homomorphism, since

ρ∆1⊗∆2 = ρ∆1 ⊗ ρ∆2 , ⇒x ρ∆⇒x = ρ∆ . On the right hand side we understand the operations of combination and focusing as those defined in the algebra of a.o.p.s. Also the vacuous random variable maps to the vacuous allocation. Thus we conclude that Υs is a subalgebra of the information algebra AΦ. We remark that, if we restrict the mapping ρ to canonical random variables, then the mapping becomes an isomorphism. In any case the mapping is order-preserving. This implies that ideals Γ ∈ R, that is, generalized random variables, are mapped into ideals {ρ∆ : ∆ ∈ Γ}. Thus, we can extend the mapping from Rs to Υs to a mapping from the ideal completion R of Rs to the ideal completion Υ of Υs,

ρΓ = ∨{ρ∆ : ∆ ≤ Γ} (28.15)

Note that, for an a.o.p., if ∆ ≤ Γ, we should expect that ρ∆(φ) ≤ ρΓ(φ) in the domain of these a.o.p.s. Therefore, it makes sense to associate with ρΓ the mapping

ρΓ(φ) = ∨{ρ∆(φ) : ∆ ≤ Γ} (28.16) from IΦ into the Boolean algebra B. Does this mapping define an a.o.p.? Yes, it does indeed.

Theorem 28.6 The mapping ρΓ defined in (28.16) is an allocation of prob- ability, defined of IΦ.

Proof. (1) We have

ρΓ(e) = ∨{ρ∆(e) : ∆ ≤ Γ}.

Since ρ∆ is an a.o.p., it holds that ρ∆(e) = >. Thus it follows that ρΓ(e) = >. (2) Let φ1, φ2 ∈ IΦ. We use the definition of ρΓ(φ) and the distributive law of Boolean algebras:

ρΓ(φ1) ∧ ρΓ(φ2) 0  00  = ∨{ρ∆0 (φ1) : ∆ ≤ Γ} ∧ ∨{ρ∆00 (φ2) : ∆ ≤ Γ} 0 0 00 = ∨{ρ∆0 (φ1) ∧ ρ∆00 (φ2) : ∆ , ∆ , ∆ ≤ Γ}. But ∆0, ∆00 ≤ Γ implies that ∆0, ∆00 ≤ ∆0 ⊗ ∆00 = ∆ ≤ Γ and ∆ is still a simple random variable. Therefore we conclude that

ρ∆0 (φ1) ≤ ρ∆(φ1), ρ∆00 (φ2) ≤ ρ∆(φ2) 28.2. RANDOM VARIABLES AND ALLOCATIONS OF PROBABILITY245 such that

ρ∆0 (φ1) ∧ ρ∆00 (φ2) ≤ ρ∆(φ1) ∧ ρ∆(φ2) = ρ∆(φ1 ⊗ φ2).

This implies that

ρΓ(φ1) ∧ ρΓ(φ2) ≤ ∨{ρ∆(φ1 ⊗ φ2) : ∆s ≤ Γ} = ρΓ(φ1 ⊗ φ2).

But ρ∆(φ1) ≥ ρ∆(φ1 ⊗ φ2). Hence

ρΓ(φ1) = ∨{ρ∆(φ1) : ∆ ≤ Γ} ≥ ∨{ρ∆(φ1 ⊗ φ2) : ∆ ≤ Γ}

= ρΓ(φ1 ⊗ φ2).

Similarly, we obtain ρΓ(φ2) ≥ ρΓ(φ1 ⊗ φ2), hence ρΓ(φ1 ⊗ φ2) ≤ ρΓ(φ1) ∧ ρΓ(φ2) and therefore ρΓ(φ1 ⊗ φ2) = ρΓ(φ1) ∧ ρΓ(φ2). ut Note that the image of R under the mapping Γ 7→ ρΓ corresponds to the ideal completion I(Υs) of a.o.p.s associated with simple random variables. Further, we remark that, according to (28.16) this mapping is continuous. There is more: The mapping Γ 7→ ρΓ is a homomorphism of R into the information algebra Υ of a.o.p.s as the following theorem shows. This result justifies also the association of the mapping defined in (28.16) with the a.o.p. of the generalized random variable Γ as defined in (28.15).

Theorem 28.7 Let Γ, Γ1, Γ2 ∈ R and x ∈ D. Then

ρΓ1⊗Γ2 = ρΓ1 ⊗ ρΓ2 , ⇒x ρΓ⇒x = ρΓ , where ρΓ is defined by (28.16) and the operations of combination and fo- cusing on the right hand sides are those of the information algebra Υ of a.o.p.s.

Proof. We have to show that

ρΓ1⊗Γ2 (φ) = (ρΓ1 ⊗ ρΓ2 )(φ), ⇒x ρΓ⇒x (φ) = (ρΓ )(φ), for all φ ∈ I(Φ). (1) We noted above that the mapping Γ 7→ ρΓ is continuous. Therefore, by well-known results about continuous mappings (see Part I, Lemma 25 (2) in Section 5.2),

ρΓ1⊗Γ2 = ρ∨{∆1⊗∆2:∆1∈Γ1,∆2∈Γ2} = ∨{ρ∆1⊗∆2 : ∆1 ∈ Γ1, ∆2 ∈ Γ2}. 246 CHAPTER 28. ALLOCATION OF PROBABILITY

On the other hand, for every φ ∈ I(Φ), we obtain, using the associative and distributive laws of Boolean algebra,

(ρΓ1 ⊗ ρΓ2 )(φ)

= ∨{ρΓ1 (φ1) ∧ ρΓ2 (φ1): φ ≤ φ1 ⊗ φ2}

= ∨{(∨{ρ∆1 (φ1) : ∆1 ∈ Γ1})

∧(∨{ρ∆2 (φ2) : ∆2 ∈ Γ2}): φ ≤ φ1 ⊗ φ2}

= ∨{ρ∆1 (φ1) ∧ ρ∆2 (φ2) : ∆1 ∈ Γ1, ∆2 ∈ Γ2, φ ≤ φ1 ⊗ φ2}

= ∨{∨{ρ∆1 (φ1) ∧ ρ∆2 (φ2): φ ≤ φ1 ⊗ φ2} : ∆1 ∈ Γ1, ∆2 ∈ Γ2}

= ∨{(ρ∆1 ⊗ ρ∆2 )(φ) : ∆1 ∈ Γ1, ∆2 ∈ Γ2}

= ∨{ρ∆1⊗∆2 (φ) : ∆1 ∈ Γ1, ∆2 ∈ Γ2}.

The last equality follows from Theorem 28.5 (28.12). This proves that

ρΓ1⊗Γ2 = ρΓ1 ⊗ ρΓ2 . (2) Again by continuity, we obtain from the definition of focusing of generalized random variables

ρΓ⇒x = ρ∨{∆⇒x:∆∈Γ} = ∨{ρ∆⇒x : ∆ ∈ Γ}.

But, we have also, by Theorem 28.5 (28.13),

⇒x ⇒x ρΓ (φ) = ∨{ρΓ(ψ): φ ≤ ψ = ψ } ⇒x = ∨{∨{ρ∆(ψ) : ∆ ∈ Γ} : φ ≤ ψ = ψ } ⇒x = ∨{∨{ρ∆(ψ): φ ≤ ψ = ψ } : ∆ ∈ Γ} ⇒x = ∨{(ρ∆ )(φ) : ∆ ∈ Γ} = ∨{ρ∆⇒x (φ) : ∆ ∈ Γ}. (28.17)

⇒x This proves that ρΓ⇒x = ρh . ut Now we claim that the a.o.p. ρΓ as defined by (28.16) coincides for generalized random variables with the a.o.p. ρ0 ◦ sΓ defined in Section 27.1 for random mappings.

Theorem 28.8 For all φ ∈ Φ and for all generalized random variables Γ ∈ R, it holds that

ρΓ(φ) = ρ0(sΓ(φ)).

Further, for all φ such that sΓ(φ) is measurable,

ρΓ(φ) = [sΓ(φ)]. 28.3. LABELED ALLOCATIONS OF PROBABILITY 247

Proof. Fix an element φ ∈ Φ and consider a measurable subset A ⊆ sΓ(φ). We define s simple random variable  φ if ω ∈ A, ∆(ω) = e otherwise.

Then certainly ∆(ω) ≤ Γ(ω) for all ω ∈ Ω, hence ∆ ≤ Γ. Furthermore we have ρ∆(φ) = [A]. This implies that

∨{ρ∆(φ) : ∆ ≤ Γ} ≥ ∨{[A]: A ⊆ sΓ(φ),A ∈ A} = ρ0(sΓ(φ)).

Conversely, for all ∆ ≤ Γ it holds that s∆(φ) ⊆ sΓ(φ) and that s∆(φ) ∈ A. Therefore, we conclude that

∨{ρ∆(φ) : ∆ ≤ Γ} ≤ ∨{[A]: A ⊆ sΓ(φ),A ∈ A} = ρ0(sΓ(φ)).

This proves that ρΓ(φ) = ρ0(sΓ(φ)). If, finally, sΓ(φ) ∈ A, then ρΓ(φ) = [sΓ(φ)]. This proves the second part of the theorem. ut So, Theorem 28.8 shows that ρΓ is an extension of the a.o.p. defined on the measurable elements of Φ. The identity of two extensions ρΓ and ρ0 ◦ sΓ on Φ is a further justification for accepting ρΓ as the a.o.p. associated with the generalized random variable Γ. In Section 29.3 we further discuss the problem of extending a.o.p.s.

28.3 Labeled Allocations of Probability

. To any domain-free information algebra (Φ,D) a corresponding labeled algebra can be associated by considering labeled pieces of information (φ, x), where x is a support of φ (see Part I, Section 3.6). This can also be done with respect to the information algebra (AΦ,D) of allocations. Any allocation ρ can be labeled by a support x of it, provided it has a support. Alternatively, the restriction ρx of ρ to the domain x can be considered. It will be shown in this section that this latter approach leads in a natural way to an algebra isomorphic to the labeled information algebra associated with (AΦ,D). To start with we consider a labeled information algebra (Φ,D). As usual, Φx denotes the set of valuations with domain x. Consider now an allocation ρ :Φx → B, such that

(A1) ρ(ex) = >,

(A2) for φ, ψ ∈ Φx, ρ(φ ⊗ ψ) = ρ(φ) ∧ ρ(ψ). Here (B, µ) is as usual a probability algebra. Such a mapping will be called a labeled allocation with domain x. We define as its label d(ρ) its domain x. We note that, as before, for φ, ψ ∈ Φx, φ ≤ ψ implies ρ(φ) ≥ ρ(ψ). 248 CHAPTER 28. ALLOCATION OF PROBABILITY

Like with the former domain-free allocations we define some operations with labeled allocations. First, the labeled allocation νx defined by

νx(φ) = ⊥ for all φ ∈ Φx, φ 6= ex,

νx(ex) = > (28.18) is called the vacuous allocation on domain x. Next, if ρx and ρy are two labeled allocations on domains x and y respectively, then we define their combination ρ in analogy to (28.2) for φ ∈ Φx∨y

ρ(φ) = ∨{ρx(φx) ∧ ρy(φy): d(φx) = x, d(φy) = y, φ ≤ φx ⊗ φy}(28.19). Note that here, as well as in the sequel, the partial order of labeled infor- mation within a domain is used. We denote this mapping also as a product, ρ = ρx ⊗ ρy and write ρ(φ) = (ρx ⊗ ρy)(φ). Furthermore, if ρ is a labeled allocation with domain x and y a domain with y ≤ x, then a mapping ρ↓ is defined by

↓y ↑x ρ (φ) = ρ(φ ) for all φ ∈ Φy. (28.20)

↑x Here φ = φ ⊗ ex is the vacuous extension of φ to domain x. The element ρ↓y is called the projection of ρ to y. ↓y The following theorem confirms that ρx ⊗ ρy and ρ are indeed again labeled allocations.

↓y Theorem 28.9 ρx ⊗ ρy is a labeled allocation with domain x ∨ y and ρ is a labeled allocation with domain y.

Proof. The first statement is proved exactly as Theorem 28.1 noting that d(φx ⊗ φy) = x ∨ y. ↓y ↑x As for the second statement, note first that ρ (ey) = ρ(ey ) = ρ(ex) = >. This proves property (A1) of a labeled allocation. Furthermore, for φ, ψ ∈ Φy, we obtain ρ↓y(φ ⊗ ψ) = ρ((φ ⊗ ψ)↑x) = ρ(φ↑x ⊗ ψ↑x) = ρ(φ↑x) ∧ ρ(ψ↑x) = ρ↓y(φ) ∧ ρ↓y(ψ).

This proves property (A2) for labeled allocations. ut As an inverse to the operation of projection, which restricts a labeled allocation to a smaller domain, we may also define the operation of extension of a labeled allocation to a finer domain. Formally, if ρ is a labeled allocation with domain x and y a domain such that x ≤ y, then we define

↑y ρ = ρ ⊗ νy.

This gives then, according to (28.19) for a φ ∈ Φy

↑y ↑y ρ (φ) = ∨{ρ(ψ): ψ ∈ Φx, φ ≤ ψ }. 28.3. LABELED ALLOCATIONS OF PROBABILITY 249

This makes sense: the support allocated to a φ in Φy by a labeled allocation with domain x ≤ y is the supremum of all supports allocated to pieces of information ψ in domain x which imply φ. By Theorem 28.9 above, ρ↑y is a labeled allocation with domain y; it is called the vacuous extension of ρ to y. The next theorems show that the labeled allocations form themselves a labeled information algebra.

Theorem 28.10 Let (Φ,D) be a labeled information algebra. Then the following hold

1. Semigroup: Combination of labeled allocations of probability is asso- ciative and commutative.

2. Labeling: For two labeled allocations ρ1, ρ2 on (Φ,D) we have d(ρ1 ⊗ ρ2) = d(ρ1) ∨ d(ρ2), and ∀ρ ∈ AΦ and ∀x ∈ D, x ≤ d(ρ), we have that d(ρ↓x) = x.

3. Neutrality: For all x ∈ D there exists an element νx such that ρ⊗νx = ↓y ρ if d(ρ) = x, and ∀y ∈ D. y ≤ x we have νx = νy.

4. Nullity: For all x ∈ D there exists an element χx such that ρ⊗χx = χx if d(ρ) = x, and and ∀y ∈ D. y ≥ x we have χx ⊗ νy = χy. 5. Projection: For a labeled allocation ρ and domains x, y ∈ D such that x ≤ y ≤ d(ρ) we have (ρ↓y)↓x = ρ↓x.

6. Combination: For two labeled allocations ρx, ρy with d(ρx) = x, d(ρy) = ↓x ↓x∧y y we have (ρx ⊗ ρy) = ρx ⊗ ρy . 7. Idempotency: For a labeled allocation ρ and x ∈ D with x ≤ d(ρ) we have ρ ⊗ ρ↓x = ρ.

Proof. (1) Commutativity of combination is evident. Associativity is proved as in Theorem 28.2. (2) follows from the definition of combination and projection. (3) If νx is the vacuous allocation on domain x and ρ a labeled allocation with domain x, then for any φ ∈ Φx

ρ ⊗ νx(φ) = ∨{ρ(ψ): d(ψ) = x, φ ≤ ψ} = ρ(φ). since νx(ψ) = ⊥ unless ψ = ex, in which case νx(ex) = >. Further, for ↓y ↑x ↑x φ ∈ Φy, we have νx (φ) = νx(φ ). Since φ = ex if, and only if, φ = ey, ↓y this shows that νx = νy. (4) Define χx(φ) = > for all φ ∈ Φx. Then ρ ⊗ χx(φ) = > for all φ ∈ Φx. This proves that ρ ⊗ χx = χx. Similarly, we have χx ⊗ νy(φ) = > for all φ ∈ Φy, hence χx ⊗ νy = χy. 250 CHAPTER 28. ALLOCATION OF PROBABILITY

(5) For a φ ∈ Φx we have, by definition, if d(ρ) = z, (ρ↓y)↓x(φ) = ρ↓y(φ↑y) = ρ((φ↑y)↑z) = ρ(φ↑z) = ρ↓x(φ).

(6) For a φ ∈ Φx we have

↓x ↑x∨y (ρx ⊗ ρy) (φ) = (ρx ⊗ ρy)(φ ) ↑x∨y = ∨{ρx(φx) ∧ ρy(φy): d(φx) = x, d(φy) = y, φ ≤ φx ⊗ φy}. On the other hand,

↓x∧y (ρx ⊗ ρy )(φ) ↓x∧y = ∨{ρx(φx) ∧ ρy (φx∧y): d(φx) = x, d(φx∧y) = x ∧ y, φ ≤ φx ⊗ φx∧y} ↑y = ∨{ρx(φx) ∧ ρy(φx∧y): d(φx) = x, d(φx∧y) = x ∧ y, φ ≤ φx ⊗ φx∧y}.

↑x∨y ↓x ↓x∧y Now, for φ ∈ Φx, φ ≤ φx ⊗ φy implies φ ≤ (φx ⊗ φy) = φx ⊗ φy ↓x∧y ↑y ↓x∧y ↑y and, for φ ∈ Φy,(φ ) ≤ φ, hence ρy((φ ) ) ≥ ρy(φ). It follows ↓x ↓x∧y that (ρx ⊗ ρy) (φ) ≤ (ρx ⊗ ρy )(φ). But φ ≤ φx ⊗ φx∧y implies that ↑x∨y ↑y φ = φ ⊗ ex∨y ≤ φx ⊗ φx∧y ⊗ ex∨y = φx ⊗ φx∧y. This shows then ↓x ↓x∧y ↓x that (ρx ⊗ ρy) (φ) ≥ (ρx ⊗ ρy )(φ) and thus finally (ρx ⊗ ρy) (φ) = ↓x∧y (ρx ⊗ ρy )(φ). (7) Let d(ρ) = y and φ ∈ Φy. Then we have

↓x ↓x ρ ⊗ ρ (φ) = ∨{ρ(φy) ∧ ρ (φx): d(φy) = y, d(φx) = x, φ ≤ φy ⊗ φx} ↑y = ∨{ρ(φy) ∧ ρ(φx ): d(φy) = y, d(φx) = x, φ ≤ φy ⊗ φx} ↑y = ∨{ρ(φy ⊗ φx ): d(φy) = y, d(φx) = x, φ ≤ φy ⊗ φx} ↑y ↑y = ∨{ρ(φy ⊗ φx ): d(φy) = y, d(φx) = x, φ ≤ φy ⊗ φx } ≤ ρ(φ).

But we have also φ = φ ⊗ φ↓x = φ ⊗ (φ↓x)↑y and therefore ρ(φ) ≤ ρ ⊗ ρ↓x(φ) which proves that ρ(φ) = ρ ⊗ ρ↓x(φ). ut It has been noted in Part I that to any domain-free information algebra (Φ,D) a labeled information algebra (Φ∗,D) with

Φ∗ = {(φ, x): φ ∈ Φ, x ∈ D support of φ}. can be associated, such that d(φ0) = x if φ0 = (φ, x) and combination and projection are defined by

(φ1, x) ⊗ (φ2, y) = (φ1 ⊗ φ2, x ∨ y), (φ, x)↓y = (φ⇒y, y) for y ≤ x. (28.21)

The same construction applies in particular to the information algebra (AΦ,D) of allocations. Let thus

∗ AΦ = {(ρ, x): ρ ∈ AΦ, x ∈ D support of ρ}. 28.3. LABELED ALLOCATIONS OF PROBABILITY 251

∗ Then combination and projection can be defined as in (28.21) and (AΦ,D) becomes in this way a labeled information algebra. Only allocations ρ which ∗ have a support are in AΦ. Note however that the elements (ρ, x) are not labeled allocations in the 0 sense defined before. But a labeled allocation ρx defined by ρx(φ ) = ρ(φ) 0 ∗ if φ = (φ, x) ∈ Φx can be associated with it. This defines in fact an isomorphism between the two labeled information algebras. That is what the following theorem says.

(1) (2) ∗ Theorem 28.11 1. If (ρ , x) and (ρ , y) belong to (AΦ,D), then

(1) (2) (1) (2) ρx ⊗ ρy = (ρ ⊗ ρ )x∨y. (28.22)

2. If ν is the vacuous allocation in (AΦ,D), then νx is the vacuous allo- cation on Φx for all x ∈ D.

∗ 3. If (ρ, x) belongs to (AΦ,D), and y ≤ x, y ∈ D, then

↓y ⇒y ρx = (ρ )y. (28.23)

0 ∗ Proof. (1) Let φ = (φ, x ∨ y) belong to Φx∨y and consider

(1) (2) 0 ρx ⊗ ρy (φ ) (1) 0 (2) 0 0 0 0 0 0 = ∨{ρx (φx) ∧ ρy (φy): φx = (φx, x), φy = (φy, y), φ ≤ φx ⊗ φy} (1) (2) ⇒x ⇒y = ∨{ρ (φx) ∧ ρ (φy): φx = φx , φy = φy , φ ≤ φx ⊗ φy}.

By assumption, x is a support of ρ(1) that is

⇒x ⇒x ρ1(φx) = ρ1 (φx) = ∨{ρ1(ψx): φx ≤ ψx = ψx } and similarly for ρ(2) which has y as a support. This implies then,

(1) (2) 0 (1) (2) (ρ ⊗ ρ )x∨y(φ ) = ρ ⊗ ρ (φ) (1) (2) = ∨{ρ (φx) ∧ ρ (φy): φ ≤ φx ⊗ φy}  (1) ⇒x  = ∨{ ∨{ρ (ψx): φx ≤ ψx = ψx }

 (2) ⇒y  ∧ ∨{ρ (ψy): φy ≤ ψy = ψy } : φ ≤ φx ⊗ φy} (1) (2) ⇒x ⇒y = ∨{ρ (ψx) ∧ ρ (ψy): φ ≤ ψx ⊗ ψy, ψx = ψx , ψy = ψy } (1) (2) 0 = ρx ⊗ ρy (φ ).

0 0 (2) We have, for φ = (φ, x), that νx(φ ) = ν(φ) = ⊥, if φ 6= e. And we have νx(e, x) = ν(e) = >. 252 CHAPTER 28. ALLOCATION OF PROBABILITY

⇒x 0 ↓y (3) Let ρ = ρ , y ≤ x and φ = (φ, y). We have to prove that ρx (φ) = ρ⇒y(φ). Then

(ρ⇒y)(φ) = ρ(φ)

⇒y 0 0 since φ = φ . On the other hand, for ψ = (ψ, x), we have ρx(ψ ) = ρ(ψ). We obtain therefore

↓y 0 0↑x ⇒y ρx (φ ) = ρx(φ ) = ρ(φ) = ρ (φ) (28.24)

⇒x ⇒y ↓y since y ≤ x implies that φ = φ . Thus we have in fact (ρ )y = ρx . ut This theorem shows that we have two essentially equivalent ways to construct labeled information algebras of allocations from algebras of allo- cations. Chapter 29

Support Functions

29.1 Characterization

As we have noted before, we may consider a random mapping Γ as an information available to us: Γ(ω) is a certain “information”, which can be asserted, provided ω is the sample element chosen by a chance process, or the “correct”, but unknown assumption in a set of possible assumptions Ω. Here, information Γ(ω) is to be interpreted in the sense of random mappings as an ideal in a domain-free information algebra (Φ,D), as explained in Section 27.1. We have then in Section 27.2 (27.7) defined the allocation of support sΓ(φ) of a random mapping to an element φ ∈ Φ as the set of elements ω ∈ Ω, which imply φ, i.e. such that φ belongs to the ideal Γ(ω), φ ∈ Γ(ω) or φ ≤ Γ(ω). Then any ω ∈ sΓ is an assumption, i.e. an argument, which permits to infer the piece of information φ in the light of the random mapping Γ. So the larger the set sΓ(φ), the more arguments are present to support φ. Or, more to the point, the more probable, the more likely it is that the correct assumption ω belongs to sΓ(φ), the stronger the hypothesis of information φ is supported. We refer back to Chapter 25 for a discussion of this point of view. The allocation of support can be seen as a mapping sΓ :Φ → P(ω) from the information algebra, or its ideal completion sΓ : IΦ → P(ω) into the sample space. This mapping has some interesting properties as stated in the next theorem. Before we formulate the theorem, let’s outline some special cases of in- formation algebras and also of random mappings: The algebra (Φ,D) may by a σ-algebra and the the ideals Γ(ω) may be σ-ideals, i.e. closed under countable combination, for all ω ∈ Ω. Then Γ is called a σ-random map- ping. Further Φ may be a complete semi-lattice. If, in this case, Γ(ω) is a principal ideal for each ω ∈ Ω, then Γ is called a complete random mapping. For a complete random mapping we have that ∨Γ(ω) ∈ Φ. Note that the ideal completion IΦ of a complete algebra Φ is still a proper superset of Φ, not all ideals in Φ are proper ideals.

253 254 CHAPTER 29. SUPPORT FUNCTIONS

We do not exclude that Γ(ω) = z for some ω. This represents improper information, which can be interpreted as contradictory information. Under semantic aspects such improper information could and should be excluded. We refer to Chapter 25 for a discussion in the context of hints. If Γ(ω) 6= z for all ω, the random mapping is called normalized.

Theorem 29.1 Let Γ ∈ R be a generalized random variable with values in an information algebra (Φ,D), then

1. sΓ(e) = Ω,

2. sΓ(z) = ∅, if Γ is normalized,

3. If ∨i∈I φi exists in Φ, then \ sΓ(∨i∈I φi) ⊆ sΓ(φi), i∈I

4. If I is finite, or Φ is a σ-algebra, I is countable and Γ a σ-random mapping, or if Φ is a complete semi-lattice and Γ a complete random mapping, then \ sΓ(∨i∈I φi) = sΓ(φi), (29.1) i∈I

Proof. (1) The first identity follows since e is the least element in Φ (or IΦ). (2) follows from the definition of a normalized variable. (3) Suppose ∨i∈I φi exists in Φ, then φi ≤ ∨i∈I φi ∈ Γ(ω) implies φi ∈ Γ(ω), because the latter is an ideal in Φ. It follows therefore that ω ∈ T spΓ(∨i∈I φi) implies ω ∈ sΓ(φi) for all i ∈ I, hence ω ∈ i∈I sΓ(φi). (4) Consider ω ∈ sΓ(φi) for all i ∈ I. This means that φi ∈ Γ(ω). If I is finite, then ∨i∈I φi ∈ Γ(ω) too, since Γ(ω) is an ideal, hence ω ∈ sΓ(∨i∈I φi). Together with (3) just proved this shows that (29.1) holds. Similarly, if Φ is a σ-algebra, I is countable, then ∨i∈I φi exists and if Γ is a σ-random mapping, then ∨i∈I φi ∈ Γ(ω) too. Finally, if Φ is complete and Γ a complete random mapping, then again ∨i∈I φi exists and ∨i∈I φi ∈ Γ(ω). Thus (29.1) holds in these cases too. ut The degree of support of an piece of information φ ∈ Φ can thus be measured by the the probability P (sΓ(φ)) of the support allocated to φ, provided this probability is defined. An element φ ∈ Φ is called measurable relative to Γ (or Γ-measurable or also simply measurable if the context is clear), if sΓ(φ) ∈ A. Let’s formally define the degree of support

spΓ(φ) = P (sΓ(φ)), 29.1. CHARACTERIZATION 255 for all measurable φ ∈ Φ. Let EΓ be the subset of Γ-measurable elements of (Φ). As the following lemma shows, the set of Γ-measurable elements is closed under combination, or even countable combination, if Φ is a σ-algebra and Γ a σ-random mapping.

Lemma 29.1 For any random mapping Γ, e ∈ E and z ∈ E, if Γ is normalized, and if φ, ψ ∈ EΓ, then φ ⊗ ψ ∈ EΓ. If Φ is a σ-algebra and Γ a ∞ σ-random mapping, then φ1, φ2,... ∈ EΓ implies ∨i=1φi ∈ EΓ. Proof. By Theorem 29.1 (4)

sΓ(φ ⊗ ψ) = sΓ(φ ∨ ψ) = sΓ(φ) ∩ sΓ(ψ).

Thus, sΓ(φ), sΓ(ψ) ∈ A implies sΓ(φ ⊗ ψ) ∈ A. Similarly, if Φ is a σ-algebra and Γ a σ-random mapping, then by Theorem 29.1 (4)

∞ ∞ \ spΓ(∨i=1φi) = spΓ(φi). i=1 ∞ Hence spΓ(φi) ∈ A implies spΓ(∨i=1φi) ∈ A. ut As a corollary, it follows from Theorem 29.1 that EΓ is σ-closed for proper random variables. The degree of support can be seen as a mapping spΓ : EΓ → [0, 1]. This mapping has the following fundamental properties:

Theorem 29.2 Let Γ ∈ R. Then the mapping spΓ has the following prop- erties:

1. The neutral element e belongs to EΓ and spΓ(e) = 1. If Γ is normalized, z belongs to EΓ and spΓ(z) = 0.

2. Assume φ1, . . . , φn ≥ φ, n ≥ 1 and φ1, . . . , φn, φ ∈ EΓ. Then

X |I|+1 spΓ(φ) ≥ (−1) spΓ(⊗i∈I φi). (29.2) ∅6=I⊆{1,...,n}

3. If Φ is a σ-algebra and Γ a σ-random mapping, then, if φ1 ≤ φ1 ≤ · · · ∈ EΓ is a monotone sequence,

∞ spΓ(∨i=1φi) = lim spΓ(φi). (29.3) i→∞

Proof. (1) follows directly from Theorem 29.1 (1) and (2). (2) On the right hand side of (29.2) we have by the inclusion-exclusion formula of probability theory,

X |I|+1 m (−1) P (∩i∈I sΓ(φi)) = P (∪i=1sΓ(φi)). ∅6=I⊆{1,...,m} 256 CHAPTER 29. SUPPORT FUNCTIONS

But φ ≤ φ1, . . . , φm implies sΓ(φ) ⊇ sΓ(φi), hence m sΓ(φ) ⊇ ∪i=1sΓ(φi) This proves (29.2) (3) By Theorem 29.1 (4) we have

∞ ∞ sΓ(∨i=1φi) = ∩i=1sΓ(φi).

Further we have that sΓ(φ1) ⊇ sΓ(φ2) ⊇ · · ·. Then, by the continuity of probability, we have

∞ ∞ P (sΓ(∨i=1φi)) = P (∩i=1sΓ(φi)) = lim P (sΓ(φi)). i→∞ This proves (29.4). ut According to Lemma 29.1 the measurable elements EΓ form a sub-semi- lattice of Φ. A function on such a semi-lattice, satisfying (2) of Theorem 29.2 above is called monotone of infinite order. If it satisfies in addition property (3), then it is called continuous. We call a function sp defined on a semi- lattice E satisfying properties (1) and (2) of Theorem 29.2 a support function on E. If, in addition, sp(z) = 0, then the support function is called normal- ized. Support functions were studied by (Shafer, 1976; ?) under the name of belief functions on subset families closed under intersection. The pre- sentation here is a generalization of the theory of hints (Kohlas & Monney, 1995), as introduced in Chapter 25. Hints were motivated by Dempster’s multivalued mappings (Dempster, 1967), but with the interpretation given here as uncertain information, allowing different, uncertain interpretations. As the discussion above and the following show, information algebras seem to provide the natural framework to develop the theory of hints. We are next going to examine the support functions of random variables. It turns out, and this is the main justification to consider the elements of σ(Rs) as proper random variables, that all elements of Φ are measurable.

Theorem 29.3 Let Γ ∈ σ(Rs) be a random variable with values in the compact information algebra (IΦ, Φ,D). Then Φ ⊆ EΓ, where EΓ is the set of Γ-measurable elements of IΦ.

∞ Proof. Let φ ∈ Φ and Γ = ∨i=1Γi, where Γi are simple random variables. ∞ i Note that Γ = ∨i=1∆i, where ∆i = ∨j=1Γi. All ∆i are simple random variables and for all ω the sequence ∆i(ω) is monotone. Then ∞ sΓ(φ) = {ω ∈ Ω: φ ≤ ∨i=1∆i(ω)}.

Fix a ω ∈ sΓ(φ). The subset {∆i(ω), i = 1, 2,...} of Φ is directed. Hence, by compactness of the information algebra (IΦ,I(Φ),D), there is a ∆j(ω) such that φ ≤ ∆j(ω), hence ω ∈ s∆j (φ). This shows that ∞ sΓ(φ) ⊆ ∪i=1s∆i (φ). 29.2. GENERATING SUPPORT FUNCTIONS 257

∞ The converse inclusion is evident. Hence sΓ(φ) = ∪i=1s∆i (φ). But s∆i (φ) ∈ A for all i, since ∆i are simple random variables. Therefore, we conclude that sΓ(φ) ∈ A ut In fact, we may, as a corollary, extend this result to the σ-closure of Φ.

Corollary 29.1 Let Γ ∈ σ(Rs) be a random variable with values in the compact information algebra (IΦ, Φ,D). Then σ(Φ) ⊆ EΓ, where EΓ is the set of Γ-measurable elements of IΦ.

∞ Proof. Assume φ ∈ σ(Φ). Then, by Theorem 27.2 φ = ∨i=1φi, with ψi ∈ Φ. It follows that

∞ ∞ ∞ \ \ sΓ(φ) = {ω : ∨i=1φi ≤ Γ(ω)} = {ω : ψi ≤ Γ(ω)} = sΓ(ψi) ∈ A, i=1 i=1 since, by Theorem 29.3, sΓ(ψi) ∈ A. ut So in these cases the support function spΓ is defined on the whole of E. In the general case, extensions of the support function from E to Φ are discussed in Section 29.3.

29.2 Generating Support Functions

In the previous section, we have seen that random mappings induce support functions on some sub-semi-lattices of an information algebra. The question arises whether any support sp function on some semi-lattice E is generated by some random mapping. The answer is affirmative (Shafer, 1979; ?; Kohlas & Monney, 1995). It follows from the Krein-Milman theorem. The Krein- Milman theorem states that a compact convex subset S of a locally convex space is the closed convex hull of its extreme points. We are first going to reformulate the Krein-Milman theorem in a form adapted to our needs, i.e. in the form of an integral representation theorem. Let V be a locally convex space and S a nonempty compact subset of V , and P a probability measure on S, i.e. a nonnegative measure on the Borel sets of V with P (S) = 1. Then, a point v ∈ V is said to be represented by P if, for every continuous linear function f : V → R, Z f(v) = f(x)dP (x). S Sometimes it is also said, that in this case v is the barycenter or the resultant of P . We cite the following useful lemma:

Lemma 29.2 Suppose that C is a compact subset of a locally convex space V . A point v ∈ V is in the closed convex hull H of C if, and only if, there exists a probability measure P on C which represents v. 258 CHAPTER 29. SUPPORT FUNCTIONS

For the proof, which depends on the Riesz theorem, we refer to (?). If P is a probability measure on a compact Hausdorf space V and B is a Borel subset of V , then P is said to be supported by B if P (V \ B) = 0. Now, with the help of this lemma we reformulate the Krein-Milman theorem as follows:

Theorem 29.4 Every point of a compact convex subset S of a locally con- vex space V is the barycenter of a probability measure on S which is supported by the closure of the extreme points of S.

Proof. Suppose v ∈ S, and let T be the closure of the extreme points of S; then, by the Krein-Milman theorem, v is in the closed convex hull H of T . By Lemma 29.2 above, then, v is the barycenter of a probability measure P on T . If we extend this measure P in the obvious way to S, we get the desired result. ut We remark the Theorem 29.4 conversely implies the Krein-Milman the- orem too. But we only need the one direction. Thus, this version of the Krein-Milman theorem is used to prove the following theorem which states that any support function in an information algebra is generated by some random mapping.

Theorem 29.5 Let (Φ,D) be a domain-free information algebra and E ⊆ Φ be a sub-semi-lattice of Φ. If s is a support function on E, satisfying (1) and (2) of Theorem 29.2, then there exists a random mapping Γ in Φ, such that sΓ = s. If Φ is a σ-algebra, E ⊂ Φ is closed under countable combination, and s a support function on E which satisfies also (3) of Theorem 29.2, then there exists a σ-random mapping Γ in Φ, such that sΓ = s.

Proof. We start be defining the vector space V of functionals f : E → R where addition and scalar mutliplication is defined by (f + g)(φ) = f(φ) + g(φ), and (λf)(φ) = λf(φ) respectively. Define, pφ(f) = |f(φ)|, for any φ ∈ E. Clearly pφ satisfies the following conditions:

1. pφ is positive semidefinite, i.e. pφ(f) ≥ 0 for any f ∈ V ,

2. pφ is positive homogeneous, i.e. pφ(λf) = |λ|pφ(f),

3. pφ satisfies the triangle inequality, i.e. pφ(f + g) ≤ pφ(f) + pφ(g).

This means that pφ(f) is a family of semi norms for φ ∈ E. This implies that V is a locally convex topological vector space, where convergence of sequences fi is defined pointwise, limi→∞ fi = f means that limi→∞ fi(φ) = f(φ) in R for every φ ∈ E. The set S of support functions s on E, satisfying (1) and (2) of Theorem 29.2, is convex in V . Further S is closed, since the limit of a sequence of support functions is a support function. Finally, the sets 29.2. GENERATING SUPPORT FUNCTIONS 259

S[φ] = {f(φ): f ∈ S} are bounded, hence their closures S[φ]− are compact for all φ ∈ E. Since

S ⊆ Π{S[φ]− : φ ∈ E} and the product of compact sets S[φ]− is compact in V by Tychonov’s theorem, it follows that S is compact. Therefore, by Theorem 29.4 any support function s ∈ S is the barycenter of a probability measure Ps which is supported by the closure of the extreme points of S. What are the extreme point of S? By a deep theorem of Choquet (Choquet, 1953–1954) concerning functions monotone of order ∞ on ordered semi-groups, the extreme points of S are exactly the exponentials on E, i.e. functions e : E → R such that e(φ ⊗ ψ) = e(φ) · e(ψ) for all φ, ψ ∈ E. In the case of a semi-lattice E it follows from φ = φ ⊗ φ that e(φ) = e(φ) ⊗ e(φ), hence e(φ) = 0 or e(φ) = 1. Let then I = {φ ∈ E : e(φ) = 1}. Then ψ ≤ φ ∈ I implies 1 = e(φ) = e(ψ) · e(φ), hence e(ψ) = 1 or ψ ∈ I too. Further, let φ, ψ ∈ I, then e(φ ⊗ ψ) = e(φ) · e(ψ) = 1, therefore φ ⊗ ψ ∈ I. In ther words, I is an ideal in E. Conversely, assume I ⊆ E to be an ideal in E and define e(φ) = 1 for φ ∈ I, e(φ) = 0, otherwise. Assume φ, ψ ∈ I, such that φ ⊗ ψ ∈ I too. Then e(φ ⊗ ψ) = 1 = e(φ) · e(ψ). Otherwise, if φ 6∈ I, then φ ⊗ ψ 6∈ I too. Hence, again, e(φ ⊗ ψ) = 0 = e(φ) · e(ψ). So the function e is an exponential, hence an extreme point of S. This shows that the set of extreme points of S is in one-to-one correspondence with the ideals of E. In other words, the indicator functions iI of the ideals in E are the extreme points of S. Further the pointwise limit e ∈ S of a sequence of {0, 1}-support function is still an extreme point. Hence the set of extreme points of S is closed. Consider then a support function s ∈ S and the probability measure Ps supported by the set X of extreme points of S, which represents s by the Krein-Milman theorem. Then s 7→ s(φ) is a linear, continuous function V → R. Thus, it follows that Z s(φ) = e(φ)dPs(e) = Ps{e ∈ X : e(φ) = 1} = Ps{iI : I ideal of E, φ ∈ I}. X for all φ ∈ E. Let A be the algebra of Borel sets in X. Further, note that an ideal I in E can be extended to an ideal in Φ, namely I0 = {φ ∈ Φ: ∃ψ ∈ I, φ ≤ ψ}. For any extreme point e of S let Ie be the ideal in Φ associated to the ideal in E corresponding to e. Define then Γ : X → IΦ by Γ(e) = Ie. Then Γ is a random mapping in Φ, defined on the probability space (X, A,Ps), and its support function sΓ equals s. This is the desired result corresponding to the first part of the theorem. The second part follows in a similar way: The set s of support func- tions on E satisfying (1) to (3) of Theorem 29.2 is still a convex subset of V . Its extreme points are still exponentials on E. They are in one-to-one 260 CHAPTER 29. SUPPORT FUNCTIONS correspondance with the σ-ideals I of E, which can be extended to σ-ideals of Φ. The elements ss of S are represented by a probability measure Ps on the set X of extreme points. So, Γ(e) = Ie, where Ie is the σ-ideal of Φ corresponding to the extreme point e of S, defines a σ-random mapping, such that spΓ = s. This proves the second part of the theorem. ut For continuous support functions there is an alternative theory by Nor- berg for characterizing support functions based solely on lattice theory (Nor- berg, 1989). We shall apply this theory first to domain-free compact infor- mation algebras (Φ, Φf ,D). We have seen in Part I of these Lecture Notes (Section 4.7) that Φ is a continuous lattice. Further, the relation φ  ψ holds for φ, ψ ∈ Φ if, and only if, φ ∈ Φf and φ ≤ ψ. Norberg (Norberg, 1989) introduces a σ-field of subsets of Φ, on which a probability measure can be defined such that the identity mapping on Φ becomes a random mapping generating a given continuous support function on Φ. As a preparation let’s introduce Scott topology into Φ. A subset U of Φ is called Scott-open, if 1. U is upward closed, i.e. if φ ∈ U and φ ≤ ψ, then ψ ∈ U.

2. ∀φ ∈ U there exists a ψ ∈ Φf ∩ U such that ψ ≤ φ. The second condition says equivalently that for all φ ∈ U, there is a ψ ∈ U such that ψ  φ. The family of Scott-open τ sets are the open sets of a topology, since, as can be easily verified, 1. Φ, ∅ ∈ τ,

2. if U, V ∈ τ, then U ∩ V ∈ τ, S 3. if Ui ∈ τ for i in some set I, then i∈I Ui ∈ τ. The topology τ is called the Scott topology in the compact information algebra (Φ, Φf ,D). The sets

↑ ψ = {φ ∈ Φ: ψ ≤ φ} for ψ ∈ Φf form a base for the topology τ. This means in this particular case that S 1. If U ∈ τ, then U = {↑ ψ : ψ ∈ Φf ∩ U}, S 2. Φ = {↑ ψ : ψ ∈ Φf }, 3. ↑ φ ∩ ↑ ψ =↑ (φ ⊗ ψ). Finally, a set F ⊆ Φ is a filter, if it is upward closed and if, for any φ, ψ ∈ F there is a η ∈ F such that η ≤ φ, ψ. Thus, a Scott-open filter is a filter which is Scott-open. This is in our case of a compact information algebra equivalent to 29.2. GENERATING SUPPORT FUNCTIONS 261

1. F is upwards closed,

2. ∀φ, ψ ∈ F there exists a η ∈ Φf ∩ F such that η ≤ φ, ψ. Denote by F the family of Scott-open filters in Φ and let σ(F) be the minimal σ-field in Φ generated by F. From now on we assume that Φf is countable. This means that τ has a countable base. A topology with a countable base is called second countable. Since ↑ ψ ∈ F, hence belongs to σ(F) and any U ∈ τ is the countable union of sets ↑ ψ for ψ ∈ Φf , it follows that U ∈ σ(F). Thus, σ(F) is the Borel algebra with respect to the Scott topology. Now, the following theorem is an immediate consequence of Theorem 3.3 in (Norberg, 1989) (see alternatively Theorem 3.15 in (Molchanov, 2005)).

Theorem 29.6 Let (Φ, Φf ,D) be a domain-free compact information alge- bra with a countable set Φf of finite elements. Let further s :Φ → [0, 1] be a continuous support function on Φ. Then, there exists a probability measure Ps on the algebra σ(F), such that for all φ ∈ Φ,

s(φ) = Ps(↑ φ).

Consider the probability space (Φ, σ(F),P ) with P the probability mea- sure of Theorem 29.6. The identity mapping id(φ) = φ is a random vari- able in the information algebra Φ. Its support functions is defined by spid(φ) = Ps(↑ φ) = s(φ). Therefore, according to Theorem 29.6 any sup- port function on a compact information algebra with countably many finite elements is generated by a random variable in this algebra. This is a first existence theorem for random variables. Note that for any φ ∈ Φ we have \ ↑ φ = ↑ ψ.

ψ∈Φf ,ψ≤φ

Since Φf is countable, so is the intersection above, and we may enumerate i the elements ψ ≤ φ, ψ ∈ Φf by ψ1, ψ2,.... Define then ηi = ∨j=1ψj such ∞ that η1 ≤ η2 ≤ · · · and φ = ∨i=1ηi. Also ↑ η1 ⊇↑ η2 ⊇ · · · and

∞ \ ↑ φ = ↑ ηi. i=1 It follows that

s(φ) = Ps(↑ φ) = lim Ps(↑ ηi) = lim s(ηi) (29.4) i→∞ i→∞ So, the support function on Φ is in fact determined by its values for the finite elements in Φf . 262 CHAPTER 29. SUPPORT FUNCTIONS

The result above (Theorem 29.6) can be slightly generalized. Let (Φ,D) be a domain-free information algebra and E ⊆ Φ a sub-semi-lattice of Φ. Then we may consider the family IE of ideals in E. The theory of ideal completion of an information algebra applies essentially to E as to Φ: First, IE is a complete lattice. Second the elements of E (or more precisely, the principal ideals I(φ) for φ ∈ E) are the finite elements of the lattice IE . So, if E is countable, and s a continuous support function on E, which can be extended by (29.4) to IE , then there is, just as in Theorem 29.6, a probability measure Ps on E such that

s(φ) = Ps(↑ φ)

Note that E and IE are no more necessarily information algebras (i.e. closed under focusing).

29.3 Extension of Support Functions

The natural way in the context of these notes to look at extension both of sp- fcts and a.o.p.s beyond their domain of definition is to use the information in the generating random mapping which is in these notes considered as the primary element describing uncertain information. This is essentially done in the previous Section 28.2, cumulating in Theorem 28.8 showing the equivalence of the two natural approaches introduced before for random variables. However, when considering a.o.p.s as primary objects, it makes sense to study extensions of them without assuming the generating random mapping as given `apriori. Then we may follow Shafer’s approach (Shafer, 1979). Al- ternatively we may use the method proposed in (Kohlas & Monney, 1995), i.e. studying the family of random mappings generating a given sp-fct. or a.o.p., delimiting canonical random mappings among them and looking at the extension they create as random mappings. That’s what we are dis- cussing in this section. Note that in (Kohlas & Monney, 1995) the problem is treated in the context of random sets. It turns out that the main results can be generalized without difficulty to the more general case of random mappings into information algebras. As a preparation we note the close connection between support functions and allocations of probability (a.o.p.). In fact, as the following lemma shows, from an a.o.p. we may easily obtain the associated support function.

Lemma 29.3 Let ρ : E → B be an a.o.p. from a sub-semi-lattice E ⊆ Φ into a probability algebra (B, µ). Then µ◦ρ : E → [0, 1] is a support function on E. If furthermore E is closed under countable combinations, and ρ is a σ-a.o.p., i.e. for any countable family ψi ∈ E, i = 1, 2,..., it holds that

∞ ∞ ρ(⊗i=1ψi) = ∧i=1ρ(ψi), 29.3. EXTENSION OF SUPPORT FUNCTIONS 263 then µ ◦ ρ is a continuous support function on E.

Proof. We have to verify properties (1) and (2) of Theorem 29.2 for the application µ◦ρ. Since ρ(e) = >, it follows that µ(ρ(e)) = 1. Thus condition (1) is satisfied. Further, let φ1, . . . , φn ∈ E, n ≥ 1 and φ1, . . . , φn ≥ φ ∈ E. Then we have ρ(φi) ≤ ρ(φ) for i = 1, . . . , n, hence

n _ ρ(φi) ≤ ρ(φ). i=1 Hence, by the inclusion-exclusion formula of probability theory we obtain

X |I|−1 µ(ρ(φ)) ≥ (−1) µ(∧i∈I ρ(φi)). ∅6=I⊆{1,...,n}

Since ρ is an a.o.p., we have that ∧i∈I ρ(φi) = ρ(⊗i∈I φi), and it follows therefore that

X |I|−1 µ(ρ(φ)) ≥ (−1) µ(ρ(⊗i∈I φi)). ∅6=I⊆{1,...,n}

But this is property (2) of Theorem 29.2. If ρ is a σ-a.o.p. and ψ1 ≤ ψ2 ≤ ... ∈ E, then by the continuity of probabilty it follows that

∞ ∞ µ(ρ(⊗i=1ψi)) = µ(∧i=1ρ(ψi)) = lim µ(ρ(ψi)), i→∞ since ρ(ψ1) ≥ ρ(ψ2) ≥ .... ut Let (Φ,D) be a domain-free information algebra and E ⊆ Φ a sub- semi-lattice contained in Φ. Further let (B, ν) be a probability algebra and ρ : E → B an a.o.p. defined on E. The mapping sp = ν ◦ ρ defines then a support function on E. On the other hand, assume Γ a random mapping Γ:Ω → Φ on a probability space (Ω, A,P ) whose support function on E coincides with sp. Let (B, ν) be the probability algebra associated with the probability space (Ω, A,P ), i.e. B = A/J where J is the σ-ideal of P -null sets in A and ν([A]) = P (A) (see Section 27.1). Then by ρ(φ) = [sΓ(φ)] for φ ∈ E an a.o.p. defined on E is defined. So, if a support function sp on E is given, we may consider any generating random mapping for sp and then obtain in the way just described the associated a.o.p. Now, for a support function sp on E by Theorem 29.5 there exists a random mapping which generates sp on E. But this random mapping is not unique. For instance, assume E ⊆ E0, where E0 ⊆ Φ is another semi-lattice in Φ and sp0 a support function on E0 whose restriction to E equals sp, then a random mapping which generates sp0 on E0 generates also sp on E. Or, looked the other way round, any random mapping Γ, which generates the 264 CHAPTER 29. SUPPORT FUNCTIONS support function sp on E, i.e., for which spΓ = sp on E, has an extension spΓ(φ) = P∗(sΓ(φ)) to the whole of Φ (see Section 27.1). So, there are many extension of a support function sp on E to Φ. The goal is to compare these extension somehow and to single out particularly interesting ones. In order to do that, we first single out particular random mappings which generate a support function sp on E, so-called canonical mappings. According to the proof Theorem 29.5 the random mapping generating a given support function sp on a sub-semi-lattice E ⊆ Φ is defined on a proba- bility space (X, A,Psp), where X is the set of binary support functions with values in {0, 1} on E and A the Borel algebra in X. We noted already in Section 29.2, that the elements of X are in a one-to-one correspondence with ideals in E. Therefore, if IE denotes the set of ideals in E, we may as well consider the probability space (IE , A,Psp), where A, by abuse of notation, now denotes the image of the former Borel algebra A in X and Psp the corresponding probability measure. The random mapping generating sp is then defined as ν(I) = {φ ∈ Φ: ∃ψ ∈ I, φ ≤ ψ}. That is, the ideal I ∈ IE is extended to an ideal ν(I) ∈ IΦ. Any ideal J ∈ IΦ becomes by restriction to E an ideal in J/E ∈ IE . So, we have a mapping π : IΦ → IE defined by π(J) = J/E. This mapping is called the projection of IΦ to IJ/E . It is onto: In fact, if I ∈ IE , then ν(I) ∈ IΦ and π(ν(I)) = I. Let

−1 π (I) = {J ∈ IΦ; π(J) = I}.

−1 Then the family of sets π (I) for all I ∈ IE forms a partition of IΦ. Further, if K ∈ π−1(I), then we claim that ν(I) ⊆ K. In fact, consider φ ∈ ν(I). Then there is a ψ ∈ I such that φ ≤ ψ. But we must also have ψ ∈ K, since K/E = I, hence φ ∈ K. Thus, ν(I) is the minimal ideal in π−1(I), minimal in the sense of the order in the information algebra IΦ. In view of these considerations, by Theorem 29.5, the random mapping 0 ν, defined on a probability space (IE , A,Psp) generates the support function sp on E, that is

0 Psp{I ∈ IE : φ ∈ ν(I)} = sp(φ) for all φ ∈ E. Of course this implies that E is contained in the set Eν of ν-measurable elements of Φ, i.e. E ⊆ Eν. As before we denote by sν the allocation of support associated with the random mapping ν,

sν(φ) = {I ∈ IE : φ ∈ ν(I)}.

Then we have sν(φ) ∈ A for all φ ∈ E. Let then Asp denote the σ-algebra generated by the sets sν(φ) for φ ∈ E. Thus, Asp ⊆ A is the smallest sub- σ-algebra in A, with respect to which the allocations of support sν(φ) are measurable for all φ ∈ E. Let finally Psp be the restriction of the probability 29.3. EXTENSION OF SUPPORT FUNCTIONS 265

0 measure Psp to Asp. The random mapping ν defined on the new probability space (IE , Asp,Psp) still generates the support function sp on E. It is convenient, for the comparison of random mappings, to transport −1 the probability space to the set IΦ. As we have seen π (I) defines for I ∈ IE −1 a partition of IΦ. The family of subsets π (A) of IΦ for A ∈ Asp form a σ-algebra of subsets of IΦ, and by Psp(A) a probability measure is defined on this σ-algebra of subsets of IΦ. By abuse of notation we denote this new probability space by (IΦ, Asp,Psp). The random mapping Γsp = ν ◦ π defined on this new probability space still generates the support function sp on E. We call this random mapping Γsp canonical and emphasize the dependence of the mapping on the support function sp and its domain E. In fact, the σ-algebra Asp is uniquely determined by the domain E of sp. At this point however, it is not clear whether the probability measure also is uniquely determined by sp. Of course it is uniquely determined on the sets −1 π (sν(φ)) for φ ∈ E. This is sufficient for the time being. Later it will turn out that it is in fact uniquely determined on the whole of Asp. As in Section 27.1, we may define the probability algebra (Asp/Jsp, µsp) associated with the probability space (IΦ, Asp,Psp), the application ρ0 : IΦ → Asp/Jsp and then finally

spΓsp (φ) = µsp(ρ0(sΓsp (φ))) = Psp∗ (spΓsp (φ)). (29.5)

The mapping spΓsp extends sp from E to Φ. We shall show that spΓsp is in fact a support function, and indeed the smallest support function, which extends sp to Φ. Consider two sub-semi-lattices E and E0 of Φ, such that E ⊆ E0 and further support functions sp and sp0 with domains E and E0 respectively, such that sp0 is an extension of sp from E to E0, i.e. sp0(φ) = sp(φ) for all φ ∈ E. We want to compare the canonical random mappings associated with the two support functions.

Lemma 29.4 Let (IΦ, Asp,Psp) and (IΦ, Asp0 ,Psp0 ) associated with the canon- 0 ical random mappings Γsp and Γsp of the support functions sp and sp . Then

1. Γsp(I) ≤ Γsp0 (I) for all I ∈ IΦ.

2. Asp ⊆ Asp0 .

0 Proof. From E ⊆ E it follows that νsp(I) ⊆ νsp0 (I). This implies (1). 0 0 Also, if φ ∈ E, then φ ∈ E , hence sΓsp (φ) ∈ Asp , which implies (2). ut In some sense the canonical random mapping associated with the exten- sion sp0 is a refinement of the mapping corresponding to sp: It focuses to richer information Γsp(I) ≤ Γsp0 (I). Further the probability measure Psp0 is an extension of Psp, hence the probability information of the mapping Γsp0 is also richer than the one of Γsp. This issue will be further discussed in a later 266 CHAPTER 29. SUPPORT FUNCTIONS chapter, when the information content of random mappings is examined. Lemma 29.4 permits now to compare the extensions sp and sp of the Γsp Γsp0 support functions sp and sp0. In fact, the refined information allocates a larger degree of support to any hypothesis.

Lemma 29.5 Let sp and sp be the extensions defined in (29.5). Then, Γsp Γsp0 for all φ ∈ Φ,

sp (φ) ≤ sp (φ). Γsp Γsp0

Proof. By Lemma 29.4 (1) we have Γsp(I) ⊆ Γsp0 (I) for all ideals I ∈ IΦ. This implies for the allocations of support that

s (φ) = {I ∈ I : φ ∈ Γ (I)} ⊆ {I ∈ I : φ ∈ Γ 0 (I)} = s (φ). Γsp φ sp φ sp Γsp0 Further, also by Lemma 29.4 (2) we have that the inner probability with respect to Psp equals at most the inner probability with respect to Psp0 , 0 i.e. Psp∗ (A) ≤ Psp∗ (A) for all subsets A of IΦ. These two considerations together imply for all φ ∈ Φ, that

sp (φ) = P (s (φ)) ≤ P 0 (s (φ)) = sp (φ). Γsp sp∗ Γsp sp∗ Γsp0 Γsp0 This proves the lemma. ut Next, we introduce an alternative extension of a support function on E to the whole of Φ, which is shown to be indeed a support function on Φ. This extension has been introduced in (Shafer, 1979) and has there been called the canonical extension, because it is the minimal extension of sp to Φ.

Theorem 29.7 Let sp be a support function on a sub-semi-lattice E of Φ. Then

X |I|−1 sp¯ (φ) = sup { {(−1) sp(⊗i∈I φi) } ∅6=I⊆{1,...,n}

φi ∈ E, φi ≥ φ, i = 1, . . . , n; n = 1, 2,...}. (29.6) is a support function on Φ, and sp¯ (φ) = sp(φ) for all φ ∈ E. Further, sp¯ (φ) is the minimal extension of sp to Φ, i.e. if sp0 is another extension of sp to Φ, then

sp¯ (φ) ≤ sp0(φ) (29.7) for all φ ∈ Φ.

Proof. Let ρ be the a.o.p. associated to the support function sp via the canonical random mapping Γsp corresponding to sp and µ the probability 29.3. EXTENSION OF SUPPORT FUNCTIONS 267 measure in the probability algebra associated with the probability space of the canonical mapping Γsp. This means that sp(φ) = µ(ρ(φ)) for all φ ∈ E. We define for φ ∈ Φ ρ¯(φ) = ∨{ρ(ψ): ψ ∈ E, φ ≤ ψ}. We claim thatρ ¯ is an a.o.p. on Φ, which extends ρ on E. Certainly, if φ ∈ E, thenρ ¯(φ) = ρ(φ). So, in particular,ρ ¯(e) = >. Further,ρ ¯(φ1) ≥ ρ¯(φ1 ⊗ φ2), ρ¯(φ2) ≥ ρ¯(φ1 ⊗ φ2), henceρ ¯(φ1) ∧ ρ¯(φ2) ≥ ρ¯(φ1 ⊗ φ2). On the other hand, let ψi ∈ E, φi ≤ ψi, for i = 1, 2. Then

ρ(ψ1) ∧ ρ(ψ2) = ρ(ψ1 ⊗ ψ2), and ψ1 ⊗ ψ2 ∈ E, as well as φ1 ⊗ φ2 ≤ ψ1 ⊗ ψ2. From this we conclude that

ρ(ψ1) ∧ ρ(ψ2) ≤ ρ¯(φ1 ⊗ φ2),

Nowρ ¯(φi) ≤ ρ(ψi) for i = 1, 2. Therefore, we obtainρ ¯(φ1) ∧ ρ¯(φ2) ≤ ρ(ψ1) ∧ ρ(ψ2) ≤ ρ¯(φ1 ⊗ φ2), so, finally,ρ ¯(φ1) ∧ ρ¯(φ2) =ρ ¯(φ1 ⊗ φ2). This proves thatρ ¯ is an a.o.p. on Φ. Next we show thatsp ¯ = µ ◦ ρ¯. This proves then by Lemma 29.3 that sp¯ is indeed a support function. Now, if we replace the terms sp(⊗i∈I φi) on the right hand side of 29.6 by their identical values in terms of the a.o.p. ρ, we obtain, by the inclusion-exclusion formula of probability,

X |I|−1 sp¯ (φ) = sup { (−1) µ(∧i∈I ρ(φi)) : ∅6=I⊆{1,...,n}

φi ∈ E, φi ≥ φ, i = 1, . . . , n; n = 1, 2,...} n = sup{µ(∨i=1ρ(φi)) : φi ∈ E, φi ≥ φ, i = 1, . . . , n; n = 1, 2,...}. (29.8)

n The family of elements ∨i=1ρ(φi) of the probability algebra B, for φi ∈ n E, φi ≥ φ forms an upward directed subset in B, because (∨i=1ρ(φi)) ∨ m m (∨i=n+1ρ(φi)) ≤ ∨i=1ρ(φi). Therefore, by Lemma 27.1 in Section 27.2, the last term in (29.8) above equals

n µ(∨{∨i=1ρ(φi): φi ∈ E, φi ≥ φ, i = 1, . . . , n; n = 1, 2,...}) = µ(∨{ρ(ψ): ψ ∈ E, φ ≤ ψ}) = µ(¯ρ(φ)). Here the associative law for joins in complete lattices is used. So,sp ¯ = µ ◦ ρ¯ as claimed above. Sinceρ ¯ is an extension of ρ, it follows thatsp ¯ is an extension of sp to Φ. Finally, let sp0 be any support function on Φ, which on E coincides with sp. Consider any φ ∈ Φ and collection φ1, . . . , φn ∈ E such that φ ≤ φ1, . . . , φn. Then, by Theorem 29.2

0 X |I|+1 sp (φ) ≥ (−1) sp(⊗i∈I φi). ∅6=I⊆{1,...,n} 268 CHAPTER 29. SUPPORT FUNCTIONS

This implies

0 X |I|+1 sp (φ) ≥ sup{ (−1) sp(⊗i∈I φi): ∅6=I⊆{1,...,n}

φi ∈ E, φi ≥ φ, i = 1, . . . , n; n = 1, 2,...} =sp ¯ (φ).

This proves the inequality (29.7), which means thatsp ¯ is the minimal sup- port function, which extends sp to Φ. ut The main result is now that the canonical extension introduced above is identical to the extension spΓsp of sp. This proves at the same time that spΓsp is indeed a support function, and it is the minimal extension of sp to Φ. It proves by the way also that the probability Psp underlying the canonical random mapping associated with sp must be unique, since the support functionsp ¯ depends only on sp and its domain E.

Theorem 29.8 Let sp be a support function on a sub-semi-lattice E of Φ. Then

spΓsp (φ) =sp ¯ (φ) for all φ ∈ Φ.

Proof. Sincesp ¯ is a support function on Φ, there is an associated canon- ical random mapping Γsp¯ defined on a probability space (IΦ, Asp¯ ,Psp¯ ). Note that we have spΓsp¯ =sp ¯ . According to Lemma 29.5 we have then

spΓsp (φ) ≤ sp¯ (φ).

Note further that, if φ ≤ ψi for i = 1, . . . , n, then sΓsp (ψi) ⊆ sΓsp (φ), therefore

n ∪i=1sΓsp (ψi) ⊆ sΓsp (φ) Therefore, if the canonical random mapping generating sp is defined on the probability space (IΦ, Asp,Psp), then, for all φ ∈ Φ we have spΓsp (φ) = Psp∗ (sΓsp (φ)) n ≥ sup{Psp(∪i=1sΓsp (ψi)) : φ ≤ ψi, ψi ∈ E, i = 1, . . . , n, n = 1, 2,...} X |I|−1 = sup{ (−1) Psp(∩i∈I sΓsp (ψi)) : ∅6=I⊆{1,...,n}

φ ≤ ψi, ψi ∈ E, i = 1, . . . , n, n = 1, 2,...} X |I|−1 = sup (−1) sp(⊗i∈I ψi): ∅6=I⊆{1,...,n}

φ ≤ ψi, ψi ∈ E, i = 1, . . . , n, n = 1, 2,...} =sp ¯ (φ). 29.3. EXTENSION OF SUPPORT FUNCTIONS 269

So finally it follows that spΓsp (φ) =sp ¯ (φ) for all φ ∈ Φ ut This result is a further justification of the extension of a support function of a random mapping by the inner probability of the underlying probability space. The results obtained so far may be extended to the case of continuous support functions, although only under the additional assumption that the domain E of the support function is a lattice. In this case, there is first a simpler representation of the canonical extension of a support function.

Theorem 29.9 Let sp be a support function defined on E ⊆ Φ. If E is a lattice then for all φ ∈ Φ

sp¯ (φ) = sup{sp(ψ): ψ ∈ E, φ ≤ ψ}. (29.9)

Proof. We have by (29.6) for ψ ∈ E such that ψ ≥ φ thatsp ¯ (φ) ≥ sp(ψ), hence

sp¯ (φ) ≤ sup{sp(ψ): ψ ∈ E, φ ≤ ψ}.

In order to prove the converse inequality let ρ be the a.o.p. associ- ated with the support function sp via the canonical random mapping Γsp corresponding to sp and µ the probability measure in the probability al- gebra associated with the canonical random mapping Γsp. Then we have sp(φ) = µ(ρ(φ)) for all φ ∈ E. In addition let sΓsp denote the allocation of support associated with the random mapping Γsp. Further we note that (using the assumption that E is a lattice), for ψ1, . . . , ψn ∈ E,

n n n ρ(∧i=1ψi) = [sΓsp (∧i=1ψi)] ≥ [(∪i=1(sΓsp (ψi)] n n = ∨i=1[(sΓsp (ψi] = ∨i=1ρ(ψi).

Now, using this inequality in (29.8), we obtain that

n sp¯ (φ) = sup{µ(∨i=1ρ(ψi)) : ψi ∈ E, ψi ≥ φ, i = 1, . . . , n; n = 1,...} n ≥ sup{µ(ρ(∪i=1ψi)) : ψi ∈ E, ψi ≥ φ, i = 1, . . . , n; n = 1,...} ≥ sup{µ(ρ(ψ)) : ψ ∈ E, ψ ≥ φ} = sup{sp(ψ): ψ ∈ E, ψ ≥ φ}.

This proves the theorem. ut Now, in the case of continuous support functions, the results on the extension of a support function to Φ can be extended as follows:

Theorem 29.10 Assume the information algebra Φ to be a σ-algebra, and sp a continuous support function defined on a distributive lattice E ⊆ Φ, which is closed under countable combinations. Then 270 CHAPTER 29. SUPPORT FUNCTIONS

1. The canonical extension sp¯ is continuous, i.e.

∞ sp¯ (⊗i=1φi) = lim sp¯ (φi), i→∞

for every monotone sequence φ1 ≤ φ2 ≤ ... ∈ Φ.

2. For all φ ∈ Φ,

∞ sp¯ (φ) = sup{ lim sp(φi): φ1 ≤ φ2 ≤ ... ∈ E, φ ≤ ⊗i=1φi}. (29.10) i→∞

Proof. As in the proof of the preceding theorem let ρ be the a.o.p. as- sociated with the support function sp via the canonical random mapping Γsp corresponding to sp and µ the probability measure in the probabil- ity algebra associated with the canonical random mapping Γsp, such that sp(φ) = µ(ρ(φ)) for all φ ∈ E. In addition let sΓsp again denote the allo- cation of support associated with the random mapping Γsp. Note that for ψi ∈ E for i = 1,..., by Theorem 29.1

∞ ∞ ∞ ρ(∨i=1ψi) = [sΓsp (∨i=1ψi)] = [∩i=1sΓsp (ψi)] ∞ ∞ = ∧i=[sΓsp (ψi)] = ∧i=ρ(ψi). (29.11)

We define first for any φ ∈ Φ

ρ¯(φ) = ∨{ρ(ψ): ψ ∈ E, φ ≤ ψ}.

Note then that

ρ¯(φ) = ∨{ρ(∨i∈I ψi): I countable set ψi ∈ E, φ ≤ ∨i∈I ψi}

= ∨{ ∧i∈I ρ(ψi): I countable set ψi ∈ E, φ ≤ ⊗i∈I ψi}.

i With ψi, i ∈ I we may associate the monotone sequence ηi = ∨j=1ψj such ∞ that ∨i∈I ψi = ∨i=1ηi. This shows finally that

∞ ∞ ρ¯(φ) = ∨{ ∧i=1 ρ(ψi): ψ1 ≤ ψ2 ≤ ... ∈ E, φ ≤ ∨i=1ψi} holds too. Then the proof proceeds as follows: We show thatρ ¯ is a σ-a.o.p. and that µ ◦ ρ¯ equals the right hand side of (29.10). Applying Lemma 29.3 we conclude then that the right hand side of (29.10) defines a continuous support function on Φ. Finally we are going to show thatsp ¯ equals the right hand side of (29.10). This proves then the two parts of the theorem. We are now going to show that

∞ ∞ ρ¯(∨i=1φi) = ∧i=1ρ(φi). (29.12) 29.3. EXTENSION OF SUPPORT FUNCTIONS 271

Define the sets

∞ D = {ρ(ψ): ψ ∈ E, ∨i=1φi ≤ ψ}, Di = {ρ(ψ): ψ ∈ E, φi ≤ ψ}.

∞ Then it holds that ∨D =ρ ¯(∨i=1φi) and ∨Di =ρ ¯(φi) such that we have to ∞ ∞ prove ∨D = ∧i=1(∨Di). Since φi ≤ ∨i=1φi ≤ ψ it follows that D ⊆ Di for all i, hence ∨D ≤ ∨Di for all i = 1,..., therefore

∞ ∨D ≤ ∧i=1(∨Di) = M.

It remains to show that ∨D ≥ M. We remark that the set Di is upwards directed: If ρ(ψ1), ρ(ψ2) ∈ Di, then ψ1, ψ2 ∈ E and φi ≤ ψ1, ψ2, therefore ψ1 ∧ ψ2 ∈ E (since E is assumed to be a lattice) and φi ≤ ψ1 ∧ ψ2. This implies that ρ(ψ1), ρ(ψ2) ≤ ρ(ψ1 ∧ ψ2) ∈ Di. It follows then from Lemma 27.1 that

µ(∨Di) = sup{µ(ρ(ψ)) : ψ ∈ E, φi ≤ ψ}.

Therefore, for any  > 0 there exists an element ρ(ψi,) ∈ Di such that  ≥ µ(∨D ) − µ(ρ(ψ )) = µ(∨D − ρ(ψ )). 2 · i i i, i i,

Since M ≤ ∨Di we obtain

c c µ(M − ρ(ψi,)) = µ(M ∧ (ρ(ψi,)) ) ≤ µ(∨Di ∧ (ρ(ψi,)) )  = µ(∨D − ρ(ψ ))) ≤ . i i, 2 · i ∞ ∞ ∞ Now, φi ≤ ψi, implies ∨i=1φi ≤ ∨i=1ψi, and ∨i=1ψi, ∈ E because E is supposed to be closed under countable combinations. This implies by (29.11) that

∞ ∞ M = ∧i=1ρ(ψi,) = ρ(∨i=1ψi,) ∈ D. Further,

c ∞ c ∞ c M − M = M ∧ M = M ∧ (∧i=1ρ(ψi,)) = M ∧ (∨i=1ρ(ψi,) ) ∞ c ∞ = ∨i=1 (M − ρ(ψi,) ) = ∨i=1(M − ρ(ψi,)). From this we obtain ∞ X  µ(M − M ) = µ(∨∞ (M − ρ(ψ ))) ≤ = .  i=1 i, 2 · i i=1

c c Then, finally, M ≤ ∨D or M ≥ (∨D) and therefore

c c c M − ∨D = M ∧ (∨D) ≤ M ∧ M = M − M . 272 CHAPTER 29. SUPPORT FUNCTIONS

c Thus we conclude that µ(M − ∨D) ≤ µ(M − M ) ≤ . Since  is arbitrarily small, it follows µ(M − ∨D) = 0, hence from ∨D ≤ M it follows that ∨D = M. This proves (29.12). Let now spcm denote the right hand side of (29.10). We are next going to show that spcm = µ ◦ ρ¯. In fact, ∞ spcm(φ) = sup{ lim sp(φi): φ1 ≤ φ2 ≤ ... ∈ E, φ ≤ ⊗i=1φi} i→∞ ∞ ∞ = sup{sp(⊗i=1φi): φ1 ≤ φ2 ≤ ... ∈ E, φ ≤ ⊗i=1φi} ∞ ∞ = sup{µ(ρ(⊗i=1φi)) : φ1 ≤ φ2 ≤ ... ∈ E, φ ≤ ⊗i=1φi} ∞ ∞ Let’s show now that the set {ρ(⊗i=1φi)) : φ1 ≤ φ2 ≤ ... ∈ E, φ ≤ ⊗i=1φi} is ∞ ∞ 0 upward directed. Let ρ(⊗i=1φi) and ρ(⊗i=1φi) be two elements of this set. Since E is assumed to be lattice, and ρ(φ) ∨ ρ(ψ) ≤ ρ(φ ∧ ψ), it follows that ∞ ∞ ∞ 0 ρ(⊗i=1φi) ≤ ρ(⊗i=1φi) ∨ ρ(⊗i=1φi) ∞ ∞ 0 ≤ ρ((⊗i=1φi) ∧ (⊗i=1φi)) ∞ 0 = ρ(⊗i=1(φi ∧ φi)) ∞ 0 ∞ 0 and similarily ρ(⊗i=1φi) ≤ ρ(⊗i=1(φi ∧ φi)). Here we use the assumption that the lattice E is distributive. So we conclude that ∞ ∞ 0 ∞ 0 ρ(⊗i=1φi) ∨ ρ(⊗i=1φi) ≤ ρ(⊗i=1(φi ∧ φi)). ∞ 0 0 The element ρ(⊗i=1(φi ∧ φi)) belongs to the set, since φi ∧ φi ∈ E, further 0 0 ∞ 0 φ1 ∧ φ1 ≤ φ2 ∧ φ2 ≤ ... and φ ≤ ⊗i=1(φi ∧ φi). Then by Lemma 27.1 that ∞ ∞ spcm(φ) = sup{µ(ρ(⊗i=1φi)) : φ1 ≤ φ2 ≤ ... ∈ E, φ ≤ ⊗i=1φi} ∞ ∞ = µ(∨{ρ(⊗i=1φi)) : φ1 ≤ φ2 ≤ ... ∈ E, φ ≤ ⊗i=1φi}) = µ(¯ρ(φ)).

Therefore spcm is indeed a support function, which extends sp from E to all of Φ. This implies thatsp ¯ (φ) ≤ spcm(φ for all φ ∈ Φ, sincesp ¯ is the minimal extension of sp (Theorem 29.7). Finally we sow thatsp ¯ = spcm. As sp is a continuous support func- tion on the lattice E, it follows that the allocation of support sΓsp satisfies ∞ ∞ sΓsp (⊗i=1φi) = ∩i=1sΓsp (φi). Let A be the σ algebra associated with the canonical ran dom mapping induce by sp and P the associated probability measure. Then we have

sp¯ (φ) = sup{P (A): A ∈ A,A ⊆ sΓsp (φ)}

≥ sup{P (sΓsp (ψ)) : ψ ∈ E, φ ≤ ψ} ∞ ∞ ≥ sup{P (sΓsp (⊗i=1ψi)) : ψ1 ≤ ψ2 ≤ ... ∈ E, φ ≤ ⊗i=1ψi} ∞ ∞ = sup{P (∩i=1sΓsp (ψi)) : ψ1 ≤ ψ2 ≤ ... ∈ E, φ ≤ ⊗i=1ψi} ∞ = sup{P ( lim sΓsp (ψi)) : ψ1 ≤ ψ2 ≤ ... ∈ E, φ ≤ ⊗i=1ψi} i→∞ ∞ = sup{P ( lim sp(ψi)) : ψ1 ≤ ψ2 ≤ ... ∈ E, φ ≤ ⊗i=1ψi} i→∞ = spcm(φ). 29.3. EXTENSION OF SUPPORT FUNCTIONS 273

So,sp ¯ = spcm, which is continuous and satifies (refeq:ContExt). this com- pletes the proof. ut

Of course, it still holds that spΓsp =sp ¯ . So the property stated in

Theorem 29.9 apply to the extension spΓsp too. If the domain E of a support function is part of a compact information algebra (Φ, Φf ,D), then, as we know, Φ is a complete lattice. If, in addition, E is a complete lattice too, then even more stringent statements can be made about the canonical extension of sp to Φ. In particular, we claim that, under some circumstances, the finite elements determine in a simple way the canonical extension of the support function to Φ. This holds, if any element of E is in some sense arbitrarily near a coarser finite and measurable element in E ∩ Φf . In this case the canonical extensionsp ¯ to all of Φ is in fact an extension of the restriction ofsp ¯ to the finite elements.

Theorem 29.11 Let (Φ, Φf ,D) be a compact information algebra, and sp a support function with domain E ⊆ Φ, where E is a complete lattice, and for all φ ∈ E and  > 0 there is a ψ ∈ E ∩ Φf such that ψ ≤ φ and sp(η) − sp(φ) <  for all η ∈ E such that ψ ≤ η ≤ φ. Then

sp¯ (∨φ∈X φ) = inf sp¯ (φ) (29.13) φ∈X for any directed subset X of Φ. In particular, for all φ ∈ Φ, it holds that

sp¯ (φ) = inf sp¯ (ψ). ψ∈Φf ,ψ≤φ

Proof. Let ρ be the a.o.p. associated with the canoncical random map- ping corresponding to the support function sp defined on E and let (B, µ) be the probability algebra derived from the canonical random mapping. Then it holds that sp = µ ◦ ρ. We define now

ρˆ(φ) = ∧{ρ¯(ψ): ψ ∈ Φf , ψ ≤ φ}, andsp ˆ = µ ◦ ρˆ. Now, the set of finite elements ψ ≤ φ is an directed set, hence the set ρ(ψ) for finite elements ψ ≤ φ is downwards directed in B. Hence, by Lemma 27.1 we obtain

µ(ˆρ(φ)) = µ(∧ψ∈Φf ,ψ≤φρ¯(ψ)) = inf µ(¯ρ(ψ)) = inf sp¯ (ψ). ψ∈Φf ,ψ≤φ ψ∈Φf ,ψ≤φ

The proof goes as follows: We show thatρ ˆ|E = ρ thatρ ˆ is an a.o.p., and thatρ ˆ(∨X) = ∨φ∈X ρˆ(φ) for any subset X ⊆ Φ. This proves then thatsp ˆ is a support function, which is an extension of sp and fulfills (29.13). Then we show thatsp ¯ =sp ˆ . This proves (29.13). The second part of the theorem is an immediate consequence of (29.13). 274 CHAPTER 29. SUPPORT FUNCTIONS

Suppose then φ ∈ E. For any ψ ∈ Φf such that ψ ≤ φ we have also ρ¯(ψ) ≥ ρ¯(φ) = ρ(φ), henceρ ˆ(φ) ≥ ρ(φ). In order to prove thatρ ˆ(φ) = ρ(φ), we fix an  > 0 and choose a ψ ∈ E ∩ Φf such that ψ ≤ φ and sp(η) − sp(φ) < /2 for all η ∈ E such that ψ ≤ η ≤ φ. Now, since φ ∈ E,

ρˆ(φ) = ∧{∨{ρ(η): η ∈ E, ψ ≤ η} : ψ ∈ Φf , ψ ≤ φ}

= ∧{∨{ρ(η): η ∈ E, ψ ≤ η ≤ φ} : ψ ∈ Φf , ψ ≤ φ}

≤ ∨{ρ(η): η ∈ E, ψ ≤ η ≤ φ}.

Denote this last element of B by m. The family {ρ(η): η ∈ E, ψ ≤ η ≤ φ} is an upward directed set in B: Indeed, if ρ(η1) and ρ(η2) are both in the set, then so is ρ(η1 ∧ η2), which is larger than both ρ(η1) and ρ(η2). Note that here we use the assumption that E is a lattice. This implies that

µ(m) = sup{ρ(η): η ∈ E, ψ ≤ η ≤ φ}.

Hence there is a η ∈ E such that ψ ≤ η ≤ φ and  µ(m ) − µ(ρ(η )) = µ(m − ρ(η )) < .     2

Since ρ(φ) ≤ ρˆ(φ) ≤ m it follows that

µ(ˆρ(φ) − ρ(φ)) ≤ µ(m) − µ(ρ(φ)) = |µ(m) − sp(φ)|   = |µ(m ) − sp(η ) + sp(η ) − sp(φ)| < + = .    2 2 Since  > 0 is arbitrary, it follows that µ(ˆρ(φ) − ρ(φ)) = 0 which, together withρ ˆ(φ) ≥ ρ(φ), implies thatρ ˆ(φ) = ρ(φ), henceρ ˆ|E = ρ. Since e ∈ E and Φf it follows thatρ ˆ(e) = ρ(e) = >. Further, let φ1, φ2 ∈ Φ. Then

ρˆ(φ1) = ∧{ρ¯(ψ): ψ ∈ Φf , ψ ≤ φ1}

≥ ∧{ρ¯(ψ): ψ ∈ Φf , ψ ≤ φ1 ⊗ φ2}

=ρ ˆ(φ1 ⊗ φ2).

Similarily, it holds also thatρ ˆ(φ2) ≥ ρˆ(φ1 ⊗ φ2), henceρ ˆ(φ1) ∧ ρˆ(φ2) ≥ ρˆ(φ1 ⊗ φ2). On the other hand,

ρˆ(φ1 ⊗ φ2) = ∧{ρ¯(ψ): ψ ∈ Φf , ψ ≤ φ1 ⊗ φ2}

≤ ∧{ρ¯(ψ1 ⊗ ψ2): ψ1, ψ2 ∈ Φf , ψ1 ≤ φ1, ψ2 ≤ φ2}

= ∧{ρ¯(ψ1) ∧ ρˆ(ψ2): ψ1, ψ2 ∈ Φf , ψ1 ≤ φ1, ψ2 ≤ φ2}

= (∧{ρˆ(ψ1): ψ1 ∈ Φf , ψ1 ≤ φ1}) ∧ (∧{ρˆ(ψ2): ψ2 ∈ Φf , ψ2 ≤ φ2})

=ρ ˆ(φ1) ∧ ρˆ(φ2).

So, we conclude thatρ ˆ(φ1 ⊗ φ2) =ρ ˆ(φ1) ∧ ρˆ(φ2) and this shows thatρ ˆ is an a.o.p. 29.3. EXTENSION OF SUPPORT FUNCTIONS 275

Let X be an arbitrary subset of Φ. We claim thatρ ˆ(∨X) = ∧φ∈X ρˆ(φ). Using Lemma 27.1 it follows then thatsp ˆ (∨X) = infφ∈X spˆ (φ), if X is directed. Now, by compactness, if ψ ∈ Φf and ψ ≤ ∨X, if, and only if, there is a finite subset Y in X such that ψ ≤ ∨Y . Therefore,

ρˆ(∨X) = ∧{ρ¯(ψ): ψ ∈ Φf , ψ ≤ ∨X}

= ∧{ρ¯(ψ): ψ ∈ Φf , ψ ≤ ∨Y, for some finite Y ⊆ X}

= ∧ (∧{ρ¯(ψ): ψ ∈ Φf , ψ ≤ ∨Y }): Y ⊆ X,Y finite} = ∧{ρˆ(∨Y ): Y ⊆ X,Y finite} = ∧ (∧{ρˆ(φ): φ ∈ Y }): Y ⊆ X,Y finite} = ∧{ρˆ(φ): φ ∈ X}.

So,sp ˆ is indeed a support function on Φ satisfying (??). We remark that so far we did not yet use the assumption that E is a complete lattice. Thus, this result holds without this assumption. Finally we are going to prove thatsp ˆ =sp ¯ if E is a complete lattice. Note in particular, that in this case sp satisfies (29.13) on E, sincesp ˆ is an extension of sp. Assume that sp0 is an support function on Φ, which satisfies (29.13) and whose restriction to E equals sp. Then fromTheorem 29.7 we know thatsp ¯ ≤ sp0. Further, since sp0 is assumed to satisfy (29.13), we obtain

sp0(φ) = inf sp0(ψ) ≥ inf sp¯ (ψ) =sp ˆ (φ). ψ∈Φf ,ψ≤φ ψ∈Φf ,ψ≤φ

This shows thatsp ˆ is the minimal extension of sp from E to Φ, which satisfies (29.13). We show next thatsp ¯ satisfies (29.13) if E is a complete lattice. Thensp ¯ ≥ spˆ . But the converse inequality holds as we have seen above, whencesp ¯ =sp ˆ as claimed. This concludes then the proof. If E is a complete lattice, then a mapping θ :Φ → E can be defined as follows:

θ(φ) = ∧{ψ : ψ ∈ E, φ ≤ ψ}.

Then it follows thatsp ¯ = sp ◦ θ, since the set {ρ(ψ): ψ ∈ E, φ ≤ ψ} is upwards directed (Lemma 27.1),

sp(θ(φ)) = sp(∧{ψ : ψ ∈ E, φ ≤ ψ}) = µ(ρ(∧{ψ : ψ ∈ E, φ ≤ ψ})) = µ(∨{ρ(ψ): ψ ∈ E, φ ≤ ψ}) = sup{µ(ρ(ψ)) : ψ ∈ E, φ ≤ ψ} = sup{sp(ψ): ψ ∈ E, φ ≤ ψ} =sp ¯ (φ), by Theorem 29.9. Further, for an arbitrary subset X ⊆ Φ, certainly θ(φ) ≤ θ(∨X) if φ ∈ X and therefore ∨φ∈X θ(φ) ≤ θ(∨X). Let η ∈ E be any upper bound of θ(φ) for all φ ∈ X. Then φ ≤ θ(φ) = ∧{ψ : ψ ∈ E, φ ≤ ψ} ≤ η for any φ ∈ X. Therefore we conclude that ∨X ≤ η, hence θ(∨X) ≤ η, 276 CHAPTER 29. SUPPORT FUNCTIONS which shows that θ(∨X) is the lowest upper bound of the θ(φ) for φ ∈ Φ, i.e. ∨φ∈X θ(φ) = θ(∨X). Let now X ⊆ Φ be a directed set. Then the set θ(φ) for φ ∈ X is directed too. This implies that

sp¯ (∨X) = sp(θ(∨X)) = sp(∨φ∈X θ(φ)) = inf sp(θ(φ)) = inf sp¯ (φ). φ∈X φ∈X

Sosp ¯ fulfills indeed (29.13) and this concludes the proof. ut A support function sp which can be extended to a support function, which satisfies (29.13), is called condensable (Shafer, 1979). Thus, Theorem 29.11 gives a sufficient condition for a support function to be condensable. If Φf ⊆ E, and the conditions of Theorem 29.11 are satisfied, then it follows that the canonical extension of sp is fully determined by the values of sp on the finite elements,

sp¯ (φ) = inf sp(ψ). ψ∈Φf ,ψ≤φ

In this casesp ¯ depends only on the allocation of support to the finite ele- ments Φf of the algebra. Chapter 30

Random Sets

30.1 Set-Valued Mappings: Finite Case

Subset algebras are important instances of information algebras. Therefore, we are going to study in this chapter random mappings into subset algebras. In particular, we examine how the general theory of Chapters 26 to 29 specializes to this important case and in which way the theory may be developed further, taking advantage of the additional structure of subset algebras. In particular in this first section we discuss the elementary case of the subset algebra of a finite set. In the next Section 30.2 general subset algebras will be considered. There is a well-known theory of random sets (?) which is originally not related to out theory based on information algebras. In Section ?? we discuss how this theory is related to ours. Finally, in Sections ?? and ?? two important special cases will be considered. Consider a finite set U, called the universe. The associated power set P(U) forms a Boolean algebra. We may consider it as an information algebra with intersection as combination and a single domain D = {U}. This corre- sponds to the view that the elements of U constitute the possible answers to a fixed question; and its subsets, i.e. the elements of the power algebra, are possible pieces of information relative to this question. The partial order of information A ≤ B means then B ⊆ A. Uncertain information relative to U is then given by a random mapping into U. Without loss of generality we may consider a finite sample space Ω = {ω1, . . . , ωn} together with a probability distribution pi = P (ωi) for i = 1, . . . , n. Let then Γ : Ω → P(U) denote such a random mapping into U. In terms of the general theory this represents a simple random variable in the algebra (P(U), {U}).

If Γ(ωi) 6= ∅ for all ωi ∈ Ω, then we call the random variable normalized. If Γ(ωi) 6= Γ(ωj), whenever i 6= j, then the variable is called canonical (see also Section26.1). To any random variable Γ we may associate a correspond- ↓ ↓ ing normalized variable Γ , where Γ (ωi) = Γ(ωi) for all ωi ∈ Ω such that

277 278 CHAPTER 30. RANDOM SETS

↓ Γ(ωi) 6= ∅. Further define Ω = {ωi ∈ Ω : Γ(ωi) 6= ∅} and p p↓ = i . i P (Ω↓) This can be interpreted as follows: Sample points or assumptions ω, which map to an empty set are not really possible assumptions. Therefore they can be eliminated and the probability conditioned on the event or informa- tion that only assumptions which refer to a proper information Γ(ωi 6= ∅ are possible. Similarly to any random variable Γ we can associate a corre- sponding canonical variable: Consider the equivalence relation ωi ≡ ωj if → Γ(ωi)) = Γ(ωj). Let then Ω be the set of equivalence classes and define

→ X p ([ωi]) = pj,

j:ωj ∈[ωi]

→ ↓ → → ↓ and finally Γ ([ωi]) = Γ(ωi). Note that (Γ ) = (Γ ) . As we have shown in Section 26.4 we may assign a basic probability assignment (bpa) to a simple random variable Γ by X mΓ(A) = P (Γ(ω) = A) = pi,

Γ(ωi)=A for any A ⊆ U. Subsets A of U for which mΓ(A) > 0 are also called focal sets of Γ. The bpa allows to compute the degrees of support associated with the random variable Γ, X spΓ(A) = P (Γ(ω) ⊆ A) = mΓ(B). B⊆A In Section 26.3 we introduced the notion of the possibility of a piece of information. In the case of subsets, the possibility of a subset A relative to a random variable Γ is defined as

c c pΓ(A) = {ωi ∈ Ω : Γ(ωi) ∩ A 6= ∅}{ωi ∈ Ω : Γ(ωi) ⊆ A } . (30.1)

From this it follows for the degrees of possibility

c plΓ(A) = 1 − spΓ(A ).

As noted in Section 26.4 the bpa serves also to compute the degrees of possibility X plΓ(A) = mΓ(B). B∩A6=∅

The case of finite subsets discussed here formally corresponds to the clas- sical Dempster-Shafer theory of evidence as presented in (Shafer, 1976). 30.1. SET-VALUED MAPPINGS: FINITE CASE 279

There the degrees of possibility are also called upper probabilities (a term going back to (Dempster, 1967) and also degree of plausibility. In Dempster- Shafer theory of evidence yet another concept, the commonality numbers are introduced X qΓ(A) = mΓ(B). A⊆B

It has been shown in (Shafer, 1976) that any one of the functions mΓ(A), spΓ(A), plΓ(A) or qΓ(A) suffices to determine the other ones. It has already been shown how the bpa determines the support, possibility and common- ality functions. The following theorem specifies the other relations

Theorem 30.1 Let Γ be a random variable with subsets of an universe U as values. Then the following equations hold for all subsets A of the universe U,

X |A\B| mΓ(A) = (−1) spΓ(B), B⊂A X |B| spΓ(A) = (−1) qΓ(B), B⊂Ac X |B| c qΓ(A) = (−1) spΓ(B ), B⊂A X |B|+1 plΓ(A) = (−1) qΓ(B), B⊂A,B6=∅ X |B|+1 qΓ(A) = (−1) plΓ(B), B⊂A The proofs of these relations are of purely combinatorial nature and will not be reproduced here. We refer to (Shafer, 1976; Kohlas & Monney, 1995) for more details and the proofs. Since the sample space Ω is finite its power set algebra ⊗ together with the probability measure P form a probability algebra. The allocation of support sΓ : U → ⊗ is then at the same time an allocation of probability (a.o.p.). In the same way the mapping pΓ : U → ⊗ defined by pΓ(H) = c c sΓ(H ) (see (30.1) abvove) is an allowment of probability. This means it designates the amount of probability mass under which hypothesis H ⊆ U remains possible. An a.o.p. is characterized by properties (A1) and (A2) of Section28.1. For an allowment of probability as exemplified by pΓ we derive two similar conditions:

c c pΓ(∅) = sΓ(U) = Ω = ∅, c c c c c pΓ(H1 ∪ H2) = sΓ((H1 ∪ H2) ) = sΓ(H1 ∩ H2) c c c c c c c = (sΓ(H1) ∩ sΓ(H2)) = sΓ(H1) ∪ sΓ(H2) = pΓ(H1) ∪ pΓ(H2). 280 CHAPTER 30. RANDOM SETS

Further, if Γ is normalized, then pΓ(U) = Ω. The support function spΓ satisfies of course the characterizing properties of Theorem 29.2 in Section 29.1. In the present case of a subset algebra this translates as follows:

1. spΓ(U) = 1, and, if Γ is normalized, spΓ(∅) = 0.

2. Assume H1,...,Hn ⊆ H ⊆ U, n ≥ 1. Then

X |I|+1 spΓ(H) ≥ (−1) spΓ(∩i∈I Hi). (30.2) ∅6=I⊆{1,...,n}

Property (3) of Theorem 29.2 makes no sense in a finite algebra. Simi- lar properties can also be derived for the plausibility function plΓ, as the following theorem shows.

Theorem 30.2 Let Γ be a random mapping into the power algebra of a finite set U. Then the mapping plΓ has the following properties

1. plΓ(∅) = 0, and, if Γ is normalized, spΓ(U) = 1.

2. Assume U ⊆ H1,...,Hn ⊇ H, n ≥ 1. Then

X |I|+1 plΓ(H) ≥ (−1) plΓ(∪i∈I Hi). (30.3) ∅6=I⊆{1,...,n}

c Proof. Both (1) and (2) follow from plΓ(H) = 1 − spΓ(H ). In fact, c plΓ(∅) = 1 − sp(Ω) = 0. Or, from (30.2), since Hi ⊇ H implies Hi ⊆ H,

c X |I|+1 c plΓ(H) = 1 − spΓ(H ) ≥ 1 − (−1) spΓ(∩i∈I Hi ) ∅6=I⊆{1,...,n} X |I|+1 c = 1 − (−1) spΓ((∪i∈I Hi) ) ∅6=I⊆{1,...,n} X |I|+1 = 1 − (−1) (1 − plΓ(∪i∈I Hi)). ∅6=I⊆{1,...,n}

The inequality (30.3) follows finally from   X |I|+1 X |I| (−1) = −  (−1) − 1 = 1. (30.4) ∅6=I⊆{1,...,n} I⊆{1,...,n} ut A function satisfying (2) of the theorem above is called alternating of infinite order. An important particular case arises, when all focal sets of a random mapping Γ : Ω → P(U) are singleton sets, Γ(ωi) = {ui}. Such a random 30.2. SET-VALUED MAPPINGS: GENERAL CASE 281 mapping is called precise, and it corresponds in fact to a classical random variable γ :Ω → U, defined by γ(ωi) = ui. Clearly in this case for all subsets H of U we obtain sΓ(H) = pΓ(H), hence spΓ(H) = plΓ(H). Furthermore, properties (30.3) and (30.3) become

n X |I|+1 spΓ(∪i=1Hi) = (−1) spΓ(∩i∈I Hi), ∅6=I⊆{1,...,n} and

n X |I|+1 plΓ(∩i=1Hi) = (−1) plΓ(∩i∈I Hi). ∅6=I⊆{1,...,n}

But this is equivalent to additivity of both functions,

n n X spΓ(∪i=1Hi) = spΓ(Hi), i=1 if the sets Hi are mutually disjoint. This means that spΓ = plΓ become a probability measure on the the power set of U.

30.2 Set-Valued Mappings: General Case

The results of the previous section can in many respect easily generalized to random mappings into general sets. The main new issue is measurability. As in the previous section consider a set U, the universe, only that it is no more assumed to be finite. Again the power set P(U) is a Boolean algebra,which can be considered as an information algebra with a single domain. So the results of Chapter 27 apply. We are going to review the results of this chapter with respect to the algebra

P(U) as well as those of Chapters 28.1 29 briefly in this section. A random mapping is now a mapping Γ : Ω → P(U) of a probability space (Ω, A.P ) into the power set of the universe U. If the subsets of U are considered as possible pieces of information in the spirit of information algebras, then, information may be more generally be represented by filters in the Boolean lattice P(U), i.e. families of subsets F such that

1. H ∈ F and H ⊆ H0 implies H0 ∈ F .

2. H1,H2 ∈ F implies H1 ∩ H2 ∈ F . Since information order in the Booelan information algebra P(U) is the inverse order to inclusion, filters of Boolean algebra correspond to ideals in 282 CHAPTER 30. RANDOM SETS the information algebra. So, as in Section 27.1 we may generalize random mappings to fitler-valued maps. So, in general, for all ω ∈ Ω we consider Γ(ω) to be a filter of subsets of U. As usual, random mappings with subsets as values can be considered as filter-valued mappings whose values are the principal filters corresponding to the set values. A random mapping Γ induces then an allocation of support

sΓ(H) = {ω ∈ Ω: H ∈ Γ(ω)}.

Note that H ∈ Γ(ω) is equivalent to H ⊆ Γ(ω), if Γ is set-valued. As in Section 27.2 we may define, if Γ is set-valued.

pΓ(H) = {ω ∈ Ω: H ∩ Γ(ω) 6= ∅}, to be the et of assumptions ω under which H, if not necessarily true, is at least possible. We call the mapping pΓ : P(U) → P(Ω) the allowment of possibility associated with the random mapping Γ. Note that

c c c pΓ(H) = {ω ∈ Ω: H ∈ Γ(ω)} = sΓ(H ).

So all ω allow H if they do not imply its complement (negation) Hc. This definition applies more generally also to filter-valued random mappings Γ. So in a Boolean algebra there is a strong duality between support and plau- sibility, which in this form does not exist in general information algebras. We denote as in Chapter 29 the family subsets H of U for which the allocationof support sΓ(H) ∈ (A) is measurable by EΓ. The duality between c c support and possibility implies that EΓ = {H : H ∈ EΓ} contains all the subsets of U for which the allowment of possibility are measurable. Lemma 29.1 of Section 29.1 can be extended to the present case as follows:

Lemma 30.1 For any random mapping Γ, U ∈ E and ∅ ∈ E if Γ is nor- malized, and if H1,H2 ∈ E, then H1 ∩ H2 ∈ E. If Γ is set-valued, then ∞ c c H1,H2,... ∈ E implies ∩i=1Hi ∈ E. And dually: ∅ ∈ E and U ∈ E if Γ is c c normalized, and if H1,H2 ∈ E , then H1 ∪ H2 ∈ E . If Γ is set-valued, then c ∞ c H1,H2,... ∈ E implies ∪i=1Hi ∈ E .

This is an immediate consequence of Lemma 29.1 and the duality between support and possibility. So E is closed under finite intersections, it is a semilattice under intersections, whereas Ec is closed under finite unions. This are minimal properties. Often E enjoys stonger properties, as already the lemma above illustrates. We shall come back to this issue below. On E we define the support function spΓ(H) = P (sΓ(H)) as usual and c on E the plausibility function plΓ(H) = P (pΓ(H)). As in the case of a finite universe, for H ∈ Ec we have still

c plΓ(H) = 1 − spΓ(H ), 30.3. RANDOM MAPPINGS INTO SET ALGEBRAS 283 which is a further element of duality between support and possibility. Fur- ther, the support function is still monotone of infinite order and the plau- sibility function alternating of infinite order, that is (30.2) and (30.3) hold for elements in E and Ec respectively. If Γ is a set valued random mapping, then, as shown in Lemma 30.1, E is closed under countable intersections, and Ec under countable unions. In these cases, both the support function and the plausibility function are continuous, as the next lemma shows

Lemma 30.2 Let Γ be a random mapping such that E is closed under count- c able intersections and E under countable unions. Then, for H1 ⊇ H2 ⊇ · · · ∈ E,

∞ spΓ(∩i=1Hi) = lim spΓ(Hi). i→∞ c Further, for H1 ⊆ H2 ⊆ · · · ∈ E , ∞ plΓ(∪i=1Hi) = lim plΓ(Hi). i→∞ The proof of the first part goes as the proof of Theorem 29.2 and the second part follows from duality. In random set theory (Molchanov, 2005; Nguyen, 1978) usually restric- tive assumptions on the random mappings are made. For instance U is assumed to be a , all Γ(ω) are closed sets and EΓ is sup- posed to contain all Borel sets (Molchanov, 2005). This leads then to a rich theory exploiting the interplay between measurability and topology. In another direction, Probabilistic argumentation systems combine logic and probability in order to derive probabilities of provability of hypotheses us- ing assumption-based reasoning. This has been developed in a context of propositional logic in (?). In another context this approach has been ap- plied to statistical inference (?). It has been shown in Chapter 25 how this is related to random mappings, namely how random mappings can be derived from the assumption-based inference. So, random mappings, in par- ticular set-valued random mappings, provide for the abstract background of probabilistic argumentation systems.

30.3 Random Mappings into Set Algebras

In Chapter 27 we have seen that random mappings into an information algebras form themeselves an information algebra. In this section we con- sider the algebra of random mapping into a set-information algebra. This is an instance of a Boolean information algebra. We use it to illustrate the specificities of uncertain information in Boolean structures. In particular we discuss the transport of uncertain information from one domain to another one. 284 CHAPTER 30. RANDOM SETS

As a base for the present discussion, we consider a compatible family of frames (see Example 5 in Section 2.2 of Part I). Remind that a frame is simply a set x. Different frames x and y may be linked by a refining, that is a mapping τ : x → P(y), such that 1. ∀θ ∈ x τ(x) 6= ∅,

2. θ0 6= θ00 implies τ(θ0) ∩ τ(θ00) = ∅, S 3. θ∈x τ(θ) = y. A compatible family of frames is a family F of sets, with partial order x ≤ y if there is a refining τ : x → P(y), and which is a lattice under this order. It has been remarked in Example 5 of Section 3.1 of Part I that the subsets of frames in a compatible family of frames form an information algebra. In fact, let Φx denote the power set of frame x, and [ Φ = Φx. x∈F

If x ≤ y denote the refining of x to y be τx→y. If φ is an element of Φx, that is a subset of frame x, then, as usual, define [ τx→y(φ) = τx→y(θ). θ∈x

Labeling is defined by d(φ) = x if φ ∈ Φx. Combination is defined for φ ∈ Φx and ψ ∈ Φy by

φ ⊗ ψ = τx→x∨y(φ) ∩ τy→x∨y(ψ).

Projection finally is defined for φ ∈ Φy and x ≤ y as

↓x φ (φ) = {θ ∈ x : τx→y(θ) ∩ φ 6= ∅}.

Let’s call this information algebra (Φ, F). Consider now random mappings Γ : Ω → Φx from a probability space (Ω, A,P ) into Φx for some frame x ∈ F. Such a mapping Γ is labeled with d(Γ) = x. These random mappings inherit as usual the structure of the underlying information algebra, a labeled algebra tis time. So combination of random mappings Γ0 and Γ00 with d(Γ0) = x and d(Γ00) = y results in the mapping

(Γ0 ⊗ Γ00)(ω) = Γ0(ω) ⊗ Γ00(ω). with domain x ∨ y. Similarly, if Γ is a random mapping with domain y and x ≤ y, then the projection of Γ to x is a random mapping

Γ↓x(ω) = (Γ(ω))↓x 30.3. RANDOM MAPPINGS INTO SET ALGEBRAS 285 with domain x. The set of these labeled random mappings R clearly forms together with the lattice of frames F a labeled information algebra (R, F). Now, as usual, a random mapping Γ ∈ R is considered as some uncertain information about the elements of Φ. Assume d(Γ) = x and consider an element φ ∈ Φy. What is the support allocated by Γ to φ? An element ω ∈ Ω supports φ, given the mapping Γ, if (Γ(ω))→y implies φ, that is, (Γ(ω))→y ⊆ φ. Remember that ψ→y = (ψ↑x∨y)↓y. Now, ψ→y ⊆ φ if, and only if, ψ↑x∨y ⊆ φ↑x∨y, which in turn holds if, and only if ψ ⊆ φ→x. So, it follows also that (Γ(ω))→y ⊆ φ if, and only if Γ(ω) ⊆ φ→x. Therefore, we obtain for the allocation of support induced by Γ,

→x →x sΓ(φ) = {ω ∈ Ω : Γ(ω) ⊆ φ } = {ω ∈ Ω : (Γ(ω)) ⊆ φ}, if d(Γ) = x. Similarly, an element ω ∈ Ω allows for a φ ∈ Φy, if it does not support φc. Hence the allowment of possibility associated with Γ is

c c →x c c pΓ(φ) = sΓ(φ ) = {ω ∈ Ω : (Γ(ω)) ⊆ φ } = {ω ∈ Ω : (Γ(ω))→x ∩ φ 6= ∅}.

Just as we may pass from the labeled information algebra (Φ, F) to the associated domain-free information algebra (Φ/σ, F) by the congruence σ ↑x∨y ↑x∨y defined as φ ≡σ ψ if d(φ) = x, d(ψ) = y and φ = ψ (see Chapter 3, Part I), we may do the same with the labeled algebra (R, F). Remind also →x →y 0 00 that φ ≡σ ψ if, and only if φ = ψ and φ = ψ. So, we define Γ ≡σ Γ if d(Γ0) = x, d(Γ00) = y and Γ0↑x∨y = Γ00↑x∨y, or also Γ0 = Γ00→x and Γ0→y = Γ00. This leads then to the domain-free information algebra (R/σ, F). Then, to the equivalence class [Γ]σ of random mappings we may assign a random mapping [Γ]σ :Ω → Φ/σ, defined by

[Γ]σ(ω) = [Γ(ω)]σ.

0 00 0 00 This is well defined since Γ ≡σ Γ implies Γ (ω) ≡σ Γ (ω) for all ω ∈ Ω. Now, it becomes evident that for a given random mapping Γ two elements 0 00 φ and ψ have the same support if φ ≡σ ψ and further, if Γ ≡σ Γ , then an element φ has the same support induced by Γ0 as by Γ00. So, we obtain finally for any Γ ∈ R and φ ∈ Φ, that

sΓ(φ) = s[Γ]σ ([φ]σ), pΓ(φ) = p[Γ]σ ([φ]σ)

In particular, degrees of support and degrees of plausibility are the same for equivalent elements φ and ψ and are further identical for equivalent random mappings. So, the theories of random mappings in its labeled and domain- free versions are equivalent and we may pass at convenience from one to the other. 286 CHAPTER 30. RANDOM SETS Chapter 31

Systems of Independent Sources

287 288 CHAPTER 31. SYSTEMS OF INDEPENDENT SOURCES Chapter 32

Probabilistic Argumentation Systems

32.1 Propositional Argumentation

32.2 Predicate Logic and Uncertainty

32.3 Stochastic Linear Systems

289 290 CHAPTER 32. PROBABILISTIC ARGUMENTATION SYSTEMS Chapter 33

Random Mappings as Information

33.1 Information Order

33.2 Information Measure

33.3 Gaussian Information

291 292 CHAPTER 33. RANDOM MAPPINGS AS INFORMATION References

Chaitin, G. J. 2001. Exploring Randomness. Berlin: Springer-Verlag.

Choquet, G. 1953–1954. Theory of Capacities. Annales de l’Institut Fourier, 5, 131–295.

Cowell, R. G., Dawid, A. P., Lauritzen, S. L., & Spiegelhalter, D. J. 1999. Probabilistic Networks and Expert Systems. Information Sci. and Stats. Springer, New York.

Date, C.J. 1975a. An Introduction to Database Systems. Addison-Wesley.

Date, C.J. 1975b. An Introduction to Database Systems. Addison-Wesley.

Davey, B.A., & Priestley, H.A. 1990a. Introduction to Lattices and Order. Cambridge University Press.

Davey, B.A., & Priestley, H.A. 1990b. Introduction to Lattices and Order. Cambridge University Press.

Dawid, A. P. 2001. Separoids: A Mathematical Framework for Conditional Independence and Irrelevance. Ann. Math. Artif. Intell, 32(1–4), 335– 372.

Dempster, A. 1967. Upper and Lower Probabilities Induced by a Multivalued Mapping. Ann. Math. Stat., 38, 325–339.

Gierz, et. al. G. 2003. Continuous Lattices and Domains. Cambridge Uni- versity Press.

Gr¨atzer,G. 1978. General Lattice Theory. Academic Press.

Guan, Xuechong, & Li, Yongming. 2010. The Continuity of Information Algebra. Unpublished paper, 1–13.

Halmos, Paul R. 1962. Algebraic Logic. New York: Chelsea.

Halmos, Paul R. 1963. Lectures on Boolean Algebras. Van Nostrand- Reinhold.

293 294 REFERENCES

Henkin, L., Monk, J. D., & Tarski, A. 1971. Cylindric Algebras. Amsterdam: North-Holland.

Kahre, Jan. 2002. The Mathematical Theory of Information. Kluwer.

Kappos, D. A. 1969. Probability Algebras and Stochastic Spaces. New York: Academic Press.

Kohlas, J. 1995. Mathematical Foundations of Evidence Theory. Pages 31–64 of: Coletti, G., Dubois, D., & Scozzafava, R. (eds), Mathemat- ical Models for Handling Partial Knowledge in Artificial Intelligence. Plenum Press.

Kohlas, J. 1997. Computational Theory for Information Systems. Tech. rept. 97–07. Institute of Informatics, University of Fribourg.

Kohlas, J. 2003. Information Algebras: Generic Structures for Inference. Springer-Verlag.

Kohlas, J., & Monney, P.A. 1995. A Mathematical Theory of Hints. An Approach to the Dempster-Shafer Theory of Evidence. Lecture Notes in Economics and Mathematical Systems, vol. 425. Springer.

Kohlas, J., Haenni, R., & Lehmann, N. 1999. Assumption-Based Reasoning and Probabilistic Argumentation Systems. Accepted for Publication in the Final DRUMS Handbook: Kohlas J. and Moral S. (eds), Algorithms for Uncertainty and Defeasible Reasoning.

Kolmogorov, A. N. 1965. Three Approaches to the Quantitative Definition of Information. Problems Inform. Transmission, 1, 1–7.

Kolmogorov, A. N. 1968. Logical basis for information theory and probabil- ity. IEEE Transactions on Information Theory, 14, 662–664.

Kullback, S. 1959. Information Theory and Statistics. Wiley.

Langel, J. 2009. An Algebraic Theory of Semantic Information and Ques- tions. PhD thesis, Dept. of Informatics, University of Fribourg.

Lauritzen, S. L., & Spiegelhalter, D. J. 1988. Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statis. Soc. B, 50, 157–224.

Li, M., & Vitanyi, P. 1993. An introduction to Kolmogorov complexity and its applications. New York, NY, USA: Springer-Verlag New York, Inc.

Maier, D. 1983. The Theory of Relational Databases. London: Pitman.

Molchanov, I. 2005. Theory of Random Sets. London: Springer. REFERENCES 295

Nguyen, H. 1978. On random sets and belief functions. Journal of Mathe- matical Analysis and Applications, 65, 531–542.

Norberg, T. 1989. Existence theorems for measures on continuous posets, with applications to random set theory. Math. Scand., 64, 15–51.

Pouly, M., & Kohlas, J. 2010. Generic Inference. John Wiley.

Shafer, G. 1973. Allocation of Probability: A Theory of Partial Belief. Ph.D. thesis, Princeton University.

Shafer, G. 1976. A Mathematical Theory of Evidence. Princeton University Press.

Shafer, G. 1979. Allocations of Probablity. Ann. of Prob., 7, 827–839.

Shafer, G. 1996. Probabilistic Expert Systems. CBMS-NSF Regional Confer- ence Series in Applied Mathematics, no. 67. Philadelphia, PA: SIAM.

Shafer, G., Shenoy, P.P., & Mellouli, K. 1987. Propagating Belief FUnctions in Qualitative Markov Trees. Int. J. of Approximate Reasoning, 1(4), 349–400.

Shannon, C.E. 1948. A Mathematical Theory of Communications. The Bell System Technical Journal, 27, 379–432.

Smyth, M.B. 1992. Topology. Pages 642—761 of: S. Abramsky, D. Gabbay, & Maibaum, T.S.E. (eds), Handbook of Logic in Computer Science. Oxford Science Publications.

Studeny, M. 1993. Formal Properties of Conditional Independence in Differ- ent Calculi of AI. Pages 341–348 of: Clarke, Michael, Kruse, Rudolf, & Moral, Seraf´ın(eds), Symbolic and Quantitative Approaches to Rea- soning and Uncertainty. Lecture Notes in Computer Science, vol. 747. Springer, Berlin.