Lecture Notes the Algebraic Theory of Information
Total Page:16
File Type:pdf, Size:1020Kb
Lecture Notes The Algebraic Theory of Information Prof. Dr. J¨urgKohlas Institute for Informatics IIUF University of Fribourg Rue P.-A. Faucigny 2 CH { 1700 Fribourg (Switzerland) E-mail: [email protected] http://diuf.unifr.ch/tcs Version: November 24, 2010 2 Copyright c 1998 by J. Kohlas Bezugsquelle: B. Anrig, IIUF Preface In today's scientific language information theory is confined to Shannon's mathematical theory of communication, where the information to be trans- mitted is a sequence of symbols from a finite alphabet. An impressive mathe- matical theory has been developed around this problem, based on Shannon's seminal work. Whereas it has often been remarked that this theory , albeit its importance, cannot seize all the aspect of the concept of \information", little progress has been made so far in enlarging Shannon's theory. In this study, it is attempted to develop a new mathematical theory cov- ering or deepening other aspects of information than Shannon's theory. The theory is centered around basic operations with information like aggregation of information and extracting of information. Further the theory takes into account that information is an answer, possibly a partial answer, to ques- tions. This gives the theory its algebraic flavor, and justifies to speak of an algebraic theory of information. The mathematical theory presented her is rooted in two important de- velopments of recent years, although these developments were not originally seen as part of a theory of information. The first development is local compu- tation, a topic starting with the quest for efficient computation methods in probabilistic networks. It has been shown around 1990 that these computa- tion methods are closely related to some algebraic structure, for which many models or instances exist. This provides the abstract base for generic infer- ence methods covering a wide spectrum of topics like databases, different systems of logic, probability networks and many other formalisms of uncer- tainty. This algebraic structure, completed by the important idempotency axiom, is the the base of the mathematical theory developed here. Whereas the original structure has found some interest for computation purposes, it becomes its flavor as a theory of information only with the additional idem- potency axiom. This leads to information algebras, the object of the present study. Surprisingly, the second development converging to the present work is Dempster-Shafer Theory of Evidence. It was proposed originally as an ex- tension of probability theory for inference under uncertainty, an important topic in Artificial Intelligence and related fields. In everyday language one often speaks of uncertain information. In our framework this concept is 3 4 rigorously seized by random variables with values in information algebras. This captures the idea that a source emits information, which may be in- terpreted in different ways, which depending on various assumptions may result in different information elements. It turns out that this results in a natural generalization or extension of Dempster-Shafer Theory of Evidence, linking thus this theory in a very basic way to our theory of information. In this way the algebraic of theory of information is thought of as a mathematical theory forming the rigorous common background of many the- ories and formalisms studied and developed in Computer Science. Whereas each individual formalism has its own particularities, the algebraic theory of information helps to understand the common structure of information processing. Contents 1 Introduction 9 I Information Algebras 13 2 Modeling Questions 15 2.1 Refining and Coarsening: the Order Structure . 15 2.2 Concrete Question Structures . 16 3 Axiomatics of Information Algebra 21 3.1 Projection Axiomatics . 21 3.2 Multivariate Information Algebras . 26 3.3 File Systems . 31 3.4 Variable Elimination . 33 3.5 The Transport Operation . 41 3.6 Domain-Free Algebra . 46 4 Order of Information 61 4.1 Partial Order of Information . 61 4.2 Ideal Completion of Information Algebras . 65 4.3 Atomic Algebras and Relational Systems . 71 4.4 Finite Elements and Approximation . 77 4.5 Compact Ideal Completion . 83 4.6 Labeled Compact Information Algebra . 85 4.7 Continuous Algebras . 91 5 Mappings 99 5.1 Order-Preserving Mappings . 99 5.2 Continuous Mappings . 101 5.3 Continuous Mappings Between Compact Algebras . 106 6 Boolean Information Algebra 111 6.1 Domain-Free Boolean Information Algebra . 111 6.2 Labeled Boolean Information Algebra . 118 5 6 CONTENTS 6.3 Representation by Set Algebras . 125 7 Independence of Information 129 7.1 Conditional Independence of Domains . 129 7.2 Factorization of Information . 137 7.3 Atomic Algebras and Normal Forms . 141 8 Measure of Information 143 8.1 Information Measure . 143 8.2 Dual Information in Boolean Information Algebras . 143 9 Topology of information Algebras 145 9.1 Observable Properties and Topology . 145 9.2 Alexandrov Topology of Information Algebras . 146 9.3 Scott Topology of Compact Information Algebras . 146 9.4 Topology of Labeled Algebras . 146 9.5 Representation Theory of Information Algebras . 146 10 Algebras of Subalgebras 147 10.1 Subalgebras of Domain-free Algebras . 147 10.2 Subalgebras of Labeled Algebras . 147 10.3 Compact Algebra of Subalgebras . 147 10.4 Subalgebras of Compact Algebras . 147 II Local Computation 149 11 Local Computation on Markov Trees 151 11.1 Problem and Complexity Considerations . 151 11.2 Markov Trees . 153 11.3 Local Computation . 158 11.4 Updating and Retracting . 162 12 Multivariate Systems 165 12.1 Local Computation on Covering Join Trees . 165 12.2 Fusion Algorithm . 165 12.3 Updating . 165 13 Boolean Information Algebra 167 14 Finite Information and Approximation 169 15 Effective Algebras 171 16 173 CONTENTS 7 17 175 III Information and Language 177 18 Introduction 179 19 Contexts 181 19.1 Sentences and Models . 181 19.2 Lattice of Contexts . 181 19.3 Information Algebra of Contexts . 181 20 Propositional Information 183 21 Expressing Information in Predicate Logic 185 22 Linear Systems 187 23 Generating Atoms 189 24 191 IV Uncertain Information 193 25 Functional Models and Hints 195 25.1 Inference with Functional Models . 195 25.2 Assumption-Based Inference . 197 25.3 Hints . 199 25.4 Combination and Focussing of Hints . 202 25.5 Algebra of Hints . 205 26 Simple Random Variables 209 26.1 Domain-Free Variables . 209 26.2 Labeled Random Variables . 211 26.3 Degrees of Support and Plausibility . 212 26.4 Basic Probability Assignments . 214 27 Random Information 217 27.1 Random Mappings . 217 27.2 Support and Possibility . 219 27.3 Random Variables . 224 27.4 Random Variables in σ-Algebras . 229 8 CONTENTS 28 Allocation of Probability 235 28.1 Algebra of Allocation of Probabilities . 235 28.2 Random Variables and Allocations of Probability . 242 28.3 Labeled Allocations of Probability . 247 29 Support Functions 253 29.1 Characterization . 253 29.2 Generating Support Functions . 257 29.3 Extension of Support Functions . 262 30 Random Sets 277 30.1 Set-Valued Mappings: Finite Case . 277 30.2 Set-Valued Mappings: General Case . 281 30.3 Random Mappings into Set Algebras . 283 31 Systems of Independent Sources 287 32 Probabilistic Argumentation Systems 289 32.1 Propositional Argumentation . 289 32.2 Predicate Logic and Uncertainty . 289 32.3 Stochastic Linear Systems . 289 33 Random Mappings as Information 291 33.1 Information Order . 291 33.2 Information Measure . 291 33.3 Gaussian Information . 291 References 293 Chapter 1 Introduction Information is a central concept of science, especially of Computer Science. The well developed fields of statistical and algorithmic information theory focus on measuring information content of messages, statistical data or com- putational objects. There are however many more facets to information. We mention only three: 1. Information should always be considered relative to a given specific question. And if necessary, information must be focused onto this question. The part of information relevant to the question must be extracted. 2. Information may arise from different sources. There is therefore the need for aggregation or combination of different pieces of information. This is to ensure that the whole information available is taken into account. 3. Information can be uncertain, because its source is unreliable or be- cause it may be distorted. Therefore it must be considered as defeasible and it must be weighed according to its reliability. The first two aspects induce a natural algebraic structure of information. The two basic operations of this algebra are combination or aggregation of information and focusing or extraction of information. Especially the oper- ation of focusing relates information to a structure of interdependent ques- tions. These notes are devoted to a discussion of this algebraic structure and its variants. It will demonstrate that many well known, but very different formalisms such as for example relational algebra and probabilistic networks, as well as many others, fit into this abstract framework. The algebra allows for a generic study of the structure of information. Furthermore it permits to construct generic architectures for inference. The third item above, points to a very central property of real-life infor- mation, namely uncertainty. Clearly, probability is the appropriate tool to 9 10 CHAPTER 1. INTRODUCTION describe and study uncertainty. Information however is not only numerical. Therefore, the usual concept of random variables is not sufficient to study uncertain information. It turns out that the algebra of information is the natural framework to model uncertain information. This will be another important subject of these notes. In gthe first part of these notes, the basic algebraic structure of in- formation will be siscussed. Questions are formally represented by their possible answers. However the answers may be given by varying granular- ity. This introduces a partial order between questions which reflects refining and coarsening of questions. If two different questions are considered si- multaneously, then this means that the coarsest set of answers which allows to obtain answers to both questions should be considered.