Statistical Information Theory The Need for other Information Theories Algebraic Information Theory
Algebraic Information Theory
Marc Pouly
Apsia Breakfast Seminar Interdisciplinary Centre for Security, Reliability and Trust University of Luxembourg
June 2011
Marc Pouly Algebraic Information Theory 1/ 25 Statistical Information Theory Hartley’s Information Theory The Need for other Information Theories Shannon’s Information Theory Algebraic Information Theory Hartley’s Measure (1928)
Given a set S = {s1,..., sn} how can we measure its uncertainty u(S)
1 uncertainty is a non-negative value
2 monotone: |S1| ≤ |S2| ⇒ u(S1) ≤ u(S2)
3 additive: u(S1 × S2) = u(S1) + u(S2)
Theorem: There is only one function that satisfies these properties
u(S) = log |S|
Marc Pouly Algebraic Information Theory 2/ 25 Statistical Information Theory Hartley’s Information Theory The Need for other Information Theories Shannon’s Information Theory Algebraic Information Theory From Uncertainty to Information
The uncertainty of S = {s1,..., s8} is log 8 = 3 bits
Assume now that someone gives more precise information for example that either s3 or s7 has been transmitted
0 We have S = {s3, s7} with a remaining uncertainty log 2 = 1 bit
The information reduces uncertainty by log 8 − log 2 = 2 bits
Information is the Reduction of Uncertainty !
Marc Pouly Algebraic Information Theory 3/ 25 Statistical Information Theory Hartley’s Information Theory The Need for other Information Theories Shannon’s Information Theory Algebraic Information Theory Shannon’s Measure (1948)
How much uncertainty is contained in a set S = {s1,..., sn} if the probability pi = p(si ) of each element is known?
n X S(p1,..., pn) = − pi log pi i=1
We have a similar uniqueness result for specific properties
An information theory is derived by the same principle
This is what people call classical or statistical information theory
1 1 Shannon generalizes Hartley: S( n ,..., n ) = log n
Marc Pouly Algebraic Information Theory 4/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory How do we represent Information
Thanks to Hartley we have an information theory for sets and
thanks to Shannon an information theory for probabilities
But there are other ways of representing information (on computers)
databases (relations), systems of equations and inequalities, constraint systems, possibilistic formalisms, formalisms for imprecise probabilities, Spohn potentials, graphs, logics, ...
Statistical information theory is not enough !
Marc Pouly Algebraic Information Theory 5/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Extending Hartley’s Theory
Alphabets
Hartley's Information Theory
Probabilistic Sources Isomorphisms
Shannon's Information Theory Relational Information Theory
Marc Pouly Algebraic Information Theory 6/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory The Fundamental Theorem of Lattice Theory
Hartley’s information theory assumes a finite alphabet S and assigns values to subsets, i.e. u : P(S) → R≥0.
P(S) is a distributive lattice with meet ∩ and join ∪
Theorem (Fundamental Theorem of Lattice Theory) Every distributive lattice is isomorphic to a lattice of subsets
We can carry over Hartley’s measure to isomorphic formalisms
for example to the relational algebra used in databases
Marc Pouly Algebraic Information Theory 7/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Relational Information Theory
We can therefore measure the uncertainty in relations
Destination Departure Gate Heathrow 10:00 7 Heathrow 14:00 9 R = Gatwick 08:30 4 City 11:15 5 City 15:20 7
and obtain u(R) = log 5 bits
If we agree on the three properties stated by Hartley then u(S) = log |S| is the only correct way of measuring uncertainty in subset systems and hence also the right way for isomorphic formalisms such as the relational algebra.
Marc Pouly Algebraic Information Theory 8/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Duality of Information
Destination Departure Gate Heathrow 10:00 7 Heathrow 14:00 9 R = Gatwick 08:30 4 City 11:15 5 City 15:20 7
1 How to get to London ? the more tuples the more information
2 I am waiting for my friend, which flight might she have taken ? the less tuples the more information
Such a dualism is always present in order theory but not for measures
Is measuring information sometimes too restrictive ?
Marc Pouly Algebraic Information Theory 9/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Linear Equation Systems
Solution set of linear equation systems form affine spaces
X1 − 2X2 + 2X3 = −1 3X1 + 5X2 − 3X3 = 8 4X1 + 3X2 − X3 = 7
The null space of the equation matrix A is
4 9 N (A) = {(x , x , x ) ∈ 3 : x = − x , x = x } 1 2 3 R 1 11 3 2 11 3
How much uncertainty is contained in an equation system ?
Can we treat this just as another subset system ?
Marc Pouly Algebraic Information Theory 10/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Equational Information Theory
Linear equation systems can have no, one or infinitely many solutions.
Hence, the uncertainty is either log 0, log 1 or log ∞
Here, a (quantitative) measure of information is not appropriate
Marc Pouly Algebraic Information Theory 11/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory A first Summary
A theory of information should explain what information is
Hartley & Shannon: information = reduction of uncertainty
Rely on the assumption that information can be measured
There are many formalisms for representing information on computers that are not covered by this theory
Does this theory reflect our daily perception of information ?
What is our perception of information ?
Marc Pouly Algebraic Information Theory 12/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory What is Information ?
... information exists in pieces
... information comes from different sources
... information refers to questions
... pieces of information can be combined
... information can be focussed on the questions of interest
Marc Pouly Algebraic Information Theory 13/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Towards a formal Framework
information exists in pieces φ, ψ ∈ Φ
there is a universe of questions r and every piece of information φ ∈ Φ refers to a finite set of questions d(φ) ⊆ r
combination of information φ ⊗ ψ
↓y focussing of information if d(φ) = x and y ⊆ x then φ ∈ Φ
Marc Pouly Algebraic Information Theory 14/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory ... and the same again for nerds ...
This is a two-sorted algebra (Φ, r) with universe of questions r and information pieces Φ labeling operator d :Φ → P(r) combination operator ⊗ :Φ × Φ → Φ focussing operator ↓:Φ × P(r) → Φ
But operations cannot be arbitrary - they must satisfy some rules !
Marc Pouly Algebraic Information Theory 15/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Axioms of Information
1 it should not matter in which order information is combined
φ ⊗ ψ = ψ ⊗ φ and (φ ⊗ ψ) ⊗ ν = φ ⊗ (ψ ⊗ ν)
2 a combination refers to the union of the question sets
d(φ ⊗ ψ) = d(φ) ∪ d(ψ)
3 focussing information on x ⊆ d(φ) gives information about x
d(φ↓x ) = x
4 focussing can be done step-wise, i.e. if x ⊆ y ⊆ d(φ)
φ↓x = (φ↓y )↓x
5 combining a piece of information with a part of itself gives nothing new
φ ⊗ φ↓x = φ
Marc Pouly Algebraic Information Theory 16/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory The Combination Axiom
How shall ⊗ and ↓ behave with respect to each other ?
6 If φ, ψ ∈ Φ with d(φ) = x and d(ψ) = y then
(φ ⊗ ψ)↓x = φ ⊗ ψ↓x∩y
Compare with the distributive law: (a × b) + (a × c) = a × (b + c)
Definition (Kohlas, 2003) A system (Φ, r) satisfying the six axioms is called information algebra
Marc Pouly Algebraic Information Theory 17/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Relational Databases
Relations are pieces of information
Player Club Goals Player Nationality Ronaldinho Barcelona 7 Ronaldinho Brazil φ = Eto’o Barcelona 5 ψ = Eto’o Cameroon Henry Arsenal 5 Henry France Pires Arsenal 2 Pires France
Combination is natural join and focussing is projection:
Player Club Goals Nationality Ronaldinho Barcelona 7 Brazil φ ⊗ ψ = Eto’o Barcelona 5 Cameroon Henry Arsenal 5 France Pires Arsenal 2 France
Goals Nationality ↓{Goals, Nationality} 7 Brazil (φ ⊗ ψ) = 5 Cameroon 5 France 2 France
Marc Pouly Algebraic Information Theory 18/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Other Examples
Linear Equation Systems: Systems of linear equations are pieces of information Combination is union of equation systems Focussing is Gaussian Variable Elimination
Logic: Logical sentences are pieces of information Combination is conjunction Focussing is existential quantification
and many more ...
Marc Pouly Algebraic Information Theory 19/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Algebraic Information Theory (2003)
Given an information algebra (Φ, r) we define
φ ψ if and only if φ ⊗ ψ = ψ
φ is less informative than ψ if it is absorbed by ψ
this relation forms a partial order called order of information
Algebraic information theory does not measure information but compares information pieces based on how informative they are with respect to each other
Marc Pouly Algebraic Information Theory 20/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Neutral Information
Some information pieces are special. Usually, exactly one such element of each type exists for every set of questions s ⊆ r.
Neutral Information: for x ⊆ r there exists ex ∈ Φ such that
↓y φ ⊗ ex = φ and ex = ey
Combination with neutral information has no effect From neutral info we can only extract neutral info
Marc Pouly Algebraic Information Theory 21/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Contradictory Information
Contradictory Info: for x ⊆ y ⊆ r there exists zx ∈ Φ s.t.
↓x φ ⊗ zx = zx and φ = zx then φ = zy
Contradictory information absorbs everything Contradictions can only be derived from contradictions
Marc Pouly Algebraic Information Theory 22/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Some interesting Properties
1 if φ ψ then d(φ) ⊆ d(ψ)
2 if x ⊆ d(φ) then ex φ
3 if d(φ) ⊆ x then φ zx
4 φ, ψ φ ⊗ ψ
5 φ ⊗ ψ = sup{φ, ψ}
6 if x ⊆ d(φ) then φ↓x φ
7 φ1 φ2 and ψ1 ψ2 imply φ1 ⊗ ψ1 φ2 ⊗ ψ2 8 if x ⊆ d(φ) ∩ d(ψ) then φ↓x ⊗ ψ↓x (φ ⊗ ψ)↓x
9 if x ⊆ d(φ) then φ ψ implies φ↓x ψ↓x
Marc Pouly Algebraic Information Theory 23/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Conclusion
Quantitative information theory is a success story !
But we cannot measure everything
Quantitative theories do not reflect our perception of information
Algebraic information theory can be defined in a generic way on every formalism that satisfies the axioms of an information algebra
The basic concept of algebraic information theory is a partial order of information
Marc Pouly Algebraic Information Theory 24/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory
Marc Pouly Algebraic Information Theory 25/ 25