Statistical The Need for other Information Theories Algebraic Information Theory

Algebraic Information Theory

Marc Pouly

[email protected]

Apsia Breakfast Seminar Interdisciplinary Centre for Security, Reliability and Trust University of Luxembourg

June 2011

Marc Pouly Algebraic Information Theory 1/ 25 Statistical Information Theory Hartley’s Information Theory The Need for other Information Theories Shannon’s Information Theory Algebraic Information Theory Hartley’s Measure (1928)

Given a set S = {s1,..., sn} how can we measure its uncertainty u(S)

1 uncertainty is a non-negative value

2 monotone: |S1| ≤ |S2| ⇒ u(S1) ≤ u(S2)

3 additive: u(S1 × S2) = u(S1) + u(S2)

Theorem: There is only one function that satisfies these properties

u(S) = log |S|

Marc Pouly Algebraic Information Theory 2/ 25 Statistical Information Theory Hartley’s Information Theory The Need for other Information Theories Shannon’s Information Theory Algebraic Information Theory From Uncertainty to Information

The uncertainty of S = {s1,..., s8} is log 8 = 3 bits

Assume now that someone gives more precise information for example that either s3 or s7 has been transmitted

0 We have S = {s3, s7} with a remaining uncertainty log 2 = 1 bit

The information reduces uncertainty by log 8 − log 2 = 2 bits

Information is the Reduction of Uncertainty !

Marc Pouly Algebraic Information Theory 3/ 25 Statistical Information Theory Hartley’s Information Theory The Need for other Information Theories Shannon’s Information Theory Algebraic Information Theory Shannon’s Measure (1948)

How much uncertainty is contained in a set S = {s1,..., sn} if the probability pi = p(si ) of each element is known?

n X S(p1,..., pn) = − pi log pi i=1

We have a similar uniqueness result for specific properties

An information theory is derived by the same principle

This is what people call classical or statistical information theory

1 1 Shannon generalizes Hartley: S( n ,..., n ) = log n

Marc Pouly Algebraic Information Theory 4/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory How do we represent Information

Thanks to Hartley we have an information theory for sets and

thanks to Shannon an information theory for probabilities

But there are other ways of representing information (on computers)

databases (relations), systems of equations and inequalities, constraint systems, possibilistic formalisms, formalisms for imprecise probabilities, Spohn potentials, graphs, , ...

Statistical information theory is not enough !

Marc Pouly Algebraic Information Theory 5/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Extending Hartley’s Theory

Alphabets

Hartley's Information Theory

Probabilistic Sources Isomorphisms

Shannon's Information Theory Relational Information Theory

Marc Pouly Algebraic Information Theory 6/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory The Fundamental Theorem of Theory

Hartley’s information theory assumes a finite alphabet S and assigns values to subsets, i.e. u : P(S) → R≥0.

P(S) is a distributive lattice with meet ∩ and join ∪

Theorem (Fundamental Theorem of Lattice Theory) Every distributive lattice is isomorphic to a lattice of subsets

We can carry over Hartley’s measure to isomorphic formalisms

for example to the relational algebra used in databases

Marc Pouly Algebraic Information Theory 7/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Relational Information Theory

We can therefore measure the uncertainty in relations

Destination Departure Gate Heathrow 10:00 7 Heathrow 14:00 9 R = Gatwick 08:30 4 City 11:15 5 City 15:20 7

and obtain u(R) = log 5 bits

If we agree on the three properties stated by Hartley then u(S) = log |S| is the only correct way of measuring uncertainty in subset systems and hence also the right way for isomorphic formalisms such as the relational algebra.

Marc Pouly Algebraic Information Theory 8/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Duality of Information

Destination Departure Gate Heathrow 10:00 7 Heathrow 14:00 9 R = Gatwick 08:30 4 City 11:15 5 City 15:20 7

1 How to get to London ? the more tuples the more information

2 I am waiting for my friend, which flight might she have taken ? the less tuples the more information

Such a dualism is always present in order theory but not for measures

Is measuring information sometimes too restrictive ?

Marc Pouly Algebraic Information Theory 9/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Linear Equation Systems

Solution set of linear equation systems form affine spaces

X1 − 2X2 + 2X3 = −1 3X1 + 5X2 − 3X3 = 8 4X1 + 3X2 − X3 = 7

The null space of the equation matrix A is

4 9 N (A) = {(x , x , x ) ∈ 3 : x = − x , x = x } 1 2 3 R 1 11 3 2 11 3

How much uncertainty is contained in an equation system ?

Can we treat this just as another subset system ?

Marc Pouly Algebraic Information Theory 10/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory Equational Information Theory

Linear equation systems can have no, one or infinitely many solutions.

Hence, the uncertainty is either log 0, log 1 or log ∞

Here, a (quantitative) measure of information is not appropriate

Marc Pouly Algebraic Information Theory 11/ 25 Statistical Information Theory Extending Hartley’s Theory The Need for other Information Theories Relational Information Theory Algebraic Information Theory Equational Information Theory A first Summary

A theory of information should explain what information is

Hartley & Shannon: information = reduction of uncertainty

Rely on the assumption that information can be measured

There are many formalisms for representing information on computers that are not covered by this theory

Does this theory reflect our daily perception of information ?

What is our perception of information ?

Marc Pouly Algebraic Information Theory 12/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory What is Information ?

... information exists in pieces

... information comes from different sources

... information refers to questions

... pieces of information can be combined

... information can be focussed on the questions of interest

Marc Pouly Algebraic Information Theory 13/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Towards a formal Framework

information exists in pieces φ, ψ ∈ Φ

there is a universe of questions r and every piece of information φ ∈ Φ refers to a finite set of questions d(φ) ⊆ r

combination of information φ ⊗ ψ

↓y focussing of information if d(φ) = x and y ⊆ x then φ ∈ Φ

Marc Pouly Algebraic Information Theory 14/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory ... and the same again for nerds ...

This is a two-sorted algebra (Φ, r) with universe of questions r and information pieces Φ labeling operator d :Φ → P(r) combination operator ⊗ :Φ × Φ → Φ focussing operator ↓:Φ × P(r) → Φ

But operations cannot be arbitrary - they must satisfy some rules !

Marc Pouly Algebraic Information Theory 15/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Axioms of Information

1 it should not matter in which order information is combined

φ ⊗ ψ = ψ ⊗ φ and (φ ⊗ ψ) ⊗ ν = φ ⊗ (ψ ⊗ ν)

2 a combination refers to the union of the question sets

d(φ ⊗ ψ) = d(φ) ∪ d(ψ)

3 focussing information on x ⊆ d(φ) gives information about x

d(φ↓x ) = x

4 focussing can be done step-wise, i.e. if x ⊆ y ⊆ d(φ)

φ↓x = (φ↓y )↓x

5 combining a piece of information with a part of itself gives nothing new

φ ⊗ φ↓x = φ

Marc Pouly Algebraic Information Theory 16/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory The Combination Axiom

How shall ⊗ and ↓ behave with respect to each other ?

6 If φ, ψ ∈ Φ with d(φ) = x and d(ψ) = y then

(φ ⊗ ψ)↓x = φ ⊗ ψ↓x∩y

Compare with the distributive law: (a × b) + (a × c) = a × (b + c)

Definition (Kohlas, 2003) A system (Φ, r) satisfying the six axioms is called

Marc Pouly Algebraic Information Theory 17/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Relational Databases

Relations are pieces of information

Player Club Goals Player Nationality Ronaldinho Barcelona 7 Ronaldinho Brazil φ = Eto’o Barcelona 5 ψ = Eto’o Cameroon Henry Arsenal 5 Henry France Pires Arsenal 2 Pires France

Combination is natural join and focussing is projection:

Player Club Goals Nationality Ronaldinho Barcelona 7 Brazil φ ⊗ ψ = Eto’o Barcelona 5 Cameroon Henry Arsenal 5 France Pires Arsenal 2 France

Goals Nationality ↓{Goals, Nationality} 7 Brazil (φ ⊗ ψ) = 5 Cameroon 5 France 2 France

Marc Pouly Algebraic Information Theory 18/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Other Examples

Linear Equation Systems: Systems of linear equations are pieces of information Combination is union of equation systems Focussing is Gaussian Variable Elimination

Logic: Logical sentences are pieces of information Combination is conjunction Focussing is existential quantification

and many more ...

Marc Pouly Algebraic Information Theory 19/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Algebraic Information Theory (2003)

Given an information algebra (Φ, r) we define

φ  ψ if and only if φ ⊗ ψ = ψ

φ is less informative than ψ if it is absorbed by ψ

this relation forms a partial order called order of information

Algebraic information theory does not measure information but compares information pieces based on how informative they are with respect to each other

Marc Pouly Algebraic Information Theory 20/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Neutral Information

Some information pieces are special. Usually, exactly one such element of each type exists for every set of questions s ⊆ r.

Neutral Information: for x ⊆ r there exists ex ∈ Φ such that

↓y φ ⊗ ex = φ and ex = ey

Combination with neutral information has no effect From neutral info we can only extract neutral info

Marc Pouly Algebraic Information Theory 21/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Contradictory Information

Contradictory Info: for x ⊆ y ⊆ r there exists zx ∈ Φ s.t.

↓x φ ⊗ zx = zx and φ = zx then φ = zy

Contradictory information absorbs everything Contradictions can only be derived from contradictions

Marc Pouly Algebraic Information Theory 22/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Some interesting Properties

1 if φ  ψ then d(φ) ⊆ d(ψ)

2 if x ⊆ d(φ) then ex  φ

3 if d(φ) ⊆ x then φ  zx

4 φ, ψ  φ ⊗ ψ

5 φ ⊗ ψ = sup{φ, ψ}

6 if x ⊆ d(φ) then φ↓x  φ

7 φ1  φ2 and ψ1  ψ2 imply φ1 ⊗ ψ1  φ2 ⊗ ψ2 8 if x ⊆ d(φ) ∩ d(ψ) then φ↓x ⊗ ψ↓x  (φ ⊗ ψ)↓x

9 if x ⊆ d(φ) then φ  ψ implies φ↓x  ψ↓x

Marc Pouly Algebraic Information Theory 23/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory Conclusion

Quantitative information theory is a success story !

But we cannot measure everything

Quantitative theories do not reflect our perception of information

Algebraic information theory can be defined in a generic way on every formalism that satisfies the axioms of an information algebra

The basic concept of algebraic information theory is a partial order of information

Marc Pouly Algebraic Information Theory 24/ 25 Statistical Information Theory Information Algebras The Need for other Information Theories Examples Algebraic Information Theory Algebraic Information Theory

Marc Pouly Algebraic Information Theory 25/ 25