Institute of Information Theory and Automation

Academy of Sciences of the Czech Republic

On mathematical description of

probabilistic conditional

indep endence structures

Milan Studen y

Prague May

The thesis for the degree Do ctor of Science in the eld mathematical

informatics and theoretical cyb ernetics

The thesis was written in the Department of DecisionMaking Theory of the Institute of

Information Theory and Automation Academy of Sciences of the Czech Republic and

was supp orted by the pro jects K and GACR n It is a result of a

longterm research p erformed also within the framework of ESF research program Highly

Structured Sto chastic Systems for and the program Causal Interpretation and

Identication of Conditional Indep endence Structures of the Fields Institute for Research

in Mathematical Sciences University of Toronto Canada September November

Address of the author

Milan Studen y

Institute of Information Theory and Automation

Academy of Sciences of the Czech Republic

Pod vodarenskou vez

Prague Czech Republic

email studenyutiacascz

web page httpwwwutiacasczuse r datastudenystudeny homehtml

Contents

Introduction

Motivation account

Goals of the work

Structure of the work

Basic concepts

Conditional indep endence

Semigraphoid prop erties

Formal indep endence mo dels

Semigraphoids

Elementary indep endence statements

Problem of axiomatic characterization

Classes of probability measures

Marginally continuous measures

Factorizable measures

Multiinformation and conditional pro duct

Prop erties of multiinformation function

Positive measures

Gaussian measures

Basic construction

Imsets

Graphical metho ds

Undirected graphs

Acyclic directed graphs

Classic chain graphs

Within classic graphical mo dels

Decomp osable mo dels

Recursive causal graphs

Lattice conditional indep endence mo dels

Bubble graphs

Advanced graphical mo dels

General directed graphs

Recipro cal graphs

Jointresponse chain graphs

Covariance graphs

Alternative chain graphs

Annotated graphs

Hidden variables

Ancestral graphs

MC graphs

Incompleteness of graphical approaches

Structural imsets fundamentals

Basic class of distributions

Discrete measures

Nondegenerate Gaussian measures

Nondegenerate conditional Gaussian measures

Classes of structural imsets

Elementary imsets

Semielementary and combinatorial imsets

Structural imsets

Pro duct formula induced by a structural imset

Examples of reference systems of measures

Topological assumptions

Markov condition

Semigraphoid induced by a structural imset

Markovian measures

Equivalence result

Description of probabilistic mo dels

Sup ermo dular functions

Semigraphoid pro duced by a sup ermo dular function

Strong equivalence of sup ermo dular functions

Skeletal sup ermo dular functions

Skeleton

Signicance of skeletal imsets

Description of mo dels by structural imsets

Galois connection

Formal concept analysis

Lattice of structural mo dels

Markov equivalence

Two concepts of equivalence

Facial and Markov equivalence

Facial implication

Direct characterization of facial implication

Skeletal characterization of facial implication

Adaptation to a distribution framework

Testing facial implication

Testing structural imsets

Grade

Invariants of facial equivalence

The problem of representative choice

Baricentral imsets

Standard imsets

Translation of DAG mo dels

Translation of decomp osable mo dels

Imsets of the least degree

Strong facial implication

Minimal generators

Width

Determining and unimarginal classes

Imsets with the least lower class

Exclusivity of standard imsets

Other ways of representation

Pattern

Dual description

Op en problems

Unsolved theoretical problems

Miscellaneous topics

Classication of skeletal imsets

Op erations with structural mo dels

Reductive op erations

Expansive op erations

Accumulative op erations

Implementation tasks

Interpretation and learning tasks

Meaningful description of structural mo dels

Distribution frameworks and learning

Conclusions

App endix

Classes of sets

Posets and lattices

Graphs

Topological concepts

Measuretheoretical concepts

Conditional indep endence of algebras

Relative entropy

Finitedimensional subspaces and cones

Linear subspaces

Convex cones

Concepts from multivariate analysis

Matrices

Statistical characteristics of probability measures

Multivariate Gaussian distributions

Index

Bibliography

Acknowledgements

Chapter

Introduction

The central topic of this work is how to describe the structures of probabilistic conditional

independence in a way that the corresp onding mathematical mo del has b oth relevant

interpretation and oers the p ossibility of computer implementation

It is a mathematical work which however found its motivation in articial intelligence

and statistics In fact these two elds are the main areas where the concept of conditional

indep endence was successfully applied More sp ecically graphical mo dels of conditional

indep endence structure are widely used in

analysis of contingency tables which is an area of discrete statistics dealing with

categorical data

multivariate analysis which is a branch of statistics investigating mutual relation

ships among continuous realvalued variables

probabilistic reasoning which is an area of articial intelligence where decision

making under uncertainty is done on basis of probabilistic mo dels

Moreover nonprobabilistic concept of conditional indep endence was introduced and

studied in several other calculi for dealing with knowledge and uncertainty in articial in

telligence eg relational databases p ossibility theory Sp ohns kappacalculus Dempster

Shafers theory of evidence Thus the presented work has multidisciplinary avour Nev

ertheless it certainly falls within the scop e of informatics or theoretical cybernetics and

the main emphasis is put on mathematical groundings

The work uses concepts from several branches of mathematics in particular measure

theory discrete mathematics information theory and algebra Occasional links to further

areas of mathematics o ccur throughout the work eg to probability theory mathematical

statistics top ology and mathematical logic

Motivation account

The reader is asked to excuse the following metho dological consideration which p erhaps

explains my motivation In the sequel I formulate six general questions of interest which

may arise in connection with every particular metho d of description of conditional inde

p endence structures I think that these questions should b e answered in order to judge

fairly and carefully the quality and suitability of every particular considered metho d

knowledge representatives

probability distributions

ob jects of discrete

mathematics

formal indep endence mo dels

Figure Theoretical groundings informal illustration

To b e more sp ecic one can assume a general situation illustrated by Figure

One would like to describ e conditional indep endence structures shortly CI structures

induced by probability distributions from a given xed class of distributions over a set

of variables N For example one can consider the class of discrete measures over N see

p the class of nondegenerate Gaussian measures over N see p the class of

CG measures over N see p or any sp ecic parametrized class of distributions In

probabilistic reasoning every particular discrete probability measure over N represents

global knowledge ab out a random system involving variables of N That means it

serves as a knowledge representative Thus one can take even a more general p oint

of view and consider a general class of knowledge representatives within an alternative

uncertainty calculus of articial intelligence instead of the class of probability distributions

eg a class of p ossibilistic distributions over N a class of relational databases over N

etc

Every knowledge representative of this kind induces a formal indep endence mo del over

N for denition see Section on p Thus the class of induced conditional inde

p endence mo dels is dened or in other words the class of CI structures to b e describ ed

is sp ecied the shaded resp ectively coloured area in Figure Well one has in mind

a metho d of description of CI structures in which ob jects of discrete mathematics for

example graphs nite lattices or discrete functions are used to describ e CI structures

Typical examples are classic graphical mo dels widely used in multivariate statistics and

probabilistic reasoning for details see Chapter It is supp osed that every ob ject of this

type induces a formal indep endence mo del over N Intended interpretation is that the

ob ject then describ es the induced indep endence mo del so that it can p ossibly describ e

a conditional indep endence mo del that is one of the CI structures to b e describ ed

The denition of the induced mo del dep ends on the type of considered ob jects Ev

ery class of ob jects has its sp ecic criterion according to which a formal indep endence

mo del is ascrib ed to a particular ob ject For example various separation criteria for clas

sic graphical mo dels were obtained as a result of development of miscellaneous Markov

prop erties see Remark in Section Evolution ended by the concept of global

Markov condition which establishes a graphical criterion how to determine the maximal

set of conditional indep endence statements represented in a given graph This set is the

induced formal indep endence mo del then The ab ove mentioned implicit assumption is

a basic requirement of consistency that is the requirement that every ob ject in the con

sidered class of ob jects has undoubtedly ascrib ed a certain formal indep endence mo del

Note that some recently developed graphical approaches see Section still need to

b e developed up till the concept of global Markov condition so that they will comply with

the basic requirement of consistency

Under situation ab ove I can formulate rst three questions of interest which in my

opinion are the most imp ortant theoretical questions in this general context

Faithfulness is the question whether every considered ob ject indeed describ es one

of the CI structures to b e describ ed

Completeness is the question whether every CI structure to b e describ ed is describ ed

by one of the considered ob jects In case this is not the case an advanced subtask

o ccurs namely to characterize conveniently those formal indep endence mo dels which

can b e describ ed by the considered ob jects

Equivalence question involves the taks to characterize in a suitable way equivalent

ob jects that is ob jects describing the same CI structure An advanced sub question

is whether one can nd a suitable representative for every class of equivalent ob jects

The phrase faithfulness was inspired by terminology of where it has similar meaning

for graphical ob jects Of course the ab ove notions dep end on the considered class of

knowledge representatives so that one can dierentiate b etween faithfulness in discrete

framework relative to the class of discrete measures and faithfulness in Gaussian

framework Note that in case of classic graphical mo dels faithfulness is usually ensured

while completeness is not see Section To avoid misunderstanding let me explain that

some authors in the area of classic graphical mo del including myself have also used a

traditional term strong completeness of a separation graphical criterion

However according to the classication ab ove results of this type b elong to the

results gathered under lab el faithfulness customary reasons of traditional terminology

are explained in Remark on p Thus I distinguish b etween completeness of a

criterion on one hand and completeness of a class of ob jects for description of a class

of CI structures one the other hand Let me remark that not all relevent theoretical

questions can b e included in the ab ove classication eg the inclusion problem see p

which can p erhaps b e regarded as a sp ecic extension of the equivalence question

motivated by additional practical questions

Let me formulate three remaining questions of interest which in my opinion are the

most imp ortant practical questions in this context for an informal illustrative picture see

Figure

Interpretability is the question whether considered ob jects of discrete mathematics

can b e conveyed to humans in an acceptable way That usually means whether they

can b e visualized in a way that they are understo o d easily and interpreted correctly

as CI structures

DATA

learning

interpretation

HUMAN

Figure

learning

see

THEORETICAL GROUNDINGS

implementation

COMPUTER

Figure Practical questions informal illustration

Learning or identication is the question how to determine the most suitable CI

structure either on basis of statistical data estimation problem or on basis of

exp ert knowledge provided by human exp erts An advanced statistical subtask is

to determine even a particular probability distribution inducing the CI structure

Implementation is the question how to manage the corresp onding computational

tasks An advanced sub question is whether acceptance of a particular CI structure

allows one to do resp ective subsequent calculation with probability distributions

eectively namely whether the describing ob jects clue in calculation

Classic graphical mo dels are easily acceptable by humans but their pictorial representation

may sometimes lead to another interpretation For example acyclic directed graphs

can b e either interpreted as CI structures or one can prefer causal or deterministic

interpretation of their edges which is dierent Concerning computational asp ects

almost ideal framework is provided by the class of decomposable models which is a sp ecial

class of graphical mo dels see Section This is a basis of wellknown metho d of lo cal

computation which is b ehind several working probabilistic exp ert systems

Of course the presented questions are connected each other For example structure

learning from exp erts certainly dep ends on interpretation while advanced distribution

learning is closely related to the parametrization problem see p which has a strong

computational asp ect

The goal of this motivation account is the idea that the practical questions are also

strongly connected with theoretical groundings Thus in my opinion b efore insp ection

of practical questions one should rst solve the related theoretical questions thoroughly

Regretably some researches in articial intelligence marginally in statistics do not pay

enough attention to theoretical groundings and concentrate mainly on practical issues like

simplicity of accepted mo dels either from the p oint of view of computation or visualization

They usually settle in a certain class of nice graphical mo dels eg Bayesian networks

see p and do not realize that their later technical problems are caused by this

limitation

Even worse limitation to a small class of mo dels may lead to serious metho dological

errors Let me give an example which is my main source of motivation Consider a

hypothetical situation when one is trying to learn CI structure induced by a discrete

distribution on basis of statistical data Supp ose moreover that one is limited to a

certain class of classic graphical mo dels say Bayesian networks It is known that this

class is not complete in discrete framework see Chapter Therefore one searches for

the b est approximation Well some of the learning algorithms for graphical mo dels

browse thorough the class of p ossible graphs as follows One starts with a graph with

maximal number of edges p erforms certain statistical tests for conditional indep endence

statements and represents the acceptance of these statements by removal of certain edges

in the graph Well this is a correct pro cedure in case that the underlying probability

distribution indeed induces a CI structure which can b e describ ed by a graph within the

considered class of graphs However in general this edge removal represents acceptance

of a new graphical mo del together with all other conditional indep endence statements

which are represented in the new graph but which may not be valid with respect to the

underlying distribution Let me emphasize once more that this erroneous acceptance of

further conditional indep endence statements is made on basis of a correctly recognized

conditional indep endence statement

Thus this error is indeed forced by the limitation to a certain class of graphical

mo dels which is not complete Note that an attitude like this was already criticized

several times see eg In my opinion these rep eated problems in solving practical

question of learning are inevitable consequences of omission of theoretical groundings

namely the question of completeness This maybe motivated several recent attempts to

introduce wider and wider classes of graphs which however lo ose easy interpretation and

do not achieve completeness Therefore in this work I prop ose a nongraphical metho d

of description of probabilistic CI structures which primarily solves the completeness and

has a p otential to take care of practical questions

Goals of the work

The aim of the present work is threefold The rst goal is to make an overview of clas

sic metho ds of description of probabilistic CI structures These metho ds use mainly

graphs whose no des corresp ond to variables as a basic to ol for visualization and inter

pretation The overview involves basic results ab out conditional indep endence including

those published in my earlier pap ers

The second goal is to present a mathematical basis of an alternative metho d of descrip

tion of probabilistic CI structures My alternative metho d removes certain basic defects

of classic metho ds

The third goal is an outline of those directions in which the presented metho d needs

to b e developed in order to satisfy the requirements of practical applicability It involves

the list of op en problems and promising directions of research

The work is p erhaps longer and more detailed than it could b e The reason is that

not only exp erts in the elds and mathematicians are exp ected audience My intention

was to write a rep ort which can b e read and understo o d by PhD students in computer

science and statistics This was the main stimulus which made me to solve the dilemma

understandability versus conciseness in favour of preciseness and understandability

Structure of the work

Chapter is an overview of basic denitions to ols and results concerning the concept of

conditional independence These notions including the notion of imset which is a certain

integervalued discrete function are supp osed to b e a theoretical basis of the rest of the

work

Chapter is an overview of graphical methods for description of CI structures Both

classic approaches undirected graphs acyclic directed graphs and chain graphs and

recent attempts are included The chapter makes for a conclusion that a nongraphical

metho d achieving completeness see Section p is needed

Chapter introduces a metho d of this type The metho d uses certain imsets called

structural imsets to describ e probabilistic CI structures It is shown that three p ossible

ways of asso ciating probability distributions and structural imsets are equivalent

Chapter compares two dierent but equivalent ways of description CI structures by

means of imsets It is shown that every probabilistic CI structure can b e describ ed using

this approach and a duality relation b etween these two ways of description is established

Chapter is devoted to an advanced question of Markov equivalence see Section

p within the framework of structural imsets Certain characterization of equivalent

imsets is given and related implementation tasks are discussed

Chapter deals with the problem of choice of a suitable representative of a class

of equivalent structural imsets Possible approaches to this problem are prop osed and

roughly compared

Chapter is an overview of op en problems to b e studied in order to tackle practical

question see Section p Chapter Conclusions summarizes the presented

metho d

The App endix Chapter is an overview of concepts and facts which are supp osed

to b e elementary and can b e omitted by an advanced reader They are added for several

minor reasons to clarify and unify terminology to broaden circulation readership and

to make reading comfortable as well It can b e used with help of the Index References

conclude the work

Chapter

Basic concepts

Throughout the work the symbol N will denote a nonempty nite set of variables In

tended interpretation is that the variables corresp ond to primitive factors describ ed by

random variables In Chapter variables will b e represented by no des of graphs The

set N will also serve as the basic set for nongraphical to ols of discrete mathematics

introduced in this work semigraphoids imsets etc

The following convention will b e used throughout the work given A B N the

juxtap osition AB will denote their union A B Moreover the following symbols will

b e reserved for sets of numbers R will denote real numbers Q rational numbers Z

integers Z nonnegative integers including N natural numbers that is p ositive

integers without The symbol jAj will b e used to denote the number of elements of a

nite set A that is its cardinality Moreover the symbol jxj will also denote the absolute

value of a real number x that is jxj max fx xg

Conditional indep endence

Basic notion of this work is a probability measure over N This phrase will b e used to

describ e the situation when a measurable space X X is given for every i N and a

i i

Q Q

probability measure P is dened on the Cartesian pro duct X X In this

i i

iN iN

Q Q

case I will use X X as a shorthand for X X for every A N The

A A i i

iA iA

A

marginal of P for A N denoted by P is dened by the formula

A

P A P A X for A X

N nA A

Moreover let us accept two conventions First the marginal of P for A N is P itself

N

that is P P Second fully formal convention is that the marginal of P for A

is a probability measure on a xed app ended measurable space X X with trivial

algebra X f X g Observe that a measurable space of this kind admits only one

probability measure P

To give the denition of conditional indep endence within this framework one needs

certain general understanding of the concept of conditional probability Given a prob

ability measure P over N and disjoint sets A C N by conditional probability on X

A

given C more sp ecically given X will b e understo o d a function of two arguments

C

P X X which ascrib es a X measurable function P Aj to every

A C C

AjC AjC

A X such that

A

Z

AC C

P A C P Ajx dP x for every C X

C

AjC

C

Note that no restriction concerning the mappings A P Ajx x X often called the

C

AjC

regularity requirement see Section Remark on p is needed within this general

AC

approach Let me emphasize that P dep ends on the marginal P only and that it

AjC

C

is dened for a xed A X uniquely within the equivalence P almost everywhere

A

Observe that owing to the convention ab ove in case C the conditional probability

A

P coincides in fact with the marginal for A that means one has P P b ecause a

AjC Aj

constant function can b e identied with its value

Remark The conventions ab ove are in concordance with the following unifying p er

sp ective Realize that for every A N the measurable space X X is isomorphic

A A

to the space X X where X X is a certain algebra representing the set A so

N A A N

that inclusion of sets is reected namely

X fA X A X g fB X B A X for A X g

A A N A

N nA N nA

It is natural to require then that the empty set is represented by the trivial algebra

X over X and N is represented by X X Using this p oint of view the marginal

N N N

A

P corresp onds to the restriction of P to X and P corresp onds to the concept of

A

AjC

conditional probability with resp ect to the algebra X Thus the existence and ab ove

C

mentioned uniqueness of P follows from basic measuretheoretical facts for details see

AjC

the App endix Section

Given a probability measure P over N and pairwise disjoint subsets A B C N

one says that A is conditionally independent of B given C with respect to P and writes

A B j C P if for every A X and B X

A B

C

P A Bjx P Ajx P Bjx for P ae x X

AB jC AjC B jC C

AB A B

Observe that in case C it collapses to a simple equality P A B P A P B

that is to classic indep endence concept Note that the validity of do es not dep end

on the choice of versions of conditional probabilities given C since these are determined

C

uniquely just within equivalence P almost everywhere

Remark Let me sp ecify the denition for the case of discrete measures over N when

X is a nite nonempty set and X P X for every i N Then P is determined

i i i AjC

C

uniquely exactly on the set fx X P fxg g by means of the formula

C

AC

P A fxg

P Ajx for every A X

AjC A

C

P fxg

so that A B j C P is dened as follows

P A Bjx P Ajx P Bjx

AB jC AjC B jC

C

for every A X B X and x X with P fxg Of course A and B can b e

A B C

C

replaced by singletons in this case Note that the fact that the equality P ae concides

with the equality on a certain xed set is a sp eciality of discrete case Other common

equivalent denitions of conditional indep endence will b e mentioned in Section

However the concept of conditional indep endence is not exclusively a probabilistic

concept It was introduced in several nonprobabilistic frameworks namely in various

calculi for dealing with uncertainty in articial intelligence for details and overview see

Formal prop erties of resp ective conditional indep endence concepts may

dier in general but an imp ortant fact is that certain basic prop erties of conditional

indep endence app ear to b e valid in all these frameworks

Semigraphoid prop erties

Several authors indep endently drew attention to these basic formal prop erties of condi

tional indep endence In mo dern statistics they were rst accentuated by Dawid

then mentioned by Mouchart and Rolin van Putten and van Shuppen Sp ohn

interpreted them in the context of philosophical logic Finally their signicance in

probabilistic approach to articial intelligence was discerned and highlighted by Pearl

and Paz Their terminology was later widely accepted so that researchers in

articial intelligence started to call them the semigraphoid properties

Formal indep endence mo dels

Formally a conditional independence statement over N is a statement of the form A is

conditionally indep endent of B given C where A B C N are pairwise disjoint subsets

of N A statement of this kind should b e always understo o d with resp ect to a certain

mathematical ob ject o over N for example a probability measure over N However

several other ob jects can o ccur in place of o for example a graph over N see Chapter

p ossibility distributions over N relational databases over N or a structural

imset over N see Section The notation A B j C o will b e used then but the

symbol o can b e omitted when it is suitable

Thus every conditional indep endence statement corresp onds to a disjoint triplet over

N that is a triplet hA B jC i of pairwise disjoint subsets of N Here the punctation

anticipates the intended role of comp onent sets The third comp onent put after the

straight line is the conditioning set while two former comp onents are indep endent areas

usually interchangeable Formal dierence is that a triplet of this kind can b e interpreted

either as the corresp onding indep endence statement or alternatively as its negation

that is the corresp onding dependence statement Occasionally I will use the symbol

AB j C o to denote the dep endence statement which corresp onds to hA B jC i The

class of all disjoint triples over N will b e denoted by T N

Having established the concept of conditional indep endence within a certain framework

of mathematical ob jects over N every ob ject o of this kind denes a certain set of disjoint

triplets over N namely

M f hA B jC i T N A B j C o g

o

Let us call it the conditional independence model induced by o This phrase is used to

indicate that the involved triplets are interpreted as indep endence statements although

from purely mathematical p oint of view it is nothing but a subset of T N Thus the

conditional indep endence mo del induced by a probability measure P over N according

to the denition from Section is a sp ecial case Conversely any class M T N of

disjoint triplets over N can b e formally interpreted as a conditional indep endence mo del

if one denes

A B j C M hA B jC i M

By restriction of a formal indep endence mo del M over N to a set T N will b e un

desto o d the class M T T denoted by M Evidently the restriction of a probabilistic

T

conditional indep endence mo del is again a conditional indep endence mo del

Remark This is to explain my limitation to disjoint triplets over N b ecause some au

thors do not make this restriction at all For simplicity of explanation consider discrete

probabilistic framework Indeed one can introduce for a discrete probability measure P

over N the statement A B j C P even for nondisjoint triplets A B C N in a

reasonable way However then the statement A A j C P has sp ecic inter

pretation namely that the variables in A are functionally dep endent on the variables in

C with resp ect to P so that it can b e interpreted as a functional dependence statement

Let us note cf Section in that one can easily derive that

A B j C P f A B n C A B n C j C B n A P A n C B n AC j C P g

Thus every statement A B j C of general type can b e reconstructed from functional

dep endence statements and from pure conditional indep endence statements describ ed by

disjoint triplets The topic of this work are pure conditional indep endence structures

therefore I limit myself to pure conditional indep endence statements

Semigraphoids

By a disjoint semigraphoid over N is understo o d any set M T N of disjoint triplets

over N interpreted as indep endence statements such that the following conditions hold

for every collection of pairwise disjoint sets A B C D N

triviality A j C M

symmetry A B j C M implies B A j C M

decomp osition A BD j C M implies A D j C M

weak union A BD j C M implies A B j DC M

contraction A B j DC M and A D j C M implies A BD j C M

Note that the terminology ab ove was prop osed by Pearl who formulated the formal

prop erties ab ove in the form of inference rules gave them sp ecial names and interpretation

and called them the semigraphoid axioms Of course the restriction of a semigraphoid

is a semigraphoid An imp ortant fact is the following one

Lemma Every conditional indep endence mo del induced by a probability measure

over N is a disjoint semigraphoid over N

Pro of This can b e derived easily from Consequence proved in the App endix see

p Indeed having a probability measure P over N dened on a measurable space

X X one can identify every subset A N with a algebra X X in the way

N N A N

describ ed in Remark Then for a disjoint triplet hA B jC i over N the statement

A B j C P is equivalent to the requirement X X j X P introduced in Section

A B C

Having in mind that X X X for A B N the rest follows from Consequence

AB A B

Note that the ab ove mentioned fact is not a sp ecial feature of probabilistic framework

Also conditional indep endence mo dels o ccuring within other uncertainty calculi mentioned

in the end of Section are disjoint semigraphoids Well even various graphs over N

induce semigraphoids as explained in Chapter

Remark The limitation to disjoint triplets in the denition of semigraphoid is not

substantial One can introduce an abstract semigraphoid on a joint semilattice S as

a ternary relation j over elements A B C D of S satisfying

A B j C whenever B C C

A B j C i B A j C

A B D j C i A B j D C A D j C

Taking S P N one obtains the denition of a nondisjoint semigraphoid over N A

more complicated example is the semilattice of all algebras A X in a measurable

space X X and the relation of conditional indep endence of algebras with resp ect to

a probability measure over X X see Consequence This p ersp ective leads to the

general notion of separoid introduced in which is a mathematical structure unifying

variety of notions of irrelevance arising in probability statistics articial intelligence and

other elds

Elementary indep endence statements

Well to store a semigraphoid over N in memory of a computer one need not allo cate all

jN j

jT N j bits A more economic way of their representation is feasible Of course

one can evidently omit trivial statements which corresp ond to triplets hA B jC i over N

with A or B Let us denote the class of resp ective trivial disjoint triplets over

N by T N

However principal imp ortance have elementary statements or triplets that is disjoint

triplets hA B jC i over N where b oth A and B are singletons cf A simplifying

convention will b e used in that case braces in singleton notation will b e omitted so that

hi j jK i or i j j K will b e written only The class of elementary triplets over N will b e

denoted by T N

Lemma Supp ose that M is a semigraphoid over N Then for every disjoint triplet

hA B jC i over N one has A B j C M i the following condition holds

i A j B C K AB C n fi j g i j j K M

In particular every semigraphoid is determined by its trace within the class of elementary

statements ie by the intersection with T N

Pro of see also The necessity of the condition is easily derivable using de

comp osition and weak union combined with symmetry

For converse implication supp ose and that hA B jC i is not a trivial triplet over N

othervise it is evident Use induction on jAB j the case jAB j is evident Supp osing

jAB j either A of B is not a singleton Owing to symmetry one can consider without

loss of generality jB j choose j B and put B B n fj g By induction assumption

implies b oth A j j B C M and A B j C M Hence by application of the

contraction prop erty A B j C M is derived

Sometimes an elementary mode of representation of semigraphoids that is by the

list of contained elementary statements is more suitable The characterization of those

collections of elementary triplets which represent semigraphoids is given in

Remark Another reduction of memory demands for semigraphoid representation

follows from symmetry Instead of keeping a pair of mutually symmetric statements

i j j K and j i j K one can choose only one of them according to a suitable criterion

In particular to represent a semigraphoid over N with jN j n it suces to have only

n

n n bits Note that the idea ab ove is also reected in Section where just

one function corresp onds to a symmetric pair of elementary statements

However further reduction of the class of considered statements is not p ossible The

reason is as follows every elementary triplet hi j jK i over N generates a semigraphoid

over N consisting of hi j jK i its symmetric image hj ijK i and trivial triplets over N cf

Lemma In fact these are minimal nontrivial semigraphoids over N and one has to

distinguish them from other semigraphoids over N Of course the ab ove mentioned fact

motivated the terminology

Problem of axiomatic characterization

Pearl and Paz formulated a conjecture that semigraphoids coincide with con

ditional indep endence mo dels induced by discrete probability measures However this

conjecture was refuted in by nding a further formal prop erty of these mo dels not

derivable from semigraphoid prop erties namely

A B j CD C D j A C D j B A B j

C D j AB A B j C A B j D C D j

Another formal prop erty of this sort was later derived in Consequently a natural

question o ccured Can conditional indep endence mo dels arising in discrete probabilistic

setting b e characterized in terms of a nite number of formal prop erties of this type This

question is known as the problem of axiomatic characterization since a result of this kind

would have b een a substantial step towards a syntactic description of these mo dels in

sense of mathematical logic Indeed as explained in Section of then it would have

b een p ossible to construct a deductive system which is an analogue of the notion formal

axiomatic theory from The wished formal prop erties then would have played the role

of syntactic inference rules of an axiomatic theory of this sort Unfortunately the answer

to the question ab ove is also negative It was shown in for a more didactic pro of

see that for every n N there exists a formal prop erty of discrete probabilistic

conditional indep endence mo dels which applies on a set of variables N with jN j n but

which cannot b e revealed on a set of less cardinality Note that a basic to ol for derivation

of these prop erties was the multiinformation function introduced in Section

On the other hand having xed N a nite number of p ossible conditional indep en

dence mo dels over N suggests that they can b e characterized in terms of a nite number

of formal prop erties Thus a related task is for a small cardinality of N to characterize

them in that way Well it makes no problem to verify that in case jN j they coincide

with semigraphoids see Figure for illustration Discrete probabilistic conditional

indep endence mo dels over N with jN j were characterized in recently completed series

of pap ers For an overview see where resp ective formal prop erties are

explicitly formulated one has dierent conditional indep endence mo dels over N

which can b e characterized by more than formal prop erties

Remark On the other hand several results on relative completeness of semigraphoid

prop erties were achieved In and indep endently in mo dels of unconditional

sto chastic indep endence that is submo dels consisting of unconditioned independence

statements ie statements of the form A B j were characterized by means of prop

erties derivable from semigraphoid prop erties Analogous result for the class of saturated

or xedcontext conditional independence statements that is statements A B j C with

AB C N was achieved indep endently in pap ers As a sp ecic relative com

pleteness result can b e interpreted the result from saying that the semigraphoid

generated by a couple of conditional indep endence statements is always a conditional in

dep endence mo del induced by discrete probability measure Note that the problem of

axiomatic characterization of CI mo dels mentioned ab ove diers from the problem of ax

iomatization in sense of mathematical logic of a single CI structure over an innite set

of variables N treated in

Classes of probability measures

There is no uniform conception of the notion of probability distribution in literature In

probability theory authors usually understand by a distribution of a ndimensional real

n

random vector an induced probability measure on the resp ective sample space R en

on the sample measurable space dowed with the Borel algebra that is a set function

On the other hand authors in articial intelligence usually identify a distribution of a

nitelyvalued random vector with a p ointwise function on the resp ective nite sam

ple space ascribing probability to every conguration of values to every element of

Q

the sample space X where X are nite sets In statistics either the meaning

i i

iN

wavers b etween these two basic approaches or authors even avoid the dilemma by de

scribing sp ecic distributions directly by their parameters eg covariance matrix of a

Gaussian distribution Therefore no exact meaning is assigned to the phrase probabil

ity distribution in this work it is used only in general sense mainly in vague motivation

parts Moreover terminological distinction is made b etween those two ab ove mentioned

approaches The concept of probability measure over N from Section rather reects

the rst approach which is more general To relate this to the second approach one has

to make an additional assumption on a probability measure P so that it can b e also de

scrib ed by a p ointwise function called the density of P Note that many authors simply

make an assumption of this type implicitly without mentioning it

In this section basic facts ab out these sp ecial probability measures are recalled and

several imp ortant sub classes of the class of measures having density called marginally

all probability measures

marginally continous measures

measures with nite multiinformation

discrete measures

p ositive measures

nondegenerate

Gaussian measures

Figure The relation of basic classes of probability measures over N

continuos measures are introduced One of them the class of measures with nite multi

information is strongly related to the metho d describ ed in later chapters The information

theoretical metho ds are applicable to measures b elonging to this class which fortunately

involves typical measures used in practice Mutual relationships among introduced classes

of measures are depicted in Figure

Marginally continuous measures

A probability measure P over N is marginally continuous if it is absolutely continuous

Q

fig

with resp ect to the pro duct of its onedimensional marginals that is P P

iN

The following lemma contains apparently weaker equivalent denition

Lemma A probability measure P on X X is marginally continuous i there exists

N N

Q

a collection of nite measures on X X i N such that P

i i i i

iN

Q

fig

Pro of It was shown in Prop osition that in case jN j one has P P

iN

Q

i there are probability measures on X X with P One can easily show

i i i i

iN

that for every nite measure on X X a probability measure on X X with

i i i i i i

exists Hence the condition ab ove is equivalent to the requirement of the

i i i

Q

existence of nite measures with P Finally one can use induction on

i i

iN

jN j to get the desired conclusion

Thus marginal continuity of P is equivalent to the existence of a dominating measure

Q

for P that is the pro duct of some nite measures on X X i N such

i i i i

iN

that P In particular every discrete measure over N is marginally continuous since

the counting measure on X can serve a dominating measure Having xed a dominating

N

measure by a density of P with respect to will b e understo o d every version of the

RadonNikodym derivative of P with resp ect to

Remark Let us note without details see Remark in that the assumption

that a probability measure P over N is marginally continuous also implies that for every

disjoint A C N there exists a regular version of conditional probability P on X

A

AjC

given X in sense of Loeve Regularity of conditional probability is usually derived as

C

a consequence of sp ecic top ological assumptions on X X i N see the App endix

i i

Section Thus marginal continuity is a nontop ological assumption implying regu

larity of conditional probabilities The concept of marginal continuity is closely related

to the concept of dominated experiment in Bayesian statistics see x and x in

the b o ok

The next step will b e an equivalent denition of conditional indep endence for margi

nally continuous measures in terms of densities To formulate it in an elegant way let us

accept the following notational convention

Convention Supp ose that a probability measure P on X X together with a xed

N N

Q

dominating measure is given More sp ecically P where is a nite

i i

iN

measure on X X for every i N

i i

Q

Then for every A N let us put choose a version f of Radon

A i A

iA

A

dP

Nikodym derivative and x it The function f will b e called a marginal density of

A

d

A

P for A It is a X measurable function on X

A A

In order to b e able to understand it as a function on X as well let us accept the

N

following notation Given A B N and x X the symbol x will denote the

B A

projection of x onto A that is x x whenever x x

A i iA i iB

The last formal convention concerns the marginal density f for the empty set It

should b e a constant function on an app ended trivial measurable space X X Thus

in formulas b elow one can simply put f x for every x X B N

B

Remark This is to explain the way of denition of marginal densities in Convention

First let me emphasize that the marginal density is not the RadonNikodym derivative

Q

A

of resp ective marginals since need not coincide with the marginal of

A i

iA

Q

unless every is a probability measure

i i

iN

Indeed the marginal of a nite measure may not b e a nite measure eg in

A

dP

case X so that RadonNikodym derivative may not exists Instead one

N

A

d

dP

and introduce for every can take the following p oint of view Let us x a density f

d

A

A N its pro jection f as a function on X dened ae as follows

A A

Z

A

f y f y z d z for y X

N nA A

X

N nA

A

dP

A

One can easily conclude using Fubini theorem that f ae so that there is

A

d

A

A

no substantial dierence b etween f and any version of the marginal density f The

A

convention for the empty set follows this line since one has

Z

f f x dx

X

N

Lemma Let P b e a marginally continuous measure over N Let us accept Convention

Given hA B jC i T N one has then A B j C P i the following equality holds

f x f x f x f x for ae x X

AB C AB C C C AC AC BC BC N

Pro of Note that minor omitted details of the pro of eg verication of equalities ae can

b e veried with help of basic measuretheoretical facts gathered in Section

As a preparatory step choose and x a density f X such that

N

Z

A

f x y d y A N x X f x

A N A

N nA

X

N nA

N

and moreover for every disjoint A C N with convention f f f one has

C AC

x X f x f x

N C AC

dP

and every version can b e Indeed these relationships hold ae for every version f of

d

overdened by whenever these relationships do not hold It makes no problem to verify

A

dP

A

for every A N Then for every disjoint A C N one can introduce the that f

d

A

function h X X as follows

A C

AjC

AC

f xz

C

if f z

C

f z

for x X z X h xjz

A C

AjC

C

if f z

C

dP

C C

One can verify using Fubini theorem for P RadonNikodym theorem for f

A

d

C

and again Fubini theorem for that the function

C A

Z

A z P Ajz h xjz d x where A X z X

A A C

AjC AjC

A

is a version of the conditional probability on X given X

A C

After this preparatory stage realize that can b e written as follows

AB C C AC BC

f x f x f x f x for ae x X

AB C C AC BC N

Further this can b e rewritten in the form

C C

h x jx f x h x jx h x jx f x for ae x X

AB C C A C B C C N

AB jC AjC B jC

C

Indeed owing to b oth and are trivially valid on fx X f x g while

N C

they are equivalent on the complement of this set The next step is to observe that is

equivalent to the requirement that A X B X C X it holds

A B C

Z Z

C

h x jx d x dP x

AB C AB AB C

AB jC

C AB

Z Z Z

C

h x jx d x h x jx d x dP x

A C A A B C B B C

AjC B jC

C A B

Indeed as mentioned in Section the equality in is equivalent to the requirement that

their integrals with resp ect to through measurable rectangles A B C coincide This can

AB C

b e rewritten using Fubini theorem RadonNikodym theorem and basic prop erties of integral in

the form ab ove But as explained in the preparatory stage it can b e understo o d as follows

Z Z

C C

P A Bjz dP z P Ajz P Bjz dP z

AB jC AjC B jC

C C

Having xed A X and B X the equality for every C X is equivalent to the

A B C

C

requirement that the integrated functions are equal P ae Hence one obtains the condition

that from p holds for every A X and B X ie A B j C P

A B

Let us observe that in one can write for ae x X instead Of course

AB C AB C

the validity of trivially do es not dep end on the choice of versions of densities The

p oint of Lemma is that it even do es not dep end on the choice of the dominating

measure since A B j C P do es dep end on it as well Note that this fact may not b e

so apparent when one tries to introduce the concept of conditional indep endence directly

by means of densities

Factorizable measures

S

Let D P N n fg b e a nonempty class of nonempty subsets of N and D

T D

We say that a marginally continuous measure P over N is factorizable after D with

resp ect to a dominating measure if the resp ective marginal density of P for D can

b e expressed in the form

Y

f x g x for ae x X

D D S S N

S D

where g X S D are X measurable functions called potentials In fact

S S S

factorization do es not dep end on the choice of a dominating measure One can show that

Q

the validity of with resp ect to a general dominating pro duct measure

i

iN

Q

fig

where is are nite is equivalent to the validity of with resp ect to P

i

iN

and with other p otentials Of course factorization after D is equivalent to factorization

max

after D and p otentials are not unique unless jD j

Further equivalent denition of conditional indep endence for marginally continuous

measures is formulated in terms of factorization see also x

Lemma Let P b e a marginally continuous measure over N and hA B jC i T N

Then A B j C P if and only if P is factorizable after D fAC BC g More sp ecically

under Convention one has A B j C P i there exist a X measurable function

AC

g X and a X measurable function h X such that

AC BC BC

f x g x hx for ae x X

AB C AB C AC BC N

Pro of One can use Lemma Clearly where g f and

AC

f x

BC BC

if f x

C C

f x

C C

for x X hx

N BC

if f x

C C

b ecause for ae x X one has f x f x

N C C BC BC

For the pro of of one can rst rep eat the preparatory step of the pro of of

Lemma see p that is to choose a suitable version f of density Then can

b e rewritten in the form

AB C

f x g x h x for ae x X

AB C AC BC N

Now using Fubini theorem and basic prop erties of integral one can derive from by

integrating

AC C

f x g x h x

AC AC C

BC C

f x g x hx for ae x X

BC C BC N

C C C

f x g x h x

C C C

where the functions

Z Z

C C

g x g y x d y h x hz x d z for x X

C C A C C B C C

X X

A B

AB C

are nite ae according to Fubini theorem owing to and the fact that f

C

is integrable Thus and give together

AB C

AB C C C C

f x f x g x hx g x h x

AB C C AC BC C C

AC BC

f x f x for ae x X

AC BC N

which is equivalent to

As a consequence one can derive a certain formal prop erty of conditional indep endence

which was already mentioned in discrete case see and Prop osition in

Consequence Supp ose that P is a marginally continuous measure over N and

A B C D N are pairwise disjoint sets Then

C D j AB P A B j P A B j C P A B j D P implies A B j CD P

Pro of It follows from Lemma that the assumption C D j AB can b e rewritted

in terms of marginal densities as follows throughout this pro of I write f x instead of

S

f x for any S N

S S

f x f x f x f x f x f x f x f x f x f x

AB C D AB C D AB C AB D C D

for ae x X Now again using Lemma the assumptions A B j A B j C

N

and A B j D imply that

f x f x f x f x f x f x f x f x f x f x

AB C D A B C D AC BC AD BD

for ae x X Since f x f x for ae x X and similarly for

N A AB C D N

B C D one can accept the convention f x whenever f x and obtain

A A

g x

AC D

z

f x f x f x f x

A AC AD AB C D

for ae x X f x f x f x f x f x f x

N BC BD B C D

z

hx

BCD

Hence by Lemma one has A B j CD

Multiinformation and conditional pro duct

Let P b e a marginally continuous measure over N Multiinformation of P is the rela

Q

fig

tive entropy H P j P of P with resp ect to the pro duct of its onedimensional

iN

marginals It is always a value in see Lemma in Section Common

formal convention is that the multiinformation of P is in case P is not marginally

continuous

Remark The term multiinformation was prop osed by my PhD sup ervisor Alb ert

Perez in late eighties Note that miscellaneous other terms were used earlier in literature

even by Perez himself for example total correlation dep endence tightness

or entaxy The main reason of Perezs later terminology is that it directly generalizes

widely accepted informationtheoretical concept of mutual information of two random

variables to the case of any nite number of random variables Indeed it can serve as a

measure of global sto chastic dep endence among a nite collection of random variables see

x in Asymptotic b ehaviour of empirical multiinformation which can b e used as

a statistical estimate of multiinformation on basis of data was examined in

To clarify the signicance of multiinformation for study of conditional indep endence I

need the following lemma

Lemma Let P b e a marginally continuous probability measure on X X and

N N

hA B jC i T N Then there exists unique probability measure Q on X X

AB C AB C

such that

AC AC BC BC

Q P Q P and A B j C Q

Q

AB C fig

Moreover P Q P and the following equality holds the symbol H

iAB C

denotes the relative entropy introduced in Section

Q Q

AB C fig C fig

H P j P H P j P

iAB C iC

Q Q

AB C AC fig BC fig

H P j Q H P j P H P j P

iAC iBC

Pro of Again omitted technical details can b e veried by means of basic measuretheoretical

facts from Section First let us verify the uniqueness of Q Supp osing b oth Q and Q

C C

satisfy one can observe that Q Q and then Q Q Q Q where

AjC AjC B jC B jC

indicates the resp ective equivalence of conditional probabilities on X resp X given C

A B

i

mentioned in Section Because of A B j C Q i one can derive using that

C C

Q Q for measurable rectangles which together with Q Q implies Q Q

AB jC AB jC

Q

fig

For existence pro of assume without loss of generality AB C N and put P

iN

dP

and Like in the preparatory step of the pro of of Lemma see p choose a density f

d

A

resp ective collection of marginal pro jection densities f A N satisfying For brevity

A

we write f x instead of f x in the rest of this pro of so that has the form

A A

x X A C N such that A C f x f x

N C AC

Let us dene a function g X by

N

f x f x

AC BC

if f x

C

f x

C

for x X X g x

N AB C

if f x

C

and introduce a measure Q on X X as follows

N N

Z

QD g x dx for D X X

N AB C

D

f x

AC

in case f x one can write for every E X using Now under the convention

C AC

f x

C

Fubini theorem and RadonNikodym theorem

Z Z Z

f x

AC

AC

f x x d x d x Q E g x dx

B C B B AC AC

f x

C

X EX E

B B

Z Z

f x

AC

AC

f x d x f x d x P E

C AC AC AC AC AC

f x

C

E E

AC AC

Hence Q P and Q is a probability measure Replace X X by X X in the

A A B B

BC BC

preceding consideration to obtain Q P The way of denition of Q implies Q and

dQ

The form of g implies that Q is factorizable after fAC B C g so that A B j C Q g

d

AB C

by Lemma To see P Q observe that implies g x f x for every

f x

in case g x and write for every D X using x X accept the convention

N N

g x

RadonNikodym theorem

Z Z Z

f x f x

f x dx P D dQx g x dx

g x g x

D D D

f

dP

Thus P Q and To derive realize that it follows from the denition of g under

g dQ

the convention ab ove that

f x

f x f x for every x X f x f x

AC BC N C

g x

Hence of course

f x

ln f x ln f x x X ln f x ln f x ln

AC BC N C

g x

According to and Lemma on p each of ve logarithmic terms ab ove is P quasi

R R

D

hx dP x for hx dP x integrable and the integral is a value in use

D D D

X X

D N

D N Hence was derived

Remark The measure Q satisfying can b e interpreted as the conditional

AC BC

product of P and P Indeed one can dene the conditional pro duct for every pair of

consonant probability measures that is measures sharing marginals in this way However

in general some obscurities can o ccur First there exists a pair of consonant measures

such that no joint measure having them as marginals exists Second even in case joint

measures of this type exist it may happ en that none of them complies with the required

conditional indep endence statement For b oth examples see

Thus the assumption of marginal continuity implies the existence of the conditional

pro duct Note that regularity of conditional probabilities P or P is a more general

AjC B jC

AB C

sucient condition for its existence see Prop osition in The value of H P jQ

in is known in information theory as the conditional mutual information of A and B

AB A B

given C with respect to P In case C just the mutual information H P jP P

is obtained so that it can b e viewed as a generalization of mutual information but from

a dierent p ersp ective than multiinformation Conditional mutual information is known

as a go o d measure of sto chastic dep endence b etween A and B conditional on knowledge

of C for an analysis in discrete case see x in

Prop erties of multiinformation function

Supp osing P is a probability measure over N the induced multiinformation function

S

m P N ascrib es the multiinformation of the resp ective marginal P to every

P

nonempty set S N that is

Y

S fig

m S H P j P for every S N

P

iS

Moreover a natural convention m is accepted The signicance of this concept

P

is evident from the following consequence of Lemma

Consequence Supp ose that P is a probability measure over N whose multiinfor

mation is nite Then the induced multiinformation function m is a nonnegative real

P

function which satises

m S whenever S N jS j

P

and is supermodular that is

m AB C m C m AC m BC whenever hA B jC i T N

P P P P

These two conditions imply m S m T whenever S T N Moreover for every

P P

hA B jC i T N one has

m AB C m C m AC m BC i A B j C P

P P P P

Pro of The relation is evident Given S N put hA B jC i hS N n S j i in

Lemma and gives

m N m N m H P jQ m S m N n S

P P P P P

Since all terms here are in and m N it implies m S Therefore

P P

for general hA B jC i can b e always written in the form

AB C

m AB C m C m AC m BC H P j Q

P P P P

AC BC

where Q is the conditional pro duct of P and P By Lemma derive

It suces to see m S m T when jT n S j which follows directly from

P P

with hA B jC i hS T n S j i and The uniqueness of the conditional pro duct Q

AB C AB C

mentioned in Lemma implies that A B j C P i P Q that is H P j Q

by Lemma Hence follows

Thus the class of probability measures over N having nite multiinformation is by

denition a sub class of the class of marginally continuous measures It will b e shown in

Section that it is a quite wide class of measures involving several classes of measures

used in practice The relation provides very useful equivalent denition of condi

tional indep endence for measures with nite multiinformation namely by means of an

algebraic identity Note that just relations and establish a basic metho d for

study of conditional indep endence used in this work Because these relations originate

from information theory the expresion in is nothing but the conditional mutual

information mentioned in Remark I dare to call them informationtheoretical tools

For example all formal prop erties of conditional indep endence from Section and the

result mentioned in the b eginning of Section were derived using this metho d Con

sequence also implies that the class of measures with nite multiinformation is closed

under marginals Note without details that it is closed under the op eration of conditional

pro duct as well

The following observation app ears to b e useful later

Q

Lemma Let P b e a probability measure on X X and P where

N N i

iN

is a nite measure on X X for every i N Let S N such that

i i i

Q

S fig

H P j and H P j for every i S Then m S

i i P

iS

and

Y X

S fig

m S H P j H P j

P i i

iS iS

Pro ofThis is just a rough sketch for technical details see Section Supp ose without loss

Q

fig fig

P By Lemma one knows P Since P of generality S N and put

i

iN

Q

fig fig

dP dP dP dP

and and observe that is a version of for every i N choose versions of

iN

d d d d

i i

dP

dened uniquely P ae Hence derive

d

fig

X

dP dP dP

ln ln ln for P ae x X

N

d d d

i

iN

The assumption of the lemma implies that all logarithmic terms on the righthand side are

P integrable Hence by integration with resp ect to P is obtained

Positive measures

A marginally continuous measure P over N is positive if there exists a dominating measure

dP

whose density f is strictly p ositive that is f x for ae x X Note

N

d

that p ositivity of density may dep end on the choice of dominating measure However

Q

fig

whenever a measure of this kind exists one has P Since P P and

iN

Q Q Q

fig fig

P one can always take P in place of In particular

i

iN iN iN

one can equivalently introduce a p ositive measure P over N by a simple requirement that

Q Q

fig

P P P Typical example is a discrete positive measure P on X X

N i

iN iN

with jX j i N such that P fxg for every x X or alternatively only

i N

Q

fig

for x Y with Y fy X P fy g g These measures play an imp ortant

i i i

iN

role in probabilistic approach to articial intelligence Pearl noticed that conditional

indep endence mo dels induced by these measures satisfy further sp ecial formal prop erty

except semigraphoid prop erties and introduced the following terminology

A disjoint semigraphoid M over N is called a disjoint graphoid over N if for every

collection of pairwise disjoint sets A B C D N one has

intersection A B j DC M and A D j BC M implies A BD j C M

It follows from Lemma and the observation b elow that every conditional indep endence

mo del induced by a p ositive measure is a disjoint graphoid

Observation Let P b e a marginally continuous measure over N A B C D N

BCD

are pairwise disjoint and P is a p ositive measure over BCD Then

A B j DC P A D j BC P A BD j C P

Pro of see also for alternative pro of under additional restrictive assumption This is a

dP

a density with f x rough hint only Let b e a dominating measure and f

BCD

d

for ae x X I again follow notational convention from the pro of of Lemma p

N

The assumptions A B j DC P and A D j BC P imply by Lemma one can assume

f x for ae x X whenever E BCD

E N

f x f x f x f x

AB C BCD AC D BCD

f x for ae x X

AB C D N

f x f x

CD BC

The terms f x can b e cancelled so that one derives by dividing

BCD

f x f x f x f x for ae x X

AC D BC AB C CD N

One can take integral with resp ect to and get by Fubini theorem

B

f x f x f x f x for ae x X

AC D C AC CD N

that is A D j C P by Lemma This together with A B j DC P implies the desired

conclusion by the contraction prop erty

Let us note that there are discrete probability measures whose induced conditional

indep endence mo del is not a graphoid ie it do es not satisfy the intersection prop erty

see Example on p On the other hand Observation holds also under weaker

BCD

assumptions on P

Gaussian measures

These measures are usually treated in multivariate statistics often under alternative name

normal distributions In this work Gaussian measures over N are measures on X X

N N

where X X R B is the set of real numbers endowed with the algebra of Borel

i i

N

sets for every i N Every vector e R and every p ositive semidenite N N matrix

N N

R denes a certain measure on X X denoted by N e whose expectation

N N

vector is e and whose covariance matrix is The comp onents of e and are then

regarded as parameters of the Gaussian measure

Attention is almost exclusively paid to nondegenerate Gaussian measures which are

obtained in case that is p ositive denite equivalently regular In that case N e

can b e introduced directly by its density with resp ect to Leb esgue measure on X X

N N

1

(xe)  (xe)

2

p

f x exp for x X

e N

jN j

det

where is Ludolf s constant and the inverse of the covariance matrix called the

concentration matrix Its elements are sometimes considered as alternative parameters

of a nondegenerate Gaussian measure Since the density f in is p ositive non

e

degenerate Gaussian measures are p ositive in sense of Section

On the other hand in case is not regular the resp ective degenerate Gaussian mea

sure N e for a detailed denition see Section is concentrated on a linear

N

subspace in R X having Leb esgue measure Thus degenerate Gaussian measures

N

are not marginally continuous except some rare cases when the subspace has the form

fy g X A N for y X for illustration see Example b elow

A N nA

Given a Gaussian measure P N e over N nonempty disjoint sets A C N a

usual implicit convention used in multivariate statistics and applicable even in degenerate

case identies the conditional probability P with its unique continuous version

AjC

P j z N e z e for every z X

AjC A AC C AA AC C A C

C C C C

The p oint is that for every z X it is again a Gaussian measure whose covariance

C

matrix actually do es not dep end on the choice of

AA AC C A

AjC

C C

z Therefore the matrix is called a conditional covariance matrix Recall that in

AjC

case C one has by convention Elements of misceleaneous conditional

AjC AA

covariance matrices can serve as convenient parameters of Gaussian measures eg

Imp ortant related fact is that the exp ectation vector of a Gaussian measure is not sig

nicant from the p oint of view of conditional indep endence It follows from the following

lemma that singlehanded covariance matrix contains all information ab out conditional

indep endence structure Therefore it is used in practice almost exclusively

Lemma Let P N e b e a Gaussian measure over N and hA B jC i T N is a

nontrivial triplet over N Then

A B j C P i

AB jC AB

Pro of The key idea is that top ological assumptions see Remark on p imply the

existence of a regular version of conditional probability on X given C that is a version

AB

P such that the mapping D P D j z is a probability measure on X for every

AB

AB jC AB jC

z X Clearly for every A X the mapping z P A X j z z X is a

C A AB jC B C

version of conditional probability on X given C analogously for B X Thus

A B

can b e rewritten in the form A X B X

A B

C

P A Bj z P A X j z P X Bj z for P ae z X

B A C

AB jC AB jC AB jC

Since all involved versions of conditional probability are probability measures for every

x X it is equivalent to the requirement that hold for every A Y B Y where

C A B

Y resp Y are countable classes closed under nite intersection such that Y X

A B A A

resp Y X use I in These classes exist in case of Borel algebras on

B B

A B

R resp R The set of z X for which holds for every A Y and B Y has

C A B

C

P measure since Y and Y are countable For these z X then holds for

A B C

every A X and B X by the ab ove mentioned consideration Hence

A B

C

A B j C P A B j P j z for P ae z X

AB jC C

However in this sp ecial case one can supp ose that P j z is a Gaussian measure

AB jC

see Section with the same covariance matrix for every z X while the

C

AB jC

exp ectation do es dep end on z It is wellknown fact that regardless the exp ectation

vector one has A B j with resp ect to a Gaussian measure i the A B submatrix of

its covariance matrix vanishes see again Section

The previous lemma involves the following wellknown criteria for elementary condi

tional indep endence statements see also Prop osition in or Corollaries and

in

Consequence Let P b e a Gaussian measure over N with a covariance matrix

and a correlation matrix Then for distinct i j N

ij ij N ij ij N

i j j P

ij ij

and for distinct i j k N

i j j fk g P

k k ij ik k j ij ik k j

If is regular and is the concentration matrix then for distinct i j N

ij ij N

i j j N nfij g P

ij

Pro of The rst part is an immediate consequence of Lemma For the last fact rst

observe by elementary computation that a nondiagonal element of a regular matrix

vanishes i the same element vanishes in its inverse matrix In particular

i j j N nfi j g P

ij ij

fij gjN nfij g fij gjN nfij g

The second observation is that for every set D N

D D D D D D

D jN nD

containing fi j g see Section Hence for every such D

D jN nD ij ij

Remark The pro of of Lemma reveals notable dierence b etween Gaussian and

discrete case While in discrete case a conditional indep endence statement A B j C P

is equivalent to the collection of requirements

C

A B j P jz for every z X with P z

AB jC C

in Gaussian case it is equivalent to a single requirement

A B j P jz for at least one z X

AB jC C

which already implies the same fact for all other z X one uses conventional choice

C

of continuous versions of P in this case Informally said the same conditional

AB jC

indep endence mo del is in Gaussian case sp ecied by less number of requirements than

in discrete case The reason b ehind this phenomenon is that the actual number of free

parameters characterizing a Gaussian measure over N is in fact smaller than the number

of parameters characterizing a discrete measure if jX j for i N Therefore discrete

i

measures oer wider variety of induced conditional indep endence mo dels than Gaussian

measures This is maybe a surprising fact for those who anticipate that a continuous

framework should b e wider than a discrete framework The p oint is that the Gaussianity

is quite restrictive assumption

Thus one can exp ect many sp ecic formal prop erties of conditional indep endence

mo dels arising in Gaussian framework For example the following prop erty of a disjoint

semigraphoid M was recognized by Pearl as a typical prop erty of graphical mo dels

see Chapter

comp osition A B j C M and A D j C M implies A BD j C M

for every collection of pairwise disjoint sets A B C D N It follows easily from Lemma

that it is also a typical prop erty of Gaussian conditional indep endence mo dels

Consequence Let P b e a Gaussian measure over N and A B C D N are pair

wise disjoint Then

A B j C P A D j C P A BD j C P

Pro of Observe that and for a co

AB D jC AB AB AB jC AB D jC AD AD AD jC

variance matrix see Section Thus the assumptions and

AB

AB D jC

imply together

AD ABD

AB D jC AB D jC

However comp osition is not universally valid prop erty of conditional indep endence

mo dels as the following example shows

Example There exists a discrete binary probability measure P over N with jN j

such that

i j j P and i j j fk g P for any distinct i j k N

to every of the following Indeed put X f g for i N and ascrib e the probability

i

congurations of values

Further imp ortant fact is that every nondegenerate Gaussian measure has nite mul

tiinformation This follows from Lemma

Consequence Let P b e a nondegenerate Gaussian measure with a correlation ma

trix Then its multiinformation has the value

m N lndet

P

Pro of Take the Leb esgue measure on X X in place of in Lemma Substitution

N N

of from Section into gives

X

jN j jN j ln

ln lndet f ln g

ii

iN

X

det

Q

ln lndet ln lndet

ii

ii

iN

iN

On the other hand a degenerate Gaussian measure need not b e marginally continu

ous as the following example shows It also demonstrates that the intersection prop erty

mentioned in Section is not universally valid

Example There exists a Gaussian measure P over N with jN j such that

i j j fk g P and i j j P for arbitrarily chosen distinct i j k N

Put P N where with for every i j N and apply Cons

ij ij N ij

esquence It makes no problem to verify see Section that P is concentracted on

Q

fig fig

the subspace fx x x x R g while P N for every i N Since P is

iN

absolutely continuous with resp ect to Leb esgue measure P is not marginally continuous

Note that the same conditional indep endence mo del can b e induced by a discrete

binary measure put X f g for i N and ascrib e the probability to congurations

i

and

Basic construction

The following lemma provides a basic metho d of construction of probability measures

with prescrib ed CI structure

Lemma Let P Q are probability measures over N Then there exists a probability

measure R over N such that M M M Moreover if P and Q have nite

R P Q

multiinformation then a probability measure R over N with nite multiinformation and

M M M exists The same statement holds for the class of discrete measures

R P Q

over N resp ectively for the class of p ositive discrete measures over N

Q Q

X and Q b e a measure X Pro of Let P b e a measure on a space X X

i i N N

iN iN

Q Q

Y Let us put Z Z X Y X Y for i N introduce Y on Y Y

i i i i i i i i N N

iN iN

Q

Z Z which can b e understo o d as X Y X Y and dene a prob Z Z

i i N N N N N N

iN

ability measure R on Z Z as the pro duct of P and Q The goal is to show that for every

N N

hA B jC i T N

A B j C R f A B j C P and A B j C Q g

Let us take unifying p ersp ective indicated in Remark Z Z and R are xed and resp ec

N N

tive co ordinate algebras X Y Z Z are ascrib ed to every A N Then P corresp onds

A A A N

to the restriction of R to X Q to the restriction of R to Y and takes the form see

N N

Section for related concepts

Z Z j Z R X X j X R and Y Y j Y R

A B C A B C A B C

As X Y measurable rectangles generate Z for every A N by the weaker formulation of

A A A

the denition of conditional indep endence for algebras observe that the fact Z Z j Z R

A B C

x y x y

is equivalent to the requirement A X A Y B X B Y

A A B B

x y x y x y x y

R A A B B j Z z R A A j Z z R B B j Z z for R ae z Z

C C C N

On the other hand X X j X R is equivalent to the requirement by usual denition of

A B C

x x

conditional indep encence for algebras A X B X

A B

x x x x

P A B j X x P A j X x P B j X x for R ae z x y Z

C C C N

y y

and Y Y j Y R is equivalent to the requirement A Y B Y

A B

A B C

y y y y

QA B j Y y QA j Y y QB j Y y for R ae z x y Z

C C C N

Moreover using the weaker denition of conditional probability see Section p and

by the denition of R verify that

x y x y x x y y

R A A B B j Z z P A B j X x QA B j Y y for R ae z x y Z

C C C N

y y x x

Thus to evidence put A B Z to evidence put A B

N

Z Conversely by which means was veried

N

fig fig fig

In b oth P and Q have nite multiinformation then R P Q are marginals of R on

Q Q Q

fig fj g fk g fk g

Z Z for i N and R P Q P Q Thus R is marginally

i i

iN j N k N

continuous measure over N and one can apply Lemma to R with doubled N to see that

Y Y Y Y

fig fig fig fj g

H R j P Q H P j P H Q j Q

iN j N iN j N

Note for explanation that in the considered case R is the conditional pro duct of P and Q

AB C

and therefore the term H P jQ in vanishes by Lemma from Section In

particular the multiinformation of R is the sum of the multiinformations P and Q and therefore

it is nite The statement concerning discrete and p ositive discrete measures easily follows from

the given construction

Elementary constructions of probability measures are needed to utilize the metho d

from Lemma One of them is the pro duct of onedimensional probability measures

Observation There exists a discrete binary probability measure P over N such

that

A B j C P for every hA B jC i T N

Observation Supp ose that jN j and A N with jAj Then there exists a

discrete binary probability measure P over N such that

ln if A S

m S

P

otherwise

jN j

Pro of Put X f g for i N and ascrib e the probability to every conguration

i

P

of values x with even x remaining congurations have zero probability

i iN i

iA

Lemma Supp ose that jN j l jN j and L fS N jS j l g Then there

exists a discrete probability measure P over N such that

hi j jK i T N with jij K j l i j j K P ij K L

Pro of If L then use Observation If L then apply Observation to every

A L and Consequence to get a binary probability measure P such that

A

elementary triplet hi j jK i with jij K j l i j j K P ij K A

A

Then Lemma can b e applied rep eatedly to get a measure over N satisfying

This gives a lower estimate of the number of discrete probabilistic CI structures

Consequence If n jN j then the number of distinct CI structures induced

n

b c

2

n

c denotes the by discrete probability measures over N exceeds the number where b

n

lower integer part of

n n

for even n resp ectively l for o dd n By Lemma Pro of Let us put l

for every sub class L of fS N jS j l g a resp ective probability measure P exists By

L

these measures induce distinct CI mo dels over N Therefore the number of distinct

s

induced CI mo dels exceeds where s is the number of elements of fS N jS j l g

n

Find suitable lower estimates for s If l then write

n

l l l l

l b c

2

s

l l l l l

n

write Similarly in case l

n

l l l

l b c

2

s

l l l

n

b c

2

s

which implies the desired conclusion in b oth cases

Imsets

By an imset over N is understo o d an integervalued function on the p ower set of N that

P N

is any function u P N Z or alternatively an element of Z Basic op eration

with imsets namely summation subtraction multiplication by an integer are dened

co ordinatewisely Analogously we write u v for imsets u v over N if uS v S

for every S N Multiset is an imset with nonnegative values that is any function

m P N Z Any imset u over N can b e written as a dierence u u u of

two multisets over N where u is the positive part of u and u is the negative part of u

dened as follows

u S max fuS g u S max fuS g for S N

By positive domain of u will b e undesto o d the class of sets D fS N uS g by

u

negative domain of u the class D fS N uS g

u

Remark The word multiset is taken from combinatorial theory while the word

imset is an abbreviation for integervalued multiset Later in this work certain sp ecial

imsets will b e used to describ e probabilistic conditional indep endence structures see

Section

Trivial example of an imset is the zero imset denoted by which ascrib es zero value

to every S N Another simple example is the identicator of a set A N denoted by

and dened as follows

A

in case S A

S

A

in case S N S A

A A

Sp ecial notation m resp ectively m will b e used for multisets which serve as identi

cators of classes of subsets resp ectively classes of sup ersets of a set A N

if A S if A S

A A

m S and m S

otherwise otherwise

fa b cg

Q

Q

Q

Q

fa bg fa cg fb cg

Q Q

Q Q

Q Q

Q Q

fag fbg fcg

Q

Q

Q

Q

Figure Hasse diagram of an imset over N fa b cg

It is clear how to represent an imset over N in memory of a computer namely by a vector

jN j

with integral comp onents which corresp ond to subsets of N However for a small

number of variables one can also visualize imsets in a more telling way using sp ecial

pictures The p ower set P N is a distributive lattice and can b e represented in the form

of Hasse diagram see p in No des of this diagram corresp ond to elements of P N

that is to subsets of N and a link is made b etween two no des if the symmetric dierence

of the represented sets is a singleton A function on P N can b e visualized by writing

assigned values into resp ective no des For example the imset u over N fa b cg dened

by the table

S fag fbg fcg fa bg fa cg fb cg fa b cg

uS

can b e visualized in the form of the diagram from Figure The third p ossible way

of description of an imset used in this work is to write it as a combination of more

elementary imsets with integral co ecients For example the imset u from Figure

can b e written as follows

u

N fabg facg fag fbg

In this work certain sp ecial imsets over N will b e used Eective dimension of these

jN j jN j

imsets that is the actual number of free values is not but jN j only There

are several ways of standardization of imsets of this kind I will distinguish three basic

ways of standardization for justication of terminology see Remark in Section

An imset u over N is ostandardized if

X X

uS and i N uS

S N iS S N

P

Alternatively the second requirement can b e formulated in the form uS

S N nfj g

for every j N An imset u is standardized if

uS whenever S N jS j

and ustandardized if

uS whenever S N jS j jN j

An imset u over N is called normalized if the collection of values fuS S N g has

no common prime divisor Except basic op erations with imsets the op eration of scalar

product of a real function m P N R and an imset u over N denoted by hm ui and

dened by

X

hm ui mS uS

S N

P N

will b e used Indeed it is a scalar pro duct on the Eucledian space R Note that the

function m can b e an imset as well it will b e often a multiset

Chapter

Graphical metho ds

Graphs whose no des corresp ond to random variables are traditional to ols for description

of CI structures One can distinguish three classic approaches using undirected graphs

using acyclic directed graphs and using chain graphs This chapter is an overview of graph

ical metho ds of description of CI structures with the main emphasis put on theoretical

questions mentioned in Section Both classic and advanced approaches are included

Note that elementary graphical concepts are introduced in Section

Undirected graphs

Graphical mo dels based on undirected graphs are also known as Markov networks

Given an undirected graph G over N one says that a disjoint triplet hA B jC i T N is

represented in G and writes A B j C G if every route equivalently every path in G

b etween a no de in A and a no de in B contains a no de in C that is C separates b etween

A and B in G For illustration see Figure Thus every undirected graph G over N

induces a formal indep endence mo del over N by means of the separation criterion for

undirected graphs

M f hA B jC i T N A B j C G g

G

Let us call every indep endence mo del obtained in this way an UG model These mo dels

were characterized in in terms of a nite number of formal prop erties

triviality A j C G

symmetry A B j C G implies B A j C G

decomp osition A BD j C G implies A D j C G

strong union A B j C G implies A B j DC G

intersection A B j DC G and A D j BC G implies A BD j C G

transitivity A B j C G implies A fdg j C G or fdg B j C G

This axiomatic characterization implies that every UG mo del is a graphoid satisfying the

comp osition prop erty

Remark Let me note that the ab ove mentioned separation criterion was a result of

certain development Theory of Markov elds stems from statistical physics where

undirected graphs were used to mo del geometric arrangement in space Several types of

f

b d

h x x

x h x h x

g

a c e

h

Figure The set C fe f g separates b etween A fa dg and B fhg

Markov conditions were later introduced in order to asso ciate these graphs and prob

abilistic CI structures The original pairwise Markov prop erty was strengthened to the

lo cal Markov prop erty and this was strengthened to the global Markov prop erty The

latter prop erty corresp onds to the separation criterion and app eared to b e the strongest

p ossible Markov condition in a certain sense see Remark The Markov conditions

dier in general eg but coincide in case of p ositive measures

Note that similar story was observed in case of acyclic directed graphs and chain

graphs for an overview see Chapter of and has b een rep eated recently in case

of advanced graphical mo dels see Section However in this work attention is paid

only to the result of this development that is to graphical criteria which corresp ond to

resp ective global Markov conditions

A probability measure P over N is Markovian with respect to an undirected graph G

over N if

A B j C G implies A B j C P for every hA B jC i T N

and perfectly Markovian if the converse implication holds as well It was shown in

Theorem that a p erfectly Markovian discrete probability measure exists for every

undirected graph over N In other words every UG mo del is a probabilistic CI mo del

and faithfulness of UG mo dels in sense of Section is ensured

Remark This is to explain certain habitual terminology used sometimes in literature

The remark holds also in case of acyclic directed graphs and chain graphs see Sections

The existence of a p erfectly Markovian measure which b elongs to a

class of measures implies the following weaker result Whenever a disjoint triplet t is

not represented in a graph G then there exists a measure P which is Markovian with

resp ect to G and t is not valid conditional indep endence statement with resp ect to P

Some authors say then that the class of measures is perfect with resp ect to

G Thus Theorem from says that the class of CG measures with prescib ed layout

o d discrete and continuous variables is p erfect with resp ect to every undirected graph

However the claim ab out p erfectness of a class is also referred in literature

as the completeness of the respective graphical criterion relative to since it says that

the criterion cannot b e strengthened within any more unlike the pairwise and lo cal

Markov conditions in case of the class of p ositive measures see Remark By strong

completeness is then meant the existence of a p erfectly Markovian measure over N with

prescrib ed nontrivial sample space X

N

original graph induced subgraph moral graph

d d d

x x x

a a a

h h h x x x

e e e

f f f

b b b

R R R R

h x h x h x

x h x x

g

c c c

Figure Testing ha f j fc dgi according to the moralization criterion

One can say that two undirected graphs G and H over N are Markov equivalent if the

classes of Markovian measures with resp ect to G and H coincide The result ab out the

existence of p erfectly Markovian measures implies that it o ccurs i M M Moreover

G H

the observation that a ! b in G i a b j N n ab G implies that M M i

G H

G H Thus the equivalence question in sense of Section has a simple solution in

case of undirected graphs

Remark A marginaly continuous probability measure over N is called factorizable

with respect to an undirected graph G over N if it factorizes after the class see p of

its cliques It is known that every factorizable measure is Markovian the converse is

true for p ositive measures but not for all discrete measures

One can say that two graphs are factorizably equivalent if the corresp onding classes of

factorizable measures coincide However this notion is not very sensible in the frameowrk

of undirected graphs since it reduces to identity of graphs in this case one can use the

same reasoning like in case of Markov equivalence

The restriction of an UG mo del to a set T N is an UG mo del However

T

the corresp onding marginal graph G diers from the usual induced subgraph G For

T

T

a b T one has a ! b in G i there exists a path in G b etween a and b consisting of

no des of fa bg N n T

Acyclic directed graphs

These graphical mo dels are also known under name Bayesian networks Note that

the ma jority of authors b ecame accustomed to the phrase directed acyclic graphs which

is not accurate from grammatical p oint of view since the adjectives do not commute

The resp ective abbreviation DAG is therefore commonly used

Two basic criteria to determine whether a triplet hA B jC i T N is represented in an

acyclic G were developed Lauritzen et al prop osed the moralization

criterion while the group around Pearl used the dseparation criterion d means

directional

The moralization criterion has three stages First one takes the set T an AB C

G

and considers the induced subgraph G Second G is changed into its moral graph H

T T

that is the underlying graph of the graph K with mixed edges over T which is obtained

from the graph G by adding a line a ! b in K whenever there exists c T having b oth a

T

and b as parents in G The name moral graph was motivated by the fact that the no des

T

having a common child are married The third step is to decide whether C separates

b etween A and B in H If yes one says that hA B jC i is represented in G according to

the moralization criterion For illustration see Figure where the tested triplet is not

represented in the original graph

To formulate dseparation criterion one needs some auxiliary concepts as well Let

c c n b e a route in a directed graph G By a collider node with resp ect to

n

is understo o d every no de c i n such that c c c in One says that

i i i i

is active with respect to a set C N if

every collider no de with resp ect to b elongs to an C

G

every other no de of is outside C

Route which is not active with resp ect to C is blocked by C A triplet hA B jC i is rep

resented in G according to the dseparation criterion if every route equivalently every

path in G from A to B is blo cked by C For illustration of dseparation criterion see

Figure It was shown in that the moralization and dseparation criteria for acyclic

directed graphs are equivalent Note that the moralization criterion is eective if hA B jC i

is represented in G while dseparation is suitable for the opp osite case The third p ossible

equivalent criterion a compromise b etween those two criteria app eared in

One writes A B j C G whenever hA B jC i T N is represented in an acyclic

directed graph G according to one of the criteria Thus every acyclic directed graph G

induces a formal indep endence mo del

M f hA B jC i T N A B j C G g

G

Following common practice let me call every indep endence mo del obtained in this way a

DAG model These mo dels were not characterized like UG mo dels just several formal

prop erties of DAG mo dels were given in They imply that every DAG mo del is a

graphoid satisfying the comp osition prop erty The problem of axiomatic characterization

of DAG mo dels seems to b e more complicated see Remark

The denition of Markovian and perfectly Markovian measure with respect to an acyclic

directed graph is analogous to the case of undirected graphs It was shown in that a

p erfectly Markovian discrete probability measure exists for every acyclic directed graph

Hence the existence of a p erfectly Markovian measure with prescrib ed nontrivial discrete

sample space was derived Thus DAG mo dels are also probabilistic CI mo dels

Two acyclic directed graphs are Markov equivalent if their classes of Markovian mea

sures coincide The problem of graphical characterization of this equivalence was probably

rst solved in but the result can b e found also in other publications and fol

lows from an analogous result for chain graphs as well Lets call by an immorality in

an acyclic directed graph G every induced subgraph of G for a set T fa b cg such that

a c in G b c in G and a b is not an edge in G Two acyclic directed graphs are

Markov equivalent i they have the same underlying graph and the same immoralities

d

x

a

x h

e

f

b

R R

h x

x h

g

c

Figure The path a b e f is active with resp ect to C fc dg

Note that the word immorality has the same justication like moralization criterion

other authors used various alternative names like unshield colliders v structures and

uncoupled headtohead no des

However the question of choice of a suitable representative of equivalence class has no

natural solution in the framework of acyclic directed graphs There is no distinguished rep

resentative in every class of equivalent graphs Thus hybrid graphs like essential graphs

or completed pattern were used in literature to represent uniquely equivalence

classes of acyclic directed graphs The problem of estimation of DAG mo dels from data

more exactly estimation of an essential graph on basis of the induced indep endence mo del

which could b e obtained as a result of statistical tests based on data was treated in

Remark It is a sp eciality of the case of acyclic directed graphs that for marginally

continuous probability measures the resp ective concept of recursively factorizable mea

sure coincides with the concept of Markovian measure Another sp ecic feature of

this case is that an analogue of the lo cal Markov prop erty is equivalent to the global

Markov prop erty This fact can b e also derived from the result in saying that

the least semigraphoid containing the following collection of indep endence statements

a fa a g n pa a j pa a for i n

i i i i

G G

where a a n is an ordering of no des of G consonant with direction of arrows

n

is nothing but the induced mo del M The ab ove collection of indep endence statement

G

is called often an causal input list or stratied proto col

Unlike the case of UG mo dels the restriction of a DAG mo del need not b e a DAG

mo del as the following example shows

Example There exists a DAG mo del over N fa b c d eg whose restriction to

T fa b c dg is not a DAG mo del over T Consider the indep endence mo del induced by

the graph in Figure It was shown in Lemma that its restriction to T is not

a DAG mo del This unpleasant prop erty of DAG mo dels probably motivated attempts

to extend study to DAG mo dels with hidden variables that is restriction of DAG mo dels

see Section

a e

d

h h x

R R

h h

c

b

Figure Acyclic directed graph with hidden variable e

Remark An indirect consequence of the preceding example is that DAG mo dels can

not b e characterized in terms of prop erties of semigraphoid type unlike UG mo dels

To evidence it take the following p ersp ective Let us call by a relevance statement over N

any indep endence or dep endence statement which corresp onds to a disjoint triplet over

N By a ful lconsistent set of relevance statements is understo o d a set of these statements

over N such that for every hA B jC i T N exclusively either the corresp onding indep en

dence statement or the corresp onding dep endence statements b elongs to the set Every

indep endence mo del can b e easily identied with a set of relevance statements of this

kind every missing indep endence statement is automatically regarded as a dep endence

statement

Consider sp ecial formal prop erties of fullconsistent sets of relevance statements where

a nite conjunction of relevance statements which may b e empty implies another rel

evance statement These formal prop erties are general enough since every requirement

that a nite conjuction implies a nite disjunction of relevance statements can b e equiv

alently describ ed in this way b ecause of fullconsistency To b e more sp ecic I have

in mind prop erties expressed in the form of syntactic inference rules eg semigraphoid

prop erties on p or the prop erties characterizing UG mo dels on p The interpre

tation for a given set of variables N is this a rule of this sort is applicable only when the

substitution of subsets of N for capital letters A B C resp elements of N for lower

case letters like d the symbol has sp ecic meaning leads to relevance statements over

N which corresp ond to disjoint triplets over N for all involved statements

Basic observation is that the restriction of any fullconsistent set of relevance state

ments over N satisfying a formal prop erty of this kind to a set T N is a full

constistent set of relevance statements over T satisfying the same formal prop erty This

hold for any even innite collection of formal prop erties of this type Therefore b ecause

of Example DAG mo dels can never b e characterized by means of any collection of

these prop erties

However p erhaps DAG mo dels can b e characterized by means of more general formal

prop erties where elementary clauses are more complex and represent sets of relevance

statements For example one symbol AB j C could represent a class of all dep endence

statements AB j D where C D N n AB According to email communication by

T Verma such a characterization is p ossible but very complex

Classic chain graphs

A chain graph is a hybrid graph without directed cycles or equivalently a hybrid graph

which admits a chain see Section p The class of chain graphs was introduced

original and induced graph moral graph

a a

f f

x h x h

c e c e

x h x x h x

H H

H H

b b

H H

H H

H H

H H

Hj H

x x x x

g g

d d

Figure Testing ha d j fb e g gi according to the moralization criterion for chain graphs

by Lauritzen and Wermuth in middle eighties in the rep ort which b ecame later a

basis of a journal pap er

Classic interpretation of chain graphs is based on the moralization criterion for chain

graphs established by Lauritzen and Frydenberg The main distinction b etween

the moralization criterion for chain graphs and for acyclic directed graphs see p is

a more general denition of the moral graph in case of chain graphs Supp osing G is

T

a hybrid graph over T N one denes a graph K with mixed edges over T by

adding lines a ! b in K whenever there exist c d T b elonging to the same connectivity

comp onent of G p ossibly c d such that a c in G and b d in G The moral

T T T

graph H of G is then the underlying graph of K A triplet hA B jC i T N represented

T

in a chain graph G over N according to the moralization criterion if C separates b etween

A and B in the moral graph G where T an AB C For illustration see Figure

T G

An equivalent cseparation criterion c stands for chain which generalizes the d

separation criterion for acyclic directed graphs was introduced in This criterion was

later simplied as follows By a section of a route c c n in a hybrid

n

graph G is understo o d a maximal undirected subroute c ! ! c of that is either

i j

i or c c is not a line analogously for j By a collider section of is understo o d

i i

a section c c i j n such that c c ! ! c c in A route

i j i i j j

is superactive with respect to a set C N if

every collider section of contains a no de of C

every other section of is outside C

Route which is not sup eractive with resp ect to C is blocked by C A triplet hA B jC i

T N is represented in G according to the cseparation criterion if every route in G from

A to B is blo cked by C The equivalence of the cseparation criterion and the moralization

criterion was shown in Consequence One writes A B j C G if hA B jC i is

represented in a chain graph G according to one of these criteria The induced formal

indep endence mo del is then

M f hA B jC i T N A B j C G g

G

Thus the class of CG models was introduced Since cseparation generalizes b oth the

separation criterion for undirected graphs and the dseparation criterion for acyclic di

rected graphs every UG mo del and every DAG mo del is a CG mo del for illustration see

Figure on p Every CG mo del is a graphoid satisfying the comp osition prop erty

Note that Example can serve also as an example that the restriction of a CG

mo del need not b e a CG mo del Therefore one can rep eat the arguments from Remark

showing that CG mo dels cannot b e characterized by means of formal prop erties of

semigraphoid type

Remark Unlike the case of undirected and acyclic directed graphs blo cking of all

routes required in cseparation criterion is not equivalent to blo cking of all paths Consider

the chain graph G in the lefthand picture of Figure The only path b etween A fag

and B fdg is a b ! c d which is blo cked by C fb e g g However the

route a b ! c ! e f g c d is active with resp ect to C Thus one has

f a d j fb e g g G g Despite the fact that the class of all routes b etween two sets

could b e innite cseparation is nitely implementable for another reason see Section

in

Note that the ab ove mentioned phenomenon was the main reason why the original

version of cseparation lo oked akward It was formulated for a sp ecial nite class of

routes called trails and complicated by subsequent inevitable intricacies

A probability measure P over N is Markovian with respect to a chain graph G over N

if

A B j C G implies A B j C P for every hA B jC i T N

and perfectly Markovian if the converse implication holds as well The main result of

says that a p erfectly Markovian p ositive discrete probability measure exists for every chain

graph In particular faithfulness of CG mo dels in sense of Section is ensured as well

Two chain graphs over N are Markov equivalent if their classes of Markovian measures

coincide These graphs were characterized in graphical terms by Frydenberg By a

complex in a hybrid graph G over N is understo o d every induced subgraph of G for a set

T fd d g k such that d d d ! d for i k d d in

k i i k k

G and no additional edge b etween distinct no des of fd d g exists in G Two chain

k

graphs over N are Markov equivalent i they have the same underlying graph and the

same complexes

However unlike the case of acyclic directed graphs the advanced question of repre

sentation of Markov equivalence classes has an elegant solution Every class of Markov

equivalent chain graphs contains a naturally distinguished member Given two chain

graphs G and H over N having the same underlying graph one says that G is larger than

H if every arrow in G is an arrow in H with the same direction Frydenberg showed

that every class of Markov equivalent chain graphs contains a graph which is larger than

every other chain graph within the class that is it has the greatest number of lines

This distinguished graph is named the largest chain graph of the equivalence class An

elegant graphical characterization of those graphs which are the largest chain graphs was

presented in The pap er also describ es an algorithm for transformation of every

chain graph into the resp ective largest chain graph An alternative algorithm is presented

in where the problem of nding the largest chain graph on basis of induced formal

indep endence mo del is solved This could b e utilized for learning CG mo dels

Remark Lauritzen Section dened the concept of marginally continuous

factorizable measure with resp ect to a chain graph Like in case of undirected graphs every

factorizable measure is Markovian and the converse is true for p ositive measures

Well having xed the sample space X X where X is nontrivial for each i N

N N i

one can say that two chain graphs over N are factorizably equivalent if the corresp onding

classes of factorizable measures on X coincide However unlike the case of undirected

N

and acyclic directed graphs the hypothesis that this equivalence coincides with Markov

equivalence has not b een conrmed untill now see Question

Within classic graphical mo dels

This section deals with some metho ds of description of probabilistic structures which in

fact fall within the scop e of classic graphical mo dels

Decomp osable mo dels

Very imp ortant class of undirected graphs is the class of triangulated graphs An undi

rected graph G is called triangulated or chordal if every cycle a a n in G has

n

a chord that is a line b etween no des of fa a g dierent from the lines of the

n

cycle There are several equivalent denitions of a chordal graph one of them says that

the graph can b e decomp osed in a certain way into its cliques see Prop osition

which motivated other alternative name decomposable graph see p For this reason

UG mo dels induced by triangulated graphs are named decomposable models Another

equivalent denition see Prop osition is that all cliques of the graph can b e

ordered into a sequence C C m satisfying the running intersection property

m

i m k i S C C C

i i j k

j i

Note that the phrase acyclic hypergraph is sometimes used in literature for a class of

sets admitting an ordering of this type The sets S are then called separators since S

i i

S

separates the history H C n S from the residuals R C n S in the graph

i j i i i i

j i

for every i m see p Actually separators and their multiplicity ie

the number of indices i f mg for which S S do not dep end on the choice of

i

the sequence satisfying the running intersection prop erty see Lemma in Section

or Note that the running intersection prop erty has a close connection to marginal

problem within the framework of probabilistic exp ert systems

One can show by rep eated application of Prop osition from that a marginally

continuous probability measure P is Markovian with resp ect to a triangulated undirected

graph G over N i its marginal densities f A N satisfy the following pro duct formula

A

Q

f x

C C

C C

f x

Q

N

w S

f x

S S

S S

where C is the class of cliques S the class of separators and w S denotes the multiplicity

of a separator Thus to store a discrete measure P in memory of a computer one needs

to store only its clique marginals

probabilistic CI mo dels

classic CG mo dels

DAG mo dels

UG mo dels

decomp osable mo dels

Figure Relationships among classic graphical mo dels

Further related equivalent denition of a triangulated graph is the existence of a

junction tree Theorem of its cliques and separators Junction trees then form

a mathematical basis for miscellaneous eective computational metho ds which

originate from the local computation method Thus decomp osable mo dels are very

suitable from the p oint of view of implementation see Section p

Perhaps another characterization of decomp osable mo dels is worthy of mentioning

Decomp osable mo dels are just those formal indep endence mo dels which are simutaneously

UG mo dels and DAG mo dels For illustration see Figure A characterization of

decomp osable mo dels in terms of a nite number of formal prop erties is given in It

implies that decomp osable mo dels are closed under restriction

Recursive causal graphs

The concept of recursive causal graph seems to precede the concept of chain graph

It can b e equivalently dened as a chain graph which admits a chain such that all its lines

b elong to the rst blo ck Thus b oth undirected and acyclic directed graphs are sp ecial

cases of recursive causal graphs The way of ascribing of an indep endence mo del to a

recursive graph is consonant with the way used in case of classic chain graphs

Lattice conditional indep endence mo dels

Andersson and Perlman came with an idea to describ e probabilistic CI structures

by nite lattices of subsets of N Given a ring R of subsets of N one says that a

probability measure over N satises the lattice conditional independence model LCI

mo del induced by R if

E F R E n F F n E j E F P

However it was found later in that LCI mo dels coincide with DAG mo dels induced

by transitive acyclic directed graphs in which a b and b c implies a c Thus LCI

mo dels also fall within the scop e of classic graphical mo dels Note that these mo dels are

advantageous from the p oint of view of learning It was shown in that an explicit

formula for the maximum likelihoo d estimate exists even in case of nonmonotone pattern

of missing data

Bubble graphs

Shafer in Section of dened bubble graphs which are not graphs in standard sense

mentioned in Section A bubble graph over N is sp ecied by an ordered decomp osition

B B n of N into nonempty subsets called bubbles and by a collection of

n

directed links which p oint to bubbles although they originate from single no des taken

from the preceding bubbles Every graph of this type describ es a class of probability

measures over N which satisfy certain factorization formula

One can asso ciate a chain graph with every bubble graph as follows Join no des in

each bubble by lines and replace any directed link from a no de a N to a bubble B N

by the collection of arrows from a to every no de of B Then one can show easily that a

probability measure over N satises the factorization formula corresp onding to the bubble

graph i it factorizes with resp ect to the ascrib ed chain graph in sense of Remark In

particular every bubble graph can b e interpreted as a classic chain graph On the other

hand every DAG mo del can b e describ ed by a bubble graph

Advanced graphical mo dels

Various types of graphs have b een recently prop osed in literature in order to describ e

probabilistic structures p ossibly expressed in terms of structural equations for random

variables Some of these graphs can b e viewed as to ols for description of CI structures

although this may not b e the original aim of resp ective authors This section gives an

overview of these graphical approaches Note that the ma jority of formal indep endence

mo dels ascrib ed to these graphs are semigraphoids satisfying the comp osition prop erty

General directed graphs

A natural way of generalization is to allow directed cycles Spirtes Glymour and Scheines

see Chapter in mentioned p ossible use of general directed graphs for description

of mo dels allowing feedback They prop osed to use dseparation criterion see p to

ascrib e a formal indep endence mo del to a directed graph even allowing multiple edges

It was shown in that even in case of general directed graphs dseparation criterion is

equivalent to the moralization criterion and the criteria are complete in sense of Remark

relative to the class of nondegenerate Gaussian measures Richardson published

a graphical characterization of Markov equivalent directed graphs It is rather complex

in comparison with the case of acyclic directed graphs six indep endent conditions are

involved

Recipro cal graphs

Koster introduced very general class of recipro cal graphs A reciprocal graph G over

N is a graph with mixed edges over N multiple edges are allowed such that there is no

arrow in G b etween no des b elonging to the same connectivity comp onent of G Thus

every classic chain graph is a recipro cal graph and every general directed graph is a

recipro cal graph as well The moralization criterion for chain graphs see p can b e

used to ascrib e a formal indep endence mo del to every recipro cal graph Note that in case

of directed graphs it reduces to the moralization criterion treated by Spirtes

Thus consistency of recipro cal graphs see p is ensured The question of their

faithfulness remains op en but the related question of existence of a p erfect class of mea

sures see Remark was answered p ositively Kosters aim was to apply these graphs to

simultaneous equation systems LISREL models A certain recipro cal graph can b e

ascrib ed to every LISREL mo del so that the class of nondegenerate Gaussian measures

satisfying the LISREL mo del is p erfect with resp ect to the assigned recipro cal graph in

sense of Remark

Jointresponse chain graphs

Cox and Wermuth generalized the concept of chain graph by introducing two addi

tional types of edges A jointresponse chain graph G is a chain graph in sense of Section

in which however every arrow is either a solid arrow or a dashed arrow and every

line is either a solid line or a dashed line Thus even four types of edges are allowed in a

graph of this type Moreover two technical conditions are required for every connectivity

comp onent C of a jointresponse chain graph namely

all lines within C are of the same type ie either solid or dashed

all arrows directed to no des of C are of the same type

The interpretation of these graphs see Section is more likely in terms of what

is known as pairwise Markov prop erty see Remark Namely the absence of an edge

b etween no des a and b is interpreted as a CI statement a b j C where the set C N n ab

dep ends on the type of absent edge Note that technical conditions ab ove allow one to

deduce implicitly what is the type of the absent edge

The resulting interpretation of jointresponse chain graphs with solid lines and arrows

only is then in concordance with the original interpretation of chain graphs see Section

so that they generalize classic chain graphs An analogue of global Markov prop erty

was established in two other sp ecial cases see Sections and

Remark Following an analogy with development of classic graphical mo dels see

Remark observe that in order to determine the strongest p ossible Markov condition

on basis of pairwise Markov condition one needs to know what is the resp ective class

of probability measures This class of measures was traditionally closely connected with

the considered class of graphs It was the class of p ositive measures in case of CG mo dels

and UG mo dels which are called concentration graphs by Cox and Wermuth the

class of Gaussian measures in case of covariance graphs see Section and the class

of all probability measures in case of acyclic directed graphs see Remark Since

Cox and Wermuth did not explicate the class of measures which should corresp ond to

general jointresponse chain graphs one cannot derive automatically the resp ective global

Markov condition Well I can only sp eculate that they have probably in mind the class

of nondegenerate Gaussian measures In particular global Markov condition for general

jointresponse chain graph was not established so far see Section of and the

question of consistency see Section remains to b e solved

Thus other theoretical questions mentioned in Section do not have sense for joint

resp onse chain graphs untill consistency is established for them

Covariance graphs

However consistency was ensured in a sp ecial case of undirected graphs made of dashed

lines and named covariance graphs Kauermann formulated a global Markov prop erty

for covariance graphs which is equivalent to the ab ove mentioned condition of Cox and

Wermuth for every probability measure whose induced indep endence mo del satises the

comp osition prop erty A triplet hA B jC i T N is represented in a covariance graph G

if N n AB C separates b etween A and B Thus every covariance graph induces a graphoid

satisfying the comp osition prop erty Kauerman also showed that the class of Gaussian

measures is p erfect in sense of Remark with resp ect to every covariance graph In

particular his criterion is the strongest p ossible one for the considered class of measures

Alternative chain graphs

Another class of jointresponse chain graphs for which the global Markov prop erty was

established are chain graphs with solid lines and dashed arrows only Lead by a sp ecic

way of parametrization of nondegenerate Gaussian measures Andersson Madigan and

Perlman introduced alternative Markov prop erty AMP for chain graphs Their

alternative chain graphs are chain graphs in sense of Section but their interpretation

is dierent from the interpretation of classic chain graphs see Section so that they

corresp ond to the ab ove mentioned jointresponse chain graphs see x for details

The corresp onding augmentation criterion is analogous to the moralization criterion

for classic chain graphs but it is more complex Testing whether a triplet hA B jC i T N

is represented in an alternative chain graph G over N consists of steps The rst step

is a sp ecic restriction of G to an extended graph over a set T N involving AB C

an analogue of the induced graph G in the moralization criterion The second step

T

is transformation of the extended graph into an undirected augmented graph This

is done by adding some edges and taking the underlying graph an analogue of the

moralization pro cedure The third step is testing whether C separates b etween A and B

in the augmented graph

Like in case of classic chain graphs an equivalent pseparation criterion p stands for

path was introduced The main result of is the existence of p erfectly Markovian

nondegenerate Gaussian measure for every alternative chain graph Thus the faithfulness

of these mo dels is ensured Moreover Markov equivalent alternative chain graphs were

characterized in graphical terms as well Every class of Markov quivalence can b e

represented by resp ective essential graph for details see x

Annotated graphs

Paz prop osed in a sp ecial fast implementation mo dication of the moralization

criterion for acyclic directed graphs In the preparatory stage of that pro cedure the

original directed graph G is changed into its moral graph and every its immoralitity

a c b in G is recorded by annotation of the edge a ! b of the moral graph by

the set C of all descendants of c in G Thus the original graph over N is changed into

an annotated graph over N that is an undirected graph supplemented by a collection

of elements fa bg j C where a b N a b and C N n ab which represents

annotated edges Testing whether a triplet hA B jC i T N is represented in G then

consists in application of a sp ecial membership algorithm for annotated graphs This

algorithm consists in successive restriction of the graph removal of resp ective annotated

edges and nal checking whether C separates b etween A and B in the resulting graph

All this pro cedure is equivalent to the moralization algorithm

The p oint is that this approach has much wider applicability In the class of regular

annotated graphs was introduced together with the corresp onding general membership

algorithm Formal indep endence mo del induced in this way by a regular annotated graph

was shown to b e a graphoid Regular annotated graph can serve as a condensed record

for the least graphoid containing the unions of UG mo dels their graphoid closure

Given a sequence of undirected graphs G N L i k k such that

i i i

N N and L L for i k a sp ecic annotation algorithm describ ed

i i i i

in allows one to construct a regular annotated graph over N N such that the

k

indep endence mo del induced by it is just the graphoid closure of all UG mo dels induced

by G i k Since every classic CG mo del can b e obtained in this way regular

i

annotated graphs generalize classic graphical mo dels

Hidden variables

Example shows that the restriction of a DAG mo del need not b e a DAG mo del

This maybe led to an idea to describ e restrictions of DAG mo dels by means of graphical

diagrams These mo dels are usually named the models with hidded variables since except

observed variables in N one anticipates other unobserved hidden variables K and a

DAG mo del over NK

Geiger Paz and Pearl introduced the concept of embedded Bayesian network It is

a graph over observed variables N allowing b oth directed and bidirected edges without

multiple edges such that purely directed cycles that is directed cycles made exclusively

of arrows are not present in the graph A generalized dseparation criterion was used to

ascrib e a formal indep endence mo del over N to a graph of this type It is mentioned in

that one can always nd a DAG mo del over a set M N whose restriction to N

is the ascrib ed indep endence mo del Moreover according to Pearls oral communication

Verma showed that the restriction of every DAG mo del can b e describ ed in this way Note

that faithfulness of embedded Bayesian networks is an easy consequence of faithfullness

of DAG mo dels and the ab ove mentioned claims

However there are other graphical metho ds for description of mo dels with hidden

variables For example summary graphs from x or ancestral graphs mentioned

b elow

Ancestral graphs

Motivated by the need to describ e classes of Markov equivalent general directed graphs

Richardson prop osed to use sp ecial graphical ob jects called partial ancestral graphs

PAGs for this purp ose PAGs are graphs whose edges have p ossible endings for b oth

endno des and where the endings of dierent edges near a common endno de may b e

connected by two p ossible connections Every mark of this type in a PAG express

certain graphical prop erty shared by all graphs within the Markov equivalence class for

example that a no de is not an ancestor of another no de in all equivalent graphs

The idea of graphical representation of common features of classes of Markov equiv

alent graphs was later substantially simplied In a recent pap er Richardson and

Spirtes introduced ancestral graphs These graphs admit three types of edges namely

lines arrows and bidirected edges eg a b and satisfy some additional requirements

These requirements imply that multiple edges and lo ops are not present in ancestral

graphs A formal indep endence mo del over N is ascrib ed to an ancestral graph over N

by means of the mseparation criterion which generalizes the dseparation criterion for

acyclic directed graphs

Additional standardization of ancestral graphs is suitable Maximal ancestral graph

MAG is an ancestral graph G such that a b is an edge in G i f a b j C G g

for every C N n ab MAGs exhibit some elegant mathematical prop erties One can

dene graphical op eration of marginalizing and conditioning of MAGs which corresp onds

to the resp ective op eration with induced formal indep endence mo dels cf Section

Edges of a MAG G corresp ond to single real parameters in a certain parametrization of

the class of nondegenerate Gaussian measures which are Markovian with resp ect to G

Moreover there exists a p erfectly Markovian Gaussian measure with resp ect to G Thus

the question of faithfulness see Section has p ositive solution in this framework Note

that MAG mo dels involve b oth UG mo dels and DAG mo dels and coincide with the class

of mo dels induced by summary graphs see x in

MC graphs

Koster introduced a certain class of graphs which admit the same three types of edges

as ancestral graphs However in these graphs called MC graphs multiple edges and some

lo ops are allowed The abbreviation MC means that graphical op erations of marginalizing

and conditioning can b e applied to these graphs like in case of MAGs However unlike

mseparation the resp ective separation criterion for MC graphs requires blo cking of all

routes like in the cseparation criterion for classic chain graphs cf Remark As

mentioned in x of the separation criterion for MC graphs generalizes mseparation

criterion Thus the class of formal indep endence mo dels induced by MC graphs involves

MAG mo dels On the other hand although MC graphs include chain graphs the resp ective

separation criterion in case of chain graphs diers b oth from the cseparation criterion

and from the pseparation criterion

Incompleteness of graphical approaches

Let me raise the question how many probabilistic CI mo dels can b e describ ed by graphs

cf the question of completeness in Section p Expressiveness of graphical metho ds

varies For example in case jN j one has UG mo dels and DAG mo dels CG

mo dels But in case jN j one has UG mo dels DAG mo dels and CG mo dels

while in case jN j there exist UG mo dels DAG mo dels and CG

mo dels However this is not enough for description of CI structures induced by

discrete probability measures Well in case jN j one has discrete CI mo dels but in

case jN j already CI mo dels So there is a tremendeous gap b etween the

number of classic graphical mo dels and the number of discrete probabilistic CI structures

in case jN j and this gap increases with jN j In particular classic graphical mo dels

cannot describ e all CI structures

The reader may ob ject that a suciently wide class of graphs could p ossibly cure the

problem Let me give an argument against it Having xed a class of graphs over N in

which only nitely many types of edges are allowed the number of these graphs is b ounded

by the cardinality of the p ower set of the set of p ossible edges which grows p olynomially

with n jN j On the contrary as shown in Consequence on p the number of

discrete probabilistic CI structures grows with n at least as rapidly as the p ower set of

p ower of n

Thus in my opinion one can hardly achieve completeness of a graphical approach see

p relative to the class of discrete measures and this may result in serious metho dological

errors see Section p Well p erhaps one can think ab out a class of advanced

complex graphs which allow exp onentially many hyperedges eg annotated graphs

and which has a chance to achieve completeness But complex graphs of this sort lo ose

their easy interpretability for humans

The conclusion ab ove is the reason for an attempt to develop a nongraphical approach

to the description of probabilistic CI structures The approach describ ed in subsequent

chapters achieves completeness in discrete framework and had minor chance to b e accept

able by humans On the other hand the mathematical ob jects which are used describ e

more than necesary in sense that some induced formal indep endence mo dels are not

probabilistic CI mo dels The loss of faithfulness is a natural price for the p ossibility of

interpretation and relatively go o d solution of the equivalence question Nevertheless I

consider these two gains more valuable than faithfulness

Chapter

Structural imsets fundamentals

The moral of the pro ceding chapter is that the main drawback of graphical mo dels is their

inability to describ e all probabilistic conditional indep endence structures This motivated

an attempt to develop an alternative metho d of their description which overcomes this

drawback and keeps some assets of graphical metho ds The central notion of this metho d

is the concept of structural imset introduced in this chapter Note that basic ideas of

the theory were presented earlier but later recognized sup eruous details worsened

understanding of the message of the original series of pap ers This work brings in the

next four chapters much simpler presentation supplemented by facts and p ersp ectives

revealed later

Basic class of distributions

The class of probability measures for which this approach is applicable that is whose

induced conditional indep endence mo dels can b e describ ed by structural imsets is rela

tively wide It is the class of measures over N with nite multiinformation mentioned in

Section The aim of this section is to show that this class involves three basic classes

of measures used in practice in articial intelligence and multivariate statistics

Discrete measures

These simple probability measures see Remark p are mainly used in probabilistic

reasoning which is an area of articial intelligence Positive discrete probability

measures are b ehind the mo dels used in analysis of contingency tables see Chapter

which is an area of statistics The fact that every discrete probability measure over N

has nite multiinformation is trivial

Nondegenerate Gaussian measures

These measures see Section for their basic prop erties are widely used in mathe

matical statistics in particular in multivariate statistics Consequence says that

every nondegenerate Gaussian measure over N has nite multiinformation

Nondegenerate conditional Gaussian measures

This class of measures was prop osed by Lauritzen and Wermuth with the aim to unify

discrete and continuous graphical mo dels Nondegenerate conditional Gaussian measure

P over N in the sequel called shortly CGmeasure over N will b e sp ecied as follows

The set N is partitioned into the set of discrete variables and the set of continuous

variables For every i X is a nite nonempty set and X P X For every

i i i

i X R and X is the class of Borel sets in R A discrete probability measure

i i

P on X X is given and a vector ex R and a p ositive denite matrix

x R is ascrib ed to every x X with P x if Then P is simply

sp ecied by its marginal for and the conditional probability on X given

P P P j x N ex x for every x X with P x

j

Of course these requirements determine unique probability measure on X X The

N N

ab ove denition collapses in case to a discrete measure over N and in case

to a nondegenerate Gaussian measure over N

Remark Note that positive CGmeasures when P x for every x X are

mainly used in practice A CGmeasure of this type can b e dened directly see

x by its density f with resp ect to the pro duct of the counting measure on X and

the Leb esgue measure on X

1

g xhx y y xy

2

f x y exp for x X y X

where g x hx R and p ositive denite matrices x R are named canonical

characteristics of P One can compute them directly from parameters P x ex x

which are named moment characteristics of the CGdistribution as follows see p

x x hx x ex

jj

ln lndetx ex x ex g x ln P x

These measures are p ositive in sense of Section but they do not involve all discrete

measures Therefore the class of CGmeasures was slightly enlarged in this work

To evidence that every CGmeasure has nite multiinformation and thus it is mar

ginally continuous I use auxiliary estimates with relative entropies mo died in a certain

way

Supp osing X X is a measurable space P and Q are probability measures and is

a nite measure on X X such that P Q by Qperturbated relative entropy of P

with respect to will b e understo o d the integral

Z Z

dP dP dQ

H P j Q ln x dQx x ln x dx

d d d

X X

dP

is Qquasiintegrable Of course the value do es not provided that the function ln

d

dQ

dP

or In case Q P dep end on the choice of versions of RadonNikodym derivatives

d d

it coincides with H P j mentioned in Section Note that a discrete version of this

concept is known in information theory as Kerridges inaccuracy p

Lemma Let X X b e a measurable space and a nite measure on X X Sup

p ose that P P r is a nite collection of probability measures on X X such

r

that H P j P for every k l f r g Then every convex combina

k l

tion of P P has nite relative entropy with resp ect to that is

r

r r

X X

P j whenever H

k k k r

k k

P

r

dP

k

for every k and x the version Pro of Put P P choose and x a version of

k k

k

d

P

r

dP

dP

l

of The assumption says

l

l

d d

Z

dP dP

k l

x j ln x j dx k l f r g

d d

X

One has to show that

Z Z Z

dP dP dP

j ln x j dP x ln x dP x ln x dP x

d d d

X X X

To estimate the rst term ab ove use RadonNikodym theorem the observation that the function

y y ln y is convex and the inequality y jy j

Z Z

dP dP dP

x dP x x ln x dx ln

d d d

X X

Z Z

r r

X X

dP dP dP dP

k k k k

x ln x dx x j ln xj dx

k k

d d d d

k k

X X

To estimate the second use the fact that the function y ln y is convex the inequality

dP

y jy j RadonNikodym theorem and the form of

d

Z Z Z

r r

X X

dP dP dP

k k

ln x dP x x dP x x j dP x ln j ln

k k

d d d

k k

X X X

Z Z

r r r

X X X

dP dP dP dP

k k l l

x j ln xj dx x j ln xj dx

k l k l

d d d d

k l k l

X X

Q

Lemma Let P b e a CGmeasure over N and where for

i i

iN

i and for i Then

i

fig

H P j and H P j for every i N

i

Pro of A direct formula for H P j is easy to derive Indeed write

dP

x y P x f y for x X with P x and y X

exx

d

apply logarithm integrate it with resp ect to P and obtain using standard prop erties of

integral

Z Z

H P j ln P x dP x y ln f y dP x y

exx

X X

N N

X

H P j P x H P jxj

xX P x

 

fig

The fact H P j for i is trivial For xed i rst realize that

i

fig

P is again a CGmeasure where

P jx N ex x for i X with P x

i ii

figj

fig

Therefore the marginal P is nothing but a convex combination of nondegenerate

fig

Gaussian measures To verify H P j one can use Lemma Indeed

supp ose P N e and P N f where e f R are the corresp onding

k l

parameters Because exp ectation and variance of P are known one can compute easily

l

Z Z

2

x e

2

ln dP x ln x e dP x H P j P

l l k l

R R

Z

2 2 2

ln x f f e x e f dP x

l

R

2

e f

2 2

ln f e f e f ln

The result is evidently a nite number

Consequence Every CGmeasure over N has nite multiinformation

Pro of Owing to Lemma the assumptions of Lemma on p for S N are

fullled

The fact ab ove was veried by nding nite lower and upp er estimates for multiinfor

mation The question whether there exists a suitable exact formula for values of multiin

formation function in terms of parameters of CGmeasure remains op en see Theme in

Chapter

Remark The class of CGmeassures is not closed under marginalizing which may

lead to problems when one tries to study CI within this context However it was shown

that this class can b e embedded into a wider class of measures with nite multiinformation

which is already closed under marginalizing see Consequence

Classes of structural imsets

Denitions and elementary facts concerning structural imsets are gathered in this section

fa b cg fa b cg fa b cg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

fa bg fa cg fb cg fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

fag fbg fcg fag fbg fcg fag fbg fcg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

fa b cg fa b cg fa b cg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

fa bg fa cg fb cg fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

fag fbg fcg fag fbg fcg fag fbg fcg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Figure Elementary imsets over N fa b cg

Elementary imsets

Elementary imset over a set of variables N with jN j is an imset of a sp ecic form

Given an elementary triplet hi j jK i where K N and i j N n K are distinct i j

the corresp onding elementary imset u over N is dened by the formula

hij jK i

u

hij jK i fij gK K figK fj gK

The class of elementary imsets over N will b e denoted by E N In case jN j is the

class E N empty by convention By level of an elementary imset u is understo o d

hij jK i

the number jK j For every l jN j the class of elementary imsets of level l will

n n

b e denoted by E N Supp osing jN j n it is easy to see that jE N j

l l

l

n

n

Thus in case N fa b cg one has elementary imsets of and jE N j

p ossible levels They are shown in Figure

The following observation is a basis of later results

Observation Supp osing n jN j and l f n g let us introduce a

multiset m over N by means of the formula

l

m S max f jS j l g for every S N

l

and a multiset m over N by means of the formula

jS j jS j for every S N m S

fa b cg fa b cg

Q Q

Q Q

Q Q

Q Q

fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

fag fbg fcg fag fbg fcg

Q Q

Q Q

Q Q

Q Q

Figure Two combinatorial imsets over N fa b cg

Then one can observe the following facts

a u E N hm ui

l l

b u E N n E N hm ui

l l

c u E N hm ui

Pro ofThe rst two facts are easy to evidence the third fact follows from the identity m

n

P

m and the previous facts Indeed this identity can b e veried for S N jS j as follows

l

l

jS j jS j jS j jS j

n

X X X X X

m S m S jS j jS j jS j m S l jS j jS j

l l

l l l l l

Semielementary and combinatorial imsets

Given hA B jC i T N the corresp onding semielementary imset u is dened by

hAB jC i

the formula

u

hAB jC i AB C C AC BC

Evidently zero imset is semielementary as u for any hA B jC i T N Every

hAB jC i

elementary imset is semielementary as well An example of nonzero semielementary

imset which is not elementary is the imset u shown in the lefthand picture of

habcji

Figure Provided one accepts the convention that zero imset is a combination of the

empty set of imsets one can observe the following fact

Observation Every semielementary imset is a combination of elementary imsets

with nonnegative integral co ecients

Pro ofA nonzero semielementary imset has the form u where hA B jC i T N n T N

hAB jC i

The formulas u u u and u u u can b e

hAB D jC i hAB jDC i hAD jC i hAD B jC i hAB jDC i hD B jC i

applied rep eatedly

By a combinatorial imset over N will b e understo o d every imset u which is a combi

nation of elementary imsets with nonnegative integral co ecients that is

X

u k v where k Z

v v

v E N

The class of combinatorial imsets over N will b e denoted by C N By Observation

every semielementary imset is a combinatorial imset The converse is not true the

imset u u in the righthand picture of Figure is not semielementary

habjci hacji

Clearly every combination of combinatorial imsets with co ecients from Z is again a

combinatorial imset In particular combinatorial imsets can b e equivalently introduced

as combinations of semielementary imsets with nonnegative integral co ecients

Of course a particular combinatorial imset can b e sometimes expressed in several

dierent ways For example the imset u from the lefthand picture of Figure can

b e written either as u u or as u u On the other hand there are

habjci hacji hacjbi habji

characteristics which do not dep end on a particular way of combination Supp osing

one can introduce the degree of a combinatorial imset u denoted by degu as follows

X

degu k

v

v E N

Similarly if jN j then introduce the leveldegree of u for every l jN j

denoted by degu l as the number

X

degu l k

v

v E N

l

The following lemma implies that these numbers do not dep end on the choice of co ecients

k for v E N

v

Observation Supp osing u C N and l f jN j g with jN j

degu l hm ui degu hm ui

l

where the multisets m m are introduced in Observation on p

l

Pro of Substitute in hm ui and hm ui and use Observation

l

Structural imsets

An imset u over N will b e called structural if there exists n N such that the multiple

n u is a combinatorial imset that is

X

n u k v for some n N k Z

v v

v E N

In other words an imset is structural if it is a combination of elementary imsets re

sp ectively semielementary imsets with nonnegative rational co ecients The class of

structural imsets over N will b e denoted by S N By denition every combinatorial

imset is structural In case jN j the converse is true However the question

whether this is true in general remains op en see Question on p

A

Observation Every structural imset u over N is ostandardized hm ui and

A

hm ui see pp The only imset w S N with w S N is the zero

imset w

Pro of All three prop erties hold for zero and elementary imsets and can b e extended to

P N A

combinatorial imsets and then to structural imsets Given w Z with hm w i for

every A N the condition w S for S N can b e veried by induction on jS j

Given a structural imset u let us introduce the lower class of u denoted by L as the

u

descending class induced by the negative domain of u that is

L f T N S N such that T S and uS g D

u

u

Similarly one can introduce the upper class of u denoted by U as the descending class

u

induced by the p ositive domain of u

U f T N S N such that T S and uS g D

u

u

Terminology is motivated by the next fact and later results Consequence on p

Observation Whenever u is a structural imset one has L U Moreover

u u

S S

S U S L

u u

S

Pro of Supp osing T L nd T S N with uS The fact hm ui from

u

Observation implies the existence of S K N with uK The fact that u is o

fig

standardized says hm ui for every i N which implies then

Given a structural imset u over N by the range of u denoted by R will b e understo o d

u

the set union from The following lemma is a basis of a later result

Lemma Supp osing u is a nonzero combinatorial imset over N let us consider a xed

particular combination

X

u k v where k Z u

v v

v E N

Then there exists v E N such that k and L L

v v u

Pro of Since u necessarily L U Because u is structural by Observation

u u

L U and therefore U Take maximal K U and again using L U observe

u u u u u u

that uK and L K uL Introduce

S P N n L f T N T S N uS g

u u

P

Clearly S is an ascending class and K S let us consider a multiset s It

u u S

S S

u

follows from the denition of S that hs ui uK Thus one can write

u

X X

hs ui k hs v i k hs v i

v v

v E N v E N hsv i

which implies the existence v E N with k and hs v i Well since S is

v u

ascending an elementary imset v u satises hs v i i fi j g K S and

u

hij jK i

fig K fj g K S see Section However this implies L S which

u v u

means L L

v u

Pro duct formula induced by a structural imset

This formula provides a direct way of asso ciating a structural imset with a probability

measure It can b e viewed as a generalization of the concept of factorization into marginal

densities To give a sensible denition I need the following auxiliary concept whose sense

b ecomes evident later see Section Supp ose that P is a probability measure on

X X which has nite multiinformation By a reference system of measures for P will

N N

b e understo o d any collection f i N g of nite measures on X X i N such that

i i i

fig fig

P and H P j for every i N

i i

Q

Having xed a reference system f i N g one can put and observe P

i i

iN

that is is a dominating measure for P Thus one can rep eat what is done in Convention

S

dP

for every S N p that is to choose marginal density f a version of

S

d

S

Given a structural imset u over N one says that P satises the product formula induced

by u if

Y Y

+

u S u S

f x f x for ae x X

S S S S N

S N S N

Of course the validity of this formula do es not dep end on the choice of versions of

marginal densities The inuence of the choice of a reference system of measures will

app ear to b e seeming as well see Section On the other hand exibility in its choice

is advantageous since miscellaneous sp ecial cases can b e describ ed in more details

Examples of reference systems of measures

Let me illustrate this concept by four basic examples The rst one shows that one can

always nd a reference system for a probability measure with nite multiinformation The

other examples corresp ond to imp ortant sp ecial cases mentioned already in Section

Universal reference system

Q

fig

Given a probability measure P over N with H P j P one can simply

iN

fig

put P for every i N It is evidently a reference system of measures since

i

fig

H P j for every i N Let us call it the universal reference system b ecause it

i

can b e established for any measure with nite multiinformation

Discrete case

Supp osing P is a discrete measure on X X with jX j i N one can consider

N N i

the counting measure on X in place of for every i N This is evidently a reference

i i

system for P leading to the following system of marginal densities

S

f x P fx g for every S N x X

S S S N

Remark An alternative choice of a reference system in discrete case is p ossible One

can take uniformly distributed probability measure on X for every i N This

i i

jX j

i

leads to alternative marginal densities

S

P fx g

S

f x for every S N x X

S S N

jX j

S

with convention jX j

Gaussian case

Supp osing P N e with is a nondegenerate Gaussian measure over

ij ij N

N one consider the Leb esgue measure on R is place of for every i N It is a

i

fig

ln for every i N by reference system for P b ecause H P j

ii

in Section Owing to the fact that the marginal of a Gaussian measure is again

Gaussian and one can choose marginal densities f for S N in the form

S

1

1

y e  y e S

S S S S

2

p

exp f y for y R

S

jS j

det 

S S

CGmeasures

Let P b e a nondegenerate CGmeasure over N partitioned into the set of discrete

variables and the set of continuous variables By a standard reference system for P will

b e understo o d the system f i N g where is the counting measure on nite

i i

X for i and is the Leb esgue measure on X R for i By Lemma

i i i

it is indeed a reference system of measures for P In purely discrete or Gaussian case it

coicides with two ab ove mentioned reference systems which I recalled explicitly in order

to emphasize the imp ortance of these two classic cases

One can choose the following versions of marginal densities f for S N of

S

course the formula is more complex than in purely discrete or purely Gaussian case

X

P x z f y for x X y X f x y

S S S

exz xz

S S S

z X

nS

P xz



where a detailed formula for f is in

e

Topological assumptions

The reader can ob ject that the pro duct formula is not elegant enough since it

is dimmed by nonuniqueness of marginal densities and the equality is understo o d in

almost everywhere sense However under certain top ological assumptions usually valid

in practice and additional natural convention it turns into a fair equality everywhere

A reference system f i N g for a probability measure P on X X with nite

i N N

multiinformation will b e called continuous if the following three conditions are fullled

a X is a separable metric space and X is the class of Borel sets in X for every i N

i i i

b Every op en ball in X has p ositive measure for every i N that is

i i

i N x X U x

i i

Q

S

dP

c For every S N there exists a version f of where which is

S S i

iS

d

S

Q

continuous with resp ect to the pro duct top ology on X X

S i

iS

The following observation is easy to evidence see the App endix Sections

and for relevant facts

Observation The standard reference system of measures for a nondegenerate CG

measure over N is continuous

In case of a continuous reference system Convention can b e explicated as follows

Convention Supp ose that P is a probability measure on X X with nite multi

N N

information and f i N g is a continuous reference system for P Then a implies that

i

X is a separable metric space and X is Borel algebra on X for every S N Put

S S S

Q

S

dP

choose a version f of RadonNikodym derivative which is continuous

S i S

iS

d

S

with resp ect to resp ective top ology on X and x it Note that it is p ossible owing to c

S

Let us call it the continuous marginal density of P for S Note that it follows from b

that it is determined uniquely use arguments from the pro of of the next lemma

Other notational habits from Convention remain valid In particular every f can

S

b e viewed as a continuous function on X endowed with the pro duct top ology

N

Lemma Let P b e a probability measure over N with nite multiinformation and

f i N g a continuous reference system of measures for P Lets accept Convention

i

Then is equivalent to the requirement

Y Y

+

u S u S

f x f x for every x X

S S S S N

S N S N

Pro of By a assume that X is a separable metric space for every i N Observe

i i

that X endowed with the distance

N

x y max x y for x y X

i i i N

iN

is a separable metric space inducing pro duct top ology which generates X see eg

S

Theorem I This denition implies that op en balls in X are Cartesian pro ducts of

N

op en balls in X and therefore one derives from b

i

Y

x X U x U x

N N N i

i

iN

Now b oth the lefthand side and the righthand side of are continuous functions

on X by c see Convention and says that their dierence g which is also a

N

R

continuous function on X vanishes ae Hence jg y j d y

N N N

X

N

Supp ose for contradiction that g x for some x X Then there exists

N

jg xj

and therefore such that y U x one has jg y j

Z Z

jg xj

jg y j d y jg y j d y U x

N N N

X

U x

N

which contradicts the fact ab ove Therefore g x for every x X

N

Thus by Observation one can interpret the pro duct formula induced by a structural

imset as a real identity of uniquely determined marginal densities in three basic cases

used in practice for discrete measures for nondegenerate Gaussian measures and for

nondegenerate CGmeasures Of course this need not hold for arbitrary measure with

nite multiinformation and resp ective universal reference system of measures

Markov condition

The second basic way of asso ciating a structural imset with a probability measure is

an analogue of Markov condition used in graphical mo dels That means one requires

that some conditional indep endence statements determined by an imset through a certain

criterion are valid conditional indep endence statements with resp ect to the measure

Semigraphoid induced by a structural imset

One says that a disjoint triplet hA B jC i T N is represented in a structural imset u

over N and writes A B j C u if there exists k N such that k u u is a

hAB jC i

structural imset over N as well An equivalent requirement is that there exists l N such

that l u u is a combinatorial imset over N The class of represented triplets then

hAB jC i

denes the conditional indep endence model induced by u

M f hA B jC i T N A B j C u g

u

Trivial example is the mo del induced by zero imset

Observation M T N for u

u

Pro of Inclusion T N M is trivial Supp ose for contradiction hA B jC i T N n T N

u

which means that u is a structural imset This contradicts Observation

hAB jC i

Further example is the mo del induced by an elementary imset

Lemma Supp osing v u E N one has

hij jK i

M f hi j jK i hj ijK i g T N

v

Pro of The facts hi j jK i hj ijK i M and T N M are evident Supp ose that

v v

hA B jC i M n T N and k v u w for k N and a structural imset w To

v hAB jC i

evidence AB C ij K use Observation to derive

AB C AB C AB C AB C

k hm v i hm k v i hm u i hm w i

hAB jC i

AB C

The fact hm u i then implies AB C ij K Analogously to evidence C K

hij jK i

C AB C

use also Observation with m in instead of m The fact that hA B jC i is

a disjoint triplet and K C AB C ij K then implies that hA B jC i coincides either

with hi j jK i or with hj ijK i

A basic fact is this

Lemma Every structural imset over N induces a disjoint semigraphoid over N

Pro of Semigraphoid prop erties see Section p easily follow from the denition

ab ove and the fact that the sum of structural imsets is a structural imset Indeed for

triviality prop erty realize for symmetry and for remaining

hAjC i hAB jC i hB AjC i

prop erties

hAB D jC i hAB jDC i hAD jC i

For the pro of of the equivalence result in Section I need a technical lemma In its

pro of the following simple observation concerning upp er classes see p is used

Observation Supp osing u w v where w v are structural imsets one has U

u

U U

w v

max

By Observation Pro of Inclusion U U U is trivial To show U U take S U

u w v w u

w

w S and w T whenever S T N Hence by Observation

X

S S S

hm w i hm v i hm ui uT

T S T

which implies that S U The inclusion U U is analogous

u v u

Lemma Supp ose that u is a structural imset over N Then there exists a sequence

U D D L r of descending classes of subsets of N and a sequence

u r u

ha b jC i ha b jC i of elementary triplets over N which is empty in case r

r r r

such that for every i r

a a b j C u

i i i

b a C b C D and D D fS S a b C g

i i i i i i i i i i

Pro of Observe that for every combinatorial imset and every n N one has U U

nu u

L L and A B j C n u i A B j C u for each hA B jC i T N Therefore it

nu u

suces to assume that u is a combinatorial imset and prove the prop osition by induction

on deg u

In case deg u necessarily u and one can put r and D U L

u u

In case degu one has u by Observation and can apply Lemma to nd

v u E N with L L such that w u v is a combinatorial imset Of

v u

habjC i

course faC bC g L a b j C u and one can observe that L L fS S abC g

u w u

Moreover by Observations and

deg w hm w i hm ui hm v i degu

In particular one can apply the induction hypothesis to w an conclude that there exists

a sequence U F F L r of descending classes and a sequence

w r w

ha b jC i i r of elementary triplets with a b j C w and

i i i i i i

a C b C F F F fS S a b C g

i i i i i i i i i i

Let us put D F U L for i r and for i r dene D L and

i i v u r u

ha b jC i ha bjC i By Observations and D F U L U U L U

r r r v u w v u u

It makes no problem to evidence that D D satises the required conditions Indeed

r

a b j C w implies a b j C u for i r and since L L fS S abC g by

i i i i i i w u

u w v one has D L U L L fS S abC g D fS S a b C g

r w v u u r r r r

The signicance of the preceding lemma summarized in the consequence b elow is

that one can always reach the upp er class of a structural imset from its lower class with

help of its induced conditional indep endence statements Note that reverse order in

formulation of Lemma going from the upp er class to the lover class is used b ecause

it is more suitable from the p oint of view of the pro ofs

Consequence Let u b e a structural imset over N Then every descending system

E T N containing L and satisfying

u

hA B jC i T N A B j C u and AC BC E implies AB C E

necessarily contains U

u

Pro of Apply Lemma and prove D E by reverse induction on i r

r

Markovian measures

Supp ose that u is a structural imset over N and P is a probability measure over N One

says that P is Markovian with respect to u if

A B j C u implies A B j C P whenever hA B jC i T N

Thus statistical meaning of an imsetal mo del is completely analogous to statistical

meaning of a graphical mo del Every structural imset u over N represents a class of

probability measures over N within the resp ective framework of measures eg discrete

measures Gaussian measures etc namely the class of measures which are Markovian

with resp ect to u In fact imsetal mo dels generalize graphical mo dels given a classic

graph there exists a structural imset having the same class of Markovian distributions

for DAG mo dels see Lemma

One says that P is perfectly Markovian with respect to a structural imset u over N

if u induces exactly the conditional indep endence mo del induced by P that is for every

hA B jC i T N one has

A B j C u if and only if A B j C P

One of the results of this work Theorem is that every probability measure with

nite multiinformation is p erfectly Markovian with resp ect to a structural imset On the

other hand there are sup eruous structural imsets whose induced semigraphoid is not

a mo del induced by any probability measure with nite multiinformation

Example There exists a structural imset u over N fa b c dg such that no mar

ginally continuous measure over N is p erfectly Markovian with resp ect to u Put

u u u u u

hcdjfabgi habji habjfcgi habjfdgi

Evidently c d j fa bg u a b j u a b j fcg u and a b j fdg u To show

that ab j fc dg u consider the multiset m in Figure and observe that hm v i

y y

for every v E N Hence hm w i for every structural imset w over N Because

y

hm k u u i for every k N the imset k u u is not structural

y habjfcdgi habjfcdgi

However by Consequence there is no marginally continuous probability measure over

N which is p erfectly Markovian with resp ect to u

Another imp ortant consequence of Lemma is that marginals of a Markovian mea

sure with resp ect to a structural imset u for sets in L determine uniquely its marginals

u

for sets in U This motivated the terminology lower and upp er class of u introduced

u

in Section Note that one often has U P N in which case whole Markovian

u

measure is determined by its marginals on the lower class

Consequence Supp ose that b oth P and Q are probability measures on X X

N N

which are Markovian with resp ect to a structural imset u Then

S S S S

P Q for every S L P Q for every S U

u u

Pro of One can rep eat the arguments used in the b eginning of the pro of of Lemma p

to verify the following uniqueness principle For every hA B jC i T N

AC BC BC BC AB C AB C

A B j C P A B j C Q P Q P Q P Q

T T S S

Then owing to the fact that S T P Q implies P Q one can apply Lemma and

S S

show by reverse induction on i r that P Q for every S D

i

fa b c dg

Q

Q

A

Q

Q

A

Q

A Q

fa b cg fa b dg fa c dg fb c dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fa bg fa cg fa dg fb cg fb dg fc dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fdg

Q

A

Q

Q A

Q

A

Q

Q

Figure The multiset m from Example

y

Equivalence result

The third way of asso ciating a structural imset with a probability measure is an algebraic

identity in which the measure is represented by its multiinformation function

One says that a probability measure P over N with nite multiinformation complies

with a structural imset u over N if hm ui where m denotes the multiinformation

P P

function dened in Section

Remark The concept ab ove can b e introduced alternatively in the following way

Supp ose that P is a probability measure over N which has a dominating measure see p

Q Q

S

such that H P j for each S N Note that

i i

iN iS

by Lemma P has nite multiinformation then Thus one can introduce the entropy

function of P relative to as follows

Y

S

h S H P j for S N

P i

iS

and h by convention Then P complies with a structural imset u over N i

P

hh ui Indeed implies together with the fact that u is ostandardized

P

X X X X

m S uS h S uS h fj g uS

P P P

S N S N j N S N j S

z

that is hm ui hh ui Note that in case of a discrete probability measure one can

P P

always take the counting measure in place of The corresp onding entropy function

is then nonnegative and has very pleasant prop erties which enable one to characterize

functional dep endence statements p with resp ect to P in addition to pure con

ditional indep endence statements Namely h A h AC for A C N while the

P P

equality o ccurs i A A j C P cf Remark However this pleasant phenomenon

seems to b e more or less limited to discrete case It is not clear which dominating mea

sures in general pro duce entropy functions with b ehaviour of this type towards functional

dep endence except measures concentrated on a countable set For example in Gaussian

case the entropy function relative to Leb esgue measure need not b e nonnegative or even

monotone This is the main reason why I prefer multiinformation function to entropy

function The second one is that entropy function do es dep end on the choice of a suitable

dominating measure unlike multiinformation function

The main result of this chapter says that all three ways of asso ciating structural imsets

with probability measures are equivalent In words a probability measure complies with

a structural imset i it is Markovian with resp ect to it or i the pro duct formula induced

by it holds In the pro of b elow the following simple observation is used

Observation Supp ose the situation from Convention p Then

S T N f x f x for ae x X

S S T T N

Pro of Combine the arguments used in Remark with the formula in the pro of of

Lemma

Theorem Let u be a structural imset over N P a probability measure on X X

N N

with nite multiinformation Suppose that f i N g is a reference system of measures

i

for P p let us accept Convention on p Then the fol lowing four conditions

are equivalent

Q Q

+

u S u S

i f x f x for ae x X

S S S S N

S N S N

Q

uS

ii f x for P ae x X

S S N

S N

iii hm ui

P

iv A B j C u implies A B j C P for every hA B jC i T N

Pro of Implication i ii is trivial since P and f x for P ae x X

S S N

and every S N To show ii iii apply logarithm to the assumed equality rst and

get

X Y

uS

uS ln f x ln f x for P ae x X

S S S S N

S N S N

Then by integrating with resp ect to P notation is from Convention

Z

X X

S

uS H P j uS ln f x dP x

S S S

S N S N

X

N

As explained in Remark this is equivalent to hm ui

P

To see iii iv consider a structural imset w k u u with k N and write

hAB jC i

hm k ui hm u i hm w i

P P P

hAB jC i

By Consequence the inequality b oth terms on the righthand side are non

negative and therefore they vanish Thus by one has A B j C P

Supp osing iv one already knows that f x for every x X S N Thus the

S S N

Q

u S

condition i can b e proved separately on the set Y fy X f y g

N S S

S N

Q

u S

and on the set Z fz X f z g Because of L U Observation

N S S u u

S N

it follows from in Observation that

Y

+

u S

f y for ae y Y

S S

S N

and b oth sides of the expression in i vanish ae on Y

Supp ose now z Z and put E fS N f z g Observe that L E for

z S S u z

every z Z and that E is a descending class for ae z Z by Having xed

z

hA B jC i T N the assumption A B j C u implies by iv A B j C P and hence

by Lemma derive that

f x f x f x for ae x X

AC AC BC BC AB C AB C N

In particular for ae z Z the fact AC BC E implies AB C E Altogether for

z z

ae z Z the assumptions of Consequence with E E are fullled and therefore

z

U E that is

u z

S N uS f z for ae z Z

S S

P

Since u is a structural imset one has n u k v for n N and k Z see

v v

v E N

Section For every v u E N with k one has i j j K u and

v

hij jK i

therefore by iv and on p derives

Y Y

+

v S v S

f x f x for ae x X

S S S S N

S N S N

These equalities can b e multiplied each other so that one gets

Y Y

P P

+

k v S k v S

v v

v E (N ) v E (N )

for ae z Z f z f z

S S S S

S N S N

P P

Let us introduce the multiset w k v n u k v n u For

v v

v E N v E N

every S N the fact w S implies v S for some v E N with k Hence

v

S U U U by Observation By application of to some T S and

v nu u

one derives f z for ae z Z This consideration implies

S S

Y

w S

f z for ae z Z

S S

S N

Thus one can divide by this nonzero expression for ae z Z and conclude that

Y Y

+

nu S nu S

f z f z for ae z Z

S S S S

S N S N

Take the nth ro ot of it and obtain what is desired

Let me note that one can always take the universal reference system p in Theo

rem which implies that the conditions iii and iv which are not dep endent on the

choice of a reference system are always equivalent for a probability measure with nite

multiinformation

Further comment is that another equivalent denition of conditional indep endence

can b e derived from Theorem Supp ose that P is a probability measure over N with

nite multiinformation hA B jC i T N and accept Convention It suces to put

u u and use ii in Theorem to see that A B j C P i

hAB jC i

f x f x

AC AC BC BC

for P ae x X f x

N AB C AB C

f x

C C

Note that f x for P ae x X

S S N

Chapter

Description of probabilistic mo dels

Two basic approaches to description of probabilistic CI structures are dealt with in this

chapter The rst one which uses structural imsets was already mentioned in Section

The second one which uses sup ermo dular functions is closely related to the rst

one It can also use imsets over N to describ e CI mo dels over N but the resp ective

class of imsets and their interpretation are completely dierent However despite formal

dierence the approaches are equivalent In fact there exists a certain duality relation

b etween these two metho ds one approach is complementary to the other see Section

The main result of the chapter says that every CI mo del induced by a probability

measure with nite multiinformation can b e describ ed b oth by a structural imset and by

a sup ermo dular function

Sup ermo dular set functions

A real set function m P N R is called a supermodular function over N if

mU V mU V mU mV for every U V N

The class of all sup ermo dular functions on P N will b e denoted by K N The denition

can b e formulated in several equivalent ways

Observation A set function m P N R is sup ermo dular i any of the following

three conditions holds

i hm ui for every structural imset u over N

ii hm ui for every semielementary imset u over N

iii hm ui for every elementary imset u E N

Pro of Evidently i ii iii The implication iii i follows from the denition of

structural imset see Section p and linearity of scalar pro duct The condition

is equivalent to the requirement hm u i for every hA B jC i T N which is nothing

hAB jC i

but ii

Further evident observation is as follows

Observation The class of sup ermo dular functions K N is a cone

m m K N m m K N

Remark This is to warn the reader that a dierent terminology is used in game

theory where sup ermo dular set functions are named either convex set functions

or even convex games I followed that terminology in some of my former rep orts

However another common term sup ermo dular is used here in order to avoid

confusion with usual meaning of the adjective convex in mathematics As mentioned in

sup ermo dular functions are also named monotone Cho quet capacities

Semigraphoid pro duced by a sup ermo dular function

One says that a disjoint triplet hA B jC i T N is represented in a supermodular function

m over N and writes A B j C m if hm u i The class of represented triplets

hAB jC i

then denes the model produced by m

m

M f hA B jC i T N A B j C m g

Two sup ermo dular functions over N are model equivalent if they represent the same class

of disjoint triplets over N

Remark This is to explain terminology I usually say that a mo del is induced by

a mathematical ob ject over N see Section for example by a probability measure

over N or by a graph over N see Chapter However in this chapter and in subsequent

chapters I need to distinguish two dierent ways of inducing formal indep endence mo dels

by imsets Both ways app ear to b e equivalent as concerns the class of obtained mo dels

see Consequence The problem is that some imsets eg zero imset or u in case

habji

N fa bg may induce dierent mo dels dep ending on the way of inducing To prevent

misunderstanding I decided to emphasize the dierence b oth in terminology induced

m

versus produced and in notation M versus M Regretably I have to confess that

u

the adjective induced was used in a former rep ort also for sup ermo dular functions

over N

Basic fact is this

Lemma A sup ermo dular function over N pro duces a disjoint semigraphoid over N

Pro of This follows easily from resp ective formulas for semielementary imsets and linear

ity of scalar pruduct Let m b e a sup ermo dular function over N For triviality prop erty

realize hm u i hm i for symmetry hm u i hm u i The formula

hAjC i hAB jC i hB AjC i

hm u i hm u i hm u i

hAB D jC i hAB jDC i hAD jC i

implies directly contraction To verify decomp osition and weak union use Observation

which says that b oth terms on the righthand side of are nonnegative

Typical example of a sup ermo dular set function is the multiinformation function in

tro duced in Section In fact Consequence on p says the following

Observation Given a probability measure P over N with nite multiinformation

the multiinformation function m is an standardized sup ermo dular function

P

One can conclude even more

Consequence Given a probability measure P over N with nite multiinformation

m

there exists an standardized sup ermo dular function m such that M M

P

Pro of Let us put m m The relation from Consequence says

P

A B j C P A B j C m for every hA B jC i T N

P

which implies the desired fact

Note that the value hm u i for a probability measure P and a disjoint triplet

P

hAB jC i

AB C

hA B jC i is nothing but the relative entropy of P with resp ect to the conditional

AC BC

pro duct of P and P see the pro of of Consequence on p This number

can b e interpreted in discrete case as a numerical evaluation of the degree of sto chas

tic conditional dep endence b etween A and B given C with resp ect to P Thus

given a sup ermo dular function m over N and hA B jC i T N the nonnegative value

hm u i could b e interpreted as a generalized degree of dep endence b etween A and B

hAB jC i

given C with resp ect to m Having in mind this p oint of view there is no reason to distin

guish b etween two sup ermo dular functions for which scalar pro ducts with semielementary

imsets coincide This motivated the next denition

Strong equivalence of sup ermo dular functions

One says that two sup ermo dular functions m and m over N are strongly equivalent if

hm u i hm u i for every hA B jC i T N

hAB jC i hAB jC i

Obviously m and m are then mo del equivalent Strong equivalence can b e equivalently

describ ed with help of a sp ecial class of functions namely the class of functions inducing

the maximal indep endence mo del T N A function l P N R is called modular if

l U V l U V l U l V for every U V N

The class of mo dular functions over N will b e denoted by LN Evidently LN K N

Observation The only standardized mo dular function is the zero function

Pro of Indeed supp osing that l T N R is standardized mo dular function one can show

by induction on jS j that l S for every S N This is evident in case jS j If jS j

then take u E N such that S ij K The fact hl u i says

hij jK i hij jK i

l S l iK l j K l K

and the righthand side of this equality vanishes by the induction hypothesis

Lemma A sup ermo dular function m pro duces T N i m LN Two sup er

mo dular functions m m over N are strongly equivalent i m m LN Every

sup ermo dular function is strongly equivalent to an standardized sup ermo dular function

fig

The class LN is a linear subspace of dimension jN j The functions m and m

for i N see p form its linear base

Pro of Clearly m P N R is mo dular i b oth m and m are sup ermo dular Hence

by Observation ii one has m LN i hm ui for every semielementary imset

m

u which means M T N On the other hand b ecause of linearity of scalar pro duct

two sup ermo dular functions m and m are strongly equivalent i hm m ui for

every semielementary imset u over N

Let m b e a sup ermo dular function over N The function

X

fig

me m m m fmfig mg m

iN

fig

is evidently standardized and sup ermo dular as m LN and m LN for i N

fig

Of course LN is a linear subspace Observe that m and m for i N are

linearly indep endent To show that they generate LN take m LN and introduce me

by means of the formula By Observation me

Remark Thus to have a clear view on quantitative dep endence structures pro

duced by sup ermo dular functions one should choose one representative from every class

of strong equivalence in a systematic way The choice should follow relevant mathemati

cal principles to have geometric insight one should do the choice linearly This can b e

P N

made as follows Take a linear subspace S N R such that S N LN fg and

P N

S N LN R Then every m K N can b e uniquely decomp osed m s l

where s S N l LN The fact LN K N and Observation implies

s K N Moreover s is strongly equivalent to m by Lemma and the function

s K N S N conicides for strongly equivalent functions m K N

However there is exibility in the choice of S N Fixing on a space S N satisfying

the ab ove requirements means that one restricts attention to this linear subspace and

represents the class of sup ermo dular functions K N by resp ective class of standardized

sup ermo dular functions K N S N In this work only three ways of standardization

are mentioned they are justied by some theoretical reasons Preferred standardization

using the linear subspace

P N

S N f m R mS whenever jS j g

is in concordance with the prop erty of multiinformation functions see p

Functions m K N S N are nondecreasing mS mT whenever S T see

Consequence In particular they are nonnegative

However from purely mathematical p oint of view another standardization which uses

the subspace

P N

S N f m R mS whenever jS j jN j g

u

is equally entitled This standardization can b e viewed as reection the of former

one since comp osition with the mapping S N n S S N on P N transforms

K N S N onto K N S N Thus this standardization leads to nonincreasing

u

standardized sup ermo dular function which are nonnegative as well

The third natural option is to take the orthogonal complement of LN

X X

P N

S N f m R mS and mS for i N g

o

S N S N iS

Note that every indep endence mo del pro duced by a sup ermo dular function is even pro

duced by a sup ermo dular imset see Consequence in Section Thus standardiza

tions of imsets mentioned in Section are just the standardization of sup ermo dular func

tions The letters u o distinguish the types of standardization standardization means

that the lower part of the resp ective diagram of an imset is vanished ustandardization

means that the upper part is vanished and ostandardization means that the resp ective

linear space is the orthogonal complement of LN

Skeletal sup ermo dular functions

m

A sup ermo dular function m over N will b e called skeletal if M T N but there is no

m r

sup ermo dular function r over N such that M M T N Thus a sup ermo dular

function is skeletal i it pro duces submaximal indep endence mo del The denition

implies that a sup ermo dular function which is mo del equivalent to a skeletal function

is also skeletal In particular strong equivalence has the same prop erty Of course mo del

equivalence decomp oses the collection of skeletal functions into nitely many equivalence

classes The aim of this section is to characterize these equivalence classes To have a

clear geometric view on the problem it is suitable to simplify the situation with help of

standardization mentioned in Remark

Introduce the class of standardized supermodular functions K N K N S N

Basic observation is that K N is a direct sum of K N and LN in notation K N

K N LN

Observation K N LN fg and every m K N has unique decomp osition

m me l where me K N and l LN

P

fig

f mfig m g m By Lemma l LN As Pro of Put l m m

iN

l LN K N by Observation me m l K N The facts l m and

l fig mfig for i N imply that me is standardized The uniqueness of the decomp osition

follows from Observation since LN K N LN S N

The following lemma summarizes substantial facts concerning K N for related con

cepts see Section

P N

Lemma The set K N is a p ointed rational p olyhedral cone in R In particular

it has nitely many extreme rays and every extreme ray of K N contains exactly one

nonzero normalized imset see p The set K N is a conical closure of this collection

of normalized imsets

Pro of To evidence that K N is a rational p olyhedral cone observe that it is the dual

P N P N

cone F fm R hm ui for u F g to a nite set F Q namely to

F E N f g f g

fig fig

iN

The fact that it is p ointed that is K N K N LN S N fg follows

from Observation All remaining statements of Lemma follow from wellknown

prop erties of p ointed rational p olyhedral cones gathered in Section Observe that

P N

every extreme ray of K N which contains a nonzero element of Q must contain a

P N

nonzero element of Z that is a nonzero imset But only one nonzero imset within

the ray is normalized

Skeleton

Let us denote by K N the collection of nonzero normalized imsets b elonging to extremal

rays of K N and call this set the skeleton over N It is empty in case jN j The

rst imp ortant observation concerning K N is the following one

Lemma An imset u over N is structural i it is ostandardized and hm ui for

every m K N

Pro of The necessity of the conditions follows from Observations and i For

P N

suciency supp ose that u Z is ostandardized and hm ui for any m K N

fig

The fact that u is ostandardized means that hm ui and hm ui for i N

Hence by Lemma derive that hl ui for every l LN The fact K N

conK N see Lemma implies that hm ui for every m K N Hence by

Observation K N K N LN implies hm ui for every m K N ie u

b elongs to the dual cone K N However K N was introduced as the dual cone E N

P N

in R see Observation iii This says u E N but E N is nothing but the

P N

conical closure conE N see Section Hence u conE N Z and by Fact

from Section u is a combination of elementary imsets with nonnegative rational

co ecients Therefore it is a structural imset see Section

The following consequence of Lemma will b e utilized later

Consequence Let u b e a structural imset over N and hA B jC i T N Then

A B j C u i r K N hr u i implies hr ui

hAB jC i

Pro of Since b oth u and v u are ostandardized w k u v is ostandardized

hAB jC i k

for every k N By Lemma w is structural i hr k u v i for every r K N

k

Thus by denition of M on p A B j C u i

u

k N r K N k hr ui hr v i

This clearly implies that

r K N hr v i hr ui

Conversely supp osing observe that r K N there exists k N such that

r

k hr ui hr v i for any k N k k Indeed owing to Observation k in case

r r

hrv i

in case hr v i As K N nite hr v i and k is the least integer greater than

r

hrui

one can put k max fk r K N g to evidence

r

An imp ortant auxiliary result is the following separation lemma

Lemma For every m K N there exists a structural imset u S N such that

hm ui and hr ui for any other r K N n fmg Moreover for every pair

m r K N m r there exists an elementary imset v E N such that hm v i

m r r m

and hr v i Consequently M n M M n M for distinct m r K N

Pro of By Lemma K N is a p ointed rational p olyhedral cone It can b e viewed as a

P N

cone in R where P N fT N jT j g Observe that this change of standp oint

do es not inuence the concept of extreme ray and skeleton One can apply Lemma from

P N

Section to the extreme ray generated by m The resp ective q Q can b e

P N

multiplied by a natural number to get u Z This integervalued function on P N

can b e extended to an ostandardized imset over N by means of the formulas

X X

ufig uS for i N u uS

SfigS SS

As every element of K N is standardized the obtained imset u satises the required

conditions it is a structural imset by Lemma The existence of v E N is a clear

P

consequence of the existence of u since n u k v for some k Z n N

v v

v E N

Indeed linearity of scalar pro duct and the fact hr ui implies that k and hr v i

v

for some v E N Moreover hm v i by Observation

However the main lemma of this section is the following prop osition

Lemma A function m K N is skeletal i it is nonzero function b elonging to an

extreme ray of K N

m

Pro of For necessity supp ose that m is skeletal Then m b ecause M T N By

Lemma write

X

r for some m

r r

N r K

Since m there exists r K N such that Linearity of scalar pro duct with

r

help of Observation says that hm ui implies hr ui for every semielementary

m r r

imset u over N Thus M M The fact r K N implies M T N by Lemma

m r

and Observation The assumption that m is skeletal forces M M By Lemma

r m

at most one r K N with M M exists Thus says m r for some

r

N and r K

r

For necessity supp ose that m b elongs to an extreme ray R of K N The fact

m

m implies by Lemma with help of Observation M T N Supp ose

m r

that r is a sup ermo dular function with M M The aim is to show that either

r m r

M M or M T N By Lemma r is strongly equivalent an standardized

sup ermo dular function Therefore assume without loss of generality r K N The

m r

assumption M M says hm ui hr ui for every semielementary imset

u over N Thus by Observation hr ui implies hm ui This means that

there exists k N with k hm ui hr ui Since the class of semielementary imsets

u u

is nite there exists k N such that k hm ui hr ui for every semielementary imset

u over N By linearity of scalar pro duct and Observation conclude that k m r is

sup ermo dular Since b oth m and r are standardized k m r K N The assumption

that R is extreme ray of K N and decomp osition k m k m r r implies that

r

r R Thus r m for If then r says M T N if then

r m

necessarily M M

Hence the desired characterization of mo del equivalence classes of skeletal imsets is

obtained

Consequence Every class of mo del equivalence of skeletal sup ermo dular functions

over N is characterized by unique element of skeleton K N b elonging to the class

Given m K N the resp ective equivalence class consists of functions

me m l where l LN

In particular every skeletal function is mo del equivalent to a skeletal imset and K N

characterizes all skeletal functions

Pro of Given a skeletal function r K N by Observation and Lemma nd unique

strongly equivalent skeletal function r K N and apply Lemma to nd m K N

and with r m The fact that m is the unique mo del equivalent element of the

skeleton follows from Lemma

Signicance of skeletal imsets

One of the main results of this chapter is the following theorem which explains the sig

nicance of the concept of skeleton

Theorem There exists the least nite set of normalized standardized imsets N N

such that for every imset u over N

u is structural u is ostandardized and hm ui for every m N N

Moreover N N is nothing but K N

Pro of Lemma says that K N is a nite set of normalized standardized imsets

satisfying Let N N b e any nite class of this type the aim is to show that

K N N N Supp ose for contradiction that m K N n N N By Lemma

there exists a structural imset u over N such that hm ui and hr ui for any other

r K N Basic observation is that hs ui for every s N N s

Indeed implies with help of Observation that s is sup ermo dular and there

P

fore s K N By Lemma write s r for Observe that

r r r

N r K

for some r K N n fmg since otherwise s m for as s and the fact

m m

that b oth s and m are normalized imsets implies s m which contradicts m N N

Hence by Observation

X

hr ui hr ui hs ui

r r

N r K

Further step is to take an ostandardized imset w over N such that hm w i and put

v k u w for every k N The inequality hm v i hm w i implies by Lemma

k k

that v is not a structural imset over N On the other hand for every s N N

k

i k hs ui hs w i one has hs ui and therefore there exists k N with hs v

s s k

s

fa b cg fa b cg

Q Q

Q Q

Q Q

Q Q

Q Q

Q Q

fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

fag fbg fcg fag fbg fcg

Q Q

Q Q

Q Q

Q Q

Q Q

Q Q

fa b cg fa b cg fa b cg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

fa bg fa cg fb cg fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

fag fbg fcg fag fbg fcg fag fbg fcg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Figure skeleton over N fa b cg

Since N N is nite there exists k N such that hs v i for every s N N By

k

the fact that v is ostandardized and derive that v is a structural imset which

k k

contradicts the conclusion ab ove Thus there is no m K N n N N and the desired

inclusion K N N N was veried

Remark The number of elements of skeleton K N dep ends on jN j In trivial case

jN j one has jK N j The simplest nontrivial case jN j is not interesting

b ecause jK N j then the cone K N consists of a single ray generated by in

N

that case However in case N fa b cg the skeleton has already imsets see Example

in Figure gives their list Thus in case jN j one needs to check only

inequalities to nd out whether an ostandardized imset is structural In case jN j

skeleton has imsets Hasse diagrams of ten basic types of skeletal imsets are in the

App endix of for the pro of see However the skeleton in case jN j was

found by means of a computer it has imsets see and

httpwwwutiacasczuser datastudenyvevarhtm

for a related virtual catalogue Note that the problem of suitable characterization of

skeletal imsets remains op en see Theme in Section

Remark The name skeleton was inspired by the idea that the collection of extreme

rays of K N can b e viewed as outer skeleton of the cone K N I used this word in

to name the least nite set of normalized imsets dening a p ointed rational p olyhedral cone

as its dual cone that is the skeleton in case of the conical closure E N Another p ossible

fa b cg fa b cg

Q Q

Q Q

Q Q

Q Q

Q Q

Q Q

fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

fag fbg fcg fag fbg fcg

Q Q

Q Q

Q Q

Q Q

Q Q

Q Q

fa b cg fa b cg fa b cg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

fa bg fa cg fb cg fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

fag fbg fcg fag fbg fcg fag fbg fcg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Figure uskeleton over N fa b cg

justication is that every indep endence mo del pro duced by a sup ermo dular function is

intersection of submaximal indep endence mo dels of this type see Theorem so that

skeletal CI mo dels form a certain generator of the whole lattice of CI mo dels pro duced

by sup ermo dular functions Note for explanation that some authors interested in graphical

mo dels use the word skeleton to name the underlying graph of a chain graph see

Section p

Remark As explained in Remark standardization is not the only p ossible way of

standardization of sup ermo dular functions An interesting fact is that all results gathered

in Section can b e achieved also for ustandardization and ostandardization Thus one

can introduce the uskeleton K N as uniquely determined nite collection of nonzero

u

normalized imsets b elonging to extreme rays of the cone of ustandardized sup ermo dular

functions K N Analogously the oskeleton K N consists of nonzero normalized

u

o

imsets from extreme rays of the cone of ostandardized sup ermo dular functions K N

o

The p oint is that extreme rays of K N K N and K N corresp ond to each other

u o

they describ e resp ective classes of mo del equivalent skeletal imsets

Thus given a skeletal imset m the corresp onding element of skeleton m can b e

obtained as follows Put

X

fig

me m m m fm mfigg m

iN

and then normalize me ie put m k me where k is the greatest common prime

divisor of fme S S N g see Figure for an example of m and the resp ective element

of the skeleton Given a skeletal imset m over N put

X

i mN mN n fig for i N x mN i

iN

introduce

X

fig

me m x m i m

u

iN

and normalize me to get the resp ective element of K N Figure shows uskeleton

u

u

for N fa b cg Finally given a skeletal imset m over N put

X X

i mS mS for i N

S N SiS

and

X X

y jS j mS jN j mS

S N S N

Then the formula

X

jN j fig

me m y m i m

o

iN

denes an ostandardized imset which after normalization yields the resp ective element

of the oskeleton K N Figure consists of Hasse diagrams of oskeletal imsets over

o

N fa b cg

Note that the pro of of result of Section for alternative standardization are analo

gous The only noteworthy mo dication is needed in the pro of of Lemma in case of

P N

ostandardization The cone K N is viewed as a cone in R and after application of

o

P N P N

Lemma from Section the resp ective q Q is multiplied to get u Z Then

the formula with u in place of m denes the desired ostandardized imset over N

Remaining arguments are analogous

Description of mo dels by structural imsets

Semigraphoid induced by a structural imset was introduced already in Section on p

The aim of this section is to relate those semigraphoids to semigraphoids pro duced

by supremo dular functions introduced in Section The rst observation is this

Observation Let m b e a sup ermo dular function over N and u a structural imset

m

over N Then hm ui i M M

u

Pro of Supp osing hm ui and hA B jC i M there exists k N such that k u u

u

hAB jC i

S N Write

k hm ui hm k ui hm k u u i hm u i

hAB jC i hAB jC i

By Observation b oth terms on the righthand side of this equality are nonnegative and

m

must vanish Thus hm u i which means hA B jC i M Conversely supp osing

hAB jC i

fa b cg fa b cg

Q Q

Q Q

Q Q

Q Q

Q Q

Q Q

fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

fag fbg fcg fag fbg fcg

Q Q

Q Q

Q Q

Q Q

Q Q

Q Q

fa b cg fa b cg fa b cg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

fa bg fa cg fb cg fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

Q Q Q Q Q Q

fag fbg fcg fag fbg fcg fag fbg fcg

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Q Q Q

Figure oskeleton over N fa b cg

P

m

k v for n N k Z For every v u E N M M write by n u

v v u

hij jK i

v E N

m

such that k observe hi j jK i M and deduce hm v i by M M In particular

v u u

X

k hm v i n hm ui

v

v E N

which implies hm ui

An imp ortant auxiliary result is the following

T

r m

Lemma Given a structural imset u over N one has M M M where

u

r R

X

r

r R f r K N M M g and m

u

r R

T

r

Pro of The fact M M is evident For converse inclusion use Consequence

u

r R

if hA B jC i T N n M then there exists r K N such that hr u i and

u hAB jC i

r r

hr ui By Observation M M Thus r R and hA B jC i M

u

T P

r m

The inclusion M M follows from the fact m r by linearity of scalar

r R r R

pro duct the converse inclusion can b e derived similarly with help of Observation

Substantial fact is the following prop osition

Consequence Given M T N the following four condition are equivalent

m

i M M for a sup ermo dular function m over N

ii M M for a combinatorial imset u over N

u

iii M M for a structural imset u over N

u

m

iv M M for a sup ermo dular standardized imset m over N

P

Pro of For iii put u u As a combination of semielementary

hAB jC i

hAB jC iM

imsets is u a combinatorial imset For every hA B jC i M observe that u u

hAB jC i

is a combinatorial imset and therefore A B j C u Thus M M For converse

u

implication observe hm ui and use Observation The implication iiiiv is an

easy consequence of Lemma iiiii and ivi are evident

Now the main result of this chapter can b e easily derived

Theorem Let P be probability measure over N with nite multiinformation Then

there exists a structural imset u over N such that P is perfectly Markovian with respect

to u that is M M

P u

Pro of By Consequence on p there exists a sup ermo dular function m over N such

m m

that M M By Consequence M M for a structural imset u over N

P u

Remark Going back to the motivation account from Section Theorem means

that structural imsets solve satisfactorily theoretical question of completeness The answer

is armative every CI structure induced by a probability measure with nite multiinfor

mation can b e describ ed by a structural imset On the other hand natural price for this

achievement is that structural imsets describ e some sup eruous semigraphoids That

means there are semigraphoids induced by structural imsets which are not induced by

discrete probability measures as Example on p shows the lefthand picture of Fig

ure depicts the resp ective structural imset In particular another theoretical question

of faithfulness from Section has negative answer

However mathematical ob jects which answer armatively b oth faithfulness and

completeness question are not advisable b ecause they cannot solve satisfactorily prac

tical question of implementation see Section p These ob jects must b e dicult

to handle by a computer as the lattice of probabilistic CI mo dels is quite complicated

For example in case of variables there exists meetirreducible mo dels which are not

coatoms submaximal mo dels which makes implementation complicated The

asset of structural imsets is that the lattice of mo dels induced by them is fairly elegant

and gives a chance of ecient computer implementation

Galois connection

The relation of b oth metho ds of description of CI mo dels mentioned in this chapter can

b e lucidly explained with help of the view of theory of formal concept analysis This

approach developed in is a sp ecic application of theory of complete lattices on

conceptual data analysis and knowledge pro cessing Because of its philosophical ro ots

formal concept analysis is very near to human conceptual thinking The most imp ortant

mathematical notion b ehind this approach is a wellknown notion of Galois connection

This view helps one to interpret the relation of structural imsets and sup ermo dular imsets

functions as a duality relation I hop e that presentation with help of Galois connection

will make theory of structural CI mo dels easy understandable for readers

Formal concept analysis

Let me recall basic ideas of Chapter of Formal context consists of the following

items

the set of objects

the set of attributes

binary incidence relation b etween ob jects and attributes

If x y for x y then write x y and say that the ob ject x has the attribute

y In general Galois connection is dened for a pair of p osets Section of Chapter

IV However in treated sp ecial case Galois connection can b e introduced as a pair of

mappings b etween p ower sets of and which are p osets with resp ect to inclusion

X X fy x y for every x X g

Y Y fx x y for every y Y g

Thus X is the set of attributes common to ob jects in X while Y is the set of ob jects

which have all attributes in Y Clearly X X implies X X and Y Y implies

Y Y The consequence is that the mapping X X is a closure op eration on

subsets of and the mapping Y Y is a closure op eration on subsets of

By a formal concept of the context is understo o d a pair X Y with X

Y X Y and Y X The set X is called the extent and the set Y the intent of the

concept Observe that the concept is uniquely determined either by its extent that is the

list of ob jects forming the concept or by its intent which is the list of attributes prop

erties which characterize the concept It reects two dierent philophicalmetho dological

ways of dening concepts constructive and descriptive denitions

One says that the concept X Y is a subconcept of the concept X Y and writes

X Y X Y if X X Basic prop erties of Galois connection and the denition

of notion formal concept implies that X X i Y Y Thus the class of all

concepts of a given context is a p oset ordered by the relation In fact it is a

complete lattice see Theorem in Chapter of where sup ermum and inmum of

two concepts are dened as follows

X Y X Y X X Y Y

X Y X Y X X Y Y

The lattice is called the concept lattice

Remark Note that it follows from the prop erties of Galois connection that the ab ove

mentioned concept lattice is orderisomorphic to the p oset fX X X g ordered

by inclusion Thus the lattice can b e describ ed only in terms of ob jects with help of

the closure op eration X X on subsets of However for analogous reason the same

concept lattice is orderisomorphic to the p oset fY Y Y g ordered by reversed

inclusion This means that the lattice can b e describ ed dually in terms of attributes

and the resp ective closure op eration Y Y as well this closure op eration induces

ordinary inclusion ordering on fY Y Y g see Section Thus the same

Y X

z

t t t t t t t

attributes

t t t t t

t t t t t

X

t t t t t t

Y

t t t t t

t t t

ob jects

incidence relation

Figure Galois connection informal illustration

mathematical structure can b e describ ed from two dierent p oints of view in terms of

ob jects or in terms of attributes This again corresp onds to two dierent metho dological

metho ds how to describ e relations b etween concepts The message of Section is that

the relation b etween description of CI mo dels in terms of structural imsets and in terms

of sup ermo dular functions is just the relation of this kind On the other hand the role of

ob jects and attributes in a formal context is evidently exchangeable See Figure for

illustration

Lattice of structural mo dels

Let us introduce the class U N of structural independence models

U N fM T N M M for a structural imset u over N g

u

Consequence implies that it coincides with the class of formal indep endence mo dels

pro duced by sup ermo dular functions

m

U N fM T N M M for a sup ermo dular function m over N g

The class U N is naturally ordered by inclusion The main result of this section

says that U N is a nite concept lattice Indeed the resp ective formal context can b e

constructed as follows

E N K N and u m i hm ui for u E N m K N

Figure gives an example of this formal context in case N fa b cg The following

theorem summarizes the results

Theorem The poset U N is a nite concept lattice which is moreover both

atomistic and coatomistic The nul l of U N is T N the model induced by zero struc

tural imset The atoms of U N are just the models induced by elementary imsets M

v

v E N The coatoms of U N are just the models produced by skeletal supermodular

m

functions M m K N The unit of U N is T N the model produced by any

modular set function l LN

K N

N N ab ac bc N ab N ac N bc

t t t

u

hbcjai

t t t

u

hacjbi

t t t

u

habjci

t t t

u

habji

t t t

u

hacji

t t t

u

hbcji

E N

Figure Formal context in case N fa b cg

Pro of The rst observation is that U N is a complete lattice Indeed it suces to show

that every subset of U N has inmum see Section Let us use for this pur

P

n

m denes p ose Given sup ermo dular functions m m n the function m

i n

i

T

n

m m

i

a sup ermo dular function such that M M This follows from Observation

i

Inmum of the empty subset of U N is T N see Lemma

The second observation is that fM v E N g is supremumdense in U N

v

M U N M sup fM u Qg where Q fu E N hi j jK i Mg

u

hij jK i

Indeed evidently M M for every v Q cf Lemma On the other hand

v

supp osing K U N satises M K for every v Q every elementary indep endence

v

statement from M b elongs to K Then by Lemma conclude M K which implies

m

The third observation is that fM m K N g is inmumdense in U N

m r

M U N M inf fM m Rg where R fr K N M M g

T

m

Indeed by one can apply Lemma to M and observe that M M

mR

This clearly implies To show that U N is even a concept lattice one can

use Theorem in Chapter of The theorem says that to show that U N is order

isomorphic to the concept lattice of a formal context it suces to show that

there exist mappings U N and U N such that

a is supremumdense in U N

b is inmumdense in U N

c u m u m for every u m

Let us introduce the formal context by means of and dene and as follows

m

ascrib es M to every v E N see p and ascrib es M to every m K N

v

see p The condition a follows from the condition b from and the

condition c follows directly from Observation Thus U N is a concept lattice

The next observation is that T N is the null of U N By Observation T N

U N by Lemma it is a semigraphoid and therefore T N M

To show that every M v E N is an atom of U N observe by M U N

v v

and assume M U N M M If v u then by Lemma obtain M

v hij jK i v

fhi j jK i hj ijK ig T N As M is a semigraphoid T N M If M T N then

either hi j jK i or hj ijK i b elongs to M which implies by symmetry prop erty M M

v

The ab ove mentioned fact implies with help of that U N is an atomistic lattice

The fact that T N is the unit of U N is evident T N U N by Lemma To

s s

show that every M s K N is a coatom of U N observe M U N by and

T

s r

assume M U N M M By and Lemma write M M where

r R

r s r

R fr K N M M g If R n fsg then M M M for some r K N

s r

r s which contradicts the fact M n M implied by Lemma Therefore R fsg

s

if R then M T N if R fsg then M M The ab ove fact implies together

with that U N is a coatomistic lattice

To evidence that fM v E N g are all atoms of U N realize that every atom is join

v

irreducible and use the wellknown fact that every supremumdense set must contain all

joinirreducible elements see Section in Chapter of Indeed fM v E N g

v

m

N g are is supremumdense in U N by Analogously the fact that fM m K

all coatoms of U N follows from and the fact that every inmumdense subset

must contain all meetirreducible elements in particular coatoms

Remark However the formal context is not the only option For example one

can alternatively take combinatorial imsets in place of and combinations of skeletal

imsets with nonnegative integer co ecients in place of but the incidence relation is

dened in the same way like in The second option is to put S N and

P N

K N Z see Figure for illustration The third option is con E N

and K N Moreover one can consider alternative standardization instead of

standardization see Remark Sp ecic combined option is E N and K N

On the other hand the formal context is distinguished in a certain sense The

orem implies that every ob ject of denes a joinirreducible concept and every

attribute of denes a meetirreducible concept Thus the context is re

duced in sense of Denition Chapter in Formal context of this type is unique

for a given nite concept lattice up to resp ective isomorphism of formal contexts see

Prop osition in Chapter of

Another p oint of view on the lattice U N is the following It orderisomorphic

to the class of structural imsets S N factorized with resp ect to corresp onding facial

equivalence see Chapter This understanding is suitable from computational p oint of

view since the op eration of supremum in the lattice corresp onds to summing of structural

imsets see Consequence in Section A dual p oint of view is also p ossible elements

P N

of K N Z factorized with resp ect to the corresp onding equivalence can b e taken

into consideration The following observation says that inmum is realizable by means of

summing sup ermo dular functions imsets

Observation Let R b e a nite set of sup ermo dular functions over N Then

X

m m r

inf fM m Rg M M where r m

mR mR

r m r

Pro of To show M M for m R take hA B jC i M write hr u i

hAB jC i

P

hm u i and use Observation ii The same equality can b e used to show that

hAB jC i

mR

s m

for every sup ermo dular function s over N the requirement M M for m R implies

s r

M M

Remark The lattice U N is also orderisomorphic to a face lattice see Section

in namely the lattice of faces of a certain p olyhedral cone For example one

P N

can take the cone conE N R Another option is the cone K N endowed with

reverse inclusion resp ectively the cones K N K N In fact original terminology

o

from was motivated by this p oint of view see Remark on p

Example The lattice U N has only element in case jN j namely T N

T N and only elements in case jN j namely T N and T N However it has

elements in case N fa b cg the resp ective Hasse diagram is shown in Figure

Every no de of the diagram contains a schematic description of the resp ective indep endence

mo del with help of elementary indep endence statements Note that Figure also shows

the lattice of semigraphoids over fa b cg as they coincide with structural indep endence

mo dels in this case In fact every structural mo del over fa b cg is even a CI mo del The

number of structural mo dels over variables is

Observation Let u b e a structural imset over N with jN j Then there exists a

discrete probability measure P over N such that M M

u P

Pro of This is an easy consequence of the fact that U N is coatomistic see Theorem

and Lemma from Section Six resp ective constructions of p erfectly Markovian

measures for the unit and coatoms of U N see Figure were already given see Observation

Observation Example and Example

Remark This is to explain the relation of this theory to p olymatroidal description

of CI mo del used in Polymatroid is dened as a nondecreasing submo dular function

h P N R such that h Some p olymatroids can b e obtained as multiples

of entropy functions of discrete measures relative to the counting measure mentioned in

Remark see The formal indep endence mo del induced by a p olymatroid h consists

of hA B jC i T N such that hh u i Since h is a sup ermo dular function

hAB jC i

there is no dierence b etween the mo del induced by a p olymatroid h and the mo del

pro duced by the sup ermo dular function h Thus mo dels induced by p olymatroids are

just the structural indep endence mo dels There is an onetoone corresp ondence b etween

certain p olymatroids and ustandardized sup ermo dular function see x in c b a j j j b c c a a b j j j b c c b a a

Figure Concept lattice of CI mo dels over N fa b cg rotated

Chapter

Markov equivalence

This chapter deals with implication and equivalence problem for structural imsets First

the question how to understand the concept of Markov equivalence and implication is

discussed and two types of equivalence are compared The rest of the chapter is devoted

to the stronger type of equivalence called the facial equivalence and to the resp ective

implication b etween structural imsets Two characterization of facial implication which

are analogous to graphical characterization of Markov equivalence mentioned in Chapter

are given and related implementation tasks are discussed

Two concepts of equivalence

Basically there are two dierent ways of dening the concept of Markov equivalence for

graphs which app ear to b e equivalent in case of classic graphical mo dels eg UG mo dels

DAG mo dels and CG mo dels see Sections The rst option is distribution equiv

alence which is the requirement that the classes of Markovian measures over N within a

certain distribution framework coincide By a distribution framework is understo o d a class

of probability measures over N Thus distribution equivalence is always understo o d

relative to a distribution framework

The second option is model equivalence which is the requirement that the induced

formal indep endence mo dels coincide This type of equivalence is not related to a dis

tribution framework Clearly b ecause of the denition of Markovian measure mo del

equivalence implies distribution equivalence The converse is true in case of faithfulness

see Section p That is if a p erfectly Markovian measure within the considered

distribution framework exists for every graph from the resp ective class of graphs

then distribution equivalence relative to implies mo del equivalence This is the case of

classic chain graphs relative to the class of discrete measures see Section and the

case of alternative chain graphs relative to the class of nondegenerate Gaussian measures

see Section Nevertheless distribution and mo del equivalence coincide even under

weaker assumption that the considered class of measures is p erfect for every graph see

Remark on p On the other hand if the distribution framework is somehow lim

ited then it may happ en that mo del and distribution equivalence dier For example it

happ ens in case that the class of measures with prescrib ed onedimensional marginals

P on xed measurable spaces X X i N is considered and P is a degenerated mea

i i i k

sure for every k M M N which means that P A f g for every A X Then

k k

i j j K P for every P and hi j jK i E N with i M use Lemma and

one can show that all undirected graphs over N which have the same induced subgraph

for N n M are distribution equivalent

Remark Note that one may consider even the third type of equivalence of graphs

namely the parametrization equivalence This approach is based on the following interpre

tation of some types of graphs eg ancestral graphs and jointresponse chain graphs

Every edge of a graph of this type represents a real parameter a sp ecic distribution

framework usually the class of nondegenerate Gaussian measures over N is consid

ered and every collection of edgeparameters determines uniquely a probability measure

from factorized in a particular way Every graph of this type is then identied with

the class of parametrized distributions which often coincides with the class of Markovian

distributions within eg in case of maximal ancestral graphs Two graphs can b e

called parametrization equivalent if their classes of parametrized distributions coincide Of

course parametrization equivalence substantially dep ends on the considered distribution

framework and may not coincide with distribution Markov equivalence for example in

case of general ancestral graphs

The mentioned p oint of view motivates a general question whether some structural

imsets may lead to a sp ecic way of parametrization of the corresp onding class of Marko

vian distribution see Direction in Chapter

However in usual situations distribution and mo del equivalence coincide which means

that the concept of Markov equivalence of graphs is unambiguously dened Then the task

to characterize Markov equivalence in graphical terms is correctly set Several solutions

of this general equivalence question see Section p were exemplied in Chapter

The aim of this chapter is to examine the same equivalence question for structural

imsets The problem is that in case of structural imsets one has to distinguish two ab ove

mentioned types of Markov equivalence and choose one of them as a basis of further study

Facial and Markov equivalence

Two structural imsets u v over N are facial ly equivalent if they induce the same CI mo del

ie M M Then one writes u v Let b e a class of probability measures over N

u v

and u denotes the class of Markovian measures with resp ect to u relative to

u fP A B j C P whenever hA B jC i M g

u

Two structural imsets u and v over N are Markov equivalent relative to if u v

The following observation is evident

Observation Facially equivalent structural imsets are Markov equivalent relative

to any class of probability measures over N

Clearly Markov equivalence relative to implies Markov equivalence relative to any

e

sub class Natural question is whether the converse of Observation holds for

a reasonable class The answer is negative even for the class of marginally continuous

measures which involves the widest class of measures for which the metho d of structural

imsets is safely applicable namely the class of measures with nite multiinformation see

Section This is illustrated by the following example

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Two Markov equivalent structural imsets which are not facially equivalent

Example There exist two structural imsets over N fa b c dg which are Markov

equivalent relative to the class of marginally continuous probability measures over N but

which are not facially equivalent Consider the imsets see Figure

u u u u u and v u u

hcdjfabgi habji habjfcgi habjfdgi habjfcdgi

Clearly M M but ha bjfc dgi M n M as shown in Example On the

u v v u

other hand by Consequence every marginally continuous measure P over N which is

Markovian with resp ect to u satises a b j fc dg P Hence one can show that P is

Markovian with resp ect to u i it is Markovian with resp ect to v

In fact the ab ove mentioned phenomenon is a consequence of the fact that structural

imsets do not satisfy the faithfulness requirement from Section see Remark on p

However in case jN j every structural imset has a discrete p erfectly Markovian

measure over N see Observation In particular if jN j then facial equivalence

coincides b oth with Markov equivalence relative to the class of discrete measures and

with Markov equivalence relative to the class of measures with nite multiinformation

use Observation

In the rest of this chapter attention is restricted to facial equivalence and related

facial implication One reason is that facial implication is not adulterated by considering

a sp ecic class of distributions Therefore one has a b etter chance that the resp ective

deductive mechanism can b e implemented on a computer Moreover in my opinion

facial equivalence represents pure theoretical basis of Markov equivalence Indeed it will

b e shown later see Lemma that for a reasonable distribution framework every

Markov equivalence class relative to decomp oses into facial equivalence classes and just

one of these classes consists of representable structural imsets that is imsets having

p erfectly Markovian measures in Thus to describ e CI structures arising within one

can limit oneself to structural imsets of this type and facial equivalence on the considered

sub class of stuctural imsets coincides with Markov equivalence relative to

Facial implication

Let u v are structural imsets over N One says that u facial ly implies v and writes u v

if M M Observe that u is facially equivalent to v i u v and v u

v u

Remark This is to explain the motivation of ab ove terminology The adjective fa

cial was used already in to name the resp ective deductive mechanism for structural

imsets This was motivated by an analogy with the theory of convex p olytop es where

the concept of face has a central role Indeed one can consider the collection of all

faces of the cone con E N and introduce the following implication of structural imsets

u implies v if every face of conE N which contains u contains also v The original de

nition of facial implication of structural imsets used in was nothing but mo dication

of this requirement It app eared to b e equivalent to the condition M M one can

v u

show this using the results from although it is not explicitly stated there

Direct characterization of facial implication

Lemma Let u v are structural imsets over N Then u v i

l N l u v is a structural imset

which is under assumption that v is a combinatorial imset equivalent to the requirement

k N k u v is a combinatorial imset

P

Pro of Supp ose u v and write n v k w where n N k Z If

w w

w E N

k and w u then hi j jK i M M Thus there exists l N such that

w hij jK i v u w

P

l u w S N Put l k l and observe that

w w w

w E N k

w

X

l u v l u n v n v k l u w n v S N

w w

w E N k

w

since S N is closed under summing Thus was veried Conversely supp ose

and consider hA B jC i M Find k N such that k v u S N As S N is

v hAB jC i

closed under summing conclude

k l u u k l u v k v u S N

hAB jC i hAB jC i

which implies hA B jC i M

u

Evidently implies On contrary supp ose and that v is a combinatorial

imset Take n N such that n l u v is combinatorial and put k n l As v C N

and C N is closed under summing k u v n l u v n v is a combinatorial

imset

Remark Basic dierence b etween and is that testing whether an o

standardized imset is combinatorial is decidable in nitely many steps and the number of

these steps is known Indeed if an ostandardized imset w k u v is combinatorial

then the degree deg w can b e directly computed by Observation It is the number of

elementary imsets which have to b e summed to obtain w The only combinatorial imset of

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Structural imsets u and u v from Example

degree is the zero imset and an imset w with deg w n N is combinatorial i there

exists an elementary imset u such that w u is a combinatorial imset of degree

hij jK i hij jK i

n Since the class of elementary imsets is known testing can b e done recursively

Note that one can mo dify the pro of of Lemma to show that is equivalent to

u v even in case that v is a structural imset such that

k n N n v and n k v are combinatorial imsets

The condition is formally weaker than the requirement that v is a combinatorial

imset However their dierence may app ear to b e illusory So far I do not know an

example of a structural imset which is not a combinatorial imset see Question

A natural question is how big the number l N from could b e The following

example shows that it may happ en that l

Example There exists a combinatorial imset u over N fa b c dg and a semi

elementary imset v such that u v is a structural imset and therefore u v but

u v is not a structural imset Put

u u u u u u u v u

habji hacji hcdjbi hbdjci hadjbci hbcjadi habcdji

and observe that

u v u u u u u

habcdji habji hacji hadji

u u u

hcdjbi hbdjci hbcjdi

u u u

hbcjadi hbdjaci hcdjabi

is a combinatorial imset see Figure for illustration and therefore u v To see that

u v is not a structural imset see the lefthand picture of Figure for illustration con

sider the multiset m shown in the righthand picture of Figure It is a sup ermo dular

multiset by Observation iii As hm u v i the imset u v is not a structural

imset by Observation i

On the other hand one has u w for an elementary imset w u Indeed one

hadji

has

u w u u u u

habcji hbcjdi hbdjaci hcdjabi

which means that the constant l from can b e lower in this case

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure The imsets u v and the sup ermo dular multiset m from Example

Remark Example shows that verication whether a semielementary imset is

facially implied by a structural imset requires multiplication of the structural imset by

at least Note that later Consequence in Section can b e mo died by replacing

the class of elementary imsets by the class of semielementary imsets to get an upp er

estimate of the constant in in this case One can show using the results of that

in case jN j

max f hr w i r K N w is a semielementary imset over N g

which implies then that is the minimal integer l satisfying

u S N v semielementary imset over N u v i l u v S N

Note that one has l in case jN j for the same reason

The following consequence of Lemma was already announced in Section

Consequence Let Q b e a nite set of structural imsets over N Then

X

sup fM u Qg M for v u

u v

uQ

where the supremum is understo o d in the lattice U N

Pro of To show M M for u Q take hA B jC i M nd k N such that

u v u

k u u S N and write

hAB jC i

X

k v u k w k u u S N

hAB jC i hAB jC i

w Qnfug

To show for every structural imset w over N that the assumption M M for u Q

u w

implies M M use Lemma Indeed the assumption means that l N with

v w u

P

l w u S N exists for every u Q Put l l and observe l w v

u u

uQ

P

l w u S N

u

uQ

Remark The denition of facial implication can b e extended as follows A nite set

of structural imsets Q facially implies a structural imset w write Q w if M M

w

S

for every structural indep endence mo del M such that M M However by

u

uQ

Consequence this condition is equivalent to the requirement M sup M M

w u v

uQ

P

where v u Thus the extension of facial implication of this type is not needed

uQ

b ecause it is covered by the current denition of facial implication

Skeletal characterization of facial implication

Lemma Let u v are structural imsets over N Then u v i

m K N hm v i hm ui

which is equivalent to the condition

hm v i hm ui for every sup ermo dular function m over N

Moreover the condition is also equivalent to the requirement

N l u v S N whenever l N such that l hr v i for every r K

Pro of Evidently Conversely if then observe by Lemma and Ob

servation i that hm v i implies hm ui for every standardized sup ermo dular

function m K N However every sup ermo dular function is strongly equivalent to a

function of this type by Lemma which means that holds

By Lemma u v i the condition holds However by Lemma it is

equivalent to the condition

l N m K N l hm ui hm v i

which implies The next step is to show that implies

m K N l hm ui hm v i for l N from

Indeed if m K N such that hm v i then l hm ui hm v i by Observation

iii If m K N such that hm v i then implies hm ui However as

b oth m and u are imsets hm ui Z and therefore hm ui and the assumption ab out

l implies l hm ui l hm v i The condition then implies l u v S N by

Lemma Thus Since K N is nite l N satisfying the requirement

from exists which means that implies u v by Lemma

The role of the way of standardization of sup ermo dular functions is not substantial in

the ab ove result One can easily derive an analogous result with the uskeleton resp ectively

with the oskeleton in place of the skeleton by a similar pro cedure see Remark

It follows from Lemma that one has u u for a structural imset u over

hAB jC i

N and hA B jC i T N i hA B jC i M Therefore Lemma can b e viewed as

u

an alternative criterion of testing whether a disjoint triplet over N is represented in a

structural imset over N Note that Lemma is suitable in the situation one wants

to conrm the hypothesis that u v while Lemma namely the conditions

and is suitable in the situation one wants to disprove u v This is illustrated

by Example b elow Well the relation of these two criteria of facial implication is

analogous to the relation of moralization and dseparation criteria in case of DAG mo dels

see Section

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Strongly equivalent elements of the uskeleton and the skeleton

Example Supp ose N fa b c dg consider the combinatorial imsets u and v from

in Example and a semielementary imsets w u The fact u v was

hbacdji

verifed in Example using direct characterization of facial implication namely by means

of the condition of Lemma with k To disprove u w consider sup ermo dular

fag

imset m see p which is shown in the lefthand picture of Figure Observe

fag fag

hm w i hm ui and apply Lemma the condition to see that

fag

u w Note that m b elongs to the uskeleton K N and the corresp onding

u

element of the skeleton see Remark is in the righthand picture of Figure

An easy consequence of Lemma is the following criterion of facial equivalence of

structural imsets

Consequence Let u v b e structural imsets over N Then u v i

m K N hm ui i hm v i

which is equivalent to the condition that hm ui hm v i for every sup ermo dular

function m over N

Note that the skeletal criteria of testing facial implication and equivalence are eective

in particular in case jN j since jK N j is small in this case see Remark They are

still implementable in case jN j a computer program which realizes facial implication

of elementary imsets over veelement set can b e found at

httpwwwutiacasczuser datastudenyvevarhtm

As the skeleton is not at disp osal in case jN j the only available criterion in that

case is the criterion from Lemma

Adaptation to a distribution framework

Let us consider a class of probability measures over N a distribution framework which

satises the following two conditions

for every P there exists a structural imset u over N such that M M

u P

for every pair P Q there exists R such that M M M

R P Q

There are at least three examples of distribution frameworks satisfying these two natu

ral conditions the class of measures with nite multiinformation the class of discrete

measures and the class of p ositive discrete measures see Theorem and Lemma

The goal of this section is to show that after suitable restriction of the class of structural

imsets facial equivalence and Markov equivalence relative to coincide

A structural imset u over N is representable in shortly representable if there

exists P which is p erfectly Markovian with resp ect to u ie M M Evidently

u P

every structural imset which is facially equivalent to a representable structural imset

is representable as well The class of representable structural imsets over N will b e

denoted by S N

Lemma Let b e a class of probability measures over N satisfying and

and u S N Then the class of structural imsets Markov equivalent to u relative to

is the union of a nite collection f of facial equivalence classes ordered by relation

Moreover the p oset f has the greatest element which is the only class of facial

equivalence f consisting of representable imsets

Pro of The rst claim of the lemma follows easily from Observation Let us put

T

M M where u is dened by and

P

P u

fP A B j C P whenever hA B jC i M g

The inclusion u follows directly from the denition of M The fact M M

u

implies u and therefore u As T N is nite the set fM P ug

P

is also nite and one can show by rep etitive application of the assumption that

R u such that M M exists By a structural imset v with M M M

R v R

exists which means v Thus u and v are Markov equivalent relative to As

R is p erfectly Markovian with resp ect to v the imset v is representable

Supp ose that w S N such that w u and observe that

M M M M M

w P P v

P w P u

Thus v w and the class of imsets facially equivalent to v is the greatest element of

f If w is representable then Q with M M exists and Q w u

w Q

which implies M M M M Hence M M M which says w v

Q w w v

Remark Note that f is even a join semilattice Indeed v w w and

w v implies w v for v w w S N Hence u w v for

structural imsets u w with u w v where v S N b elongs to the greatest

element of f mentioned in Lemma By Consequence M M M which

uw u w

means that u w represents the join of u and w in f On the other hand f

need not b e closed under the op eration of meet in the lattice of structural imsets

It may happ en that f consists of one class of facial equivalence only This means

that the class of imsets Markov equivalent to u coincides with the class of imsets facially

equivalent to u For example this phenomenon is quite common in case jN j for the

class of discrete measures over N one has Markov equivalence classes and

facial equivalence classes then

The following fact immediately follows from Lemma

Consequence Let b e a class of probability measures over N satisfying and

Consider the collection of representable structural imsets S N over N Facial

and Markov equivalence relative to coincide for imsets from S N

In the considered case the class S N satises b oth the requirements of faithfulness

and the requirement of completeness relative to the class of CI structures arising within

which were mentioned in Section Thus a theoretical solution of those problems is

at disp osal but a practical question how to recognize imsets from S N remains to b e

solved then

Remark The idea of implementation of resp ective deductive mechanism on a com

puter is as follows Except usual algebraic op erations with structural imsets one needs to

implement an additional op eration which ascrib es the resp ective representable struc

tural imset v S N to every structural imset u S N Supp ose that u u n

n

are structural imsets which represent input pieces of information ab out CI structure in

duced by an unknown distribution P which is known to b elong to a given distribution

framework a sub class of the class of measures with nite multiinformation which sat

P

n

u then represents aggregated information ab out CI ises The sum u

i

i

structure of P But within the considered distribution framework over more can b e de

duced one should nd the resp ective v S N which represents necessary conclusions

of input pieces of information ab out CI structure of any P

Nevertheless p ossible inherent complexity of the problem of description of the lattice of

CI structures arising within cannot b e avoided Indeed implementation of the op eration

ascribing resp ective v S N to every u S N may app ear to b e complicated see

Remark for analogous consideration Hop efully the presented approach helps to

decomp ose the original problem prop erly

Testing facial implication

This section deals with implementation tasks connected with direct characterization of

facial implication

Testing structural imsets

The rst natural question is how to recognize a structural imset One p ossible metho d

is given by Theorem but as explained in Remark that metho d is not feasible

in case jN j Thus only the direct denition of structural imset is available in

general Therefore one needs to know whether the corresp onding pro cedure is decidable

As explained in Remark testing of combinatorial imsets is quite clear One needs to

know whether the natural number by which a structural imset must b e multiplied to get

a combinatorial imset is somehow limited

Lemma There exists n N such that

imset u over N u S N i n u C N

Pro of One can apply Theorem from with says that every p ointed rational

n

p olyhedral cone C R n has a unique minimal integral Hilbert basis generating

n

C that is minimal nite set B Z such that

X

n

x C Z x k y for some k Z

y y

y B

and con B C which implies B C One can apply this result to the rational

P N

p olyhedral cone conE N R which is p ointed by Observation as hm ti

for every nonzero t conE N Moreover by Fact from Section an imset u over

N b elongs to con E N i it is structural Thus a nite set of structural imsets H N

exists such that

X

u S N u k v for some k Z

v v

v HN

One can nd nv N for every v H N such that nv v is a combinatorial imset and

Q

put n nv Clearly n v C N for every v H N and implies that

v HN

n u C N for every u S N

Natural question is what is the minimal n N satisfying I do not know the

answer in case jN j see Theme on p But if jN j then n let me

formulate as a separate observation the main result of

Observation If jN j then the class of structural imsets over N coincides with

the class of combinatorial imsets over N

Remark The least n N satisfying may app ear to b e to o high Alternative

approach to direct testing structural imsets could b e based on the concept of minimal

integral Hilb ert basis H N mentioned in the pro of of Lemma see Theme It

follows from the pro of of Theorem of that H N has the form

H N fv S N v v v v where v v S N v v g

The fact that every elementary imset generates an extreme ray of con E N allows to

derive E N H N Of course H N E N if jN j by Observation The

idea is to characterize H N in general Then every imset u over N can b e eectively

tested whether it can b e written as a combination of imsets from H N with nonnegative

integral co ecients Indeed one can mo dify the pro cedure describ ed in Remark

Grade

Another natural question arising in connection with Lemma is whether there exists

l N such that

u S N v E N u v i l u v S N

The answer is yes Evidently if l N satises then every l N l l satises

it as well Therefore one is interested in minimal l N satisfying which app ears

to dep end on N Actually it dep ends on jN j only b ecause of inherent onetoone cor

resp ondence b etween E N and E M resp ectively b etween S N and S M for sets of

variables N and M of the same cardinality The following number is a go o d candidate

for the minimal l N satisfying

Supp osing jN j let us call the grade denoted by graN the natural number

graN max f hr w i r K N w E N g

Evidently graN dep ends on jN j only Lemma the condition implies this

Consequence If jN j then l graN satises

Consequence leads to an eective criterion of testing facial implication of elemen

tary imsets in case jN j which utilizes the fact that structural and combinatorial

imsets coincide in this case

Consequence Supp ose that jN j u is a structural imset over N and v an

elementary imset over N Then u v i u v is a combinatorial imset

Pro of The rst observation is that if jN j then hm v i f g for every m K N

and v E N see Thus graN and by Consequence one has u v i

u v S N which is equivalent to u v C N by Observation

However as shown in graN in case jN j In fact an example from

Section of shows that the minimal natural number l for which holds is

just in case jN j The question what is the minimal l N satisfying cf p

is partially answered by the following lemma

Lemma Supp ose that jN j Then the minimal l N satisfying

u C N v E N u v i l u v S N

is the upp er integer part of

max fhm w i w E N g

gra N max

N mK

min fhm w i w E N hm w i g

Pro of To show that every l N with l gra N satises the pro cedure from

the pro of of Lemma can b e used The only mo dication is that in case m K N

with hm ui the fact u C N implies hm ui min fhm w i w E N hm w i g

which allows to write

l hm ui gra N min hm w i max hm w i hm v i

w E N hmw i w E N

To show that for every l N with l gra N there exists a combinatorial imset u and

N for an elementary imset v such that u v and l u v S N choose and x m K

which the maximum in is achieved Then choose w u E N minimizing

hij jK i

nonzero value hm w i w E N and v E N maximizing hm w i for w E N By

m

Consequence a u C N with M M exists Put u u w By Lemma

u

m m

M M M As hi j jK i M n M the fact that m is a skeletal imset see

u u u

Section implies M T N M which means u v On the other hand by

u v

Observation hm ui and therefore

hm v i hm v i

implies hm l u v i l gra N

hm w i hm ui

which means l u v S N by Lemma

Remark Note that the type of the skeleton is not material in the ab ove result In

fact the skeleton can b e replaced either by the uskeleton or by the oskeleton and the

resp ective constant gra N has the same value Indeed it follows from Consequence

that for every skeletal sup ermo dular function me there exists such that hme ui

hm ui for every u E N where m K N is the unique mo del equivalent element of

the skeleton Thus the ratios maximalized in are invariants of classes of mo del

equivalence of skeletal imsets Note that if jN j then

m K N min fhm ui u E N hm ui g

which implies that graN gra N in this case Thus if the hypothesis holds in

general see Question then graN is the least l N satisfying by Consequence

and Lemma Note that an analogue of holds for jN j and the uskeleton

b ecause of the op eration of reection mentioned on p or in Section of

but not for the oskeleton This is maybe the main dierence b etween ostandardization

and standardization

Invariants of facial equivalence

This section deals with some of those attributes of structural imsets which are either

invariable with resp ect to facial equivalence or characterize classes of facial equivalence

Let u b e a structural imset over N By eective domain of u denoted by D is

u

understo o d the class of sets T N such that S T S for some S S N with

uS uS that is D D D Recall that U D is nothing but the

u

u u u u

upper class of u from Section The region of u denoted by R is the class of subsets

u

of N obtained as follows

R fK iK j K ij K g fC AC B C AB C g

u

hij jK iM T N hAB jC iM nT N

u u 

Note that the equality of the unions in over the class of elementary triplets and

over the class of all nontrivial triplets can b e easily derived from the fact that M is a

u

semigraphoid Lemma by means of Lemma on p

Lemma Given a structural imset u over N one has R D If u v S N such

u

u

that u v then U U D D and R R Moreover S R for S N i

u v u v u

u v

w S for every w S N which is facially equivalent to u

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Two structural imsets with the same eective domain but dierent regions

Pro of To show R D consider elementary triplet hi j jK i M and nd k N with

u u

u

ij K ij K

k u u S N As hm u i by Observation k hm ui Hence

hij jK i hij jK i

uT for some ij K T N which means K iK j K ij K D Analogously

u

K K

hm u i implies hm ui and uS for some S K which means

hij jK i

max S

K iK j K ij K D The next observation is that S D i hm ui

u u

T max

and hm ui for every T S Indeed S D means uS and uT

u

for T S by Observation L U and one can show by reverse induction on jT j

u u

T

that uT for every S T N i hm ui for every S T N Analogous

min S T

arguments allow to show that S D i hm ui and hm ui for every

u

T S replace by However by Consequence and Observation the conditions

A A A A

hm ui hm ui hm ui hm ui for A N are invariable with

are invariable as well Analogous resp ect to facial equivalence Therefore U and D

u

u

claim ab out R is evident b ecause of its denition in terms of M

u u

P

To show that S R implies uS write n u k v where n N k Z

u v v

v E N

and observe hi j jK i M whenever k for v u As S R one has v S

u v u

hij jK i

for every v E N of this kind which implies n uS The consideration holds for

any w S N with u w in place of u Conversely supp ose S R take elementary

u

triplet hi j jK i M with S fK iK j K ij K g and observe that w u k u is

u

hij jK i

facially equivalent to u for every k N by Lemma One can nd k N such that

w S

However the eective domain and the region of a structural imset may dier as the

following example shows

Example There exist two structural imsets u v over N fa b c dg with the same

eective domain but dierent regions Consider the imset u u u shown in the

hbcjai hadjci

lefthand picture of Figure and the imset v u u shown in the righthand

hcdjai habjci

picture of Figure The set fa dg b elongs to the eective domain D D and to the

u v

region R but it do es not b elong to the region R

v u

On the other hand regions and eective domains of structural imsets over N coincide

in case jN j cf Section and Figure

Remark The signicance of the concept of eective domain is that it allows to re

strict the considered class of elementary imsets when one tests whether an ostandardized

P

imset is combinatorial see Remark Indeed if u k v with k Z

v v

v E N

then for every v u with k one has hi j jK i M and therefore by Lemma

v u

hij jK i

K iK j K ij K R D Thus a reduced class of elementary imsets v u

u

hij jK i

u

satisfying K iK j K ij K D can b e considered Observe that the eective domain D

u u

can b e identied directly on basis of u This is the main dierence from the region R

u

which gives even stronger restriction of the class of considered elementary imsets but the

region cannot b e immediately recognized only on basis of u It is a characteristics of the

resp ective class of facially equivalent structural imsets and can b e identied on basis of

the imset only partially as mentioned in Lemma

However by Observation one can compute the corresp onding leveldegrees of u for

l jN j which may result in further restriction of the class of considered imsets

in particular if some of the leveldegrees vanish

Eective domains are attributes of structural imsets which allow to distinguish imme

diately imsets which are not facially equivalent A natural question whether there exists

a complete collection of invariant prop erties of similar type in sense that for every pair of

structural imsets u and v over N at least one prop erty of this type exists in which they

dier Consequence gives a p ositive answer to this question Indeed every skeletal

imset m K N is asso ciated with an invariant attribute of a structural imset u over

N namely the fact whether the scalar pro duct hm ui vanishes or not The collection of

these attributes is complete in the ab ove mentioned sense

However as explained in Remark this criterion do es not seem feasible in case

jN j Therefore one is interested in invariants analogous to the eective domain or

in relatively simple characteristics of facial equivalence like the region see Direction in

Chapter For example if jN j then a completely distinguishing class of attributes

is the eective domain together with the minimal lower classes see Section or the

pattern see Section

Chapter

The problem of representative choice

This chapter deals with the problem of choice of suitable representative within a class

of facially equivalent structural imsets It is an advanced subtask of general equivalence

question mentioned in Section studied in the framework of structural imsets an

analogous question has already b een treated in graphical frameworks see Chapter the

concept of essential graph and the concept of the largest chain graph A few principles

of representative choice are introduced and discussed in this chapter Sp ecial attention is

devoted to the representation of graphical mo dels by structural imsets The last section

describ es some other ideas whose aim is unique description of structural mo dels

Baricentral imsets

An imset u over N is called baricentral if it has the form

X X

u w or equivalently u u

habjC i

w E N uw habjC iM T N

u

Evidently every elementary imset is baricentral and every baricentral imset u is a combi

natorial imset with the degree jfw E N u w gj Moreover the denition implies that

every class of facial equivalence of structural imsets contains exactly one baricentral imset

Nevertheless semielementary imset need not b e baricentral Given a semielementary im

set u for hA B jC i T N the resp ective facially equivalent baricentral imset even

hAB jC i

need not b e its multiple of despite the fact that the formulas

jAj jB j

degu jAj jB j jfw E N u w gj jAj jB j

hAB jC i hAB jC i

suggest that it may b e the case

Example There exists a semielementary imset v over N fa b c dg such that no

multiple k v k N is a baricentral imset Put v u see the lefthand picture

habcdji

of Figure Then u w E N i w u where e fb c dg C fb c dg n feg

haejC i

cf Lemma The resp ective baricentral imset u is shown in the righthand picture of

Figure Observe that degu degv but u v since the leveldegrees of

u and v are not prop ortional degv l for l while degu degu

and deg u On the other hand u v u u u

habjci hacjdi hadjbi

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Resp ective nonprop ortional semielementary and baricentral imsets

The signicance of baricentral imsets consists in the fact that testing facial implication

b etween them is very simple

Observation Let u v are baricentral imsets over N Then u v i u v is a

combinatorial imset

Pro of If u v then v w implies u w for every w E N and u v C N follows from

The converse follows from Lemma

Note that testing combinatorial imsets is clear see Remark Analogous result

holds if v is replaced by a semielementary imset see Consequence b elow In partic

ular the whole induced mo del M can b e easily identied on basis of a baricentral imset

u

u over N ie without multiplication

Remark The terminology baricentral imset was inspired by geometric idea that

the class of structural imsets which are facially implied by v S N is nothing but the

class of imsets b elonging to the cone confw E N v w g cf Remark on p

Thus a minimal balanced combination of all extreme imsets of this cone forms the

baricentre of the cone

Natural question is what is the number of baricentral imsets over N A rough upp er

estimate can b e obtained as follows Supp ose n jN j and S N jS j k Then

a limited number of w E N takes a nonzero value w S f g In particular

by every baricentral imset u over N takes the value uS in a nite set LS Z

k nk

which dep ends on k jS j only Concretely LS fk n k g for

nn

n

g for k f ng and LS fn k n LS f g

n

for k f n g Thus j LS j for every S N Since baricentral imsets are

n

ostandardized it suces to know their values for n sets only Therefore every

baricentral imset can b e represented as a function on fS N jS j g taking values in

LS for every S The number of these functions is

n

n

n

f g

n

which serves as an upp er estimate of the number of baricentral imsets over N and therefore

of the number of structural mo dels over N

The following gives an indirect comparison of memory demands when a structural

mo del over N is represented either in the form of a baricentral imset or directly By

Lemma every semigraphoid M over N is determined by M T N Thus owing to

symmetry prop erty see p it can b e represented as a function on E N taking value

n

n

in a twoelement set As jE N j the number of these functions is

n3

nn

n

One has and On

n

n2

n

n n

2

for n so that asymptotically the the other hand

n n

number of considered integral functions on fS N jS j g is lower than the number

of binary functions on E N Only n bits suces to represent elements of LS for

S N in case n which means that memory demands are slightly lower in case of

representation by baricentral imsets

On the other hand the actual number of baricentral imsets ie structural mo dels for

n are much lower than the estimates from Remark see Example The

lattice of baricentral imsets over fa b cg ordered by is shown in Figure

Baricentral imsets provide quite go o d solution of the problem of representative choice

from computational p oint of view However the question of getting resp ective baricentral

imset from any given structural imset remains to b e solved satisfactorily For example

formulas ascribing resp ective baricentral imsets to graphical mo dels are needed see Theme

in Chapter Relative disadvantage of baricentral imsets is that they do not seem to

oer easy interpretation in comparison with standard imsets for DAG mo dels mentioned

b elow

Standard imsets

Some classic graphical mo dels can b e represented by certain standard structural imsets

which seem to exhibit imp ortant characteristics of the mo dels These standard repre

sentatives of graphical mo dels may dier from baricentral representatives and seem to

b e more suitable from the p oint of view of interpretation They are introduced in this

section together with relevant basic facts Note that the motive of later Sections and

is to nd out whether these exceptional representatives reect some deep er principles

so that the concept of standard imset could b e extended even b eyond the framework of

graphical mo dels

Translation of DAG mo dels

Let G b e an acyclic directed graph over N By a standard imset for G will b e understo o d

the imset u over N given by

G

X

u

pa c c pa c G N

G G

cN

Lemma Let G b e an acyclic directed graph over N Then the imset u u is a

G

combinatorial imset and M M Moreover degu jN j jN j jAGj

u G G

where jAGj is the number of arrows in G g g c b c f f Q Q Q Q Q Q g Q Q g g b f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f g g c b c f f g g Q Q g g c c Q Q b c b c f f f f Q Q Q Q Q Q g Q Q g Q Q Q Q g b Q Q Q Q f a c a b c f g g f Q Q Q Q g g Q Q g g b b Q Q f f a c a c a b c a b c f f f f Q Q Q Q Q Q Q Q g Q Q Q Q g a Q Q Q Q a b f f Q Q Q Q g g g g a a a b a b f f f f g g c b c f f g Q Q g c Q Q b c f f Q Q Q Q g Q Q g Q Q g b Q Q f a c a b c f g f Q Q g Q Q g b Q Q f a c a b c f f Q Q Q Q Q Q Q Q g g a Q Q a b f f Q Q g g a a b f f g g c b c f f Q Q Q Q Q Q g Q Q g g g g b g g a c f c c a b c f b c b c f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g Q Q Q Q g g a g g a b f b b f f f a c a c a b c a b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g a a a b a b f f f f g g g g g g g g c c c c b c b c b c b c f f f f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g Q Q Q Q Q Q Q Q g g g g g g g g b b b b f f f f a c a c a c a c a b c a b c a b c a b c f f f f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g g g g g a a a a a b a b a b a b f f f f f f f f g g g g c c b c b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q g g Q Q Q Q g g g g g b b g f f a c a c c a b c a b c f f b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g g Q Q g a a g a b a b f f b f f f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f g g c b c f f g Q Q g c Q Q b c f f Q Q Q Q g Q Q g Q Q g b Q Q f a c a b c f g f Q Q g Q Q g b Q Q f a c a b c f f Q Q Q Q Q Q Q Q g g a Q Q a b f f Q Q g g a a b f f g g g g c c b c b c f f f f g Q Q Q Q g c Q Q Q Q b c f f Q Q Q Q Q Q g g Q Q Q Q g g Q Q g g b b Q Q f f a c a c a b c a b c f f g f f Q Q g Q Q Q Q g b Q Q Q Q f a c a b c f f Q Q Q Q Q Q Q Q Q Q Q Q g g g g a a Q Q a b a b f f f f Q Q g g a a b f f g g c b c f f Q Q Q Q Q Q g Q Q g g b f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f

Figure Baricentral imsets over N fa b cg rotated

Pro of Consider a xed ordering a a n of no des of G consonant with direction

n

of arrows and the corresp onding causal input list see Remark

h a a a n pa a j pa a i for j n

j j j j

G G

Introduce u as the semielementary imset corresp onding to the j th triplet from for

j

j n and write

n n

X X

u u u

j fa a g fa a g a pa a paa G

1 1

j j 1 j j j

j j

since almost all terms are cancelled Thus u is a combinatorial imset and the

G

fa a g

1

j

P

n

deg u gives the desired substitution of deg u j jpa a j into degu

j j j

G

j

P

n

formula for deg u Note that the formula ab ove implies that u actually do es not

G j

j

dep end on the choice of causal input list

Since M is a semigraphoid containing the result from saying that M

u G

is the least semigraphoid containing implies M M For converse inclusion

G u

use the result from implying that a discrete probability measure P over N with

M M exists and Theorem saying that v S N with M M exists Since

G P P v

the list b elongs to M one has v u for j n and therefore by Lemma

G j

P

n

u u which means M M M v

j u v G

j

Remark In fact it was shown in the pro of of Lemma that u S N where

G

is the class of discrete measures over N cf Section

Standard imsets app ear to a suitable to ol for testing Markov equivalence of acyclic

directed graphs

Consequence Let G H b e acyclic directed graphs over N Then M M if

G H

and only if u u

G H

Pro of By Lemma u u M M The converse implication is shown in

G H G H

as Consequence concluding Remark there

Remark Every semielementary imset over N is a standard imset for an acyclic

directed graph over N Indeed given hA B jC i T N consider a total ordering of no des

of N in which the no des of C precede the no des of A which precede the no des of B and

these precede the no des of N n AB C Take an undirected graph over N in which every

pair of distinct no des is a line except pairs a b where a A b B Consider a directed

graph G which has this undirected graph as the underlying graph and has the direction

of arrows consonant with the total ordering ab ove Then it makes no problem to see by

the pro cedure in the pro of of Lemma that u u

G hAB jC i

Translation of decomp osable mo dels

Decomp osable mo dels that is indep endence mo dels induced by triangulated undirected

graphs form an imp ortant class of graphical mo dels see Section Let H b e a

triangulated undirected graph over N and C is the class of all its cliques By standard

imset for H will b e understo o d the imset u over N given by

H

X

jB j

T

u

H N

B

B C

It is shown b elow that u is a combinatorial imset Consequence the next lemma

H

helps to compute u eciently

H

Lemma Let H b e a triangulated undirected graph over N and C C m

m

is a sequence of all its cliques satisfying the running intersection prop erty see on

p Then

m m

X X

u

H N C S

i i

i i

S

where S C C for i m are resp ective separators In particular the

i i j

j i

righthand side of do es not dep end on the choice of and can also b e written as

follows

X X

u w S

H N C S

C C S S

where C is the class of cliques of H S is the class of separators and w S denotes the

multiplicity of a separator S S

Pro of The idea is to verify by induction on m jC j It is evident in case m

S

If m then put C C n fC g T C and H H Observe that C C is a

m T m

sequence of all cliques of H satisfying the running intersection prop erty Write by

X

jB j

T

u u

H N T H

B

C B C

m

S

Running intersection prop erty says S C C C for some k m This

m m j k

j m

allows to write

X X

jB j jAj jAj

T T T

g f

C C C

B A AC

m m

k

k

C B C

C AC nfC g

m

m

k

where the last equality holds b ecause every term in braces vanishes whenever jAj

T S T T

the inclusion A C C C says A C A Hence by and

m j k k

j m

the induction hypotheses applied to H over T get

m m

X X

u

S H N T T C S C

m m

i i

i i

which gives

Remark Note that implies that the pro duct formula induced by u see Section

H

is nothing but wellknown formula characterizing Markovian measures with

resp ect to triangulated graphs mentioned in Section Thus this classic result can

b e viewed as a sp ecial case of Theorem on structural imsets

Decomp osable mo dels can b e viewed as DAG mo dels see Figure The reader

may ask whether standard translation of DAG mo dels and decomp osable mo dels leads

to the same imset Positive answer is given by the following lemma

Lemma Let H b e a triangulated graph over N and C C m is a sequence

m

S

of its cliques satisfying the running intersection prop erty Put R C n C for

i i j

j i

i m and consider a total ordering of no des of N in which no des of R precede

i

no des of R for i m Let G is an acyclic directed graph over N having

i

H as the underlying graph such that the direction of arrows in G is consonant with the

constructed total ordering of no des Then M M and u u

G H H G

Pro of The rst observation is this

c N a b pa c a b a b is an edge in G

G

S S

Indeed c R for uniquely determined l m If a pa c then a R C

l j j

G

j l j l

and fa cg b elongs to a clique of H Let C b e the rst clique in the sequence C C

i m

S S

containing fa cg Necessarily i l as otherwise a c C C C C S

i j i j i

j l j i

and by the running intersection prop erty a c S C for k i which contradicts the

i k

denition of C However as c C for j i necessarily i l Hence pa c C which

i j l

G

implies

Now b oth G and H can b e viewed as classic chain graphs over N with the same

underlying graph see Section To show M M by wellknown graphical char

G H

acterization see p it suces to show that they have the same complexes But

H has no complexes and G as well b ecause of

i i

The equality u u can b e derived using in Lemma Indeed if d d

G H

y

is the chosen ordering within R i m then gives

i

i

d

y

m m

X X X

u f g f g

G N pa d d pad N S C

i i

i

i i

dd

i i i

S and d pa d C for i m and all remaining where S as pa d

i i

G G

y y

terms within the inside sum are cancelled

Consequence Let H b e a triangulated undirected graph over N Then u u is a

H

combinatorial imset M M and u coincides with the standard imset for any acyclic

H u

directed graph G for which M M

G H

Pro of This follows directly from Lemma Lemma and Consequence

Remark Because of Remark the preceding consequence implies that u S N

H

where is the class of discrete measures over N

fa b cg fa b cg

Q Q

Q Q

Q Q

Q Q

fa bg fa cg fb cg fa bg fa cg fb cg

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

fag fbg fcg fag fbg fcg

Q Q

Q Q

Q Q

Q Q

Figure Two distinct equivalent imsets of the least degree

Imsets of the least degree

One of p ossible approaches to the choice of representative from a class of facially equivalent

structural imsets is to choose a combinatorial imset of the least degree see Section

p for this concept Note that contains combinatorial imsets by Consequence

The denition of degree implies that only nitely many combinatorial imsets over a xed

set N with prescrib ed degree exists In particular the set of combinatorial imsets of

the least degree in is nite By an imset of the least degree will b e understo o d a

combinatorial imset u which has the least degree within the class of combinatorial imsets

v with M M Nevertheless the class may contain more than one imset of the least

u v

degree

Example There exists a class of facially equivalent structural imsets over N

fa b cg which has two dierent imsets of the least degree Consider the class of

w S N with M T N Then b oth u u u u and v

w hbcjai habji hacji

u u u they are shown in Figure have the least degree within the

habjci hacjbi hbcji

class of combinatorial imsets from Observe that L fa b cg while L fab ac bcg

u v

Note that the fact that u and v are all imsets of this kind can b e veried using the pro

cedure describ ed later in Section

Lemma Standard imset for an acyclic directed graph G over N is an imset of the

least degree

Pro of Let v C N with v u where u u To verify

G

deg v degu j fa b N N a b a b is not an edge in G g j

P

see Lemma write v k w for k Z and show that for every a b N

w w

w E N

a b such that a b is not an edge in G there exists w u E N with k for

habjK i w

ab

some K N n ab Indeed otherwise hm v i for m m as hm w i for w E N

m

i w u for some K N n ab Hence by Observation M M But the

v

habjK i

m m

moralization criterion see Section says ha b j pa apa bi M n M M n M

G u

G G

which implies a contradictory conclusion M M

v u

The previous lemma implies by Remark this fact

Consequence Every semielementary imset is an imset of the least degree

The metho d of nding of all imsets of the least degree within a given equivalence class

mentioned in Example is based on the fact that every imset of this type determines

a certain minimal generator of the resp ective induced indep endence mo del The metho d

uses a computer program and its theoretical justication is given in the rest of this section

Strong facial implication

Let u v b e combinatorial imsets over N Let us say that u strongly facial ly implies v and

write u ; v if u v is a combinatorial imset Clearly u ; v implies u v by Lemma

The relation ; is a partial ordering on C N for antisymmetry use Observation

Its advantage in comparison with is that it can b e easily tested see Remark

Observation Every imset u of the least degree is minimal with resp ect to ; within

the class fv C N v u g

Pro of If u v are combinatorial imsets and u ; v then degu degv degu v as

the only combinatorial imset of degree is the zero imset

However the question whether the converse implication holds remains op en see Ques

tion on p

Minimal generators

The p oint is that imsets satisfying the condition from Observation corresp ond to

sp ecic minimal generators with resp ect to a closure op eration on subsets of X T N

see Section p for related denitions Indeed the class U N is a closure system

of subsets of X T N by Observation and one can introduce the resp ective closure

op eration cl on subsets of T N Thus by the structural closure of G T N is

U N

understo o d the least structural mo del containing G dened by

cl G M for G T N

U N

G M U N

A set G T N is called a structural generator of M U N if M cl G if

U N

moreover G consists of elementary triplets G T N then it is called an elementary

generator of M A structural elementary generator of M is called minimal if no its

prop er subset is a structural generator of M As every structural mo del M over N is a

semigraphoid by Lemma M T N is an elementary generator of M This implies

the following observation

Observation Every structural mo del M over N has minimal elementary generator

Remark Note that the concept of minimal generator can b e understo o d with re

sp ect to arbitrary closure op eration on subsets of T N for example with resp ect to

semigraphoid closure op eration or any closure op eration introduced by means of syntac

tic inference rules of semigraphoid type The concept of complexity of a mo del with

resp ect to a closure op eration which can b e introduced as the least cardinality of a

generator app ears to b e an interesting characteristic of the mo del

The following lemma provides a metho d of nding all imsets of the least degree

Lemma Let M b e a structural mo del over set N endowed with a total ordering

and fv C N M Mg Then every minimal element u of with resp ect to ;

v

has the form

X

u k w where k f g

w w

w E N

i j g is a minimal elementary generator of M and G fhi j jK i T N k

u

hij jK i

P

Pro of Write u k w where k Z If k for some w E N

w w w

w E N

then u w C N is facially equivalent to u and u ; u w Therefore necessarily

holds and G is in onetoone corresp ondence with elements of E N having non

zero co ecients there To show that G is a structural generator of M consider M U N

with G M As it is a semigraphoid M M for w u hi j jK i G and by

w hij jK i

Consequence M M Thus M cl G and the converse inclusion follows

u u

U N

from M U N and G M To show that no prop er subset F G is a generator

u u

P

introduce v u C N Observe that cl F M by an analogous

v

hij jK i U N

hij jK iF

pro cedure If cl F M then u v and u ; v u which contradicts the

U N u

assumption

In case jN j all minimal elementary generators of a structural mo del M over N

can b e found by a computer program written by my colleague P Bocek Thus owing

to Observation given M U N the list of imsets of the least degree inducing M

can b e obtained by reducing the list of imsets u satisfying Reduction is sometimes

necessary as the following example shows

Example There exists a structural mo del M over fa b c dg and an elementary

P

generator G M T N such that v u is not an imset of the least

hij jK i

hij jK iG

degree Let us consider the indep endence mo del from Example on p restriction

of a DAG mo del Then b oth imsets

u u u u v u u u u

hadji hacjdi hbdjai hacji hbdji hadjbi hadjci

are dened by means of minimal elementary generators of M but deg v degu

the imsets are shown in Figure Note that v u u and u v is a baricentral

hadji

imset Moreover u is the unique imset of the least degree among facially equivalent imsets

while v is an imset with the least lower class see Section among facially equivalent

imsets

The following consequence easily follows from Lemma the denition of baricentral

imset and Consequence

Consequence If u is a baricentral imset over N and v is an imset of the least degree

with u v then u v is a combinatorial imset In particular for every hA B jC i T N

hA B jC i M i u u is a combinatorial imset

u hAB jC i

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Imset of the least degree resp ectively with the least lower class

Width

Recall that the lower class L of a structural imset u is contained in the upp er class U

u u

see Section p but they dier for nonzero u Moreover by Consequence

marginals of Markovian measures for sets in L determine the marginals for sets in U

u u

The upp er class is an invariant of facial equivalence see Section but the lower class

is not as demonstrated by Example In the considered example the imset shown

in the lefthand picture of Figure tells more ab out which marginals determine the

whole Markovian measure in comparison with the imset shown in the righthand picture

of Figure Thus facially equivalent imsets need not b e equiinformative from this p oint

of view This consideration motivates an informal concept of width of a structural imset

u which is the class U n L

u u

Determining and unimarginal classes

Let us take more general view on some results of Section Supp ose that M is a

structural mo del over N The upp er class U U of subsets of N and the class of

u

probability measures over N which are Markovian with resp ect to u do not dep end on

the choice of u S N with M M they are determined by M only

u

A descending class D U of subsets of N will b e called determining for M if the

only descending class E with D E U such that AC BC E AB C E for every

hA B jC i M is the class E U A descending class D U will b e called unimarginal

for M if every pair of Markovian measures over N with resp ect to u S N satisfying

M M whose marginals for sets from D coincide has the same marginals for sets from

u

U Evidently whenever D U is determining resp unimarginal then every descending

class D with D D U is determining resp unimarginal as well Therefore one

is interested in minimal determining classes for M that is determining classes D U

for M such that no prop er descending sub class D D is a determining class for M

In particular I am interested in the question for which M the least determining class

resp the least unimarginal class for M exists which is the unique determining resp

unimarginal class D U ie one has D D for every descending determining resp

unimarginal class D U DAG mo dels app ear to b e examples of structural mo dels of

this type see Section

Observation Every determining system is unimarginal

S S

Pro of If D U is determining and P Q Markovian measures then put E fS U P Q g

and observe that D E and AC B C E AB C E for every hA B jC i M use the

uniqueness principle mentioned in the pro of of Consequence

Recall that Consequence says that the lower class L is a determining class for

u

M whenever u S N Thus one can summarize implications as follows

u

lower class determining class unimarginal class

Note that a determining class need not b e a lower class see Example and the question

whether every unimarginal class is determining remains op en see Question on p

However it is known that these concepts essentially coincide for DAG mo dels see

Consequence in Section

Remark The concept of unimarginal class can b e alternatively introduced as a con

cept relative to a distribution framework Then every unimarginal class relative to

is unimarginal relative to any subframework But unimarginal classes may dier

for dierent frameworks Given a distribution framework and a structural mo del M

one can ask what are minimal unimarginal classes for M relative to see Theme in

Chapter

Imsets with the least lower class

A structural imset u S N is called an imset with the least lower class if L L

u v

for every v S N with u v Some classes of facial equivalence contain imsets of

this type for example the imset u from Example on p On the other hand as

subsequent example shows there are classes of facial equivalence which do not have these

imsets but several imsets with minimal lower class ie imsets u S N such that no

facially equivalent v S N with L L exists

v u

Example There exists a structural mo del M over N fa b c dg such that

the collection fu S N M Mg has three imsets with distinct minimal

u

lower classes

distinct minimal determining classes for M exist and none of them is a lower

class for any u

Introduce M T N and a class of elementary imsets K as follows

M f hA B jC i T N jC j g K f u E N jK j g

hij jK i

m

Observe that M M for the sup ermo dular imset m shown in the lefthand picture

of Figure and K fw E N M Mg Introduce a combinatorial imset u

w

P

k w where

w

w K

i w u K or w u K

habjK i hcdjK i

k

w

for remaining w K

Q Q

fa b c dg fa b c dg

Q Q

Q Q

A A

Q Q

A A

Q Q

Q Q

A A

A A Q Q

fa b cg fa b dg fa c dg fb c dg fa b cg fa b dg fa c dg fb c dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fa bg fa cg fa dg fb cg fb dg fc dg fa bg fa cg fa dg fb cg fb dg fc dg

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

P P P P P P

P P P P P P

Q Q

Q Q

P P P P P P

fag fbg fcg fdg fag fbg fcg fdg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Multiset pro ducing M and resp ective imset with minimal lower class

m

u is shown in the righthand picture of Figure Since M T N M T N

u

by Lemma M M The rst step to show that u is an imset with minimal

u

lower class is an observation that for every v S N with M M one has v S

v

for some S S fabc ab acg Indeed put K fu E N jK j g write

hadjK i

P

n v l w where n N l Z and observe that l for some w K

w w w

w K

ad ad

b ecause hm w i for w K n K while hm w i for w K use Observation

to derive contradiction with the assumption M M in case l for w K Put

v w

P

s and observe hs w i for w K and hs w i for w K This implies

S

S S

n hs v i and the desired conclusion

Analogous observation can b e made for any class S consisting of a threeelement

subset of N and a pair of its twoelement subsets If v S N satises M M and

v

L L then the observation ab ove necessitates v ac and one can derive analogously

v u

v ad v bc v bd which implies L L Therefore u is an imset with minimal lower

v u

class and p ermutation of variables gives two other examples of facially equivalent imsets

with distinct minimal lower classes

On the other hand the class D fab bc cdg is a determining class for M as

ha cjbi hb djci ha djbci M One can show that D is minimal and an analogous

conclusion can b e made for any class obtained by p ermutation of variables either from D

or D fab ac adg This list of minimal determining classes for M can b e shown to b e

complete

Exclusivity of standard imsets

The standard imset u for an acyclic directed graph G app ears to b e an exclusive imset

G

within the class of structural imsets u with M M The rst step to show that it is

u G

an imset with the least lower class is the following lemma

Lemma Let M b e a structural mo del over N fu S N M Mg and

u

U U for any u If S U has the form S cT where c N and

u

M fha cjK i T N a T g M fha bjK i T N a b T c K g

then every descending unimarginal class for M contains S

Q

Pro of The rst observation is that any probability measure P Q P where

i

iN nS

Q

T

Q is a probability measure over S with Q Q and P Q are arbitrary one

i i i

iT

dimensional probability measures is Markovian with resp ect to u Indeed by Lemma

it suces to verify M T N M Supp ose ha bjK i M T N If a S then

P

a N n a j P implies a b j K P analogously in case b S If a b S then

implies a b T and c K so that a N n ac j P implies a b j K P as well The

second step is a construction put X f g and dene a pair of probability measures

i

Q

Q Q on X X

S i

iS

P

jS j

if x is even

i

jS j

iS

P

Q x Q x

i iS i iS

jS j

if x is o dd

i

iS

Q

jS j

where Then put P Q P for j where P are some xed

j j i i

iN nS

probability measures on X i N n S Observe that b oth P and P is Markovian with

i

S S L L

resp ect to u P P and P P whenever L N S n L

Finally supp ose for contradiction that S D where D U is an unimarginal class for

L L L L

M This implies P P for L D and therefore P P for L U which contradicts

the fact S U

Consequence Given an acyclic directed graph G over N the lower class L for

u

u u is the least unimarginal and the least determining class for M In particular

G G

u is an imset with the least lower class

G

Pro of By and Lemma is L a determining resp unimarginal class for M If

u G

D is a lower class for v S N with M M resp a determining class for M then it

v G G

is an unimarginal class for M by Lemma can b e then used to verify L D

G u

max

Indeed if S L then S U by Observation and S c pa c for some c N

u

G

u

by The moralization criterion see Section allows to verify that the condition

for T pa c is fullled for M M

G

G

Remark Thus the standard imset u for an acyclic directed graph G over N is b oth

G

an imset of the least degree see Lemma and an imset with the least lower class Note

that a computer program help ed to show that in case jN j the converse holds ie

the only imset satisfying these two conditions is the standard imset u for a given graph

G

G The question whether these two requirements determine standard imsets for acyclic

directed graphs in general remains op en see Question on p

An interesting feature of the standard imset u for a DAG mo dels is that it vanishes

max

outside L U On the other hand a lot of other equivalent structural imsets with

u

u

the same lower class exist sometimes even imsets which take only strictly p ositive values

in R n L The following example shows that imsets of this kind need not exist

u u

Example There exists a DAG mo del M over N fa b c dg such that no u S N

with the least lower class L among imsets with M M is strictly p ositive on R n L

u u u u

for the notion of range R see Section Consider a directed graph G shown in the left

u

hand picture of Figure the corresp onding standard imset is in the righthand picture

Put s and observe that one has hs u i for every hi j jK i M T N

ac acd G

hij jK i

This implies hs v i for every v S N with M M Since ac acd R n L

v G u u

it implies v ac v acd for every v of this kind which moreover satises L L

v u

Note that an analogous consideration can b e made for s

bd abd

Q

fa b c dg

Q

Q

A

Q

A

Q

Q

A

A Q

fa b cg fa b dg fa c dg fb c dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

h h h h

a c

b d

fa bg fa cg fa dg fb cg fb dg fc dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fdg

Q

A

Q

Q A

Q

A

Q

A

Q

Q

Figure An acyclic directed graph and the resp ective standard imset

Other ways of representation

This section describ es further ideas how structural mo dels can b e p ossibly represented

It is only a rough outline of those approaches which lo ok promising and which have to b e

examined in more details

Pattern

Recall that one of feasible metho ds of representing a class of Markov equivalent acyclic

directed graphs is to use a graph which is not an acyclic directed graph but which somehow

exhibits common features of the graphs within the equivalence class see Section p

This motivated an analogous idea in the framework of structural imsets

Observation Let M b e a structural mo del over N fu S N M Mg

u

Then the set fuS u g for S N has one of the following four forms

fg Z fm m g for some m Z and f l l g for some l Z

Pro of If S R for u then fv S v g fg by Lemma If S R then

u u

w u with hi j jK i M T N and w S exists by Observe that for every

hij jK i

u and k N one has u k w which leads to the remaining cases

Pattern of a structural imset u over N can b e introduced as an undirected graph which

has the region R as the set of no des and the set of lines is the collection of pairs of the

u

form fK iK g fiK ij K g for some hi j jK i M T N Moreover the no des of the

u

pattern have assigned symbolic values dep ending on the class of v S N with u v

rS if fv S v g Z

rS if fv S v g Z fl l Z g

rS if fv S v g Z

The symbolic function r can b e formally extended to P N by putting

rS if fv S v g fg

fa b c dg fa b c dg

fa b cg fa b dg fa b cg fa b dg

fa bg fa cg fb cg fa bg fa cg fb cg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fag fbg fcg

Q Q

A A

Q Q

Q Q A A

Q Q

A A

Q Q

A A

Q Q

Q Q

Figure Patterns of imsets from Example

An evaluated pattern is obtained by mo dication in values of r one writes m where

m min fv S v g instead of and l where l min fv S v g instead

of Note that the signs and are kept to distinguish the cases rS ie

m and rS ie l from the case rS ie fv S v g fg

Observe that evaluated patterns characterize classes of facial equivalence Note that

symbolic functions r distinguish all classes of facial equivalence if jN j see Figure

for illustration On the other hand this is not true in general as the following example

shows

Example There exist structural imsets over N fa b c dg which are not facially

equivalent but which have the same symbolic function Let us put

u u u u v u u

hacji hbcji hcdjabi habji

The pattern of u is in the lefthand picture and the pattern of v in the righthand picture

of Figure

Remark The question whether patterns distinguish all classes of facial equivalence

remains op en see Direction The following mo dication of the concept of evaluated

pattern can p ossibly cure the problem if the answer is negative More general values of the

symbolic function can b e considered One can distinguish b etween upp er plus denoted

by and the lower plus denoted by

rS if hi j jK i M T N with ij K S

u

rS if hi j jK i M T N with K S

u

Alternatively one can turn the pattern into a graph with directed and bidirected edges

Indeed every hi j jK i M T N generates four arrows iK ij K j K ij K

u

iK K j K K and every pair of arrows S T and T S can b e replaced by a

bidirected edge S T g g c b c f f Q Q Q Q Q Q g Q Q g g b f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f g g c b c f f g g Q Q g g c c Q Q b c b c f f f f Q Q Q Q Q Q g Q Q g Q Q Q Q g b Q Q Q Q f a c a b c f g g f Q Q Q Q g g Q Q g g b b Q Q f f a c a c a b c a b c f f f f Q Q Q Q Q Q Q Q g Q Q Q Q g a Q Q Q Q a b f f Q Q Q Q g g g g a a a b a b f f f f g g c b c f f g Q Q g c Q Q b c f f Q Q Q Q g Q Q g Q Q g b Q Q f a c a b c f g f Q Q g Q Q g b Q Q f a c a b c f f Q Q Q Q Q Q Q Q g g a Q Q a b f f Q Q g g a a b f f g g c b c f f Q Q Q Q Q Q g Q Q g g g g b g g a c f c c a b c f b c b c f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g Q Q Q Q g g a g g a b f b b f f f a c a c a b c a b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g a a a b a b f f f f g g g g g g g g c c c c b c b c b c b c f f f f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g Q Q Q Q Q Q Q Q g g g g g g g g b b b b f f f f a c a c a c a c a b c a b c a b c a b c f f f f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g g g g g a a a a a b a b a b a b f f f f f f f f g g g g c c b c b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q g g Q Q Q Q g g g g g b b g f f a c a c c a b c a b c f f b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g g Q Q g a a g a b a b f f b f f f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f g g c b c f f g Q Q g c Q Q b c f f Q Q Q Q g Q Q g Q Q g b Q Q f a c a b c f g f Q Q g Q Q g b Q Q f a c a b c f f Q Q Q Q Q Q Q Q g g a Q Q a b f f Q Q g g a a b f f g g g g c c b c b c f f f f g Q Q Q Q g c Q Q Q Q b c f f Q Q Q Q Q Q g g Q Q Q Q g g Q Q g g b b Q Q f f a c a c a b c a b c f f g f f Q Q g Q Q Q Q g b Q Q Q Q f a c a b c f f Q Q Q Q Q Q Q Q Q Q Q Q g g g g a a Q Q a b a b f f f f Q Q g g a a b f f g g c b c f f Q Q Q Q Q Q g Q Q g g b f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f

Figure Symbolic functions over N fa b cg rotated

Dual description

Two approaches to the description of indep endence mo dels by imsets were distinguished

in Chapter Every structural mo del is induced by a structural imset and pro duced by a

sup ermo dular imset see Consequence and b oth metho ds can b e viewed as mutually

dual approaches As mentioned b efore Observation p one can take a dual p oint of

view and describ e structural mo dels as indep endence mo dels pro duced by standardized

sup ermo dular imsets

Dual baricentral imsets

One can introduce an analogue of facial equivalence and implication for standardized

P N P N m r

sup ermo dular imsets m K N Z implies r K N Z if M M and

they are equivalent if they pro duce the same mo del Moreover every imset of this kind is a

nonnegative rational combination of skeletal imsets see Lemma so that these play

the role which is analogous to the role of elementary imsets within the class of structural

imsets cf Theorem Following this analogy an standardized sup ermo dular imset

m over N will b e called a dual baricentral imset if it has the form

X

m r

m r

N M M r K

The corresp onding p oset of dual baricentral imsets is shown in Figure

Cop ortraits

Let me explain dual p ersp ective in more details with help of the concept of Galois con

nection from Section It was explained there p that the p oset of structural

mo dels U N can b e viewed as a concept lattice given by the formal context

More sp ecically it follows from Lemma that every structural mo del M over N is in

onetoone corresp ondence with a set of elementary imsets over N namely with

fv E N v u where hi j jK i M T N g

hij jK i

P N

In particular every u S N resp ectively m K N Z corresp onds through M

u

m

resp ectively through M to a subset of E N

E fv E N v u hi j jK i M g fv E N u v g

u hij jK i u

m m

E fv E N v u hi j jK i M g fv E N hm v i g

hij jK i

Thus every structural mo del can b e identied with a set of ob jects of the formal context

In fact it is an extent of a formal concept so that structural mo dels corresp ond

more or less to the description in terms of ob jects However as explained in Remark

every formal concept can b e also describ ed by means of its intent ie in terms of

attributes In this case the set of attributes is the skeleton K N which motivates

the following denition

By coportrait of a structural imset u over N will b e understo o d the set of skeletal

imsets H given by

u

H fr K N hr ui g

u

g g c b c f f Q Q Q Q Q Q g Q Q g g b f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f g g c b c f f g g Q Q g g c c Q Q b c b c f f f f Q Q Q Q Q Q g Q Q g Q Q Q Q g b Q Q Q Q f a c a b c f g g f Q Q Q Q g g Q Q g g b b Q Q f f a c a c a b c a b c f f f f Q Q Q Q Q Q Q Q g Q Q Q Q g a Q Q Q Q a b f f Q Q Q Q g g g g a a a b a b f f f f g g c b c f f g Q Q g c Q Q b c f f Q Q Q Q g Q Q g Q Q g b Q Q f a c a b c f g f Q Q g Q Q g b Q Q f a c a b c f f Q Q Q Q Q Q Q Q g g a Q Q a b f f Q Q g g a a b f f g g c b c f f Q Q Q Q Q Q g Q Q g g g g b g g a c f c c a b c f b c b c f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g Q Q Q Q g g a g g a b f b b f f f a c a c a b c a b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g a a a b a b f f f f g g g g g g g g c c c c b c b c b c b c f f f f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g Q Q Q Q Q Q Q Q g g g g g g g g b b b b f f f f a c a c a c a c a b c a b c a b c a b c f f f f f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g g g g g a a a a a b a b a b a b f f f f f f f f g g g g c c b c b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q g g Q Q Q Q g g g g g b b g f f a c a c c a b c a b c f f b c f f f f Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q g g g g g Q Q g a a g a b a b f f b f f f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f g g c b c f f g Q Q g c Q Q b c f f Q Q Q Q g Q Q g Q Q g b Q Q f a c a b c f g f Q Q g Q Q g b Q Q f a c a b c f f Q Q Q Q Q Q Q Q g g a Q Q a b f f Q Q g g a a b f f g g g g c c b c b c f f f f g Q Q Q Q g c Q Q Q Q b c f f Q Q Q Q Q Q g g Q Q Q Q g g Q Q g g b b Q Q f f a c a c a b c a b c f f g f f Q Q g Q Q Q Q g b Q Q Q Q f a c a b c f f Q Q Q Q Q Q Q Q Q Q Q Q g g g g a a Q Q a b a b f f f f Q Q g g a a b f f g g c b c f f Q Q Q Q Q Q g Q Q g g b f a c a b c f f Q Q Q Q Q Q Q Q g g a a b f f

Figure Dual baricentral imsets over N fa b cg rotated

Indeed H fr K N hr v i for every v E g which means that H is nothing

u u u

but E As E E the pair E H is a formal concept in sense of Section

u u u

u u

By Consequence two structural imsets are facially equivalent i they have the same

cop ortrait Thus every class of facial equivalence is uniquely represented by the resp ective

cop ortrait The lattice of all cop ortrait over variables is shown in Figure

Remark This is to explain terminology The idea of dual description of a structural

mo del was presented already in where the concept of portrait of u S N was

introduced as the set of skeletal imsets

fr K N hr ui g

Thus cop ortrait H is nothing but the relative complement of in K N which moti

u

vated my terminology here Provided the skeleton is known and are equiin

formative but the concept of cop ortrait seems more natural from theoretical p oint of view

in light of Galois connection Despite this fact I decided to keep former terminology and

do not rename things The reason why I preferred in to was my antici

N u E N g pation that for jN j the relative o ccurence of zeros in fhm ui m K

exceeds the relative o ccurence of nonzero values which seem to b e true in explored cases

Practical consequence should b e that p ortraits have less cardinality than cop ortraits for

structural imsets inducing a lot of indep endence statements

Nevertheless the metho d of dual description of structural mo dels is limited to the

situation when the skeleton is known Of course as explained in Remark the type of

the skeleton is not substantial since the use of the uskeleton resp the oskeleton instead

of the skeleton leads to an isomorphic concept of p ortrait and cop ortrait

Global view

Of course owing to Consequence and every cop ortrait can also b e written in

m m P N

the form H E where m K N Z Note that one can show analogously to

the pro of of Lemma that

m

H fr K N k m r K N for some k N g

Therefore the mutual relation of a structural imset u and the corresp onding set of ele

mentary imsets E given by is completely analogous to the mutual relation of an

u

m

standardized sup ermo dular imset m and the corresp onding set of skeletal imsets H

The global view on all four ab ove mentioned approaches to the description of a structural

mo del is indicated by Figure One can use

a set of elementary imsets

a structural imset

a set of skeletal imsets

an standardized sup ermo dular imset bc ab ac N N N bc N ac ab N

Figure Cop ortraits of structural imsets over N fa b cg rotated

m

r

P N

K N Z

m

nonnegative

rational

z

combination

r r r

N K

m

r r r r

r r r r

r r r r

E N

incidence relation hm ui

can b e extended to a wider context

r

u

S N

Figure Extension of Galois connection for structural mo dels illustration

Recall that the set of elementary imsets can b e viewed as direct translation of the con

sidered structural mo del the structural imset is obtained by nonnegative rational com

bination of elementary imsets the set of skeletal imsets is obtained by Galois connection

and the sup ermo dular imset by nonnegative rational combination of skeletal imsets

Let me emphasize that unlike the case of general Galois connection describ ed in Section

additional sup erstructure of summing elementary resp ectively skeletal imsets is at

disp osal This fact allows to describ e and later implement resp ective relation among

formal concepts namely the relation b e a sub concept with help of algebraic op erations

more precisely by means of arithmetic of integers This is the main asset of the describ ed

approach

Dual minimal generators

However the dual approach exhibits some dierent mathematical prop erties One can

introduce an analogue of the concept of combinatorial imset ie an imset which is a

nonnegative integral combination of skeletal imsets But there is no analogue of the

concept of degree for imsets of this type the sum of two skeletal imsets from the rst

line of Figure on p equals to the sum of three skeletal imsets from the second

line of Figure

Nevertheless one can introduce an analogue of the concept of strong facial implication

see Section and prove an analogue of Lemma Indeed following the idea

indicated in Remark one can introduce the lattice of coportraits

W N fH K N H H for u S N g

u

and observe that it coincides with the collection of closed sets with resp ect to a closure

op eration on subsets of K N

H H fm K N hm v i for v E N with hr v i for every r H g

One can show then that imsets minimal with resp ect to the dual strong facial implication

corresp ond to minimal generators with resp ect to this closure op eration on subsets of

K N An interesting fact is that in case jN j every class of resp ective equivalence

of imsets of ab ove type has unique minimal imset in the describ ed sense In other words

no analogue of Example is valid for dual representation if jN j Moreover the

corresp onding dual baricentral imset is a multiple of the minimal imset in that case

Well p erhaps the dual approach indeed exhibits b etter mathematical prop erties in this

sense than the approach based on structural imsets Nevertheless I tend to b elieve that

the phenomenon mentioned ab ove is a coincidence and not a general feature of the dual

approach see Theme

Chapter

Op en problems

The goal of this chapter is to gather op en problems and present a few topics omitted in

the previous chapters Op en problem are classied according to the degree of vagueness in

three categories Questions are clear inquiries formulated as mathematical problems For

mal denitions of related concepts were given and exp ected answer is yes or no Themes

of research are wider areas of mutually related problems Their formulation is slightly

less sp ecic but still in mathematical terms and they may deserve some clarication of

involved concepts Directions of research are very wide groups of problems with recog

nized common motivation source They are formulated quite vaguely and may b ecome a

topic of research in forthcoming years The secondary criterion of classication of op en

problems is their topic the division of this chapter into sections was inspired by the

motivation account from Section

Unsolved theoretical problems

In this section op en problems concerning theoretical groundings are gathered Some of

them were already mentioned earlier They are classied by their topics

Miscellaneous topics

Distributions

There are several op en problems related to Sections and

Question Let P and Q are probability measures over N dened on the a pro duct of

Q

measurable spaces X X X X which have nite multiinformation p

N N i i

iN

Has their convex combination P Q nite multiinformation as well

Theme Is there any direct formula for the multiinformation function of a nondege

nerate CG measure see p in terms of their canonical or moment characteristics

Alternatively is there any iterative metho d of its computing

Note that owing to Lemma equivalent formulation of Theme is to nd a formula

Q

for entropy of a CG measure P with resp ect to where f i N g is the standard

i i

iN

reference system for P see p I am more likely sceptical ab out the existence of a

direct formula of this kind The following natural question motivated by the results of

Section concerns Gaussian distribution framework

N

Question Let P Q b e nondegenerate Gaussian measures on R see p Is there

N

a nondegenerate Gaussian measure R on R such that M M M

R P Q

Graphs

Further op en problems are related to Chapter The following problem named the

inclusion problem in can b e viewed as an advanced subtask of the equivalence task

see Section

Theme Let G H b e acyclic directed graphs over N see p Is there any graphical

characterization of the inclusion M M see Section Is it p ossible to charac

G H

terize M M in terms of a simple algebraic relation of standard imsets u and u

G H G H

p

Note that suitable graphical characterization of Markov equivalence ie of M

G

M was found p and I would appreciate an analogous solution of Theme which

H

is in terms of invariants of Markov equivalence The following question concerning fac

torizably equivalent chain graphs was already mentioned in Section

Q

Question Let X X X X b e a xed sample space with nontrivial X

N N i i i

iN

for every i N Let G H b e classic chain graphs over N p such that M M

G H

see p Do es the class of factorizable measures on X X with resp ect to G see

N N

Remark on p coincide with the class of factorizable measures with resp ect to H

Structural imsets

There are some unsolved problems related to Chapter

Theme Let G b e a chain graph over N see p Is there any direct formula for

the baricentral imset u over N see p with M M Can every sup ermo dular

u G

function m over N see p and b e eectively translated into a baricentral imset u

m

over N with M M

u

Direction Develop an eective criterion which decides whether a given structural

imset is a baricentral imset

Question Let b e a class of facially equivalent structural imsets over N see p

and u b e a combinatorial imset minimal with resp ect to strong facial implication ;

see p Is u an imset of minimal degree in

Classication of skeletal imsets

Basic problem related to skeletal imsets see Section is the following one

Theme Is there any suitable characterization of skeletal imsets which allows to nd the

skeleton K N for any nite nonempty set of variables N How do es jK N j dep end

on jN j

Note that oers a characterization of extreme sup ermo dular functions but the

result is more likely a criterion whether a given standardized sup ermo dular function is

skeletal more precisely it can b e used for this purp ose However the criterion do es

not seem suitable for the purp ose of computer implementation Therefore the result of

do es not solve the problem of nding the skeleton for every N A promising idea

how to tackle the problem is indicated in the rest of Section A related task is

the task to classify submaximal structural mo dels One can x a way of standardization

of skeletal imsets see Remark as submaximal structural mo dels are in onetoone

corresp ondence with the elements of the resp ective skeleton Every p ermutation N

N of variables can b e extended to a p ermutation of the p ower set P N P N

and this step allows one to introduce p ermutable equivalence on the class of skeletal

imsets any skeletal imset m is equivalent in this sense to the comp osition m see also

Of course every p ermutation of skeletal imsets denes a p ermutation of resp ective

pro duced indep endence mo dels Basic way of classication is division of the class of

standardized skeletal imsets into classes of p ermutable equivalence Every equivalence

class then represents a type of a skeletal imset For example standardized skeletal

imsets decomp ose into types in case jN j imsets decomp ose into types in case

jN j and imsets decomp ose into types in case jN j

Level equivalence

Nevertheless p erhaps even more precise way of classication of skeletal imsets exists

Supp ose that m K N is a skeletal imset over N let resp ective symbols m m and m

u o

denote the resp ective mo del equivalent element of the skeleton the uskeleton and the

oskeleton obtained by formulas from Remark p Thus m more precisely

the resp ective class of mo del equivalent skeletal imsets denes a certain equivalence on

the class of subsets of N

S T N S T m S m T m S m T and m S m T

m o o u u

The equivalence classes of could b e interpreted as areas in which these standardized

m

skeletal imsets have the same values in other words they corresp ond to levels of values

In fact I conjecture that the following hypothesis is true

Question Let m K N m is mo del equivalent skeletal imset over N see p

and S T N such that S T Is then necessarily m S m T

m

Remark As recognized in case jN j the ostandardized representative m cannot

o

b e omitted in and m S m T is often equivalent to S T for S T N

o o m

It seems that m resp m can b e omitted in but not b oth Therefore I think that

u

ostandardization is the b est standardization for the purp ose of level equivalence

Two skeletal imsets over N will b e called level equivalent if they induce the same

equivalence on subsets of P N

Observation Let m m are level equivalent skeletal imsets over N and is a

p ermutation of N extended to P N Then m and m are level equivalent

Pro of This is a hint only Given a skeletal imset m over N put r m and observe with

help of formulas from Remark that r m r m and r m Hence for every

u u o o

S T N one has S T i S T which implies the desired fact immediately

r m

Remark Further interesting op eration with sup ermo dular functions can b e intro

duced with help of sp ecic selftransformation of P N

S N n S for every S N

Given a sup ermo dular function m over N one can introduce z m and observe see

Section in that z is also a sup ermo dular function over N called the reection

of m Indeed the reection of z is again m Moreover one can show using the formulas

from Remark that z m z m and z m Consequently for every S T N

u u o o

one has S T i S T An interesting fact is that in case jN j one has

z m

S T i N n S N n T for every m K N see Example b elow In particular

m m

o

m and z are level equivalent in this case Nevertheless the question whether the ab ove

hypotheses holds in general is op en

Question Let m K N and S T N such that S T Is then necessarily

m

o

N n S N n T

m

Sup ertypes

Natural consequence of Observation is that the concept of p ermutable equivalence can

b e extended to classes of level equivalence Then every class of this extended p ermutable

equivalence decomp oses into several classes of level equivalence which decomp ose into

individual standardized skeletal imsets Thus every class of p ermutable equivalence of

this kind represents a supertype For example two sup ertypes exists in case jN j and

ve sup ertypes in case jN j An interesting fact is that every equivalence on

P N jN j dened by a skeletal imset m can b e describ ed by means of at most two

cardinal criteria which distribute sets S N to their equivalence classes levels on

basis of the cardinality of the intersection of S with one or two given disjoint subsets of

N Every equivalence on P N of this kind is therefore determined by a certain system of

disjoint subsets of N having at most two comp onents This is illustrated by the following

example

Example One can distinguish ve types of cardinal criteria distributing subsets S

of N fa b c dg to levels which corresp ond to ve sup ertypes of skeletal imsets

The criterion jS fa bgj divides P N into levels see the upp er picture of Figure

The corresp onding class of level equivalence has standardized imset but the

class of p ermutable equivalence has classes of level equivalence Therefore the

resp ective sup ertype involves standardized skeletal imsets

The criterion jS fa b cgj divides P N into levels see the lower picture of

Figure The corresp onding class of level equivalence has imsets the class of

p ermutable equivalence has classes of level equivalence Hence the sup ertype

involves imsets An example of a skeletal imset of this type is in Figure where

b oth standardized and ustandardized versions are given

The criterion jS fa b c dgj divides P N into levels see the upp er picture

of Figure The corresp onding class of level equivalence has imsets while the

corresp onding class of p ermutable equivalence has just one level equivalence class

Thus the sup ertype involves imsets

Comp osed criterion jS fa b cgj jS fdgj divides P N into levels see the

lower picture of Figure The corresp onding class of level equivalence has

imsets the class of p ermutable equivalence has classes of level equivalence and

the sup ertype involves imsets An example of an imset of this kind is m in the

righthand picture of Figure

Comp osed criterion jS fa bgj jS fc dgj divides P N into levels see Figure

The corresp onding class of level equivalence has imsets while the corresp ond

ing class of p ermutable equivalence has classes of level equivalence The sup ertype

involves imsets an example is the imset m from Figure

y

Endeavour describ ed in Section can b e summarized as follows

Theme Can classication of sup ertypes of skeletal imsets by cardinal criteria b e ex

tended to a general case

Moreover in case of succesful solving of Themes and the following op en problem

may app ear to b e interesting

Theme Find out whether an standardized sup ermo dular imset pro ducing a structural

mo del M over N which is minimal with resp ect to dual strong facial implication see p

is uniquely determined If yes is the resp ective dual baricentral imset p its

multiple

Op erations with structural mo dels

This section is an overview of basic op erations with structural mo dels It is shown how

they can b e realized with help of op erations with imsets either with sup ermo dular or

with structural ones

Reductive op erations

These op erations assign a mo del over T T N to a structural mo del over N

Contraction

Supp ose M T N T N and X N n T The mo del

M fhA B jC i T T hA B jCX i M g

T jX

will b e called the contraction of M to T conditioned by X

Observation If M U N then M U T

T jX

jS fa bgj

Q

fa b c dg

Q

Q

A

Q

A

Q

Q

A

Q A

fa b cg fa b dg fa c dg fb c dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fa bg fa cg fa dg fb cg fb dg fc dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fdg

Q

A

Q

A Q

Q

A

Q

A

Q

Q

jS fa b cgj

Q

fa b c dg

Q

Q

A

Q

A

Q

Q

A

Q A

fa b cg fa b dg fa c dg fb c dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fa bg fa cg fa dg fb cg fb dg fc dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fdg

Q

A

Q

A Q

Q

A

Q

A

Q

Q

Figure Cardinality criteria and resp ective levels for N fa b c dg

jS fa b c dgj

Q

fa b c dg

Q

Q

A

Q

A

Q

Q

A

Q A

fa b cg fa b dg fa c dg fb c dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fa bg fa cg fa dg fb cg fb dg fc dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fdg

Q

A

Q

A Q

Q

A

Q

A

Q

Q

jS fa b cgj jS fdgj

Q

fa b c dg

Q

Q

A

Q

A

Q

Q

A

Q A

fa b cg fa b dg fa c dg fb c dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fa bg fa cg fa dg fb cg fb dg fc dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fdg

Q

A

Q

A Q

Q

A

Q

A

Q

Q

Figure Further cardinality criteria and resp ective levels for N fa b c dg

jS fa bgj jS fc dgj

Q

fa b c dg

Q

Q

A

Q

A

Q

Q

A

Q A

fa b cg fa b dg fa c dg fb c dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fa bg fa cg fa dg fb cg fb dg fc dg

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

P P P

P P P

Q

Q

P P P

fag fbg fcg fdg

Q

A

Q

Q A

Q

A

Q

A

Q

Q

Figure The last cardinality criterion and resp ective levels for N fa b c dg

P N P T

Pro of Given m R dene m R by the formula

T jX

m S mS X for S T

T jX

and observe hm u i hm u i for every hA B jC i T T By Observation

T jX hAB jC i hAB jCX i

m

T jX

derive that m K N implies m K T The equality ab ove also implies hA B jC i M

T jX

m

i hA B jCX i M

Thus conditioned contraction corresp onds to linear op eration m m with pro

T jX

ducing sup ermo dular functions imsets

Remark Note that the mo del M given by was named minor of a semi

T jX

graphoid M over N in while the term contraction was conned to the case X N n T

there Moreover the op eration applied to various graphical mo dels was systemati

cally treated in and under name marginalizing and conditioning My terminology

is a compromise which reects the idea that the op eration is simultaneously con

traction and conditioning and ts b est other names of op erations with structural imsets

mentioned b elow

Restriction

Recall that the restriction M of M T N to T N was already introduced on

T

p as M T T Of course M is nothing but contraction M conditioned by the

T T j

empty set Hence Observation implies this

Consequence If M U N and T N then M U T

T

Note that it was shown in the pro of of Observation that the restriction of M

m m

T

corresp onds to the restriction of pro ducing sup ermo dular function ie M M

T

where

m S mS for every S T and m K N

T

On the other hand an analogous statement for inducing structural imsets do es not hold

as the following example shows

Example There is no linear op eration with structural imsets which corresp onds to

the restriction of induced structural mo dels Put N fa b cg and T fa bg Take

v u and w u By Lemma M M T T On the other

hacji habjci v T w T

hand v w u which means M T T Supp osing there exists a linear

uv T

habcji

mapping

u S N u S T

T

such that M M observe that that v w use Lemma and Observation

u T u T T

T

to conclude that the only structural imset over T inducing T T is the zero imset

By linearity of the mapping derive v w which contradicts M T T

T v w T

The preceding example motivates the next op en problem

Theme Let u b e a structural imset over N and T N Is there any direct formula

for a structural imset inducing M is terms of u

u T

One can distinguish two versions of the problem First one can b e interested in an

algebraic formula which provides at least one structural imset over T inducing M for

u T

any u S N Second one may wish to have an expression for the baricentral imset of

M on basis of the baricentral imset of M U N Nevertheless b oth desired formulas

T

must b e nonlinear as demonstrated by Example

Remark Restriction of every CI mo del induced by a probability measure P over N

T

to T N is a CI mo del over T induced by the resp ective marginal P In fact

conditioned contraction M of a CI mo del M T N induced by a discrete measure

T jX

over N where X N n T is a CI mo del induced by a discrete measure over T

Other sp ecial op erations

Given M T N and T N by the ful lconditioned contraction is understo o d the

mo del

M f hA B jC i T T X N n T hA B jCX i M g

T j

T

Clearly M M and Observation implies with help of Observation

T j T jX

X N nT

the following fact

Consequence If M U N and T N then M U T

T j

The consideration ab ove also implies that fullconditioned contraction corresp onds to

the following linear op eration with pro ducing sup ermo dular functions

X

P N

m R m S mK for every S T

T j

K K T S

Remark The restriction of a structural imset u over N to P T where T N

need not b e a structural imset For example consider N fa b cg T fa bg and

u u Nevertheless it is a structural imset if u vanishes outside P T This fact can

hbcjai

b e veried using Lemma and Observation with help of an observation that every

sup ermo dular function over T can b e extended to a sup ermo dular function over N see

the mapping dened b elow

Nevertheless the same linear mapping can b e interpreted as a mapping assigning a

structural imset over T to a structural imset over N named contraction

X

P N

u R u S uK for S T

T

K K T S

Observation If u S N T N then u S T Moreover

T

M f hA T B T jC T i hA B jC i M g M M

u T u

T

[T ]

Pro of The rst fact follows from linearity of the mapping and the formula fu g

hAB jC i T

u for any hA B jC i T N If hA B jC i M then k u u S N for

u

hAT B T jC T i hAB jC i

some k N and k u fu g S T says hA T B T jC T i M

u

T hAB jC i T

[T ]

On the other hand the inclusion can b e strict as the following example shows

Example There exists a structural imset u over N and T N with M

T

M Put N fa b c dg T fa b cg and u u u Then u u

T habji habjcdi T habcji

is not a semigraphoid as ha bji ha bjci M In fact M but ha bcji M

T T T

This motivates the next hypothesis

Theme Let M M for u S N and T N Is it true that M M

u T u

[T ]

where u is given by coincides with the structural closure see p of M

T

T

fhA T B T jC T i hA B jC i Mg Find out whether M is a CI mo del induced

T

by a discrete probability measure over T provided that M is a CI mo del induced by a

discrete probability measure over N

Expansive op erations

These op erations assign a mo del over N to a structural mo del over T T N Main

attention is devoted to extensions that is op erations which assign a mo del over N to

M U T whose restriction is again M These expansive op erations are pinp ointed

Another type of expansive op eration is a lift which can b e viewed as a counterpart of

conditioned contraction

Solid extension

Given T N M T T the mo del so M N over N given by

so M N f hA B jC i T N hA T B T jC T i M g

will b e called the solid extension of M to N

Observation If M U T then so M N U N and so M N M

T

The pro of is based on a sp ecial linear extensive op eration which assigns an extension m

to every sup ermo dular r over T

r K T mS r S T for S N

Pro of Observe that the mapping from is linear and hm u i hr u i

hAB jC i hAT B T jC T i

for every hA B jC i T N Hence m K N by Observation Supp osing M U T by

r m

there exists r K T with M M Observe that so M N M

Remark Note that the solid extension of a CI mo del M induced by a probability

measure over T is a CI mo del again the resp ective probability measure P over N has

Q

the form P Q P where Q induces M and P are probability measures on

i i

iN nT

arbitrary measurable spaces X X i N n T Moreover the solid extension is a maximal

i i

extension in sense that so M N M U N M M implies M so M N

T

Indeed it suces to verify that hA B jC i M hA T B T jC T i M

since hA C n T j C T i so M N M and M is a semigraphoid by contraction

prop erty derive hA B C n T j C T i M and hence by weak union and symmetry

hA T B T j C T i M On the other hand the solid extension need not b e unique

maximal extension For example consider N fa b cg T fa bg and M T T

Then M M with u u u u is another maximal M U N with

u

habjci hacjbi hbcjai

M M

T

Lift

Given T N X N n T and M T T the mo del

li M N X T N f hA B jCX i hA B jC i M g

will b e called the lift of M to N conditioned by X Basic observation is that the op eration

P T P N

to R which assigns a of lift corresp onds to the following linear mapping from R

structural imset v N X over N to a structural imset v over T

v S T if S n T X

P T

for any S N v R v N X S

if S n T X

Lemma Supp ose that T N X N n T The mapping given by is

a linear mapping such that v N X S N whenever v S T Moreover it holds

li M N X M In particular li M N X U N whenever M U T and

v

v N X

one has fli M N X g M

T jX

Pro of Observe that N X for D T which implies by linearity of

D DX

u N X u for every hA B jC i T T This gives the rst two statements

hAB jC i hAB jCX i

of the lemma As M is a semigraphoid it contains T N If hA B jC i M then

v

v N X

k v u S T for some k N and by linearity k v N X u S N

hAB jC i hAB jCX i

which means hA B jCX i M The converse inclusion M li M N X

v

v N X v N X

can b e shown in three steps

hA B jD i T N n T N fX D g hA B jD i M

v N X

D D

Indeed use Lemma hm u i while hm v N X i as X n S for

hAB jD i

every S D

hA B jD i T N n T N fAB D TX g hA B jD i M

v N X

AB D AB D

Indeed again use Lemma hm u i while hm v N X i as

hAB jD i

S n T X for every S AB D

hA B jC i T T n M hA B jCX i M

v

v N X

By Lemma a sup ermo dular function r over T with hr u i and hr v i

hAB jC i

exists Let m b e its extension given by It is a sup ermo dular function over N

hm u i hr u i and hm v N X i hr v i

hAB jCX i hAB jC i

Thus if hA B jD i M n T N then X D and AB D TX by and Therefore

v N X

hA B jD i hA B jCX i where hA B jC i T T and hA B jC i M by The next

v

statement then follows from the equality fli M N X g M is trivial

T jX

Ascetic extension

Given T N M T T the mo del as M N over N given by

as M N T N M

is called the ascetic extension of M to N It is nothing but the lift li M N X with

X Lemma therefore implies this

Consequence If M U T then as M N U N and as M N M

T

It follows directly from that the ascetic extension is the least extension of M

U T in sense that M U N M M implies as M N M Let me remind

T

that the pro of of Lemma implies that the ascetic extension is realized by means of a

linear op eration with inducing structural imsets namely by means of the mapping

v S if S T

P T

v R uS for every S N

otherwise

Note that one can show with help of Lemma that the ascetic extension of a CI mo del

induced by nondegenerate discrete measure over T is a CI mo del over N

Cellular extension

Given T N M T T the mo del ce M N over N given by

ce M N T N f hA B jCX i hA B jC i M X N n T g

S

is called the cellular extension of M to N In fact ce M N li M N X

X N nT

Basic observation is that the cellular extension corresp onds to a linear mapping from

P N

R which assigns a structural imset u over N to a structural imset v over T

P T

v R uS v S T for S N

Lemma Supp ose M T T and T N The mapping given by is a

linear mapping which assigns u S N to v S T Moreover M ce M N In

u v

particular ce M N U N whenever M U T and ce M N M

T

P

Pro of It follows directly from and that u v N X Thus u

X N nT

S N by Lemma and M M by Lemma for every X N n T The

v N X u

converse inclusion M ce M N can b e shown in two steps

u v

hU V jW i T N U n T V hU V jW i M

u

Indeed choose i U n T j V and use Lemma hm u i and hm ui

hUV jW i

P

ij

hm v N X i for m m To verify the last equality use Observation

X N nT

m

and show M M one has hm u i for every hA B jC i M and

v

v N X hAB jCX i

X N n T according to Lemma This is clear whenever i CX in case i CX the

assumption i T AB implies i AB C X

hU V jW i T N U V T hU V jW T i M hU V jW i M

v u

Indeed by Lemma nd a sup ermo dular function r over T with hr u i and

hUV jW T i

hr v i Dene a sup ermo dular function m over N by and observe hm u i

hUV jW i

P P P

hr u i with hm ui mKX uKX hr v i

hUV jW T i

X N nT K T X N nT

This implies hU V jW i M by Lemma

u

Thus if hU V jW i M n T N then U V T by and hU V jW T i M by To

u v

derive further statement use the last formula is trivial

Remark This is to explain my reasons for the chosen terminology The reader can

observe on basis of and that

as M N ce M N so M N for every M T T

The inclusions may b e strict as Example b elow shows The fact that as M N is the

least extension of M motivated the adjective ascetic and the fact that so M N is one of

maximal extensions of M see Remark motivated the adjective solid The adjective

cellular was motivated by the fact that ce M N is comp osed of several dierent cells

namely li M N X for X N n T

Example There exists M U N T N whose ascetic cellular and solid

extensions dier Put N fa b cg T fa bg and M M where u u Then

u habji

ha bjci ce M N n as M N and hc abji so M N n ce M N

Accumulative op erations

The aim to represent big structural mo dels eectively in memory of a computer moti

vates the need for suitable denition of decomp osition of a structural mo del into less

dimensional mo dels

Lo calization of UG mo dels

The following intuitive consideration do es not pretend preciseness it serves as a motiva

tion account only some results cited b elow can b e even misinterpreted Final goal is

to achieve an analogue of the result from saying that every UG mo del has canonical

decomposition into prime UG submo dels In fact wellknown concept of decomposition of

undirected graphs from is b ehind this approach Recall that if G is an undirected graph

over N and hA B jC i T N n T N such that C a complete set in G and A B j C G

p then the pairs of undirected graphs G G is called a proper decomposition

AC BC

of G The graphs G resp G can have p ossibly further prop er decomp ositions which

AC BC

means that every UG mo del M can b e gradually decomp osed into UG submo dels with

G

no prop er decomp osition which can b e named prime UG submo dels of M In my view

G

the result of can b e paraphrased as follows for every undirected graph G over N a

unique triangulated sup ergraph H over N see p exists whose cliques corresp ond to

maximal subsets of N dening prime UG submo dels of M Therefore various com

G

putational op erations with Markovian measures with resp ect to G can b e done lo cally

within prime UG submo dels I b elieve that wellknown metho d of local computation ap

plied mainly to DAG mo dels has analogous source of justication Therefore I hop e

that these ideas can b e extended to more general structural mo dels Suitable concept of

decomp osition of a structural mo del based on an accumulative op eration with structural

mo dels is needed One of p ossible prop osals is mentioned b elow

Comp osition

Given hA B jC i T N and M U AC M U BC such that M M by the

C C

composition of M and M will b e understo o d the structural mo del M M over AB C

given by

M M cl as M AB C as M AB C fhA B jC ig

U AB C

In words b oth M and M are embedded into U AB C by the ascetic extension then

hA B jC i is added and the structural closure op eration see p is applied It follows

from the denition that M M U AB C Natural question related to the concept

of conditional product from is the following one

Theme Supp ose hA B jC i T N M U AC M U BC with M M

C C

Is it true that M M M and M M M Can the domain of

AC BC

the op eration dened by b e restricted suitably so that the axioms of conditional

pro duct from are fullled for it then

Structural decomp osition

Let M b e a structural mo del over N and U V N such that UV N One says that M

decomp oses into M and M or that M M forms a structural decomposition of M if

U V U V

M M M The decomp osition is proper if U n V V n U A structural mo del

U V

will b e called indecomposable if it has no prop er structural decomp osition Note that

M M which means that the comp osition M M is always dened

U U V V U V U V

Clearly a necessary condition for the existence of structural decomp osition M M is

U V

U n V V n U j U V M see p In particular M T N is an indecomp osable

mo del As M resp M can b e again decomp osed one can obtain gradually a ful l

U V

decomposition of M into indecomp osable mo dels Unfortunately the hypothesis that

every structural mo del has unique full decomp osition of this type into prime comp onents

is false Indeed consider the mo del from Example one has M M M

abc abd

M M and every M with S N jS j is indecomp osable

acd bcd S

Theme Let G b e an undirected graph over N and hA B jC i T N n T N denes

a prop er decomp osition of G Is M M a structural decomp osition of M

G AC G BC G

Has any CG mo del see p unique minimal canonical decomp osition into maximal

indecomp osable submo dels

It may b e the case that uniqueness of canonical decomp osition cannot b e achieved

even under p ossible additional standardization requirements This would conrm that

the concept of structural decomp osition is not suitable for the purp ose mentioned earlier

p Then one should lo ok for another type of decomp osition based on another

accumulative op eration with structural imsets which generalizes decomp osition of UG

mo dels

Direction Develop an analogue of the metho d of lo cal computation for structural

mo dels based on conveniently dened concept of decomp osition of structural mo dels

Find sucient conditions for decomp osition of this type which can b e veried by statistical

tests or on basis of exp ert knowledge Develop an analogy of Shenoys pictorial metho d

of valuation networks for lo cal representation of structural imsets and Markovian

measures in memory of a computer

Wellknown results ab out factorization of maximumlikelihoo d estimators

should b e generalized then as well

Implementation tasks

These op en problems are motivated by the task to implement facial implication on a

computer The most imp ortant question is probably the next one

Question Is every structural imset u over N already a combinatorial imset over N

see p

If the answer to Question is negative then the following two problems b ecome topics

of immediate interest

Theme Given a nite nonempty set of variables N nd the least nite class H N

of structural imset such that

X

k v for some k Z u S N u

v v

v HN

Recall that the existence of the class H N named minimal integral Hilb ert basis of

conE N follows from Theorem in One has E N H N i S N C N

Theme Given a nite nonempty set of variables N determine the least n N such

that an imset over N is structural i its multiple n u is a combinatorial imset ie

P N

u Z u S N n u C N

Determine the least n N satisfying

P N

u Z u S N n N n n n u C N

Find out how the values n and n dep end on jN j

Note that n n and I am not able to decide whether the inequality is strict Indeed

n n S N C N Further imp ortant question concerns the skeleton

Question Let K N b e the skeleton over N see p and E N the class of

elementary imsets over N p Is the equality

min f hm ui u E N hm ui g

fullled for every m K N

Note that the condition from Question implies that graN gra N see p

The following problem b ecomes relevant in case b oth Question and Question have

negative answers

Theme How do es dep end the value of the least l N satisfying the condition

u S N v E N u v l u v S N

dep end on jN j Recall that facial implication is dened on p Can one determine

graN directly without nding the skeleton ie without solving Theme

Recall that if either Question or Question has p ositive answer then the least l N

satisfying is graN

Theme Is there the least l N such that

u S N v E N u v l u v C N

How do es l dep end on jN j then If there is no l N of this kind nd out for a given

u S N how the class of k N satisfying

v E N u v k u v C N

lo oks Is there a structural imset u S N such that the condition from Remark

is not fullled that is n u C N or k n u C N for every k n N

Theme Given a nite nonempty set of variables N is there l N such that

y

u S N v semielementary imset over N u v l u v S N

y

resp ectively l N such that

yy

u S N v semielementary imset over N u v l u v C N

yy

How do es l resp ectively l dep end on jN j then

y yy

The following op en problem also concerns facial implication

Theme Is there any metho d of testing facial implication which combines direct and

skeletal criteria see Lemma on p and Lemma on p and which is more

suitable for ecient implementation on a computer

The ab ove formulation is partially vague let me sp ecify what I have in mind in more

details Direct criterion of facial implication u v consists in testing whether k u v

C N for some k N This can b e tested recursively as mentioned in Remark

However plenty of transient imset obtained during decomp osition pro cedure are not

combinatorial imsets This can b e often recognized immediately by means of Theorem

which is b ehind the skeletal criterion of and save sup eruous steps of the recursive

decomp osition pro cedure The observation that a transient imset v is not combinatorial

can b e made on basis of the fact that hm v i for a standard sup ermo dular imset m

A A

over N for example the imset m resp m for A N p or m for l jN j

l

p The p oint is that one need not to have the whole skeleton at disp osal In fact

Remark is based just on observations of this type

Memory demands

Another imp ortant problem is what are memory demands for representing a structural

imset in memory of a computer Informally by actual dimension of the class of structural

mo dels U N is understo o d the minimal number of binary attributes of elements of U N

which can distinguish b etween every pair of distinct structural mo dels

Observation The following inequality holds

d ln jU N j e actual dimension of U N min f jE N j jK N j g

Pro of If s binary attributes distinguish b etween elements of U N then s bites is enough to

s

represent all elements of U N Hence jU N j gives the lower estimate The fact that

elementary resp ectively skeletal imsets dierentiate b etween structural mo dels follows from

Lemma resp Consequence

For example the actual dimension of U N is in case jN j as jU N j and

jE N j jK N j while in case jN j one has jU N j and E N K N

If jN j then jU N j gives lower estimate d l n jU N j e which is precise since

jK N j jE N j In case jN j one has jU N j

jE N j and jK N j Thus Observation implies

Consequence If jN j then the actual dimension of U N is b etween and

Note that the inequality jU N j in case jN j means that one can p erhaps

construct awkward articial attributes which dierentiate b etween the elements of

U N and p erhaps they have even the form of functions of elementary characteris

tics However I am interested in those characteristics or attributes which have reasonable

interpretation and can b e generalized in sense that their generalization achieves the ac

tual dimension of U N for jN j In fact I am interested in solution of the following

vaguely dened problem

Theme Given a nonempty nite set of variables N what is the least cardinality of a

set of interpretable binary attributes which dierentiate b etween structural mo dels over

N How do es it dep end on jN j

Interpretation and learning tasks

Op en problems lo osely motivated by practical questions of interpretation and learning

from Section are gathered b elow

Meaningful description of structural mo dels

The following two op en problems are motivated by the concept of standard imset for an

acyclic directed graph from Section

Question Let G b e an acyclic directed graph over N p Is it true that the

standard imset for G see p is the only imset from the class of combinatorial imsets

inducing M which is simultaneously an imset of the least degree p and an imset

G

with the least lower class p

Natural question is whether the concept of standard imset for a DAG mo del can b e

generalized

Theme Is there any consistent principle of unique choice of representatives of classes

of facial equivalence see p such that for every acyclic directed graph G over N the

standard imset u is chosen from the class fu S N M M g

G u G

The ab ove description is somewhat vague let me sp ecify more detailed hypotheses

Supp ose M U N fv S N M M g and U U for some v it do es

v v

not dep end on v see Lemma on p Let us put

L fL U L is a minimal lower class L for v g

v

where minimality is understo o d with resp ect to inclusion L is dened on p Re

v

sp ective hypotheses are that u with L L exists for every M U N and that

u

a combinatorial imset with the least degree among u C N satisfying L L is

u

determined uniquely If they were true then u is an imset of this kind for M M

G G

where G is an acyclic directed graph over N by Lemma on p and Consequence

on p The hypotheses can b e mo died by considering the class

L fL U L is a minimal determining class for Mg

or the class

fL U L is a minimal unimarginal class for Mg L

see p in place of L Further op en problem is motivated by Section

Direction Lo ok for necessary conditions for facial equivalence of structural imsets

formulated in terms of invariants of facial equivalence which are easy to verify and oer

clear interpretation The aim is to nd a set of these conditions which is able to distinguish

every pair of structural imsets which are not equivalent

Desired complete set of interpretable invariants could then b ecome a basis of alterna

tive way of description of structural mo dels which is suitable from the p oint of view of

interpretation

Distribution frameworks and learning

Below mentioned problems concern more or less the distribution framework In my view

they are also related to general task of learning structural mo dels see Section p

Theme Let b e a class of probability measures over N satisfying the conditions

and from Section Let u denote the class of Markovian measures

with resp ect to u S N given by on p and S N the class of representable

structural imsets over N p Is the condition

u v S N u v u v

fullled then Do es it hold under additional assumptions on

Question Let M b e a structural mo del over N U U is the upp er class of u

u

S N with M M and D U is an unimarginal class for M see Section p

u

Is then D necessarily a determining class for M

The ab ove question can also b e formulated relative to a distribution framework see

Remark on p

Theme Let b e classes of probability measures satisfying and M b e a

structural mo del over N May it happ en that minimal unimarginal classes for M relative

to and dier More sp ecically I am interested in the class of discrete measures

p in place of and the class of nondegenerate Gaussian measures in place of

p

The last two op en problems are closely related to mathematical statistics The rst

one is the parametrization problem

Direction Find out for which structural imsets u over N and for which classes of

Q

probability measures with prescrib ed sample space X X X X a suitable

N N i i

iN

parametrization of the class of Markovian measures u with resp ect to u exists

Note that I am interested in parametrization by means of indep endent parameters ie

situations in which elements of u are in onetoone corresp ondence with parameters

n

b elonging to a ndimensional interval for some n N Preferable parametriza

tions are those in which parameters corresp ond to lessdimensional marginal measures

Typical example is the parametrization of nondegenerate Gaussian measures which are

Markovian with resp ect to an acyclic directed graph

Direction Prop ose metho ds of learning structural mo dels on basis of data b oth sta

tistical testing and estimation Develop metho ds for statistical estimation of Markovian

measures with resp ect to a given structural imset within a given distribution framework

ie tting pro cedures

I think that the most suitable metho dological approach to statistical learning of

structural mo dels is to introduce suitable distance on the set of probability measures

b elonging to the considered distribution framework with a xed sample space Then

one can compute the distance of the empirical measure computed from data and the

set of Markovian measures with resp ect to prosp ective structural imset I hop e that the

equivalence result from Section see p namely the pro duct formula p may

serve as a basis for some iterative tting pro cedures

Chapter

Conclusions

The aim of this chapter is to summarize the metho ds of description of probabilistic

conditional indep endence CI structures mentioned in this work

Chapter is an overview of graphical metho ds of description of CI structures These

metho ds are suitable from the p oint of view of interpretation and some of them are go o d

from the p oint of view of implementation on a computer However they are not complete

in sense that they are not able to describ e all discrete probabilistic CI structures see

Section Omission of this theoretical requirement may result in serious metho dological

errors in practical learning pro cedures see Section This fact motivated an eort to

develop a nongraphical metho d of description of probabilistic CI structures by ob jects of

discrete mathematics which complies with the requirement of completeness

The metho d of description of probabilistic CI structures by means of structural im

sets describ ed in Chapter and in subsequent chapters meets the ab ove requirement of

completeness for a quite wide class of distributions namely for probability measures with

nite multiinformation Theorem The class of measures with nite multiinformation

involves three basic classes of distributions used in practice in graphical mo delling of CI

structures namely the class of discrete measures the class of nondegenerate Gaussian

measures and the class of nondegenerate CG measures see Section

Theorem gives three equivalent denitions of a Markovian probability measure

P with resp ect to a structural imset u The standard denition requires that every CI

statement represented in u is indeed valid CI statement with resp ect to P see Section

The second equivalent denition is the requirement that P satises the product

formula induced by u see Section The pro duct formula which needs an auxiliary

concept of a reference system of dominating measures is p erhaps imp ortant from the

p oint of view of interpretation of CI structures induced by structural imsets First

it generalizes the wellknown pro duct formula for decomp osable mo dels see on p

this happ ens when one takes in place of u the standard imset for the resp ective

decomp osable graph see Section Remark Second it can p erhaps b e viewed

as a lo ose analogue of formulas dening loglinear mo dels Third the pro duct formula

also illustrates how the uniqueness principle for Markovian measures with resp ect to a

structural imset works The principle formulated in Consequence says that the

marginals of a Markovian measure P with resp ect to a structural imset u for the lower

class L determine uniquely the marginals for the upp er class U Indeed in case of the

u u

standard imset induced by an acyclic directed graph see Section the upp er class

corresp onds to the collection of all marginals including the measure P itself and the

lower class describ es the least p ossible collection of marginals determining uniquelly

the measure P see Consequence in Section The third equivalent denition of

Markovian measure is the requirement that the scalar pro duct of the multiinformation

function m with u vanishes see Section This equivalent denition seems to b e

P

imp ortant from the p oint of view of computer implementation and maybe from the p oint

of view of learning Indeed p erhaps it can serve as a basis of a future learning metho d

which can determine the most suitable structural imset on basis of a statistical estimate

of the multiinformation function However the main signicance is in bringing the p oint

of view of algebra This may facilitate computer implementation of the metho d on basis

of arithmetic of integers

The algebraic p oint of view is emphasized in Chapter The multiinformation function

is known to b e an standardized sup ermo dular function Consequence and the cone

K N of standardized sup ermo dular functions plays an imp ortant role in the presented

approach More precisely K N is a p ointed rational p olyhedral cone which implies

that it has nitely many extreme rays and every extreme ray contains just one non

zero normalized standardized sup ermo dular imset Lemma named skeletal imset

Finite collection K N of skeletal imsets allows one to characterize dually structural

imsets as ostandardized imsets with nonnegative scalar pro ducts with skeletal imsets

N is the least collection of normalized standardized imsets Theorem in fact K

of this sort

Every sup ermo dular function denes a certain formal CI structure see Section

and an imp ortant fact is that the class of CI structures induced by structural imsets coin

cides with the class of CI structures which can b e describ ed by sup ermo dular functions

The relation o d these two dierent but equivalent metho ds of description of CI struc

tures is characterized in Section as a relation of duality This is done with help of an

algebraic concept of Galois connection interpreted in light of the theory of formal concept

analysis The lattice of CI structures induced by structural imsets or equivalently

those which can b e describ ed by sup emo dular functions is shown to b e a nite concept

lattice which is b oth atomistic and coatomistic Theorem Its atoms corresp ond to

known elementary structural imsets while its coatoms corresp ond to skeletal imsets

Chapter deals mainly with implementation of facial implication which corresp onds

to the inclusion of induced CI structures Two characterization of facial implication u v

b etween structural imsets u v are given The rst one Lemma characterizes it as

a direct arithmetical relation of u and v namely that there exists l N such that

l u v is a structural imset resp a combinatorial imset The second one Lemma

characterizes it with help of skeletal imsets resp ectively sup ermo dular functions

namely by the requirement that u has nonzero scalar pro duct with an skeletal imset a

sup ermo dular function whenever v do es so The skeletal characterization then leads to

a characterization of facially equivalent structural imsets Consequence

To transform the task of computer implementation of facial implication into a standard

task of integer programming two observations are needed The rst observation Conse

quence is that in testing u v for a structural imset u and an elementary imset v

the number l N such that l u v is a structural imset is limited by a constant The

constant dep ends on the cardinality of the set of variables N only and it has the value

in case jN j and in case jN j A go o d candidate for the the least constant of this

type is the grade see Section the least constant of this kind for a combinatorial

imset u and an elementary imset v is found in Lemma The second observation is that

in testing whether a given imset u eg l u v ab ove is a structural imset the co ecient

n N such that n u is a combinatorial imset ie a sum of known elementary imsets is

also limited by a constant dep ending on the cardinality of N Lemma The constant

is in case jN j there is a hop e that it is in general see Question

Further results of Chapter allow to adapt the describ ed metho d of description of CI

structures to a particular distribution framework that is a class of probability measures

over N satisfying certain basic conditions see Section

Chapter concerns the problem of choice of a representative of a class of facially

equivalent structural imsets A go o d solution from the p oint of view of computer im

plementation seems to b e the baricentral imset Section Indeed facial implication

b etween baricentral imsets is very simple one has u v for these imsets i u v is a

combinatorial imset Observation From the p oint of view of interpretation standard

imsets for acyclic directed graphs Section and for triangulated undirected graphs

Section seems to b e suitable First they provide a simple translation of classic

graphical mo dels into the framework of structural imsets Second they oer an alter

native nongraphical metho d of testing Markov equivalence for acyclic directed graphs

Consequence Finally they are exclusive for two theoretical reasons standard im

sets for acyclic directed graphs are combinatorial imsets of the least degree Lemma

and imsets with the least lower class Consequence

Chapter is an overview of op en problems The most imp ortant problems from the

p oint of view of computer implementation of the metho d seem to b e the question whether

structural imsets indeed coincide with combinatorial imsets Question and the task to

characterize skeletal imsets Theme Note that skeletal imsets are known in case

jN j but not in general

Well the overall goal of the work was to present the metho d of description of prob

abilistic CI structures by means of structural imsets This involves motivation the de

scription of present state and an outlo ok represented by the list of op en problems This

nongraphical metho d removes inevitable limitation of graphical approaches and promises

a chance of computer implementation by transforming the problem into standard tasks

of integer programming The work gives a theoretical solution practical implemetation

requires further research However in case of successful solving the task of computer

implemetation the metho d can nd wide application b oth in area of articial intelligence

and in area of multivariate statistics

Chapter

App endix

University graduates in mathematics should b e familiar with the ma jority of the concepts and

facts gathered in this chapter However certain misunderstanding can o ccur in their exact

meaning and moreover graduates in other elds eg computer science statistics may not b e

familiar with all basic facts Thus to avoid misunderstanding and to facilitate reading I decided

to recall them here Just to provide the reader with a reference source for wellknown facts

which can b e easily utilized with help of the Index

Classes of sets

By a singleton is understo o d a set containing one element only the symbol is reserved for the

empty set The symbol S T also T S denotes that S is a subset of T alternatively T

is a superset of S which involves the situation S T However strict inclusion is denoted as

follows S T or T S means that S T but S T The of a nonempty set X is

S

the class of all its subsets f T T X g denoted by P X The symbol D denotes the union

T

of a class D P X the symbol D the intersection of a class D P X Supp osing N is

a nonempty nite set of variables a class D P N is called ascending if it is closed under

sup ersets ie

S T N S D S T T D

Given D P N the induced ascending class denoted by D is the least ascending class

containing D ie

D f T N S D S T g

Analogously a class D P N is called descending if it is closed under subsets ie

S T N S D T S T D

and given D P N the induced descending class D consists of subsets of sets in D ie

D f T N S D T S g

A set S D where D P N is called a maximal set of D if T D S T S T

max

max max max

the class of maximal sets of D is denoted by D Clearly D D and D D

min

Dually a set S D is called a minimal set of D if T D T S S T and D denotes

the class of minimal sets of D

By a permutation of a nite nonempty set N will b e understo o d a onetoone mapping

N N It can b e also viewed as a mapping on the p ower set P N which assigns S

f x x S g to every S N Then given a real function m P N R the juxtap osition m

will denote the composition of m and dened by S m S for S N

Posets and lattices

Partially ordered set L shortly a poset is a nonempty set L endowed with a partial ordering

that is a on L which is

i reexive x L x x

ii antisymmetric x y L x y y x x y

iii transitive x y z L x y y z x z

The phrase total ordering is used if moreover x y L either x y or y x Given x y L

one writes x y for x y and x y If x y and there is no z L such that x z and

z y then x is called a lower neighbour of y and y is an upper neighbour of x Given M L an

element x M is a minimal element of M with resp ect to if there is no z M with z x

y M is a maximal element of M with resp ect to if there is no z M with z x

Given M L the supremum of M in L denoted by sup M and alternatively called the

least upper bound of M is an element of y L such that z y for every z M but y y for

each y L with z y for every z M Owing to antisymmetry of the supremum of M is

determined uniquely if it exists Given x y L their join denoted by x y is the supremum of

the set fxg fy g A p oset in which every pair of elements has a join is called join semilattice

Analogously the inmum of M L denoted by inf M called also the greatest lower bound

of M is an element of x L such that x z for every z M but x x for each x L with

x z for every z M The meet of elements x y L denoted by x y is the inmum of the

set fxg fy g Lattice is a p oset L such that for every x y L there exists b oth supremum

x y and inmum x y in L A lattice is distributive if for every x y z L

x y z x y x z and x y z x y x z

Typical example of a distributive lattice is a ring of subsets of a nite nonempty set N that

is a collection R P N which is closed under nite intersection and union In particular

P N ordered by inclusion is a distributive lattice

Complete lattice is a p oset L such that every subset M L has the supremum and

inmum in L Note that it suces to show that every M L has the inmum Any nite

lattice is an example of a complete lattice By the nul l element of a complete lattice L is

undesto o d the least element of L that is x L such that x z for every z L it is nothing

but the supremum of the empty set in L By the unit element is understo o d the greatest elements

of L that is y L such that z y for every z L An element x of a complete lattice is

joinirreducible if x sup fz L z x g and meetirreducible if x inf fz L x z g

An element of a nite lattice is joinirreducible i it has exactly one lower neighbour and it is

meetirreducible i it has exactly one upp er neighbour The set of joinirreducible elements in a

nite lattice L is the least set M L which is supremumdense which means that for every

x L there exists M M such that x sup M Analogously the set of meetirrreducible

elements in L is the least set M L which is inmumdense ie for every y L there exists

M M with y inf M Standard example of a joinirreducible element in a complete lattice is

an atom of L which is an upp er neighbour of the null element of L By coatom of L is undesto o d

a lower neighbour of the unit element of L A complete lattice L is atomistic if the set of its

atoms is supremum dense in L equivalently if the only joinirreducible elements are atoms It is

coatomistic if the set of its coatoms is inmum dense in L ie the only meetirreducible elements

are coatoms

Two p osets L and L are orderisomorphic if there exists a mapping L L

onto L such that

x y x y for every x y L

The mapping is then a onetoone mapping b etween L and L and it is called an order

isomorphism If the p oset L is a complete lattice then L is also a complete lattice

and is even complete lattice homomorphism which means that

sup M sup fz z M g inf M inf fz z M g for every M L

General example of a complete lattice can b e obtained by means of a closure operation on subsets

of a set X that is a mapping cl P X P X which is

i isotone S T X S T clS clT

ii extensive S X S clS

iii idemp otent S X clclS clS

A set S X is called closed with respect to cl if S clS Given a closure op eration cl

on subsets of X the collection K P X of closed sets with resp ect to cl is closed under set

intersection

D K D K by convention D X for D

Every collection K P X satisfying this requirement is called a closure system of subsets of

X The corresp ondence b etween cl and K is onetoone since the formula

cl S fT X S T K g for S X

K

denes a closure op eration on subsets of X having K as the collection of closed sets with resp ect

to cl see Theorem in The p oset K is then a complete lattice in which

K

sup D cl D inf D D for every D K

Every complete lattice is orderisomorphic to a lattice of this type see Prop osition in Chapter

of

Graphs

A classic graph is sp ecied by a nonempty nite set of nodes N and by a set of edges consisting

of pairs of elements taken from N Several types of edges are mentioned in this work but classic

graphs admit only two basic types of edges An undirected edge or a line over N is an unordered

pair fa bg where a b N a b that is a twoelement subset of N A directed edge or an

arrow over N is an ordered pair a b where a b N a b Pictorial representation is clear

no des are represented by small circles and edges by corresp onding links b eween them Note that

explicit requirement a b excludes any loop that is an edge connecting a no de with itself lo ops

are p ossible in some nonclassic graphs

A classic graph with mixed edges over a set of no des N is given by a set of lines L over

N and by a set of arrows A over N Supp osing G N L A is a graph of this kind one writes

a ! b in G in case fa bg L and says that there exists a line b etween a and b in G Similarly

in case a b A one say that there exists an arrow from a to b in G and writes a b in G

or b a in G Pictorial representation naturally reects notation in b oth cases

If either a ! b in G a b in G or a b in G then one simply says that a b is an edge

in G Note explicitly that this denition allows for a pair of distinct no des a b N that each

of a ! b a b and a b are simultaneously edges in G If T N then the induced

subgraph of G for T is the graph G T L A over T where L A is the set of those lines

T T T T T

arrows over T which are also in L in A A hybrid graph over N is a graph with mixed edges

G without multiple edges That means for an ordered pair of distinct no des a b a b N at

most one of three ab ove mentioned options can o ccur

A route from a no de a to a no de b or b etween no des a and b in a graph G with mixed edges is

a sequence of no des c c N n together with a sequence of edges L A

n n

p ossibly empty in case n such that a c b c and is either c ! c c c or

n i i i i i

c c for i n A route is called undirected if is c ! c for i n

i i i i i

descending if is either c ! c or c c for i n and strictly descending

i i i i i

if n and is c c for i n In particular every undirected route is a

i i i

descending route A path is a route in which all no des c c are distinct a cycle is a route

n

where n c c and c c are distinct and is not a reverse copy of in case

n n

n A directed cycle is a cycle which is a descending route and where is c c at least

i i i

once The adjectives undirected and strictly directed are used for paths as well

A no de a is a parent of a no de b in G or b is a child of a if a b in G a is an ancestor of b

in G dually b is a descendant of a if there exists a descending route equivalently a descending

path from a to b in G The set of parents of a no de b in G will b e denoted by pa b Supp osing

G

A N the symbol an A will denote the set of ancestors of the no des of A in G Analogously

G

a is a strict ancestor b is a strict descendant of a if there exist a strictly descending route from

a to b Similarly a is connected to b in G if there exists an undirected route equivalently an

undirected path b etween a and b Clearly the relation b e connected is an equivalence relation

which decomp oses N into equivalence classes named connectivity components

An undirected graph is a graph containing only lines that is A a directed graph is a

graph containing only arrows that is L The underlying graph H of a graph with mixed

edges G N L A is an undirected graph H over N such that a ! b in H i a b is an edge

in G A set A N in an undirected graph H over N is complete if a ! b for every a b A

a b a clique of H is a maximal complete subset of N

A acyclic directed graph over N is a directed graph over N without directed cycles It can

b e equivalently introduced as a directed graph G whose no des can b e ordered in a sequence

a a k such that if a a is an edge in G for i j then a a in G A chain for a

i j i j

k

hybrid graph G over N is a partition of N into ordered disjoint nonempty subsets B B

n

n called blocks such that if a b is an edge in G with a b B then a ! b and if a b is an

i

edge in G with a B b B i j then a b A chain graph is a hybrid graph which admits

i j

a chain It can b e equivalently introduced as a hybrid graph without directed cycles see

Lemma Evidently every undirected or acyclic directed graph is a chain graph

Note that various other types of edges are used in advanced graphical approaches see Section

eg bidirected edges dashed lines dashed arrows or even lo ops From purely mathematical

p oint of view these edges can b e also introduced as either ordered or unordered pairs of no des

but their meaning is dierent Thus b ecause of dierent interpretation they have to b e carefully

distinguished from the ab ove mentioned classic edges However all the concepts introduced in

Section can b e naturally extended to the graphs allowing edges of additional types

Topological concepts

Metric space X is a nonempty set X endowed with a distance which is a nonnegative real

function X X such that x y z X one has

i x y i x y

ii x y y x

iii x z x y y z

A set G X is called open in X if for every x G there exists such that the open ball

U x fy X x y g with center x and radius b elongs to G A set F X is closed if

its complement X n G is op en A metric space is separable if it has a countable dense set that is

such a set S X that x X there exists y S U x A metric space is complete

if every Cauchy sequence x x of elements of X ie a sequence satisfying n N

such that k l n x x converges to an element x X ie n N such

k l

that k n x x

k

Classic example of a separable complete metric space is an arbitrary nonempty nite set X

endowed with the discrete distance dened as follows

if x y

x y

otherwise

n

Another common example is the set of ndimensional real vectors R n endowed with the

Eucledian distance

v

u

n

X

u

t

x y x y for x x x y y y

i i n n

i

The set of real numbers R with x y jx y j is a sp ecial case

Topological space X is a nonempty set X endowed with a topology which is a class

of subsets of X closed under nite intersection arbitrary union and involving b oth the empty

set and X itself Every metric space X is an example of a top ological space b ecause the

class of op en sets in X is a top ology A top ological space of this kind is called metrizable

and its top ology is induced by the distance For instance the set of real numbers R is often

automatically understo o d as a top ological space endowed with Eucledian topology induced by

Eucledian distance The product of topological spaces X and X is the Cartesian

pro duct X X endowed with the product topology that is the class of sets G X X such

that x x G there exist G G with x x G G G The pro duct

Q

X of any nite collection X i N jN j of top ological spaces is dened

i i i i

iN

n

analogously For example R n endowed with the top ology induced by Eucledian distance

can b e viewed as the pro duct of top ological spaces X R i f ng

i

A real function f X R on a top ological space X is continuous if fx X f x r g

b elongs to for every r R

Measuretheoretical concepts

Measurable space X X is a nonempty set X endowed with a algebra X over X which is a class

of subsets of X involving X itself and closed under countable union and complement Given a class

A of subsets of X the least algebra over X containing A ie the intersection of all algebras

containing A is called the algebra generated by A and denoted by A In particular if

X is a top ological space then the algebra generated by its top ology is the Borel algebra

or the algebra of Borel sets Trivial algebra over X is the class f Xg that is the algebra

generated by an empty class Given a measurable space X X the class of all algebras S X

ordered by inclusion is a lattice Indeed for algebras S T X their supremum S T is the

algebra generated by S T while their inmum S T is simply the intersection S T The

product of measurable spaces X X and X X is the Cartesian pro duct X X endowed

with the product algebra X X which is generated by measurable rectangles that is the sets

Q Q

X of arbitrary nite X of the form A B where A X and B X The pro duct

i i

iN iN

collection of measurable spaces X X i N where jN j is dened analogously

i i

A real function f X R on a measurable space X X is measurable sometimes one

writes X measurable if fx X f x r g b elongs to X for every r R Typical example is

the indicator of a set A X dened as follows

A

if x A

x

A

if x X n A

Given a real measurable function f X R its positive part f and negative part f are

nonnegative measurable functions dened by

f x max ff x g f x max ff x g for x X

and one has f f f and jf j f f

Nonnegative measure on a measurable space X X is a function dened on X taking

values in the interval innite values are allowed which satises and is countably

additive that is the equality

X

A A

i i

i i

holds for every countable collection of pairwise disjoint sets A A in X It is a nite

measure if X and a nite measure if there exists a sequence B B of sets in X

S

B and B for every i N A trivial example of a nite measure is a such that X

i i

i

nonempty nite set X endowed with the counting measure on X P X dened by A jAj

n

for every A X Classic example of a nite measure is Lebesgue measure on R n

n

endowed with the Borel algebra B This measure can b e introduced as the only nonnegative

n n

measure on R B ascribing to every ndimensional interval its volume that is

n n

Y Y

s r whenever r s R r s i n r s

i i i i i i i i

i i

Probability measure is a measure satisfying X It is concentrated on a set B X if

B or equivalently X n B Two real measurable functions f and g on X X equal

almost everywhere if fx X f x g xg Then one writes f g ae Clearly it

is an equivalence relation

The concept of integral is understo o d in sense of Leb esgue Given a nonnegative mea

sure on X X this construction describ ed for example in Chapter assigns a value

R

f x dx from called the integral of f through A with respect to to every non

A

negative measurable function f and arbitrary A X f can b e dened on A only A real measur

R

able function f on X X is called integrable if the integral of its absolute value jf xj dx

X

R

f x dx ie a real number is then dened for every integrable is nite Finite integral

A

function f and A X Note that supp osing f is integrable and g is X measurable function on

R R

g x dx for every A X and g is integrable f x dx X one has f g ae i

A A

Q Q

in b oth cases Note that in case X X X X it is equivalent to apparently

i i

iN iN

weaker requirement that the equality of integrals holds for every measurable rectangle A only

Q Q

X which equal on X This follows from the fact that every two nite measures on

i i

iN iN

measurable rectangles must coincide

Sometimes one needs to introduce p ossibly innite integral even for a nonintegrable real

measurable function f X R by the formula

Z Z Z

f x dx f x dx f x dx

X X X

provided that at least one of the integrals on the righthand side is nite Then one says that f is

R

quasiintegrable and the integral f x dx is dened as a value in the interval

X

Let us refer for elementary prop erties of Leb esgue integral to Chapter

Supp osing and are measures on X X one says that is absolutely continuous with

respect to and writes if A implies A for every A X Basic measure

theoretical result is RadonNikodym theorem see Sections and

Theorem Supp osing is a nite measure and a nite measure on X X such that

there exists a nonnegative integrable function f called RadonNikodym derivative of with

respect to such that

Z

A f x dx for every A X

A

Moreover one can show using Theorem in that for every X measurable function g

on X g is integrable i g f is integrable and

Z Z

g x d x g x f x dx for every A X

A A

According to the remark ab ove RadonNikodym derivative is determined uniquely only within

d

to denote that a nonnegative X measurable function f is equivalence ae One writes f

d

a version of RadonNikodym derivative of with resp ect to

Product of nite measures on X X and on X X is the unique measure

on X X X X dened on measurable rectangles as follows

A B A B whenever A X B X

Let us refer to Chapter Example and Sections for the pro of of existence and

uniqueness of necessarily nite pro duct measure Pro duct of nitely many nite

Q

measures jN j can b e introduced analogously Another basic measuretheoretical

i

iN

result is Fubini theorem see Section

Theorem Let b e a nite measure on X X and a nite measure on X X

Supp ose that f is a nonnegative X X measurable function on X X Then the function

R R

x f x x d x is X measurable the function x f x x d x is X

X X

2 1

measurable and one has

Z

f x x d x x

X X

1 2

Z Z Z Z

f x x d x d x f x x d x d x

X X X X

1 2 2 1

Whenever f is integrable real function on X X the same conclusion holds with the

proviso that resp ective functions on X are dened almost everywhere i

i i

By a measurable mapping of a measurable space X X into a measurable space Y Y is

understo o d a mapping t X Y such that for every B Y the set t B fx X tx Bg

b elongs to X Note that a measurable function is a sp ecial case when Y is R endowed with the

Borel algebra Every probability measure P on X X then induces through t a probability

measure Q on Y Y dened by QB P t B for every B Y

Two measurable spaces X X and Y Y are isomorphic if there exists a onetoone mapping

X Y which is onto Y and preserves countable union and complement Then

X Y and countable intersection and inclusion are also preserved The inverse mapping

preserves these op erations as well and every measure on X X corresp onds to a measure

on Y Y dened by

B B for B Y

and conversely For example given measurable spaces X X and X X the space X X

is isomorphic to X X X endowed with the algebra

X fA X A X g X X

Supp osing P is a probability measure on a measurable space X X and A X is a

A

algebra over X the restriction of P to A will b e denoted by P Given B X the conditional

probability of B given A with respect to P is an Ameasurable function h X such that

Z

P A B hx dP x for every A A

A

A

One can use RadonNikodym theorem with X A P and A P A B for A A

to show that a function h satisfying exists and is determined uniquely within equivalence

A

P ae Let us write h P BjA to denote that a Ameasurable function h X is a

version of conditional probability of B given A Let us mention without pro of two equivalent

denitions of conditional probability The rst one apparently weaker says that h P BjA i

the equality holds for every A G where G A is a class closed under nite intersection

such that G A The second one apparently stronger says that h P BjA i for every

nonnegative Ameasurable function g X R and A A one has

Z Z Z

A

g x dP x g x hx dP x g x hx dP x

AB A A

It follows from the denition of conditional probability that whenever S T X are algebras

and B X then every S measurable version of P BjT is a version of P BjS Sometimes it

happ ens that a certain fact or the value of an expression do es not dep end on the choice of a

version of conditional probability In this case the symbol P BjA is used in the corresp onding

formula to substitute arbitrary version of conditional probability of B given A wrt P

Remark Having xed just P on X X and a algebra A X by a regular version of con

ditional probability given A is understo o d a function which ascrib es to every B X a version

of P BjA such that for every x X the mapping B P BjAx is a probability measure

on X X Note that this concept is taken from x and that a regular version of con

ditional probability may not exist in general eg Example VI in However under

certain top ological assumptions namely that X is a separable complete metric space and X is

the class of Borel sets in X its existence is guaranteed see either Theorem VI or

Consequence of Theorem V

Supp osing is a measure on the pro duct of measurable spaces X X X X the marginal

measure on X X is dened as follows

A A X for every A X

Every X measurable function h on X can b e viewed as X X measurable function on X X

Then h is integrable i it is integrable and

Z Z

hx d x hx dx x

X X X

1 1 2

A real function R is called convex if for all r s and

r s r s

It is called strictly convex if this inequality is strict whenever r s and Further basic

result is Jensens inequality one can mo dify the pro of from Section

Theorem Let b e a probability measure on X X f X a integrable function and

R a convex function Then

Z Z

f x dx f x dx

X X

In case is strictly convex the equality o ccurs if and only if f is constant ae more exactly

R

f x k for ae x X where k f x dx

X

Conditional indep endence of  algebras

Let A B C X are algebras in a measurable space X X and P is a probability measure on

it One can say that A is conditionally independent of B given C with respect to P and write

A B j C P if for every A A and B B one has

C

P A BjC x P AjC x P BjC x for P ae x X

Note without pro of that an apparently weaker equivalent formulation is as follows It suces

to verify only for A A and B B where A A resp ectively B B are classes closed

under nite intersection such that A A resp ectively B B

Lemma Under the assumptions ab ove A B j C P o ccurs i for every A A there exists

a C measurable version of P AjB C

Pro of To show the necessity of the condition x A A and choose a version f of P AjC

Write for every B B C C by denition of P A BjC and

Z Z

P A B C P A BjC x dP x P AjC x P BjC x dP x

C C

and continue using the stronger denition of P BjC and the fact f P AjC

Z Z Z

P AjC x P BjC x dP x P AjC x dP x f x dP x

C BC BC

Since the class G fB C B B C C g is closed under nite intersection and B C G

by the weaker denition of P AjB C conclude that f P AjB C

To show the suciency x A A and B B Take a C measurable version f of P AjB C

and observe f P AjC Then write by the denition of P A BjC and the fact f P AjB C

for every C C

Z Z

C

P A BjC x dP x P A B C f x dP x

C BC

and continue using the stronger denition of P BjC and the fact f P AjC

Z Z Z

C C

f x dP x f x P BjC x dP x P AjC x P BjC x dP x

BC C C

Thus the equality

Z Z

C C

P A BjC x dP x P AjC x P BjC x dP x

C C

was veried for every C C which implies

The next lemma describ es basic prop erties of conditional indep endence for algebras

Lemma Supp osing P is a probability measure on X X and A B C D E F G X are

algebras it holds

i B C A B j C P

ii A B j C P B A j C P

iii A E j C P F E C G E C A F j G P

iv A B j D C P A D j C P A B D j C P

Pro of The condition ii follows immediately from symmetry in For other prop erties

use the equivalent denition from Lemma For i realize that B C C and therefore

every version of P AjB C is C measurable In case iii observe S F G E C T The

assumption A E j C P implies for every A A the existence of a C measurable version of

P AjT Since C G S it is b oth G measurable and S measurable Hence it is a version

of P AjS The existence of a G measurable version of P AjS means A F j G P To show

iv x A A and by A B j D C P derive the existence of D C measurable version f of

P AjB D C Similarly by A D j C P derive the existence of C measurable version g of

P AjD C Observe that f is a version of P AjD C and by the uniqueness of P AjD C

D C B D C

derive that f g P ae Hence f and g are equal P ae and by uniqueness of

P AjB D C conclude that g is its version This implies A B D j C P

Consequence Supp osing P is a probability measure on X X semigraphoid properties

for algebras hold that is one has for algebras A B C D X the symbol of P is omitted

triviality A B j C whenever B C C

symmetry A B j C B A j C

decomp osition A B D j C A D j C

weak union A B D j C A B j D C

contraction A B j D C A D j C A B D j C

Pro of Use Lemma for decomp osition use iii with E B D F D G C for weak

union put E B D F B G D C instead

Relative entropy

Supp ose that P is a nite measure and a nite measure on a measurable space X X In

dP

accept the convention ln case P choose a version of RadonNikodym derivative

d

and introduce

Z

dP dP

H P j x ln x dx

d d

X

dP dP

ln is quasiintegrable we call the integral the relative entropy Provided that the function

d d

of P with respect to Of course quasiintegrability and the value of H P j do es not dep end

dP

on the choice of a version of It follows from the denition of RadonNikodym derivative

d

that H P j can b e equivalently introduced as the integral

Z

dP

H P j ln x dP x

d

X

dP

is P quasiintegrable Hence the relative entropy of P with resp ect to is provided that ln

d

dP

is P integrable Let us note that in general P do es not imply the existence nite i ln

d

of the integral in and in case H P j is dened it can take any value in the interval

However when b oth P and are probability measures and P the existence

of H P j is guaranteed and it can serve as a measure of similarity of P to

Lemma Supp osing P and are probability measures on X X such that P the

relative entropy of P with resp ect to is dened and H P j Moreover H P j i

P

dP

and r r ln r for r Pro of Apply Jensens inequality to the case f

d

R

Since P is a probability measure f x dx and gives the lower estimate

X

Moreover since is strictly convex H P j i f x for ae x X which is equivalent

to the requirement P

Let us emphasize that the assumption that H P j is nite involves the requirement P

Finitedimensional subspaces and cones

n

Throughout Section the set of ndimensional real vectors R n is xed It is a top o

n

logical space endowed with Eucledian top ology Given x y R and R the sum of vectors

n n

x y R and the scalar multiple x R are dened comp onentwisely The symbol de

n

notes the zero vector which has as all its comp onents Given A R the symbol A denotes

the set fx x Ag where x denotes the scalar multiple x The scalar product of two

P

n

n n

vectors x x and y y is the number hx y i x y

i i i i

i i

i

Linear subspaces

n

A set L R is a linear subspace if L and L is closed under linear combinations ie

x y L R x y L

n n

Every linear subspace of R is a closed set with resp ect to Eucledian top ology A set A R

n

linearly generates a subspace L R if every element of L is a linear combination of elements

of A ie

X

x L B A nite such that x y for some R y B

y y

y B

By convention is the empty linear combination which means that linearly generates the

n

subspace fg A nite set A R is linearly independent if

X

R y A y for every y A

y y y

y A

n

In particular a set containing is never linearly indep endent Linear basis of a subspace L R

is any nite linearly indep endent set A L which linearly generates L Every linear subspace

n

L R has a basis which is p ossibly empty in case L f g Dierent bases of L have the same

number of elements called the dimension of L The dimension is the number b etween for

n

L f g and n for L R

n n

One says that a subspace L R is a direct sum of subspaces L L R and writes

L L L if L L f g L L L L and L L generates L Then every x L can

b e written in the form x y z where y L z L and this decomp osition of x is unique

Moreover the dimension of L is the sum of dimensions of L and L The orthogonal complement

n

of a set A R is the set

n

A fx R hx y i for every y Ag

n n

It is always a linear subspace Moreover for every linear subspace L R one has R L L

and L L

Convex cones

n

A set K R is a convex cone if K and K is closed under conical combinations ie

x y K x y K

By closed convex cone is understo o d a convex cone which is a closed set with resp ect to Eucledian

n

top ology on R An example of a closed convex cone is a linear subspace Another example is

n

the dual cone A to a set A R dened by

n

A fy R hx y i for every x Ag

n n

This is a general example as K R is a closed convex cone i K A for some A R see

Consequence in Further example of a closed cone is the conical closure conB of a

n

nonempty set B R con fg by convention

X

n

conB fx R x z for some and nite C Bg

z z

z C

n

Note that conB B for every nite B R see Fact and Prop osition in A cone

conB with nite B called a polyhedral cone by a rational polyhedral cone is understo o d the

n n

conical closure of a nite set of rational vectors B Q Basic fact is that a set K R is a

n

p olyhedral cone i K A for a nite A R Analogously K is a rational p olyhedral cone i

n

K A for a nite set of rational vectors A Q see Prop osition in Note that these

facts can b e viewed as consequences or analogues of a wellknown result from convex analysis

saying that p olytop es coincide with b ounded p olyhedrons Let us call by a face of a p olyhedral

cone K a convex cone F K such that x y K x y F implies x y F Note that

this is a mo dication of the usual denition of a face of a closed convex set from and the

denitions coincide for nonempty subsets F of a p olyhedral cone K One can show that a face

of a p olyhedral cone is a p olyhedral cone cf Consequence in

n

A closed cone K R is pointed if K K f g Apparently stronger equivalent denition

n

says that a closed cone K is p ointed i there exists y R such that hx y i for every

n

x K n f g see Prop osition in By a ray generated by nonzero vector x R is

understo o d the set R f x g Clearly every cone contains whole ray R with any of

x

its nonzero vectors x R which then necessarily generates R Given a closed convex cone

n

K R a ray R K is called extreme in K if

x y K x y R implies x y R

A closed cone K has extreme rays i it is p ointed and contains a nonzero vector x K

n

Moreover every p ointed closed convex cone K R is a conical combination of its extreme

rays more exactly K conB for every B K such that M R n f g for each extreme

ray R K Note that this fact can b e viewed as a consequence of wellknown KreinMillman

theorem for b ounded closed convex sets see Prop osition in A p ointed closed cone is a

p olyhedral cone i it has nitely many extreme rays Moreover it is a rational p olyhedral cone

i it has nitely many extreme rays and every its extreme ray is generated by a rational vector

see Consequence in Basic result of Section are based on the following sp ecic

prop erty of p ointed rational p olyhedral cones

n

Lemma Let K R b e a p ointed rational p olyhedral cone and R is an extreme ray of K Then

n

there exists q Q such that hq x i for any x R and hq y i whenever y b elongs

to other extreme ray of K

Another useful fact is that every conical combination of integral vectors is necessarily a

rational conical combination see Lemma in

P

n n

y where Q Fact Supp osing B Z every x conB Z has the form x

y y

y B

for every y B

y

Concepts from multivariate analysis

The concepts and facts mentioned in this section are commonly used in mathematical statistics

in particular in its sp ecial area known as multivariate analysis The pro ofs of the facts from

Section can b e found in textb o oks of matrix calculus for example Chapters and

The pro ofs of basic facts from Section are in any reasonable textb o ok of statistics see

eg section V

Matrices

Given nonempty nite sets N M by a real N M matrix will b e understo o d a real function on

N M

N M that is an element of R The corresp onding values are indicated by subscripts so

that one writes to explicate the comp onents of a matrix of this type Note

ij iN j M

that this approach slightly diers from classic understanding of the concept of matrix where the

index sets have settled preorderings eg N f ng and M f mg This enables

one to write certain formulas involving matrices in much more elegant way

The result of matrix multiplication of an N M matrix and an M K matrix where

N M K are nonempty nite sets is an N K matrix denoted by A real vector v over

N

N that is an element of R will b e here understo o d as a column vector so that it should app ear

in matrix multiplication with an N N matrix from left v Nul l matrix or vector having

all comp onents zero is denoted by unit matrix by I An N N matrix is

ij ij N

symmetric if for every i j N regular if there exists uniquely determined inverse

ij j i

N N matrix such that I The transpose of will b e denoted by

the determinant by det

By a generalized inverse of a real N N matrix will b e understo o d any N N matrix

such that A matrix of this sort always exists but it is not determined uniquely

unless is regular when it coincides with see Section b However the expressions

in which generalized inverses are commonly used usually do not dep end on their choice

A real symmetric N N matrix is called positive semidenite if v v for every

N N

v R and positive denite if v v for every v R v Note that is p ositive

denite i it is regular and p ositive semidenite In that case is p ositive denite as well

Given an N N matrix and A B N the symbol will b e used to denote

ij ij N AB

A B submatrix that is Supp osing is p ositive denite semidenite

AB ij iAj B

and A N its main submatrix is p ositive denite semidenite as well Note that the

AA

op eration plays sometimes the role of marginalizing but only for p ositive semi

AA

denite matrices On the other hand supp osing is only regular need not b e regular

AA

Supp ose that is a real N N matrix nonempty sets A C N are disjoint and is

C C

regular Then one can introduce Schur complement as an A A matrix as follows

AjC

AA AC C C C A

AjC

with the convention Note that is regular i and is regular

AA AC AC C C

Aj AjC

and then Moreover the following transitivity principle holds

AC AC AA

AjC

supp osing A B C N are pairwise disjoint and is an N N matrix such that b oth

C C

and is regular one has An imp ortant fact is that whenever

BC BC

AjBC AB jC AjB

is p ositive denite then is p ositive denite as well Thus the op eration

AC AC

AjC AjC

often plays the role of conditioning for p ositive denite matrices only

However one sometimes needs to dene conditional matrix even in case that

C C

AjC

is not regular Thus supp osing is a p ositive semidenite matrix one can introduce by

AjC

means of generalized inverse as follows

C C

AA AC C C C A

AjC

Note that this matrix do es not dep end on the choice of generalized inverse use Section

av and that in case of p ositive denite matrix is coincides with the ab ove mentioned

Schur complement Therefore the concept of conditioning is extended to p ositive semidenite

matrices

Statistical characteristics of probability measures

Remark Elementary concept of mathematical statistics is a random variable which is a real

measurable function on a certain intentionally unsp ecied measurable space A where

is interpreted as the universum of elementary events and A as the collection of observable

random events Moreover it is assumed that A admits a probability measure P Then

every random vector that is a nite collection of random variables where jN j

i iN

Q

N N

induces on R X with X R endowed with the Borel algebra B the pro duct

i i

iN

of Borel algebras on R in this case a probability measure P called the distribution of

N

P A P fw w Ag for every Borel set A R

N N

The measurable space R B is then called the joint sample space Note that generalized

random variables taking values in alternative sample spaces eg nite sets instead of R are

sometimes considered as well

The area of interest of mathematical statistics is not the underlying theoretical probability

P but the induced probability measure P on the sample space Indeed despite the fact that text

b o oks of statistics introduce various numerical characteristics of random vectors these numbers

actually do not characterize random vectors themselves but their distributions that is induced

N

Borel probability measures on R The purp ose of many statistical metho ds is then simply to

estimate these numerical characteristics from data Denitions of basic ones are recalled this

section

Q Q

N N

Let P b e probability measure on X X R B where jN j Let x

i i i

iN iN

N

denote the ith comp onent i N of a vector x R In case that for every i N the function

N N

x x x R which is B measurable is P integrable one can dene the expectation as a

i

N

real vector e e R having comp onents

i iN

Z Z

fig

e x dP x y dP y for i N

i i

N

X

R

i

If moreover the function x x e x e is P integrable for every i j N one denes

i i j j

the covariance matrix of P as an N N matrix with elements

ij ij N

Z Z

fij g

x e x e dP x y e z e dP y z for i j N

i i j j i j ij

N

X X

R

i j

Alternative names are variance matrix disp ersion matrix or even variancecovariance

matrix Elementary fact is that covariance matrix is always p ositive semidenite the

converse is also valid see the next section

Supp osing P has a covariance matrix such that for every i N one can

ij ij N ii

introduce the correlation matrix by the formula

ij ij N

ij

for i j N

p

ij

ii j j

Note that the situation ab ove o ccurs whenever is regular p ositive denite and is then

a p ositive denite matrix with for every i N

ii

Multivariate Gaussian distributions

N

Denition of a general Gaussian measure on R is not straightforward First one has to

introduce onedimensional Gaussian measure N r s on R with parameters r s R s

In case s one can do so by dening RadonNikodym derivative with resp ect to Leb esgue

measure on R

2

(xr )

2s

p

exp for x R f x

s

In case s is N r dened as the Borel measure on R concentrated on fr g

N

Then supp osing e R and is a p ositive semidenite N N matrix jN j one

N

can introduce the Gaussian measure N e as a Borel measure P on R such that for every

N N

v R P induces through measurable mapping x x v x R onedimensional Gaussian

measure N v e v v on R Let us note that a measure of this kind always exists and

is determined uniquely by the requirement ab ove Moreover P has then the exp ectation e and

the covariance matrix This indicates why parameters were designed in this way and shows

that every p ositive semidenite matrix is the covariance matrix of a Gaussian measure

N

Linear transformation of a Gaussian measure N e by a mapping x y x x R

M M N

where y R R jM j is again a Gaussian measure N y e In

particular the marginal of a Gaussian measure is again a Gaussian measure

A

P N e A N P N e

A AA

Note that this explains the interpretation of as a marginal matrix Very imp ortant fact

AA

is that indep endence is characterized by means of the covariance matrix

AB A B

P N e A B N A B P P P i

AB

In general Gaussian measure N e is concentrated on a certain shifted linear subspace which

can b e describ ed as follows

N N

f x R v R v v x e g

N

In case is regular the subspace is whole R and P N e can b e introduced directly by

N

its RadonNikodym derivative with resp ect to Leb esgue measure on R

1

(xe)  (xe)

N

2

p

exp for x R f x

jN j

det

This version of RadonNikodym derivative is strictly p ositive and continuous with resp ect to

N

Eucledian top ology on R Moreover it is the unique continuous version within the class

of all p ossible versions of RadonNikodym derivative of P with resp ect to Leb esgue measure

This simple fact motivates an implicit convention used commonly in statistical literature

only continuous versions called densities are taken into consideration The convention is in

concordance with usual way of marginalizing since for A N by integrating continuous

density

Z

A

f x y dy for x R f x

A

X

N nA

one gets a continuous density again This also motivates a natural way of denition of contin

uous conditional density for disjoint A C N by the formula

f xz

AC

A C

for x R z R f xjz

AjC

f z

C

where the ratio is zero whenever f z by convention and the denition of conditional

C

C

measure for every z R

Z

A

P Ajz f xjz d x for every Borel set A R

A

AjC AjC

A

A

which app ears to b e a regular version of conditional probability on R given C Let us emphasize

C

that just the acceptance of the convention ab ove leads to its uniqueness for every z R It

is again a Gaussian measure

P N e A C N A C A

P jz N e z e

A AC C C C

AjC AjC

see Section av called sometimes conditional Gaussian measure Imp ortant feature

is that its covariance matrix do es not dep end on z This maybe explains the meaning of Schur

complement sometimes called the conditional covariance matrix

AjC

However conditioning can b e introduced uniquely even in case of degenerate Gaussian

C

measure for z R b elonging to resp ective shifted linear subspace mentioned in It is

again a Gaussian measure given by but is replaced by a generalized inverse

C C

As shown in section av this conditional measure do es not dep end on the

C C

choice of generalized inverse

The last imp ortant fact is that in case is p ositive denite the measure P N e has

N

nite relative entropy with resp ect to Leb esgue measure on R namely

jN j jN j

ln lndet H P j

see section a note that Raos entropy is nothing but minus relative entropy

Index

The rst part of the Index is the list of no B

tions in alphab etic order The reference usu

ball U x

ally indicates the page containing the deni

basis

tion Several entries are in italics These en

basis of a linear subspace

tries indicate either concepts from literature

Hilb ert basis H N

not studied in details in this work they are

baricentral imset

mentioned without denition only or vague

dual baricentral imset

ly dened concepts The second part of the

blo ck in a chain graph

Index is the list of elementary items like Fig

blo cked route path

ures Lemmas etc The third part of the In

in an acyclic directed graph

dex is the list of sp ecial symbols

in a chain graph

Borel algebra sets

SUBJECT INDEX

bubble graph

A

C

absolute value jxj

canonical

absolutely continuous measures

characteristics of CG measure

acyclic

decomposition of an UG model

directed graph

cardinality jAj

hypergraph

causal input list

active route

causal interpretation

in an acyclic directed graph

cellular extension ce M N

in a chain graph

CG conditional Gaussian measure

actual dimension

nondegenerate CG measure

algebra algebra

CG mo del chain graph mo del

almost everywhere ae

chain in a hybrid graph

alternative chain graph

chain graph

ancestor an A

G

child

ancestral graph

chordal undirected graph

annotated graph

classic graphs

annotation algorithm

clique in an undirected graph

membership algorithm

closed set

arrow a b

in top ological sense

ascending class D

with resp ect to a closure op eration

ascetic extension as M N

closure

atom of a lattice

closure op eration system

atomistic lattice

conical closure conA

attribute of a formal context

structural closure cl

U N

augmentation criterion

coatom of a lattice

coatomistic lattice

collider no de

combinatorial imsets C N consonant probability measures

complete continuous

lattice function

metric space marginal density

set of no des reference system

completeness variables

of a graphical criterion contraction

task motivation of an indep endence mo del M

T jX

complex in a chain graph of a structural imset u

L

complexity of a structural mo del prop erty axiom of indep endence

comply mo dels

probability measure complies with convex

a structural imset convex function

comp osition convex cone

of functions denoted by juxtap osition cop ortrait H

u

of symbols correlation matrix

of structural mo dels M M countably additive measure

prop erty axiom of indep endence counting measure

mo dels covariance

concentration covariance graph

concentration graph covariance matrix

concentration matrix in Gaussian case

concept lattice cseparation for chain graphs

conditional cycle

conditional contraction of a mo del

D

M

T jX

conditional covariance matrix

AjC

DAG mo del

in Gaussian case

dashed arrow line

conditional density

decomp osable

dep endence statement AB j C o

mo del

conditional Gaussian measure

undirected graph

CG measure

decomp osition

conditional indep endence

of an undirected graph

for algebras A B j C P

prop erty axiom of indep endence

for random variables A B j C P

mo dels

degenerated

indep endence mo del M

o

degenerated measure

indep endence statement A B j C o

degenerated Gaussian measure

degree of a combinatorial imset degu

conditional mutual information

dense set

conditional probability

in a top ological sense

general denition

inmumdense set

on X given C P

A

AjC

supremumdense set

regular version

density

conditional product

terminological remark

conical closure conB

denition

connected

dP

density of P with resp ect to

d

connected no des

continuous marginal density

connectivity comp onent

dep endence statement AB j C o

consistency task motivation

descendant

descending facial equivalence

descending class D factorizable equivalence

descending route path level equivalence

determinant of a matrix det Markov equivalence

determining class mo del equivalence

dimension parametrization equivalence

direct sum of linear subspaces L L p ermutable equivalence

directed strong equivalence of sup ermo dular

directed cycle functions

directed edge arrow a b essential graph

directed graph for acyclic directed graphs

discrete for alternative chain graphs

discrete distance Eucledian

discrete measure distance

discrete p ositive measure top ology

discrete variables evaluated pattern

disjoint exp ectation vector

disjoint triplet over N hA B jC i in Gaussian case

disjoint semigraphoid expansive op erations for struc mo dels

distance ascetic extension as M N

distribution cellular extension ce M N

distribution equivalence solid extension so M N

distribution of a random variable extent of a formal concept

distributive lattice extreme ray R

x

dominated experiment

F

dominating measure

dseparation criterion

face lattice

dual

facial

dual cone A

facial equivalence u v

dual baricentral imset

facial implication u v

strong facial implication ;

E

factorizable equivalence

edge

factorizable measure

in a mixed graph

after a class

eective domain D

wrt an acyclic directed graph

u

elementary

wrt a chain graph

imset E N

wrt an undirected graph

mode of semigraphoid representation

faithfulness task motivation

triplet statement

nite measure

embedded Bayesian network

xedcontext independence statement

empirical multiinformation

formal concept

empty set

formal context

entropy

Fubini theorem

entropy function h

P

functional dependence model

p erturbated relative entropy

fullconditioned contraction M

T j

H P j Q

fullconsistent set of relevance statements

relative entropy H P j

G

equivalence

equivalence task motivation

Galois connection

distribution equivalence

Gaussian measure interpretability task motivation

denition N e interpretation causal

generalized inverse matrix intersection

T

generated algebra A of a class D

generator minimal structural prop erty axiom of indep endence

global Markov property mo dels

grade gra N inverse matrix

graph generalized

chain graph irreducible

classic graph joinirreducible meetirreducible

directed graph isomorphism

undirected graph isomorphic measurable spaces

graphoid orderisomorphism

greatest

J

greatest element of a lattice

greatest lower b ound inf M

Jensens inequality

join x y

H

join of algebras S T

Hasse diagram join semilattice

hidden variable joinirreducible

Hilb ert basis jointresponse chain graphs

homomorphism of lattices junction tree

hybrid graph juxtap osition

convention

I

notation for comp osition

A A

identicator of a set m m

A

L

identier of leveldegree m m

l

immorality largest chain graph

implementation task motivation lattice

imset atomistic coatomistic lattice

of the least degree concept lattice

represented in representable face lattice

visualization lattice homomorphism

with the least lower class lattice conditional independence models

with minimal lower class learning task motivation

incidence relation of a formal context least

indecomp osable structural mo del least determining unimarginal

indicator function class

A

induced subgraph G least element of a lattice

A

inmum inf M least upp er b ound sup M

inmum of algebra S T Leb esgue measure

inmumdense set level

informationtheoretical tools level equivalence

input list level identier m

l

integers Z nonnegative Z level of elementary imset E N

l

integrable function lift li M N X

integral adjective related to integers line undirected edge a ! b

integral noun linear

intent of a formal concept linear generator

linear subspace combination nite

linearly indep endent Gaussian

LISREL model induced by a measurable mapping

local computation method nondegenerate Gaussian

local Markov property nonnegative

lower p ositive

lower class L p ositive CG measure

u

lower neighbour nite

N skeleton K meet x y

standardization of algebras S T

standardized sup ermo dular meetirreducible

functions K N membership algorithm for annotated

graphs

M

method of local computation

metric space

main submatrix

AA

metrizable top ology

MAG

marginal density of P for A

minimal determining class

min

continuous

minimal sets of a class D

marginal measure

minimal structural generator

A

marginal undirected graph G

minor

marginally continuous probability

mo del equivalence of

measure

graphs

Markov equivalence of

skeletal imsets

acyclic directed graphs

mo del

chain graphs

induced by a structural imset M

structural imsets

u

pro duced by a sup ermo dular function

undirected graphs

m

M

Markov network

structural mo del

Markov prop erty

with hidden variables

Markovian measure with resp ect to

mo dular functions LN

an acyclic directed graph

moment characteristics of a CG measure

a chain graph

moral graph of

a structural imset

an acyclic directed graph

an undirected graph

a chain graph

matrix multiplication

max

moralization criterion for

maximal sets of a class D

acyclic directed graphs

MC graph

chain graphs

measurable

mseparation criterion

function

mapping

multiinformation

rectangle

multiinformation function

space

multiple of a vector x

measure

multiset

concentrated on a set

multivariate analysis

complying with a structural imset

mutual information

countably additive additive

degenerate Gaussian

N

discrete

discrete p ositive

natural numbers N

negative domain of an imset D p ositive semidenite matrix

u

negative part p ositive part

of a function f of a function f

of an imset u of an imset u

neighbour in a nite lattice potential

lower upp er p ower set P N

no de of a graph

prime models

nondecreasing function

prime UG submodels

nondegenerate Gaussian measure

probability distribution

nonnegative integers Z

terminology

normal distribution

probability measure

normalization of an imset

over N

null element of a lattice

problem of axiomatic characterization

null matrix vector

pro duct

number of elementary imsets

algebra

O formula induced by an imset

top ology

object of discrete mathematics

pro duct of

ob ject of a formal context

measurable space X Y X Y

op en set

measures

orderisomorphic

top ological spaces

orthogonal complement L

pro jection of x onto A x

A

oskeleton K N

o

prop er decomp osition of an undirected

ostandardization

graph

pseparation criterion

P

Q

pairwise Markov property

PAG

quasiintegrable function

path

parametrization equivalence R

parent

RadonNikodym

partial ancentral graph PAG

derivative

partial ordering

theorem

pattern of

rational numbers Q

an acyclic directed graph

rational p olyhedral cone

a class of facial equivalence

random variable vector

perfect class of measures range R

u

p erfectly Markovian measure ray R

x

p ermutable equivalence

real numbers R

p erturbated relative entropy H P j Q

reciprocal graph

p ointed cone recursive causal graph

p olyhedral cone recursively factorizable measures

portrait region of a structural imset R

u

p oset L regular

p ositive denite matrix annotated graph

p ositive domain of an imset D matrix

u

p ositive measure version of conditional probability

p ositive CG measure given A

reference system strict

continuous strict ancestor

standard strict inclusion

universal strictly

reection of a skeletal imset strictly convex function

relative entropy H P j strictly descending route path

relevance statement over N strong

restriction of strong completeness of a graphical

an imset criterion

an indep endence mo del M strong equivalence of sup ermo dular

T

a probability measure functions

strong facial implication ;

ring of subsets

dual strong facial implication

route

structural indep endence mo dels U N

active route in a directed graph

structural imsets S N

running intersection prop erty

structural closure

S

sub concept of a formal concept

subgraph induced

sample space

submatrix

AB

scalar pro duct

main submatrix

of imsets hm ui

submaximal element of a lattice

of vectors hx y i

subminimal element of a lattice

Schur complement

AjC

subset

sum of vectors x y

section or a route

summary graph

semidenite matrix

sup eractive route in a chain graph

semielementary imset

sup ermo dular function

semigraphoid prop erties axioms

class of sup ermo dular functions

semigraphoid prop erties for

K N

algebras

sup erset

separable metric space

sup ertype of a skeletal imset

separation criterion for acyclic directed

supremum sup M

graphs

of algebras S T

separator for decomposable models

supremumdense set

separoid

symbolic function r

set of variables N

A A

symmetric matrix

set identier m m

A

symmetry prop erty axiom of

simultaneous equation model

indep endence mo dels

singleton

skeletal sup ermo dular function

T

solid arrow line

solid extension so M N

top ology

pro duct top ology

standard imset

top ology induced by a distance

for an acyclic directed graph

top ological space

for a triangulated undirected graph

transitive acyclic directed graph

standard reference system for a CG

transp ose of a matrix

measure

triangulated undirected graph

standardization of a sup ermo dular

triplet over N disjoint

function

represented in a structural imset LIST OF ITEMS

A B j C u

represented in a sup ermo dular function

Consequences

A B j C m

represented in a undirected graph

Consequence p

A B j C G

Consequence p

trivial indep endence statements T N

Consequence p

trivial algebra

Consequence p

triviality prop erty axiom of

Consequence p

indep endence mo dels

Consequence p

type of a skeletal imset

Consequence p

U

Consequence p

Consequence p

UG mo del

unconditioned independence statement

Consequence p

underlying graph

Consequence p

undirected

Consequence p

edge line a ! b

Consequence p

graph

Consequence p

route path

Consequence p

unimarginal class for a structural

Consequence p

mo del

S

Consequence p

union of a class D

Consequence p

unit element of a lattice

unit matrix I

Consequence p

universal reference system

Consequence p

upp er class for a structural imset U

u

Consequence p

upp er neighbour in a lattice

Consequence p

uskeleton K N

u

ustandardization of sup ermo dular

Consequence p

functions

Consequence p

Consequence p

V

Consequence p

variables N

Consequence p

continuous

discrete

random variables

Conventions

vector

Convention p

random vector

Convention p

W

Directions

weak union prop erty axiom of

indep endence mo dels

Direction p

width

Direction p

Direction p

Z

Direction p

zero vector

Direction p

Examples Equation p

Equation p

Example p

Equation p

Example p

Equation p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Example p

Equation p

Equation p

Equation p

Equations

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p

Equation p Equation p

Equation p

Equation p

Figures

Equation p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Equation p

Figure p

Equation p

Figure p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Equation p

Figure p

Equation p

Figure p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Figure p

Equation p

Figure p

Equation p

Equation p

Figure p

Equation p

Figure p

Equation p

Figure p

Equation p

Equation p

Equation p

Lemmas Observations

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Observation p

Lemma p

Observation p

Lemma p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Lemma p

Observation p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Lemma p

Observation p

Lemma p

Observation p

Observation p

Lemma p

Observation p

Lemma p

Observation p

Lemma p

Lemma p

Questions

Lemma p

Lemma p

Question p

Question p

Lemma p

Question p

Lemma p

Question p

Lemma p

Question p

Lemma p

Question p

Lemma p

Question p

Question p

Question p

Question p

Remarks Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Remark p

Themes

Remark p

Remark p

Theme p

Remark p

Theme p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Remark p

Theme p

Theme p

Remark p

Theme p

Remark p

Remark p

Theorems

Remark p

Remark p

Theorem p

Remark p

Remark p

Theorem p

Remark p

Theorem p

Remark p

Theorem p

Remark p

Remark p

THE LIST OF SYMBOLS

Simple conventional symbols

symbol for absolute continuity of measures

symbols for conditional indep endence and dep endence

symbol for conditional pro duct

symbol for direct sum of linear spaces

symbol for the empty set

symbol for facial equivalence

symbol for facial implication

symbol for innity

R

symbol for Leb egue integral

symbol for level equivalence induced by a skeletal imset m

m

symbols for meet inmum and join supremum

symbol for multiplication of matrices scalar multiple

symbol for negation

symbols for partial ordering

Q

symbols for pro duct in general

n symbol for set dierence eg A n B

symbols for set inclusion

S T

symbols for set union and intersection

; symbol for strong facial implication

zero zero imset

zero vector null matrix

Comp osite conventional symbols

j j absolute value cardinality

bc lower integer part bac max fz Z z a g for a R

de upp er integer part dae min fz Z a z g for a R

n

n

for n k N k n combination number

k

k nk

! line undirected edge a ! b is a line b etween a and b

arrow directed edge a b is an arrow from a to b

op en interval ordered pair

closed interval edge in a graph

h i scalar pro duct

f g set containing elements of the list

Symbols from other languages

the set of attributes of a formal context

indicator of a set A

A

identicator of a set A

A

generic symbol for a set of discrete variables

generic symbol for a set of continuous variables

incidence relation of a formal context

o ccasionally Leb esgue measure

r generic symbol for a symbolic function

the set of ob jects of a formal context

Ludolf s constant generic symbol for a p ermutation

generic symbol for a single class of facial equivalence

the class of measures over N distribution framework

u the class of Markovian measures with resp ect to u in

sigma a generic symbol for countable innite op eration

A algebra generated by a class of set A

inverse and generalized inverse of a matrix

submatrix of a matrix Schur complement

AB

AjB

v transp ose of a matrix and of a vector v

f generic symbol for a set of classes of facial equivalence

counting measure

Symbols in alphab etic order

A the set fx x Ag

A A orthogonal complement of A dual cone to A

hA B jC i disjoint triplet over N

A B j C o conditional indep endence statement

AB j C o conditional dep endence statement

A B j C P conditional indep endence for algebras

an A the set of ancestors of a set A in a graph G

G

as M N ascetic extension of a mo del M to N

C o ccasionally the set of cliques of an undirected graph

C N the class of combinatorial imsets over N

ce M N cellular extension of a mo del M to N

cl generic symbol for a closure op eration

cl structural closure

U N

D D induced descending and ascending class

p ositive and negative domain of an imset u D D

u u

the eective domain of a structural imset u D

u

d

RadonNikodym derivative density of with resp ect to

d

degu l degu level degree of a combinatorial imset u

det determinant of a matrix

E N E N the class of elementary imsets over N of degree l

l

exp the symbol for exp onential function

f f p ositive and negative part of a function f

f generic symbol for marginal density

A

A

f pro jection of a density f to A

f generic symbol for conditional density

AjC

f density of nondegenerate Gaussian measure N e

e

G induced subgraph of G for T

T

T

G marginal undirected graph

gra N graN generalized grade of the set of variables N

H N Hilb ert basis of the cone conE N

H cop ortrait of a structural imset

u

H P j relative entropy of P with resp ect to

H P j Q Qp erturbated relative entropy of P with resp ect to

h entropy function

P

I unit matrix

hi j jK i elementary triplet over N

i j j K elementary indep endence statement over N

inf inmum the greatest lower b ound

K N the class of sup ermo dular functions over N

K N the class of standardized sup ermo dular functions over N

N the skeleton K

N the oskeleton the uskeleton N K K

u o

generic symbol for lower standardization

LN the class of mo dular functions over N

L the lower class of a structural imset u

u

li M N X lift of a mo del M to N conditioned by X

ln symbol for natural logarithm

A A

m m identicators of sup ersets and subsets of a set A

m m level degree identiers

l

m m m elements of the u and oskeleton corresp onding to m

u o

m multiinformation function induced by P

P

m restriction of a sup ermo dular function m to T

T

m m sp ecial sup ermo dular functions Figure Figure

y

m contraction of a function m to T conditioned by X

T jX

m

M indep endence mo del pro duced by a sup ermo dular function m

M M M M indep endence mo del induced by o P G u

o P G u

M restriction of a mo del M to T

T

M mo del induced by the contraction u of u S N to T

T T

M contraction of a mo del M to T conditioned by X

T jX

M fullconditioned contraction of a mo del M to T

T j

max

max D maximum the class of maximal sets in D

min

min D minimum the class of minimal sets in D

N generic symbol for a nonempty nite set of variables

N the set of natural numbers

N r s onedimensional Gaussian measure

N e Gaussian measure with exp ectation e and covariance

matrix

o generic symbol for orthogonal standardization

P generic symbol for a probability measure over N

A A

P marginal of a measure P for a set A

A

P restriction of P to a algebra A

P N P X the p ower set

P conditional probability on X given C

A

AjC

P its regular version in Gaussian case

AjC

P BjA conditional probability of B given A with resp ect to P

P ae ae almost everywhere with resp ect to P

pa b the set of parents of a no de b in a graph G

G

n

Q Q the set of rational numbers rational vectors

n

R R the set of real numbers real vectors

P N

R the class of real functions on the p ower set P N

R region of a structural imset u

u

R range of a structural imset u

u

R ray generated by a vector x

x

n n

R B R B the space of real numbers vectors with Borel algebra

S o ccasionally the class of separators of a triangulated graph

S N the class of structural imsets over N

S N the class of representable structural imsets over N

so M N solid extension of a mo del M to N

sup supremum the least upp er b ound

T N the class of disjoint triplets over N

T N T N the classes of trivial and elementary triplets

u generic symbol for upp er standardization

u u p ositive and negative part of an imset u

u u standard imsets for graphs G and H

G H

u restriction of an imset u to T

T

u contraction of a structural imset u to T

T

u u semi elementary imset

hAB jC i hij jK i

U the upp er class of a structural imset u

u

U N the class of structural mo dels over N

U x ball with center x and radius

W N the class of cop ortraits over N

x pro jection of a conguration x onto a set A

A

X generic symbol for a sample space for A

A

X X co ordinate algebra for A its isomorphic representation

A A

X X X X generic symbol for a measurable space conventional notation

A A

X trivial algebra

X Y Galois connection

Z Z the set of integers the set of nonnegative integers

P N

Z the class of imsets over N

Bibliography

M Aigner Combinatorial Theory SpringerVerlag

Z An D A Bell J G Hughes On the axiomatization of conditional indep endence

Kyb ernetes n pp

J Andel Mathematical Statistics in Czech SNTL Prague

S A Andersson M D Perlman Lattice mo dels for conditional indep endence in

multivariate normal distributions Annals of Statistics pp

S A Andersson D Madigan M D Perlman A characterization of Markov equiv

alence classes for acyclic digraphs Annals of Statistics n pp

S A Andersson D Madigan M D Perlman C M Triggs A graphical charac

terization of lattice conditional indep endence mo dels Annals of Mathematics and

Articial Intelligence pp

S A Andersson D Madigan M D Perlman Alternative Markov prop erties for

chain graphs Scandinavian Journal of Statistics n pp

G Birkho Lattice Theory AMS Collo quim Publications

P Bocek GENERATOR a computer program Institute of Information Theory and

Automation March

R R Bouckaert IDAGs a p erfect for any distribution in Symbolic and Quan

titative Approaches to Reasoning and Uncertainty M Clarke R Kruse S Moral

eds Lecture Notes in Computer Science SpringerVerlag pp

R R Bouckaert M Studen y Chain graphs semantics and expressiveness in Sym

b olic and Quantitative Approaches to Reasoning and Uncertainty Ch Froidevaux

J Kohlas eds Lecture Notes in Articial Intelligence SpringerVerlag

pp

A Brndsted An Introduction to Convex Polytopes SpringerVerlag Russian

translation Mir

L M de Camp os J F Huete S Moral Probability intervals a to ol for uncertain

reasoning a technical rep ort DECSAI July University of Granada

L M de Camp os Indep endency relationships in p ossibility theory and their ap

plication to learning b elief networks in Mathematical and Statistical Metho ds in

Articial Intelligence G Della Riccia R Kruse R Viertl eds SpringerVerlag

pp

L M de Camp os Characterization of decomp osable dep endency mo dels Journal of

Articial Intelligence Research pp

D M Chickering A transformational characterization of equivalent Bayesian net

work structures in Uncertainty in Articial Intelligence P Besnard S Hanks

eds Morgan Kaufmann pp

R G Cowell A P Dawid S L Lauritzen D J Spiegelhalter Probabilistic Net

works and Exp ert Systems SpringerVerlag

D R Cox N Wermuth Multivariate Dep endencies Mo dels Analysis and Inter

pretation Chapman and Hall

A P Dawid Conditional indep endence in statistical theory Journal of the Royal

Statistical So ciety series B n pp

A P Dawid Conditional indep endence in Encyclop edia of Statistical Science Up

date vol S Kotz C B Read D L Banks eds John Wiley pp

A P Dawid M Studen y Conditional pro ducts an alternative approach to condi

tional indep endence in Articial Intelligence and Statistics Pro ceedings of the th

Workshop D Heckerman J Whittaker eds Morgan Kaufmann pp

A P Dawid Separoids a general framework for conditional indep endence and

irrelevance submitted to Annals of Mathematics and Articial Intelligence

M Fiedler Sp ecial Matrices and Their Use in Numerical Mathematics in Czech

SNTL Prague

JP Florens M Mouchart JM Rolin Elements of Bayesian Statistics Marcel

Dekker

M Frydenberg The chain graph Markov prop erty Scandinavian Journal of Statistics

n pp

M Frydenberg Marginalization and collapsibility in graphical interaction mo dels

Annals of Statistics n pp

L C van der Gaag JJ Ch Meyer Informational indep endence mo dels and normal

forms International Journal of Intelligent Systems n pp

B Ganter R Wille Formal Concepts Analysis Mathematical Foundations

SpringerVerlag

D Geiger Towards the formalization of informational dep endencie technical rep ort

CSD R UCLA Los Angeles December

D Geiger T Verma J Pearl Identifying indep endence in Bayesian networks Net

works n pp

D Geiger J Pearl On the logic of causal mo dels in Uncertainty in Articial In

telligence R D Shachter T S Lewitt L N Kanal and J F Lemmer eds

NorthHolland pp

D Geiger A Paz J Pearl Axioms and algorithms for inferences involving proba

bilistic indep endence Information and Computation n pp

D Geiger J Pearl Logical and algorithmic prop erties of conditional indep endence

and graphical mo dels Annals of Statistics n pp

D Geiger A Paz J Pearl On testing whether an embedded Bayesian network

represents a probability mo del in Uncertainty in Articial Intelligence R L de

Mantaras D Poole eds Morgan Kaufmann pp

P Hajek T Havranek R Jirousek Uncertain Information Pro cessing in Exp ert

Systems CRC Press

R A Horn Ch R Johnson Matrix Analysis Cambridge University Press

Russian translation Mir

R Jirousek Solution of the marginal problem and decomp osable distributions Ky

b ernetika pp

K G Joreskog D Sorbom LISREL A Guide to the Program and Application

SPSS Inc

G Kauermann On a dualization of graphical Gaussian mo dels Scandinavian Journal

of Statistics n pp

H G Kellerer Verteilungsfunktionen mit gegeb enem Marginalverteilungen in Ger

man Z Wahrscheinlichkeitstheorie pp

H Kiiveri T P Sp eed J B Carlin Recursive causal mo dels Journal of Australian

Mathematical So ciety series A pp

T Kocka R R Bouckaert M Studen y On the inclusion problem research rep ort

n Institute of Information Theory and Automation Prague February

J T A Koster Gibbs factorization and the Markov prop erty unpublished

manuscript

J T A Koster Markov prop erties of nonrecursive causal mo dels Annals of Statistics

n pp

J T A Koster Marginalizing and conditioning in graphical mo dels submitted to

Bernoulli

I Kramosil A note on nonaxiomatizability of indep endence relations generated by

certain probabilistic structures Kyb ernetika n pp

S L Lauritzen N Wermuth Mixed interaction mo dels research rep ort R Inst

Elec Sys University of Aalb org

Note that this rep ort was later mo died and b ecame a basis of the pap er

S L Lauritzen T P Sp eed K Vijayan Decomp osable graphs and hypergraphs

Journal of Australian Mathematical So ciety A n pp

S L Lauritzen D J Spiegelhalter Lo cal computation with probabilities on graphi

cal structures and their application to exp ert systems Journal of the Royal Statistical

So ciety series B n pp

S L Lauritzen N Wermuth Graphical mo dels for asso ciations b etween variables

some of which are qualitative and some quantitative Annals of Statistics

n pp

S L Lauritzen Mixed graphical asso ciation mo dels Scandinavian Journal of Statis

tics n pp

S L Lauritzen A P Dawid B N Larsen HG Leimer Indep endence prop erties

of directed Markov elds Networks n pp

S L Lauritzen Graphical Mo dels Clarendon Press

M Levitz M D Perlman D Madigan Separation and completeness prop erties for

AMP chain graph Markov mo dels submitted to Annals of Statistics

M Loeve Probability Theory Foundations Random Pro cesses D van Nostrand

F M Malvestuto A unique formal system for binary decomp osition of database

relations probability distributions and graphs Information Sciences pp

F M Malvestuto M Studen y Comment on A unique formal graphs

Information Sciences pp

F M Malvestuto Theory of random observables in relational data bases Informa

tion Systems n pp

J L Massey Causal interpretation of random variables in Russian Problemy

Peredachi Informatsii n pp

F Mat us Ascending and descending conditional indep endence relations in Infor

mation Theory Statistical Decision Functions and Random Pro cesses Transactions

of th Prague Conference vol B S Kubk J A Vsek eds Kluwer pp

F Mat us On equivalence of Markov prop erties over undirected graphs Journal of

Applied Probability n pp

F Mat us Probabilistic conditional indep endence structures and matroid theory

backgrounds International Journal of General Systems n pp

F Mat us Sto chastic indep endence algebraic indep endence and abstract connected

ness Theoretical Computer Science A n pp

F Mat us On the maximumentropy extensions of probability measures over undi

rected graphs in Pro ceedings of WUPES September Trest Czech

Republic pp

F Mat us M Studen y Conditional indep endences among four random variables I

Combinatorics Probability and Computing n pp

F Mat us Conditional indep endences among four random variables I I Combina

torics Probability and Computing n pp

F Mat us Conditional indep endence structures examined via minors Annals of

Mathematics and Articial Intelligence pp

F Mat us Conditional indep endences among four random variables I I I nal con

clusion Combinatorics Probability and Computing n pp

Ch Meek Causal inference and causal explanation with background knowlewdge in

Uncertainty in Articial Intelligence P Besnard S Hanks eds Morgan Kauf

mann pp

Ch Meek Strong completeness nad faithfulness in Bayesian networks in Uncertainty

in Articial Intelligence P Besnard S Hanks eds Morgan Kaufmann pp

E Mendelson Introduction to Mathematical Logic nd edition D van Nostrand

M Mouchart JM Rolin A note on conditional indep endence with statistical ap

plications Statistica n pp

M Mouchart JM Rolin Letter to the editor Statistica n pp

J Moussouris Gibbs and Markov prop erties over undirected graphs Journal of

Statistical Physics n pp

J Neveu Bases Mathematiques du Calcul des Probabilites in French Masson et

Cie

A Paz A fast version of Lauritzens algorithm for checking representation of inde

p endencies in Pro ceedings of th Workshop on Uncertainty Pro cessing WUPES

June J Hradec Czech Republic p

A Paz R Y Geva M Studen y Representation of irrelevance relations by annotated

graphs Fundamenta Informaticae pp

J Pearl A Paz Graphoids graphbased logic for reasoning ab out relevance rela

tions in Advances in Articial Intelligence II B Du Boulay D Hogg L Steels eds

NorthHolland pp

J Pearl Probabilistic Reasoning in Intelligent Systems Networks of Plausible In

ference Morgan Kaufmann

A Perez admissible simplications of the dep endence structure of a set of random

variables Kyb ernetika pp

M D Perlman L Wu Lattice conditional indep endence mo dels for contingency

tables with nonmonotone missing data pattern Journal of Statistical Planning

pp

C van Putten J H van Shuppen Invariance prop erties of conditional indep endence

relation Annals of Probability n pp

Note that by the ma jority of the results of this pap er is almost identical with the results of

C R Rao Linear Statistical Inference and Its Application John Wiley

T S Richardson A p olynomialtime algorithm for deciding Markov equivalence

of directed cyclic graphical mo dels in Uncertainty in Articial Intelligence E

Horvitz F Jensen eds Morgan Kaufmann pp

T S Richardson A discovery algorithm for directed cyclic graphs in Uncertainty in

Articial Intelligence E Horvitz F Jensen eds Morgan Kaufmann pp

T S Richardson P Spirtes Ancestral graph Markov mo dels rep ort n Depart

ment of Statistics University of Washington Seattle submitted to Annals of

Statistics

J Rosenm uller H G Weidner Extreme convex set functions with nite carrier

general theory Discrete Mathematics n pp

W Rudin Real and Complex Analysis McGrawHill

Y Sagiv S F Walecka Subset dep endencies and completeness result for a sub

class of embedded multivalued dep endencies Journal of Asso ciation for Computing

Machinery n pp

G Shafer Probabilistic Exp ert Systems CBMSNSF Regional Conference Series in

Applies Mathematics SIAM

A Schrijver Theory of Linear and Integer Programming John Wiley

L S Shapley Cores of convex games International Journal of Game Theory

pp

P P Shenoy Conditional indep endence in valuationbased systems International

Journal of Approximate Reasoning n pp

P P Shenoy Representing conditional indep endence relations by valuated networks

International Journal of Uncertainty Fuzziness and KnowledgeBased Systems

pp

P Spirtes C Glymour R Scheines Causation Prediction and Search Lecture

Notes in Statistics SpringerVerlag

P Spirtes Directed cyclic graphical representations of feedback mo dels in Uncer

tainty in Articial Intelligence P Besnard S Hanks eds Morgan Kaufmann

pp

W Sp ohn Sto chastic indep endence causal indep endence and shieldability Journal

of Philosophical Logic n pp

W Sp ohn On the prop erties of conditional indep endence in Patrick Supp es Scien

tic Philosopher vol Probability and Probabilistic Causality P Humphreys ed

Kluwer pp

J Stepan Probability Theory Mathematical Foundations in Czech Academia

Prague

M Studen y Asymptotic b ehaviour of empirical multiinformation Kyb ernetika

n pp

M Studen y Multiinformation and the problem of characterization of conditional

indep endence relations Problems of Control and Information Theory n

pp

M Studen y Convex set functions I and I I research rep orts n and n

Institute of Information Theory and Automation Prague November

M Studen y Conditional indep endence relations have no nite complete characteri

zation in Information Theory Statistical Decision Functions and Random Pro cesses

Transactions of th Prague Conference vol B S Kubk J A Vsek eds Kluwer

pp

M Studen y Multiinformation and conditional indep endence I I research rep ort n

Institute of Information Theory and Automation Prague September

M Studen y Formal prop erties of conditional indep endence in dierent calculi of AI

in Symbolic and Quantitative Approaches to Reasoning and Uncertainty M Clarke

R Kruse S Moral eds Lecture Notes in Computer Science SpringerVerlag

pp

M Studen y Convex cones in nitedimensional real vector spaces Kyb ernetika

n pp

M Studen y Structural semigraphoids International Journal of General Systems

n pp

M Studen y P Bocek CImo dels arising among random variables in Pro ceedings

of WUPES September Trest Czech Republic pp

M Studen y Description of structures of conditional sto chastic indep endence by

means of faces and imsets a series of pap ers International Journal of General

Systems n pp

M Studen y Semigraphoids and structures of probabilistic conditional indep en

dence Annals of Mathematics and Articial Intelligence n pp

M Studen y A recovery algorithm for chain graphs International Journal of Ap

proximate Reasoning n pp

M Studen y On marginalization collapsibility and precollapsibility in Distribu

tions with Given Marginals and Moment Problems V Benes J Stepan eds Kluwer

pp

M Studen y R R Bouckaert On chain graph mo dels for description of conditional

indep endence structures Annals of Statistics n pp

M Studen y Bayesian networks from the p oint of view of chain graphs in Uncer

tainty in Articial Intelligence G F Co op er S Moral eds Morgan Kaufmann

pp

M Studen y Complexity of structural mo dels in Prague Sto chastics August

Prague pp

M Studen y J Vejnarova The multiinformation function as a to ol for measuring

sto chastic dep endence in Learning in Graphical Mo dels M I Jordan ed Kluwer

pp

M Studen y R R Bouckaert T Kocka Extreme sup ermo dular set functions over

ve variables research rep ort n Institute of Information and Automation

Prague January

J Vejnarova Conditional indep endence in p ossibility theory in Pro ceedings of

ISIPTA st International Symp osium on Imprecise Probabilities and their Ap

plications G de Co oman F G Cozman S Moral P Walley eds pp

I Vajda Theory of Statistical Inference and Information Kluwer

T Verma J Pearl Causal networks semantics and expressiveness in Uncertainty

in Articial Intelligence R D Shachter T S Lewitt L N Kanal and J F

Lemmer eds NorthHolland pp

T Verma J Pearl Equivalence and synthesis of causal mo dels in Uncertainty in

Articial Intelligence P P Bonissone M Henrion L N Kanal J F Lemmer

eds Elsevier pp

T Verma J Pearl An algorithm for deciding if a set of observed indep endencies

has a causal explanation in Uncertainty in Articial Intelligence D Dub ois M

P Wellman B DAmbrosio P Smets eds Morgan Kaufmann pp

M Volf M Studen y A graphical characterization of the largest chain graphs

International Journal of Approximate Reasoning n pp

S Watanabe Information theoretical analysis of multivariate correlation IBM

Journal of Research and Development pp

J Whittaker Graphical Mo dels in Applied Multivariate Statistics John Wiley

Y Xiang S K M Wong N Cercone Critical remarks on single link search in

learning b elief networks in Uncertainty in Articial Intelligence E Horvitz F

Jensen eds Morgan Kaufmann pp

Acknowledgements

Dekuji sv ym ro dicum za trpelivost a vsestrannou p omo c a p o dp oru Mo je neter Petra

bude urcite p otesena kdyz si precte ze o ce nuji jej moraln p o dp oru

Many p eople deserve my thanks for a help with this work In particular I would like to

A

thank Marie Kolarova for typing the text of this work in L T X As concerns exp ert help

E

I am indebted to my colleagues and former coauthors Frantisek Mat us and Phil Dawid

for their remarks even for some critical ones made by Fero various advices p ointers

to literature and discussion which help ed me clarify my view on the topic I have also

proted from co op eration with other colleagues some results presented in this work were

achieved with help of computer programs written by Pavel Bocek Remco Bouckaert

Tomas Kocka Martin Volf and Jir Vomlel Moreover I am indebted to my colleagues

Radim Jirousek Otakar Krz and Jirina Vejnarova for encouragement to write this work

which was quite imp ortant for me The co op eration with my colleagues mentioned ab ove

involved joint theoretical research as well As concerns technical help I would like to

A

thank to Vaclav Kelar for making sp ecial L T X fonts for me and to Jarmila Pankova for

E

helping me to print several pages with coloured pictures

Finally I am also indebted to other colleagues all over the world whose pap ers or

b o oks inspired me somehow in connection with this work In particular I would like to

mention my PhD sup ervisor Alb ert Perez However except those who were mentioned

ab ove many others inuences me Let me name some of them Steen Andersson Luis de

Camp os Rob ert Cowell David Cox Morten Frydenberg Dan Geiger Tomas Havranek

Jan Koster Ivan Kramosil Steen Lauritzen Franco Malvestuto Michel Mouchart Chris

Meek Azaria Paz Judea Pearl Michael Perlman JeanMarie Rolin Thomas Richardson

Jim Smith Glenn Shafer Prakash Shenoy David Spiegelhalter Peter Spirtes Wolfgang

Sp ohn Nanny Wermuth Jo e Whittaker Raymond Yeung and Zhen Zhang The ab ove

list is not exhaustive I ap ologize to those who were p ossibly omitted

Let me emphasize that I proted from meeting a lot of colleagues who gave me inspi

ration during the seminar Conditional Indep endence Structures held from September

to Octob er in the Fields Institute for Research in Mathematical Sciences

Toronto Canada and during several events organized within the framework of ESF pro

gram Highly Structured Sto chastic Systems for In particular let me thank

to Helene Massam and Steen Lauritzen who gave me a chance to participate actively

in these wonderful events For example I remember stimulating atmosphere of HSSS

research kitchen Learning conditional indep endence mo dels held in Trest Czech Re

public in Octob er