ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS

GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

Master Project

March 2019

Student: G.D. Cohen Tervaert

First supervisor: Prof. Dr. A. C. D. van Enter

Second supervisor: Dr. D. Valesin

1 2 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

Abstract. We perform numerical estimates for the q-state Potts model on a k-order Cayley tree. K¨ulske, Rozikov and Khakimov explicitly calculated up to 2q − 1 TISGMs (Translation Invariant Splitting Gibbs Measures) for the binary tree (k = 2), without an external field (α = 0). We extend these results numerically for k > 2 and α 6= 0. We conjecture that for α ≥ k − 1 the model has uniqueness. Additionally, decay of memory is proved for a Potts-type model on a Bayesian network with up to two parents and a counterexample is given for a more general case.

Contents 1. Introduction 3 2. Cayley tree 4 2.1. Definitions 4 3. 4 3.1. Definitions 4 3.2. Phase transitions 5 3.3. Compatible measures 6 3.4. Numerical example 6 4. Potts model 7 4.1. Definition 7 4.2. Compatible measures 8 4.3. Uncountable set of Gibbs measures 9 4.4. Translation invariant Gibbs measures 9 4.5. Binary tree explicit solutions 12 4.6. Higher-order trees 13 4.7. External fields 16 4.8. Dobrushin’s condition 18 5. Bayesian Networks 22 5.1. Definition 22 5.2. Excluding remote generations 23 6. Decay of memory on Bayesian networks 25 6.1. Ising-type model 25 6.2. Potts-type model 26 6.3. Counterexample for a general discrete model 35 7. Discussion and Conclusion 38 References 38 Internet 39 Applications 39 ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS 3

1. Introduction In this thesis, several systems of random variables will be discussed. The theory of ”interacting random variables” is a domain of probability theory that is gaining popularity. The goal of this theory is comprehending behaviour of large random systems. Applications are mainly in , but also in other fields such as computer science or biology. In statistical mechanics, the concept ’phase transition’ is often relevant. Phase transitions are sudden shifts of the state which can occur at certain temperatures. For example, matter changing from liquid to solid state. These phase transitions for the Ising model on a Cayley tree have been extensively studied [3]. For the Potts model on a Cayley tree there is still some ground to explore. The Ising model is used to describe ferromagnetism in statistical mechanics. The Potts model is a generalisation of the Ising model, in which the spins can take up to q different states whereas in the Ising model they can only take the values −1 and +1. These models consist of networks of interacting random variables. They will be described first as Gibbs measures on Cayley trees and later as Bayesian Networks to simulate inter-dependencies. In this thesis two topics will be investigated: First, we will describe the Ising and the Potts model on a Cayley tree. Results from the literature will be presented and extended with our own numerical results. Then, we will discuss decay of memory on Bayesian networks. We will present results from our efforts to set conditions for decay, when all variables have two parents or less. That is, earlier results from the Ising model are generalised to the Potts model and a counterexample for decay is given for a more general case. Throughout the thesis, to show the dynamics of these models, computer simu- lations will be added for both the Ising and the Potts model. For readability’s sake, we try not to complicate formulas if not necessary and give some explanation where appropriate (E.g. We omit variables that are not relevant for the calculations and we expand extra on the transition from h˜ ∈ Rq to h ∈ Rq−1 in the Potts model). This work extends the main theorem of my Bachelor thesis to a Potts-type model. 4 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

2. Cayley tree A tree in which each non-leaf graph vertex has a con- stant number of branches n is called a (n − 1)-Cayley tree. 1-Cayley trees are path graphs. The picture on the right shows a 2-Cayley tree. Both the Ising model as well as the Potts model will be defined as a collection of random variables on a Cayley tree.

2.1. Definitions. Let Γk = (V,L) be the Cayley tree of order k, which is a (k + 1) regular infinite tree. V is the set of vertices and L the set of edges. Two vertices x ∈ V and y ∈ V are called nearest neighbours if there exists an edge l ∈ L : l = hx, yi. A path from x0 ∈ V to xn ∈ V is a collection of nearest neighbour pairs hx0, x1i, hx1, x2i, ..., hxn−1, xni. The distance d(x0, xn) = n is the number of edges of the shortest path from x0 to xn. For a fixed arbitrary vertex x0 ∈ V called the root, we set n [ Wn = {x ∈ V |d(x, x0) = n},Vn = Wi i=0

The set of direct successors of x ∈ Wn will be defined as:

S(x) = {y ∈ Wn+1 : d(x, y) = 1}

Ln will be the subset of edges restricted to Vn:

Ln = {hx, yi ∈ L : x ∈ Vn & y ∈ Vn}

3. Ising model The Ising model is named after the German physicist Ernst Ising. It describes ferromagnetism and consists of a collection of interdependent random variables, or spins, that can be in one of two states (+1 or −1). The model allows the identification of phase transitions. The one-dimensional Ising model has no phase transition and was solved by Ising himself in his 1924 thesis. The two-dimensional square-lattice Ising model is a simple statistical model that does show a phase transition.

3.1. Definitions. Let Γk = (V,L) be the infinite Cayley tree of order k, where V is the set of vertices and L is the set of edges. The state space X = {−1, 1} and the set of configurations is defined as Ω = XV . We say that for Λ ⊂ V and σ ∈ Ω: σΛ = {σλ : λ ∈ Λ}. Let J > 0. The Hamiltonian of the ferromagnetic Ising model on the volume Vn is defined as:

0 X Hn(σ) = −J σxσy

hx,yi∈Ln where hx, yi is summed over all pairs of nearest neighbour vertices. Introducing the ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS 5 boundary condition η = {ηi ∈ R : i ∈ V } on the volume Vn yields:

η X X Hn(σ) = −J σxσy − J σxηy

hx,yi∈Ln hx,yi∈Ln+1 x∈Wn y∈Wn+1

We say that ηx = 1 for all x ∈ V is the plus boundary condition, ηx = −1 for all x ∈ V is the minus boundary condition and ηx = 0 for all x ∈ V is the free boundary condition. ¯ Next, we add possibly spatially dependent external fields h = {hi ∈ R : i ∈ N}:

n η η X X Hn,h¯ (σ) = Hn(σ) − hkσx k=0 x∈Wk

If we add the inverse temperature β > 0, we can define the Gibbs measure for the Ising model on a finite Cayley tree with boundary condition η and with spatially dependent external fields h¯ by

exp{−βHη (σ)} η n,h¯ (3.1) µn,β,h¯ (σ) = η Zn,β,h¯ η Zn,β,h¯ is the partition function:

η X η Zn,β,h¯ = exp{−βHn,h¯ (σ)} σ∈ΩVn

The set of Gibbs measures Gβ,h¯ is defined as the convex hull of the set of all weak limits of the Gibbs measures on the volumes Vn. All measures that correspond to some sequence of boundary conditions belong to the set:

  ηmn Gβ,h¯ = conv µ : lim µ ¯ (σ) = µ; for all increasing (mn)n≥1, and (ηn)n≥1 n→+∞ mn,β,h

The model is said to undergo a phase transition if there exists a β for which |Gβ,h¯ | > + − 1. This existence of a phase transition is equivalent to stating that µβ,h¯ 6= µβ,h¯ , + + ηn where µβ,h¯ = limn→∞ µn,β,h¯ , the limit measure of the plus boundary condition − + − ηn (ηn,x = 1 : n ∈ N, x ∈ V ), and µβ,h¯ = limn→∞ µn,β,h¯ , the limit measure of the − minus boundary condition (ηn,x = −1 : n ∈ N, x ∈ V ). The model is said to have uniqueness at inverse temperature β if |Gβ,h¯ | = 1.

k 3.2. Phase transitions. On Z , a homogeneous external field hn = h ∈ R\{0} for all n ≥ 0, implies uniqueness. Absence of an external field implies appearance of phase transitions for k > 1. However, on a Cayley tree, phase transitions can also appear if the non-zero external fields are weak enough. 6 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

3.3. Compatible measures. Probability measures, as defined in (3.1), are called compatible if for all n ≥ 1 and σ ∈ ΩVn−1 : X (3.2) µn(σ ∨ ω) = µn−1(σ)

ω∈ΩWn Here σ ∨ ω represents the concatenation of configurations σ and ω. For each sequence of compatible measures, Kolmogorov’s theorem states that there exists a unique measure µ such that for all n and σVn ∈ ΩVn :

µ(σ|Vn = σVn ) = µn(σVn ) Such a measure µ is called a splitting Gibbs measure. If the boundary conditions only depend on the distance from the root, ηx = ηn, n = d(x, x0), ∀x ∈ V , a recurrence relation between the boundary conditions and the external fields can be deducted from equations (3.1) and (3.2):

(3.3) ηn = hn−1 + kF (ηn+1, θ), where θ = tanh(βJ),F (x, θ) = arctanh(θ tanh x) and n ≥ 2. For the homogeneous fields case, hn = h ∈ R, the recurrence relation simplifies ∗ to ηn = η ∈ R: ∗ ∗ ∗ η = h + kF (η , θ) := ψh(η ) where ψh(x) = h + kF (x, θ). We know that, assuming translation-invariance, there exists a βc(k) = ln(1/k)/J and, for β > βc, we have a hc(β, k) > 0 such that:

• If β ≤ βc(k) or |h| > hc(β, k), there is uniqueness - the function ψh(x) has exactly one fixed point. • if β > βc(k) and |h| < hc(β, k), the model undergoes a phase transition - the function ψh(x) has exactly three fixed points • if β > βc(k) and |h| = hc(β, k), the model undergoes a phase transition - the function ψh(x) has exactly two fixed points For the heterogeneous case asymptotically approaching a homogeneous critical external field, hn = −hc − εn, εn > 0, the following condition is equivalent to the existence of a phase transition [2]:

n n !2 X X lim εi < ∞ n→∞ j=1 i=j Next, we will show how compatible measures for an arbitrary example are cal- culated. 3.4. Numerical example. Suppose we want to calculate the exact measure µ2 with compatible µ1 and µ0 if k = 2, h = 0, J = 1 and β = 1 with the plus boundary condi- tion.

We assign the variables to the vertices of a rooted Cayley tree. In a rooted Cay- ley tree the root vertex has k edges and all other vertices have k + 1 edges. This is pre- ferred over the normal Cayley tree because the recurrence relation (3.3) now holds for ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS 7 all n ≥ 1. We set η3 = 1 for the plus boundary condition, Application A calculates the following output:

X η3 V2 Z2 = exp(H2,0(σ)) = 25145.12, |X | = 128, |V2| = 7 σ∈XV2

µ2(σ0|η3) = (µ2(σ0 = 1|η3), µ2(σ0 = −1|η3)) = (0.96003, 0.03997)

To construct a compatible measure for µ1, with equation (3.3) we set η2 = 2F (1, tanh(1)) = arctanh(tanh(1) tanh(1)) = 1.325:

X η2 V1 Z1 = exp(H1,0(σ)) = 111.03, |X | = 8, |V1| = 3 σ∈XV1

µ1(σ0|η2) = (0.96003, 0.03997)

V1 Indeed these measures give equal probabilities for all σ ∈ X . For µ0, we set η1 = 2F (1.325, tanh(1)) = arctanh(tanh(1) tanh(1.325)) = 1.58946:

X η1 V0 Z0 = exp(H0,0(σ)) = 5.10515, |X | = 2, |V0| = 1 σ∈XV0

µ0(σ0|η1) = (0.96003, 0.03997)

Hence with this recurrence relation (3.3) for boundary conditions (ηn) it is possible to calculate probabilities for the root vertex for larger values of n, where exact computer calculations have limitations even for small values of n due to the double exponential growth of the model.

4. Potts model The Potts model is named after the Australian mathematician Renfrey Burnard Potts, who introduced the model near the end of his 1951 thesis on the suggestion of his supervisor Cyril Domb. It is a generalization of the Ising model in the sense that the spins can take q different values instead of 2. The 2-state Potts model is equivalent to the Ising model. Configurations where adjacent spins take the same value have an increased prob- ability, if J > 0, which is often the case. If a positive or negative external field α is added, configurations with more spins taking the value 1 will have their probability increased or decreased respectively. 4.1. Definition. As in the definition for the Ising model, let Γk = (V,L) be the Cayley tree of order k, where V is the set of vertices and L is the set of edges. The spins take values in the set Φ := {1, 2, ..., q} and are placed on the vertices of the tree. Ω = ΦV is the set of all possible configurations, σ ∈ Ω is then defined as a function x ∈ V → σ(x) ∈ Φ. We say that for Λ ⊂ V : ωΛ = {ωλ : λ ∈ Λ}. The formal Hamiltonian of the Potts model on the volume Vn is defined as:

X X (4.1) Hn(σ) = −J δσ(x)σ(y) − α δ1σ(x)

hx,yi∈Ln x∈Vn where hx, yi is summed over all pairs of nearest neighbour vertices in Vn, σ ∈ Ω is 8 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN a configuration, J ∈ R is a coupling constant, α ∈ R is an external field, and δij is the Kronecker’s symbol:  0, if i 6= j δ = ij 1, if i = j ˜ ˜ ˜ Introducing the boundary condition, a collection of vectors {hx = (h1,x, ..., hq,x) ∈ q R , x ∈ V } on the volume Vn yields:

h˜ X ˜ Hn (σ) = −βHn(σ) + hσ(x),x

x∈Wn

Now we define a set of finite-dimensional probability measures {µn : n ≥ 0} in the volumes Vn as: h˜ exp (Hn (σ)) (4.2) µn(σ) = Zn where Zn is the normalising factor:

X h˜ Zn = exp Hn (σ) σ∈ΦVn

4.2. Compatible measures. We say that the probability distributions µn are Vn−1 compatible iff for all n ≥ 2 and σn−1 ∈ Φ : X (4.3) µn(σn−1 ∨ ωn) = µn−1(σn−1) W ωn∈Φ n

Here σn−1 ∨ ωn is the concatenation of the configurations. If the measures are compatible, according to Kolmogorov’s theorem, there exists V Vn a unique measure µ on Φ such that, for all n ≥ 0 and σn ∈ Φ :

µ({σ|Vn = σn}) = µn(σn)

The following theorem guarantees compatibility of µn(σn)

Theorem 1. Probability distributions µn(σn) , n ≥ 1 of (4.2) are compatible iff ∀x ∈ V \{x0}: X (4.4) hx = F (hy, θ, α) y∈S(x)

q−1 Here F : h = (h1, ..., hq−1) ∈ R → F (h, θ, α) = (F1, ..., Fq−1) is defined as:

hi Pq−1 hj ! (θ − 1)e + j=1 e + 1 Fi = αβδ1i + ln q−1 P hj θ + j=1 e For a proof please read [5]. The reason why there are just q − 1 equations ˜ ˜ to satisfy and not q is that hx,q can be subtracted from all values of hx without changing the probability measure: ˜ ˜ hi,x = hi,x − hq,x , i = 1, . . . , q − 1. q−1 It follows that for all h = {hx ∈ R : x ∈ V } satisfying the equation (4.4) there exists a unique Gibbs measure µ and vice versa. ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS 9

1 4.3. Uncountable set of Gibbs measures. If two distinct stable points h∗ and 2 h∗ of kF (h, θ) exist, then the Bleher-Ganikhodjaev construction shows the exis- tence of uncountably many Gibbs measures for the Potts model on a k-order Cayley tree[3]. In the Bleher-Ganikhodjaev construction, a real number 0 ≤ t ≤ 1 can un- ambiguously be identified with a Gibbs measure µt. We define y > x if there exists 0 a path x = x0, x1, . . . , xn = y from x to y that ”goes upward”, i.e. d(xm, x ) = 0 d(xm−1, x ) + 1, m = 1, . . . , n. We label all edges by l0(x), l1(x), . . . , lk−1(x) going upward from the vertex x ∈ V 0 in the rooted tree V 0. Then, we identify an infinite 0 path π = {x = x0 < x1 < . . . }, represented by the sequence i1i2 ... of edges, in ∈ {0, 1, . . . , k − 1}, to a real number ∞ X in t = t(π) = , 0 ≤ t ≤ 1. kn n=1

We define x ≺ y for x, y ∈ Wn if x is represented by the sequence i1i2 . . . in and y by j1j2 . . . jn, where i1 = j1, i2 = j2, . . . , im = jm, but im+1 < jm+1, for some m. Next, the set hπ is defined by the conditions  1 π h∗, if x ≺ xn, x ∈ Wn hx = 2 h∗, if xn ≺ x, x ∈ Wn

By iterating arbitrary starting vectors on the path π through equation (4.4), π one can show that they converge to a unique set of Rq−1-dimensional vectors h = π 0 π(t) {hx, x ∈ V } satisfying equation (4.4). By Theorem 4.2, every set h can be associated with a Gibbs measure µt.

4.4. Translation invariant Gibbs measures. By restricting to the translation- q−1 invariant Gibbs measures, i.e. hx = h ∈ R ∀x ∈ V , and assuming absence of an external field, α = 0, the number of those Gibbs measures decreases to at most q q 2 − 1 [1]. Define θcr = 1 + k−1 . Theorem 2. For the q-state ferromagnetic (J > 0) Potts model on the Cayley tree of order k ≥ 2 there are critical values θm ≡ θm(k, q), m = 1,..., [q/2] such that the following statements hold

(1) θ < θ < ··· < θ q ≤ θ 1 2 [ 2 ] cr (2) If θ < θ1 then there exists a unique TISGM 10 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

q Pm q (3) If θm < θ < θm+1 for some m = 1,..., [ 2 ] − 1 then there are 1 + 2 s=1 s TISGMs q (4) If θ q < θ 6= θ then there are 2 − 1 TISGMs [ 2 ] cr (5) If θ = θcr then the number of TISGMs is  2q−1, if q is odd q−1 q−1 2 − q/2 , if q is even q  Pm−1 q (6) If θ = θm, then there are 1 + m + 2 s=1 s TISGMs. For a proof please see [1]. In this translation-invariant case without an external field, equation (4.4) trans- forms to h = kF (h, θ):

hi Pq−1 hj ! (θ − 1)e + j=1 e + 1 (4.5) hi = k ln q−1 , i = 1, . . . , q − 1. P hj θ + j=1 e

hi Substituting e = zi gives: Pq−1 !k (θ − 1)zi + zj + 1 (4.6) z = j=1 , i = 1, . . . , q − 1. i Pq−1 θ + j=1 zj All solutions to this system of equations (4.6) are of the form [1]:  1, if i∈ / M z = i z∗, if i ∈ M for some M ⊂ {1, . . . , q − 1} and z∗ > 0. So all solutions to these equations have a number of equally preferred states over the other states. It follows that any TISGM of the Potts model (4.6) corresponds to a solution of the following single equation: (θ + m − 1)z + q − mk (4.7) z = f (z) = m mz + q − m − 1 + θ

Here m ∈ {1, . . . , q − 1}, z ∈ R>0. ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS11 √ In equation (4.7) we substitute x = k z and get

(4.8) mxk+1 − (θ + m − 1)xk + (θ + q − m − 1)x − q + m = 0

graph for k = 3, q = 4, θ = 4.3, m = 1 x = 1 is always a solution, so we can write (4.8) as

(x − 1) ϕm(x, θ) = 0

k k−1 k−2 ϕm(x, θ) = mx − (θ − 1)(x + x + ··· + x) + q − m Because the number of sign changes of (4.8) is at most three, we know that there are at most three positive solutions for each m. This set, containing one, two or three solutions to equation (4.7), will be denoted by z(m) = exp(h(m)).

• If |z(m)| = 1, then only z0 = 1 is a solution which generates one TISGM ¯ q µ0 described by any point on the line h0 ⊂ R :

¯ q h0 = {h ∈ R : h1 = h2 = ··· = hq}

• If |z(m)| = 2, then again z0 = 1 is a solution which generates µ0. The other q  solution 1 6= z1 ∈ z(m) generates m TISGMs given by any point on the ¯ q line h1(M) ⊂ R :  ln(z ) + c, if i ∈ M h¯ (M) = 1 1i c, if i∈ / M

For each M ⊂ {1, . . . , q}, |M| = m, c ∈ R is a constant. The number of q  TISGMs is 1+ m . • If |z(m)| = 3, then all above TISGMs are generated. Also another 1 6= z2 ∈ q  ¯ q z(m) generates m TISGMs given by any point on the line h2(M) ⊂ R :  ln(z ) + c, if i ∈ M h¯ (M) = 2 2i c, if i∈ / M

For each M ⊂ {1, . . . , q}, |M| = m, c ∈ R is a constant. The number of q  TISGMs is in this case 1 + 2 m . 12 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

The above TISGMs are represented by a vector h˜ ∈ Rq which corresponds to the measure compatible with (4.2). The corresponding TISGM will be denoted by

µh(m)1M ,

1M = (e1, . . . , eq), with ei = 1 if i ∈ M, ei = 0 if i∈ / M Each TISGM for the Potts model on a Cayley tree corresponds to a solution of (4.7) in combination with a set M ⊂ {1, . . . , q}, represented by a line h¯ ⊂ Rq or a q−1 ˜ q vector h ∈ R , with the implicit h = (h1, h2, . . . , hq−1, 0) ∈ R for the vector.

4.5. Binary tree explicit solutions. To calculate explicit solutions for (4.7) and√ to obtain formulas for critical temperatures, we assume k = 2. Substituting x = z in (4.7) yields: mx3 − (θ + m − 1)x2 + (θ + q − m − 1)x − 1 + m = 0 This equation always has the solution x = 1. Furthermore, if p θ > θm = 1 + 2 m(q − m), m = 1, . . . , q − 1 then two extra solutions exist: θ − 1 − p(θ − 1)2 − 4m(q − m) θ − 1 + p(θ − 1)2 − 4m(q − m) x (m) = , x (m) = 1 2m 2 2m if θ = θm, then we have only one extra solution because x1(m) = x2(m). If θ = θcr = q + 1,  1, if q ≥ 2m  1, if q ≤ 2m x1(m) = q x2(m) = q m − 1, if q < 2m m − 1, if q > 2m

Moreover, if q = 2m, then x1(m) = x2(m) = 1. So the above defined explicit values θ < θ < ··· < θ q < θ q ≤ q + 1 generate the regions for which the number of 1 2 [ 2 −1] [ 2 ] TISGMs is constant. Proposition 1. Let k = 2, J > 0

(1) If θ < θ1 then the system of equations (4.5) has a unique solution h0 = (0, 0,..., 0); q (2) If θm < θ < θm+1 for some m = 1,..., [ 2 ] − 1 then the system of equations (4.5) has solutions q − 1 h = (0, 0,..., 0), h (s), h (s), i = 1,..., , 0 1i 2i s q − 1 h0 (q − s), h0 (q − s), i = 1,..., , s = 1, 2, . . . , m 1i 2i q − s 0 where hji(s), (resp. hij(q − s)) j = 1, 2 is a vector with s (resp. q − s) coordinates equal to 2 ln xj(s) (resp. 2 ln xj(q − s)) and the remaining q − s − 1 (resp. s − 1) coordinates equal to 0. The number of such solutions is: m X q 1 + 2 ; s s=1 q (3) If θ q < θ 6= q + 1 then there are 2 − 1 solutions to (4.5); [ 2 ] (4) If θ = q + 1 then the number of solutions to (4.5) is  2q−1, if q is odd q−1 q−1 2 − q/2 , if q is even; ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS13

q (5) If θ = θ , m = 1,..., [ ], (θ q 6= q + 1) then h (m) = h (m) and the m 2 [ 2 ] 1i 2i number of solutions to (4.5) equals m−1  q  X q 1 + + 2 . m s s=1 For a proof please read [1]. All solutions to (4.5) correspond one-to-one to all TISGMs, so this gives the full description of TISGMs for the binary tree. All results so far have been described in [1]. In the next chapter we will extend these numerically for higher-order Cayley trees and by introducing external fields. 4.6. Higher-order trees. For k ≥ 2, we recall equation (4.8): (4.8) mxk+1 − (θ + m − 1)xk + (θ + q − m − 1)x − q + m = 0 A solutionx ˙ to this equation produces either 1 (forx ˙ = 1) TISGM h¯ = q  ¯ (0, 0,..., 0) or m (forx ˙ ∈ R>0 \{1}) TISGMs generated by h:  k ln(x ˙) + c, if i ∈ M h¯(M) = c, if i∈ / M For each M ⊂ {1, . . . , q}, |M| = m, c ∈ R is a constant. Application D can be used to calculate explicit solutions for different values of k, θ, m with q = 3. 4.6.1. Numerical model. For example, if we want to estimate critical values for θ and to construct TISGMs we introduce a computer simulation (Application C): q The table below gives all critical θ’s for 2 ≤ k ≤ 7 and 3 ≤ q ≤ 7 and m ≤ 2 . We recall that k is the order of the Cayley tree, i.e. all vertices have k +1 branches. The spins of the Potts model can take q different values, θ = exp(Jβ) is a variable in the Potts model proportional to the exponential of the inverse temperature β, and m = |M| is the number of states that have a different probability compared to the others in the equilibrium. The graph underneath shows critical thresholds q for uniqueness. These thresholds for m are equal for q − m, hence only m ≤ 2 is shown. 14 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

θm(k, q)

We see that for lower values of q, lower values of m and higher-order Cayley trees, we have lower critical values θm. The exact formula for these critical values can be investigated in future work. Numerical solutions for different values of θ can be calculated with Application C. E.g. for q = 6, k = 5, m = 2, θ = 2.45 it finds 43 TISGMs:

Pm q This equals the number of TISGMs according to Theorem 2: 1+2 s=1 s = 43 for m = 2. The values in the right column should be read as hx, (e.g. one of the ˜ twelve TISGMs for m = 1 is: hx = (4.1566, 0, 0, 0, 0) =⇒ h = (4.1566, 0, 0, 0, 0, 0) which produces the TISGM µ4.1566 1(1,0,0,0,0,0) ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS15

4.6.2. Numerical example. Suppose we want to find the TISGMs for k = 2, q = 3, θ = 4.1,J = 1 and α = 0. We assign the variables to the vertices of a rooted Cayley tree. such that the recurrence relation (4.4) holds for all vertices includ- ing the root. By assigning values to all static variables, all probabilities for µn can be calculated numeri- cally: ˜ If we set hx = (1, 0, 0) ∀x ∈ V , then for n = 1 we have the following output (Application B):

X h˜ V1 Z1 = exp(H1 (σ)) = 295.04, |Φ | = 27, |V1| = 3 σ∈ΦV1 ˜ ˜ ˜ ˜ µ1(σ0|h) = (µ1(σ0 = 1|h), µ1(σ0 = 2|h), µ1(σ0 = 3|h)) = (0.5856, 0.2072, 2072) while for n = 0:

V0 ˜ Z0 = 4.718, |Φ | = 3, |V0| = 1, µ0(σ0|h) = (0.57612, 0.21194, 0.21194) ˜ ˜ This shows that these measures are not compatible, µ0(σ0|h) 6= µ1(σ0|h). Indeed, equation (4.5) is not satisfied. We iterate equation (4.5), starting with h = (1, 0), until we have a constant vector (for at least 14 decimals) using Application C. This ˜ yields hx = (1, 562709, 0) =⇒ hx = (1.562709√ , 0, 0). This equals the exact cal- 2 ˜ (θ−1+ (θ−1) −4∗2) 2 culation hx = (2 ln x2(1), 0, 0) = (2 ln( 2m ) ), 0, 0) = (1.5627, 0, 0). ˜ Updating this value for hx, ceteris paribus, we have for n = 1:

X h˜ V1 Z1 = exp(H1 (σ)) = 659.91, |Φ | = 27 σ∈ΦV1 ˜ µ1(σ0|h) = (0.7047, 0.1477, 0.1477) while for n = 0:

V0 ˜ Z0 = 6.772, |Φ | = 3, µ0(σ0|h) = (0.7047, 0.1477, 0.1477) ˜ ˜ ˜ Now the measures are compatible, µ0(σ0|h) = µ1(σ0|h). So this vector hx = V (1.562709, 0, 0) generates a TISGM µ1.5627 1(1,0,0) on Φ that favours the state 1 over the states 2 and 3. By symmetry, we have the TISGM that favours the state 2 ˜ ˜ generated by hx = (0, 1.5627, 0), and that favours the state 3 by√hx = (0, 0, 1.5627). 2 ˜ (θ−1− (θ−1) −4∗2) 2 Then we have three vectors hx = (2 ln x1(1), 0, 0) = (2 ln( 2m ) ), 0, 0) = (−0, 17641, 0, 0) favouring two states over the other state. This can also be found by iterating equation (4.5) starting from h = (−1, 0). With the ˜ disordered state h0 = (0, 0, 0) this results in a solution with 7 TISGMs:

µ0, µ1.5627 1(1,0,0) , µ1.5627 1(0,1,0) , µ1.5627 1(0,0,1) , µ0.17641 1(0,1,1) ,

µ0.17641 1(1,0,1) , µ0.17641 1(1,1,0) . By Proposition 1 we know that this solution is com- plete, 2q − 1 = 7. 16 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

Critical θ To find the critical θ, if we relax the condition θ = 4.1, we perform a binary search between√ β = 0 and β = exp(50) yielding βc = 1.34245 =⇒ θ1 = 3.8284 (= 1 + 2 2). √ • If 1 + 2 2 = θ1 < θ√6= θcr = 4 the number of TISGMs is 7. • If θ = 4 or θ = 1 + 2 2, the states favouring two states over the third state disappear, the√ number of TISGMs is 4. • If θ < 1 + 2 2, the disordered h˜ = (0, 0, 0) is the unique TISGM. √ 0 ˜ Moreover, if θ < 1 + 2 2, h0 = (0, 0, 0)√ attracts all starting points through iteration of equation (4.5). Between 1 + 2 2 ≤ θ < 4 the disordered attractor range decreases gradually and if 4 ≤ θ, only h0 iterates to the disordered TISGM. The states favouring more than one state can only be reached through iteration of equation (4.5) if in the starting vector these states have exactly the same highest value, and√ are not within the disordered attractor range. For 1 + 2 2 ≤ θ, equilibria favouring a single state are attractors of all start- ing points that have this state as the single highest value and are not within the disordered attractor range. We see this behaviour also for larger values of k and q. Iterating through equation (4.5) usually leads to the single preferred states (for higher θ’s) and the disordered state (for lower θ’s). 4.7. External fields. By introducing an external field (α 6= 0) for TISGMs, the equilibria change. Equation 4.4 can be written as

h1 Pq−1 hj ! (θ − 1)e + j=1 e + 1 h1 = αβ + k ln q−1 P hj θ + j=1 e

hi Pq−1 hj ! (θ − 1)e + j=1 e + 1 hi = k ln q−1 i = 2, . . . , q − 1 P hj θ + j=1 e 4.7.1. Negative external fields. For negative external fields, the single preferred state equilibria shift away from the state 1. This means that for the single preferred state 1 to reach equilibrium, higher values of θ are necessary. The graph below shows the critical β’s above which the single preferred state 1 has a corresponding TISGM, for different negative values of α, multiple values for k and with q = 3. We choose β = ln(θ)/J, where J = 1 for the y-axis to have a better graph. ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS17

β1(k, 3, α) - single preferred state 1

It seems that this TISGM disappears for α ≤ 1 − k. Higher values of q give similar graphs with higher critical β’s. We see that other states, > 1, have stronger attraction and reach single preferred state equilibrium for lower values of θ.

β1(k, 3, α) - single preferred state > 1 18 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

So for all β’s greater than the corresponding β1(k, 3, α) in the above graph, the model undergoes a phase transition, as the model can take at least two equilibria: one where the single preferred state is 2 and one where the single preferred state is 3.

4.7.2. Positive external fields. By introducing a positive external fields (α > 0) for TISGMs, the single preferred state equilibria shift towards state 1. For all β > 0, we can find a TISGM with the single preferred state 1. Other states can reach single preferred state equilibrium if β is large enough. The graph below shows these critical values for β above which the model undergoes a phase transition.

β1(k, 3, α) - single preferred state > 1

We see that for higher values of α, other states can only attain single preferred state equilibrium for higher values of β. For α ≥ k − 1, we can’t find a β ≤ 50 to guar- antee a phase transition. This implies, most probably, uniqueness, which can be investigated in future research.

4.8. Dobrushin’s condition. In 1968, Roland Dobrushin, whose doctoral advisor was , described a technique to show absence of phase transi- tions for certain models of interacting variables. In this chapter this technique will be discussed and applied to the Potts model on a regular tree. We will compare these calculations with results from earlier sections. Dobrushin defined a matrix that estimates maximal interdependencies between variables of a system. This ma- trix has the property that if every row sums up to less than one, the set of measures of the model contains at most one element. First, a distance is defined between probability measures α1 and α2 on some mea- surable state space (X, χ): ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS19

(4.9) kα1 − α2k = sup |α1(A) − α2(A)| A∈χ

Then, the S × S-matrix C(γ) = (Ci,j(γ))i,j∈S is defined such that each element Ci,j(γ) will estimate maximal dependence of the variable at vertex i on the variable at vertex j for some specification1 γ. A specification is a consistent set of probability kernels that describes a system of interacting random variables. The elements of the matrix C are defined as: 0 0 (4.10) Ci,j(γ) = sup γi (·|ζ) − γi (·|η) ζ,η∈Ω ζS\{j}=ηS\{j}

0 Here i, j ∈ S and γi is a probability kernel defined as 0 γi (A|ω) = γi({σi ∈ A}|ω)(A ∈ χ, ω ∈ Ω, i ∈ S)

Dobrushin’s uniqueness theorem now states: Theorem 3. Let γ be a specification. If γ satisfies Dobrushin’s condition: X (4.11) c(γ) := sup Ci,j(γ) < 1 i∈S j∈S Then the set of measures which is admitted by γ contains but one element: |G (γ)| ≤ 1. If (X, χ) is a Borel space, then |G (γ)| = 1. A measure is admitted by γ if it has the same conditional probabilities as γ. For a proof please read [7]. To calculate the Dobrushin matrix for the Potts model on a regular tree, the ele- ment Ci,j of equation (4.10) can be investigated for a measure µ, admitted by γ, compatible to (4.2):

(4.12) Ci,j = sup kµ(σi|ζ) − µ(σi|η)k ζ,η∈Ω ζS\{j}=ηS\{j} Because the state space Φ is a finite set, we can write 1 X (4.13) kµ(σ |ζ) − µ(σ |η)k = |µ(σ = a|ζ) − µ(σ = a|η)| i i 2 i i a∈Φ The first term inside the absolute value bars on the right hand side of equation (4.13) can be written as

µ(σi = a, σS\{i} = ζS\{i}) (4.14) µ(σi = a|ζ) = P b∈Φ µ(σi = b, σS\{i} = ζS\{i}) Define Θ = {hx, yi ⊂ L : x = i or y = i} as the set of all edges incident to vertex i. Substituting equation (4.2) yields (4.15) P P P exp(β(J( hx,ii∈Θ δaζx + hx,yi∈L\Θ δζxζy ) + α( x∈V \{i} δ1ζx + δ1a))) (4.14) = P P P P b∈Φ exp(β(J( hx,ii∈Θ δbζx + hx,yi∈L\Θ δζxζy ) + α( x∈V \{i} δ1ζx + δ1b)))

1for a formal definition of a specification, see Georgii 1.23, page 16 20 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

All factors of L\Θ and the magnetic field terms of vertices in V \{i} cancel out: P exp(β(J hx,ii∈Θ δaζx ) + αδ1a) (4.16) (4.15) = P P b∈Φ exp(β(J( hx,ii∈Θ δbζx ) + αδ1b) Analogically, the second term inside the absolute value bars on the right hand side of equation (4.13) can be written as P exp(β(J hx,ii∈Θ δaηx ) + αδ1a) (4.17) µ(σi = a|η) = P P b∈Φ exp(β(J( hx,ii∈Θ δbηx ) + αδ1b) Substituting equations (4.16) and (4.17) in equation (4.13) gives (4.18) P P exp(β(J hx,ii∈Θ δaζx ) + αδ1a) exp(β(J hx,ii∈Θ δaηx ) + αδ1a) (4.13) = P P −P P b∈Φ exp(β(J( hx,ii∈Θ δbζx ) + αδ1b) b∈Φ exp(β(J( hx,ii∈Θ δbηx ) + αδ1b) Now we separate two cases: one where hi, ji ∈ L and one where hi, ji ∈/ L. Suppose hi, ji ∈/ L, then we know that hx, ii ∈ Θ =⇒ ζx = ηx, because ζS\{j} = ηS\{j}, so we can write (4.19) P P exp(β(J hx,ii∈Θ δaζx ) + αδ1a) exp(β(J hx,ii∈Θ δaζx ) + αδ1a) (4.18) = P P −P P = 0 b∈Φ exp(β(J( hx,ii∈Θ δbζx ) + αδ1b) b∈Φ exp(β(J( hx,ii∈Θ δbζx ) + αδ1b)

This shows that only elements Ci,j where hi, ji ∈ L, have a positive value, other elements have value 0. Hence every row i will have at most (k+1) positive elements, one for every vertex j adjacent to i. For hi, ji ∈ L we have  P 1 X exp(β(J hx,ii∈Θ δaζx ) + αδ1a) Ci,j = sup P P 2 exp(β(J( δbζ ) + αδ1b) ζ,η∈Ω a∈Φ b∈Φ hx,ii∈Θ x (4.20) ζS\{j}=ηS\{j} P  exp(β(J hx,ii∈Θ δaηx ) + αδ1a) −P P b∈Φ exp(β(J( hx,ii∈Θ δbηx ) + αδ1b) We recall Dobrushin’s condition (4.10). X (4.10) c(γ) := sup Ci,j(γ) < 1 i∈S j∈S Because a Cayley tree is symmetric, it follows that 1 X (4.21) c(µ) = (k + 1) sup |µ(σ = a|ζ) − µ(σ = a|η)| 2 i i ζ,η∈Ω a∈Φ ζS\{j}=ηS\{j} So the following inequality is equivalent to Dobrushin’s condition for the Potts model on a Cayley tree: exp(β(J P δ ) + αδ ) k + 1 X hx,ii∈Θ aζx 1a × sup P P 2 exp(β(J( δbζ ) + αδ1b)) ζ,η∈Ω a∈Φ b∈Φ hx,ii∈Θ x (4.22) ζS\{j}=ηS\{j} exp(β(J P δ ) + αδ ) hx,ii∈Θ aηx 1a −P P < 1 b∈Φ exp(β(J( hx,ii∈Θ δbηx ) + αδ1b)) ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS21

We developed a simulation program (Application F) to calculate Dobrushin’s matrices for different values of the inverse temperature β, the external field α and the order of the Cayley tree k. This program draws blue dots for values where Dobrushin’s condition is satisfied, it draws yellow dots where Dobrushin’s condition is not satisfied.

We see that for higher values of α and for lower values of β, Dobrushin’s condi- tion is always satisfied and we have uniqueness for the model. This is alike results presented in previous sections. However, the conjecture ”α ≥ k − 1 =⇒ unique- ness” of the previous section is not covered by Dobrushin’s condition as is shown by these graphs. For q = 4, 5,... we see similar patterns with slightly more blue dots. Thus, as expected, more neighbours in the tree means more cohesion and more phase transitions. More states in the state space means slightly less cohesion and more uniqueness in the model. So for the Potts model on a Cayley tree, Dobrushin’s condition provides bounds for uniqueness that can be improved by examining the exact model. 22 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

5. Bayesian Networks The next part of the thesis deals with Bayesian Networks. First, a definition will be given. 5.1. Definition. A Bayesian Network is a probabilistic model that can be repre- sented by a directed acyclic graph. Let G = (S, E) be a directed acyclic graph where S is a finite vertex set (e.g. [0,N] ∩ Z) and E = {eij : i, j ∈ S} is a set of directed edges. Let X be a state space with a measurable structure (e.g. {−1, 1}). (σi)i∈S is a family of random variables defined on some probability space (Ω, F ,P ) S and taking values in X. Ω = X = {ω = (ωi)i∈S : ωi ∈ X} is the set of all possible configurations, we say that for Λ ⊂ S : ωΛ = {ωλ : λ ∈ Λ}. Let the sigma-algebra F on Ω be the power set P(X)S and P be a probability measure defined on Ω. The probability measure P on Ω is a Bayesian Network for the acyclic graph G, if it can be written as: Y (5.1) P (σi = ωi : i ∈ S) = Qi (ωi|ωh : h ∈ p(i)) i∈S where p(i) := {j ∈ S\i : eji ∈ E} ⊂ S is the set of parents of vertex i. Every Qi is |p(i)| a probability kernel from X to X. If p(i) = ∅ then Qi has to be understood as an unconditional probability (Qi ∈ P(X)) which is specified. That is, a probability distribution on random variables on the vertices of an acyclic graph is a Bayesian Network if all random variables depend on their grandparents only through their parents. This is a kind of Markov property, the following lemma formulates this in a more general way.

NB. The previously described Ising and Potts model on a Cayley tree don’t qualify as Bayesian Networks. The dependencies (edges) for variables in these models are both ways and can be represented by undirected edges. It is not pos- sible to describe the models by equation (5.1), but instead a potential function Q P (x) = 1/Z i∈S ϕ(xi) over the set of variables is required for undirected graphi- cal models. ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS23

5.2. Excluding remote generations.

Lemma 1 In a Bayesian Network µ, for any vertex i, conditioning on the nth generation of grandparents and all their ancestors equals conditioning on the nth generation of grandparents only. n n µ(σi = a|σj = ωj : j ∈ ∆i ) = µ(σi = a|σk = ωk : k ∈ p (i)) X Y (5.2) = Ql(ηl|η¯h : h ∈ p(l)) n ηΓn ,ηi=a l∈Γ i i ∀ i ∈ S, ω ∈ Ω, a ∈ X, n ∈ N where

n n n−1 p (i) := {S 3 j∈ / Γi , s.t. ∃k ∈ p (i): ejk ∈ E} ⊂ S is the nth generation of grandparents, p0(i) := {i}

n [ n0 th ∆i := p (i) is the n generation of grandparents and all their

n0≥n ancestors n n th Ψi := S\∆i is all vertices except the n generation of grand- parents and all their ancestors Γn := S pn0 (i) is vertex i and all its ancestors up to the n − 1th i 0≤n0

(these graphs are meant to clarify notation, the equations hold for any graph)

Proof For a proof please see [10] ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS25

6. Decay of memory on Bayesian networks From earlier results [10] we know that an Ising-type model on Bayesian networks exhibits decay of memory when the number of ’parents’ is less than or equal to two, this theorem is recalled in the next section. After that, we will present results from our efforts to generalise this theorem; we will demonstrate that it holds as well for a Potts-type model. Finally, we will show that the contraction argument, used in the proof, will not apply for a more general model by providing a counterexample.

6.1. Ising-type model. Let the Bayesian Network µ for a directed acyclic graph G = (S, E) be defined as a model with kernels: Y (6.1) µ(σi = ωi : i ∈ S) = Qi(ωi|ωh : h ∈ p(i)) i∈S  P  exp βσi( h∈p(i) σh) (6.2) Q (σ |σ : h ∈ p(i)) = i i h  P  2 cosh β( h∈p(i) σh) where p(i) = {p1(i), p2(i), ..., pd(i)} is the set of parents of vertex i ∈ S. S is a finite vertex set and E = {eij : i, j ∈ S} is a set of directed edges. The random variables S (σi)i∈S take values in X = {−1, 1}. ω ∈ Ω = X is a configuration and β ∈ R+ is a constant. This is an Ising-type model with ferromagnetic interaction from parent-spins to the spin of the children. Theorem 4. In the Ising-type model as presented above, equations (6.1) and (6.2), if |p(i)| ≤ 2 ∀i ∈ S, then for c = tanh(2β) < 1 with β ∈ R+ and a ∈ {−1, 1} :

n 0 n n (6.3) sup µ(σi = a|σj = ωj : j ∈ ∆i ) − µ(σi = a|σj = ωj : j ∈ ∆i ) ≤ c ω,ω0 If all vertices have two parents or less, the combined influence of the nth generation of grandparents and their ancestors can be bounded by a number that goes to zero as n gets larger.

NB. We warn the reader that this model is not identical to the usual ferromag- netic Ising model on a tree. Here the probability to find a configuration would be given by:   1 X (6.4) µ0(σ) = exp β σ σ  Z  i j S i,j∈S i∈p(j) or j∈p(i)

We recall that: exp(x)−exp(−x) sinh(x) = 2 exp(x)+exp(−x) cosh(x) = 2 sinh(x) exp(x)−exp(−x) tanh(x) = cosh(x) = exp(x)+exp(−x) 26 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

6.2. Potts-type model. Let the Bayesian Network µ for a directed acyclic graph G = (S, E) be defined as a model with kernels: P exp(β h∈p(i) 1σi=σh ) (6.5) Qi(σi|σh : h ∈ p(i)) = Pq P s=1 exp(β h∈p(i) 1s=σh ) Y (6.6) µ(σi = ωi : i ∈ S) = Qi(ωi|ωh : h ∈ p(i)) i∈S where p(i) = {p1(i), p2(i), ..., pd(i)} is the set of parents of vertex i ∈ S. S is a finite vertex set and E = {eij : i, j ∈ S} is a set of directed edges. The random variables S + (σi)i∈S take values in Φ = {1, 2, . . . , q}. ω ∈ Ω = Φ is a configuration and β ∈ R is a constant. Theorem 5. In the Potts-type model as presented above (equations 6.5 and 6.6), exp(2β)−1 If |p(i)| ≤ 2 ∀i ∈ S, then for c = exp(2β)+q−1 < 1 with a ∈ Φ:

n 0 n n (6.7) sup µ(σi = a|σj = ωj : j ∈ ∆i ) − µ(σi = a|σj = ωj : j ∈ ∆i ) ≤ c ω,ω0 If all vertices have two parents or less, the combined influence of the nth generation of grandparents and their ancestors can be bounded by a number that goes to zero as n gets larger.

NB. This is not the usual Potts model on a tree. This would not be a Bayesian network and the probability for a configuration would be given by: ! 0 1 X X µ (σ) = exp β σiσj + βαδ(σi, 1) ZS i,j∈S i∈S i∈p(j) or j∈p(i) where δ(x, y) is the Kronecker’s delta function ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS27

Proof We will assume that a = 1, ω = 1S and ω0 = 2S. And ∀Λ ⊆ S, b ∈ 1c c 0 and c ∈ 2 that: µ(σi = 1|σj = ωj : j ∈ Λ) = µ(σi = 2|σj = ωj : j ∈ Λ) ≥ µ(σi = 0 b|σj = ωj : j ∈ Λ) = µ(σi = c|σj = ωj : j ∈ Λ) To be proven is:

n 0 n n (6.8) sup µ(σi = 1|σj = ωj : j ∈ ∆i ) − µ(σi = 1|σj = ωj : j ∈ ∆i ) ≤ c ω,ω0 The proof is by induction: First we demonstrate that the inequality (6.8) is true for n = 1. Then we deduce that if this inequality is true for n − 1, it holds for n as well for n ≥ 2. Choose any i ∈ S. The situation where vertex i has no parents, |p(i)| = 0, is trivial: 0 ≤ c. 28 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

To show, for |p(i)| = 1 and |p(i)| = 2 : 1 0 1 (6.9) sup µ(σi = 1|σj = ωj : j ∈ ∆i ) − µ(σi = 1|σj = ωj : j ∈ ∆i ) ≤ c ω,ω0 Let us consider the first term inside the absolute value bars: 1 X Y µ(σi = 1|σj = ωj : j ∈ ∆i ) = Ql(ηl|η¯h : h ∈ p(l)) (lemma 1) 1 ηΓ1 ,ηi=1 l∈Γ i i

1 [ n0 0 (6.10) = Qi(1|ωh : h ∈ p(i)) (because Γi = p (i) = p (i) = {i})

0≤n0<1 For the same reason, the second term is: 0 1 0 (6.11) µ(σi = 1|σj = ωj : j ∈ ∆i ) = Qi(1|ωh : h ∈ p(i)) So, the supremum of the difference over ω, ω0 is: 1 0 1 sup µ(σi = 1|σj = ωj : j ∈ ∆i ) − µ(σi = 1|σj = ωj : j ∈ ∆i ) ω,ω0

0 (6.12) = sup |Qi(1|ωh : h ∈ p(i)) − Qi(1|ωh : h ∈ p(i))| ω,ω0 P P exp(β 1ω =1) exp(β 1ω0 =1) h∈p(i) h h∈p(i) h (6.13) = sup Pq P − Pq P ω,ω0 exp(β 1 ) exp(β 1 0 ) s=1 h∈p(i) ωh=s s=1 h∈p(i) ωh=s

This should be smaller than or equal to c for |p(i)| = 1 and |p(i)| = 2. For |p(i)| = 1 this becomes:

exp(β1 ) exp(β1ω0 =1) ωh=1 h (6.14) sup Pq − Pq (h = p(i)) ω,ω0 exp(β1 ) exp(β1 0 ) s=1 ωh=s s=1 ωh=s

exp(β) − 1 = ≤ c exp(β) + q − 1  For |p(i)| = 2, it is: (6.15)

exp(β(1 + 1 )) exp(β(1ω0 =1 + 1ω0 =1)) ωh1 =1 ωh2 =1 h1 h2 sup q − q (h1 = p1(i), h2 = p2(i)) 0 P P 0 0 ω,ω exp(β(1ωh =s + 1ωh =s)) exp(β(1ω =s + 1ω =s)) s=1 1 2 s=1 h1 h2

exp2β 1

(6.16) =  −  ≤ c  exp 2β + q − 1 exp 2β + q − 1 So we’ve seen that: n 0 n n (6.17) sup µ(σi = 1|σj = ωj : j ∈ ∆i ) − µ(σi = 1|σj = ωj : j ∈ ∆i ) ≤ c ω,ω0 is correct for n = 1, for all i ∈ S

Now assume that inequality (6.8) holds for n − 1, for n ≥ 2. We’ll show that if |p(i)| ≤ 2, inequality (6.8) holds for n as well: Choose any i ∈ S. In the trivial case where |p(i)| = 0; we have that 0 ≤ cn ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS29

First let |p(i)| = 1: The nth generation of grandparents of vertex i is the (n−1)th generation of grandparents of the parent of vertex i. Thus, by conditioning on the parent of vertex i, h = p(i), the first term of inequality (6.8) can be written in a more practical way: n µ(σi = 1|σj = ωj : j ∈ ∆i ) = n n µ(σi = 1|σh = 1, σj = ωj : j ∈ ∆i ) µ(σh = 1|σj = ωj : j ∈ ∆i ) c n c n (6.18) + µ(σi = 1|σh = 1 , σj = ωj : j ∈ ∆i ) µ(σh = 1 |σj = ωj : j ∈ ∆i ) Using lemma 1: exp(β) (6.19) µ(σ = 1|σ = 1, σ = ω : j ∈ ∆n) = i h j j i exp(β) + q − 1 1 (6.20) µ(σ = 1|σ = 1c, σ = ω : j ∈ ∆n) = i h j j i exp(β) + q − 1 So equation (6.18) becomes: exp(β) (6.18) = µ(σ = 1|σ = ω : j ∈ ∆n) = µ(σ = 1|σ = ω : j ∈ ∆n) i j j i exp(β) + q − 1 h j j i 1 (6.21) + µ(σ = 1c|σ = ω : j ∈ ∆n) exp(β) + q − 1 h j j i The second term of inequality (6.8) is, for the same reason: exp(β) µ(σ = 1|σ = ω0 : j ∈ ∆n) = µ(σ = 1|σ = ω0 : j ∈ ∆n) i j j i exp(β) + q − 1 h j j i 1 (6.22) + µ(σ = 1c|σ = ω0 : j ∈ ∆n) exp(β) + q − 1 h j j i The supremum of the difference between the two ((6.21) and (6.22)) over ω, ω0: n 0 n (6.23) sup µ(σi = 1|σj = ωj : j ∈ ∆i ) − µ(σi = 1|σj = ωj : j ∈ ∆i ) ω,ω0 is the supremum over ω, ω0 of: exp(β) 1 µ(σ = 1|σ = ω : j ∈ ∆n) + µ(σ = 1c|σ = ω : j ∈ ∆n) exp(β) + q − 1 h j j i exp(β) + q − 1 h j j i (6.24) exp(β) 1 − µ(σ = 1|σ = ω0 : j ∈ ∆n) − µ(σ = 1c|σ = ω0 : j ∈ ∆n) exp(β) + q − 1 h j j i exp(β) + q − 1 h j j i c And because µ(σh = 1 ) = 1 − µ(σh = 1) : exp(β) 1 (6.24) = µ(σ = 1|σ = ω : j ∈ ∆n) + (1−µ(σ = 1|σ = ω : j ∈ ∆n)) exp(β) + q − 1 h j j i exp(β) + q − 1 h j j i exp(β) 1 − µ(σ = 1|σ = ω0 : j ∈ ∆n) − (1−µ(σ = 1|σ = ω0 : j ∈ ∆n)) exp(β) + q − 1 h j j i exp(β) + q − 1 h j j i  exp(β) 1  = − × exp(β) + q − 1 exp(β) + q − 1

 n 0 n  (6.25) µ(σh = 1|σj = ωj : j ∈ ∆i ) − µ(σh = 1|σj = ωj : j ∈ ∆i ) 30 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

Placing back the supremum yields:  exp(β) 1  (30) = sup − × ω,ω0 exp(β) + q − 1 exp(β) + q − 1   n 0 n (6.26) µ(σh = 1|σj = ωj : j ∈ ∆i ) − µ(σh = 1|σj = ωj : j ∈ ∆i )

exp(β) 1 ≤ − × exp(β) + q − 1 exp(β) + q − 1

 n 0 n  (6.27) sup µ(σh = 1|σj = ωj : j ∈ ∆i ) − µ(σh = 1|σj = ωj : j ∈ ∆i ) ω,ω0 We can use the n − 1 assumption: (6.28)

exp(β) 1 n−1 n ≤ − c ≤ c exp(β) + q − 1 exp(β) + q − 1  Now consider the case where |p(i)| = 2, again we have to show that: n 0 n n (6.29) sup µ(σi = 1|σj = ωj : j ∈ ∆i ) − µ(σi = 1|σj = ωj : j ∈ ∆i ) ≤ c ω,ω0 The first term of inequality (6.29) can be split up into five terms that depend on their (n − 1)-th grandparents by conditioning on the two parents of vertex i, (h1 = p1(i), h2 = p2(i)). If both parents of vertex i are not 1, they can be the same or different which corresponds to different probabilities for the random variable at vertex i: (6.30) n µ(σi = 1|σj = ωj : j ∈ ∆i ) = n n µ(σi = 1|σh1 = 1, σh2 = 1, σj = ωj : j ∈ ∆i ) µ(σh1 = 1, σh2 = 1|σj = ωj : j ∈ ∆i ) c n c n + µ(σi = 1|σh1 = 1, σh2 ∈ 1 , σj = ωj : j ∈ ∆i ) µ(σh1 = 1, σh2 ∈ 1 |σj = ωj : j ∈ ∆i ) c n c n + µ(σi = 1|σh1 ∈ 1 , σh2 = 1, σj = ωj : j ∈ ∆i ) µ(σh1 ∈ 1 , σh2 = 1|σj = ωj : j ∈ ∆i ) c c n + µ(σi = 1|σh1 ∈ 1 , σh2 ∈ 1 , σh1 = σh2 , σj = ωj : j ∈ ∆i ) × c c n µ(σh1 ∈ 1 , σh2 ∈ 1 , σh1 = σh2 |σj = ωj : j ∈ ∆i ) c c n + µ(σi = 1|σh1 ∈ 1 , σh2 ∈ 1 , σh1 6= σh2 , σj = ωj : j ∈ ∆i ) × c c n µ(σh1 ∈ 1 , σh2 ∈ 1 , σh1 6= σh2 |σj = ωj : j ∈ ∆i ) Applying lemma 1 gives: (6.31) exp(2β) (6.30) = µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i exp(β) + µ(σ = 1, σ ∈ 1c|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i exp(β) + µ(σ ∈ 1c, σ = 1|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i 1 + µ(σ ∈ 1c, σ ∈ 1c, σ = σ |σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 h1 h2 j j i 1 + µ(σ ∈ 1c, σ ∈ 1c, σ 6= σ |σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 h1 h2 j j i ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS31

1 1 Because 2 exp(β)+q−2 > exp(2β)+q−1 , we can write: (6.32) exp(2β) (6.31) = µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i exp(β) + µ(σ = 1, σ ∈ 1c|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i exp(β) + µ(σ ∈ 1c, σ = 1|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i 1 + µ(σ ∈ 1c, σ ∈ 1c|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i  1 1  + − µ(σ ∈ 1c, σ ∈ 1c, σ 6= σ |σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 exp(2β) + q − 1 h1 h2 h1 h2 j j i The second term of equality (6.29) is identical, except we have ω0 instead of ω: (6.33) exp(2β) (6.29) = µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i exp(β) + µ(σ = 1, σ ∈ 1c|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i exp(β) + µ(σ ∈ 1c, σ = 1|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i 1 + µ(σ ∈ 1c, σ ∈ 1c|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i  1 1  + − µ(σ ∈ 1c, σ ∈ 1c, σ 6= σ |σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 exp(2β) + q − 1 h1 h2 h1 h2 j j i  exp(2β) − µ(σ = 1, σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i exp(β) + µ(σ = 1, σ ∈ 1c|σ = ω0 : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i exp(β) + µ(σ ∈ 1c, σ = 1|σ = ω0 : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i 1 + µ(σ ∈ 1c, σ ∈ 1c|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i  1 1   + − µ(σ ∈ 1c, σ ∈ 1c, σ 6= σ |σ = ω0 : j ∈ ∆n) 2 exp(β) + q − 2 exp(2β) + q − 1 h1 h2 h1 h2 j j i For the first three terms, where at least one parent takes the value 1, conditioning on ω will give a higher or equal probability than conditioning on ω0 because ω = 1S 0 S c n c and ω = 2 .(µ(σh1 ∈ 1 , σh2 = 1|σj = ωj : j ∈ ∆i ) ≥ µ(σh1 ∈ 1 , σh2 = 1|σj = 0 n ωj : j ∈ ∆i )). For the same reason, the last two terms (fourth and fifth) will have a lower probability if we condition on ω than if we would condition on ω0. c c n c c (µ(σh1 ∈ 1 , σh2 ∈ 1 , σh1 6= σh2 |σj = ωj : j ∈ ∆i ) ≤ µ(σh1 ∈ 1 , σh2 ∈ 1 , σh1 6= 0 n σh2 |σj = ωj : j ∈ ∆i )) So we can eliminate the fifth terms to get an upper bound, as they have negative contribution: 32 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

exp(2β) (6.33) ≤ µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i exp(β) + µ(σ = 1, σ ∈ 1c|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i exp(β) + µ(σ ∈ 1c, σ = 1|σ = ω : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i 1 + µ(σ ∈ 1c, σ ∈ 1c|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i (6.34)  exp(2β) − µ(σ = 1, σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i exp(β) + µ(σ = 1, σ ∈ 1c|σ = ω0 : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i exp(β) + µ(σ ∈ 1c, σ = 1|σ = ω0 : j ∈ ∆n) 2 exp(β) + q − 2 h1 h2 j j i 1  + µ(σ ∈ 1c, σ ∈ 1c|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i

The second and third terms are positive, i.e. c n c 0 n µ(σh1 = 1, σh2 ∈ 1 |σj = ωj : j ∈ ∆i ) − µ(σh1 = 1, σh2 ∈ 1 |σj = ωj : j ∈ ∆i ) > 0 and c n c 0 n µ(σh1 ∈ 1 , σh2 = 1|σj = ωj : j ∈ ∆i ) − µ(σh1 ∈ 1 , σh2 = 1|σj = ωj : j ∈ ∆i ) > 0. exp(β) Both are multiplied by a constant number 2 exp(β)+q−2 . Since we are bounding from exp(β) 1 q−2 above, we can increase this constant number: 2 exp(β)+q−2 = 2 − 4 exp(β)+2q−4 ≤ 1 q−2 2 − 2 exp(2β)+2q−2 , so we can write:

(6.35) exp(2β) (6.34) ≤ µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i 1 q − 2  + − µ(σ = 1, σ ∈ 1c|σ = ω : j ∈ ∆n) 2 2 exp(2β) + 2q − 2 h1 h2 j j i 1 q − 2  + − µ(σ ∈ 1c, σ = 1|σ = ω : j ∈ ∆n) 2 2 exp(2β) + 2q − 2 h1 h2 j j i 1 + µ(σ ∈ 1c, σ ∈ 1c|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i  exp(2β) − µ(σ = 1, σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i 1 q − 2  + − µ(σ = 1, σ ∈ 1c|σ = ω0 : j ∈ ∆n) 2 2 exp(2β) + 2q − 2 h1 h2 j j i 1 q − 2  + − µ(σ ∈ 1c, σ = 1|σ = ω0 : j ∈ ∆n) 2 2 exp(2β) + 2q − 2 h1 h2 j j i 1  + µ(σ ∈ 1c, σ ∈ 1c|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS33

1 q−2 Because the four different right factors sum up to one, subtracting 2 − 2 exp(2β)+2q−2 1 q−2 from the equation is the same as subtracting 2 − 2 exp(2β)+2q−2 from the correspond- ing left factors:

(6.36) 1 q − 2 (6.35) = − 2 2 exp(2β) + 2q − 2  exp(2β) 1 q − 2  + − − µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 2 2 exp(2β) + 2q − 2 h1 h2 j j i  1 1 q − 2  + − − µ(σ ∈ 1c, σ ∈ 1c|σ = ω : j ∈ ∆n) exp(2β) + q − 1 2 2 exp(2β) + 2q − 2 h1 h2 j j i 1 q − 2 − − 2 2 exp(2β) + 2q − 2  exp(2β) 1 q − 2  + − − µ(σ = 1, σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 2 2 exp(2β) + 2q − 2 h1 h2 j j i  1 1 q − 2   + − − µ(σ ∈ 1c, σ ∈ 1c|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 2 2 exp(2β) + 2q − 2 h1 h2 j j i

The constant terms immediately cancel each other out, after simplifying we get:

 1 exp(2β) − 1  (6.36) ≤ 2 2 µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i − 1 exp(2β) + 1  + 2 2 µ(σ ∈ 1c, σ ∈ 1c|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i (6.37)  1 exp(2β) − 1  − 2 2 µ(σ = 1, σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i − 1 exp(2β) + 1   + 2 2 µ(σ ∈ 1c, σ ∈ 1c|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i

Now we have written inequality (6.29) in a form that contains two terms of both parents taking the value 1 and two terms of both parents taking a value in 1c. However, we have an estimate for a form containing one parent. To get this form c n we add and subtract µ(σh1 = 1, σh2 = 1 |σj = ωj : j ∈ ∆i ) and 34 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

c 0 n µ(σh1 = 1, σh2 = 1 |σj = ωj : j ∈ ∆i ), this yields: (6.38)  1 exp(2β) − 1   (6.37) ≤ 2 2 µ(σ = 1, σ = 1|σ = ω : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i  − 1 exp(2β) + 1  + µ(σ = 1, σ = 1c|σ = ω : j ∈ ∆n) + 2 2 h1 h2 j j i exp(2β) + q − 1  c c n c n  µ(σh1 ∈ 1 , σh2 ∈ 1 |σj = ωj : j ∈ ∆i ) + µ(σh1 = 1, σh2 = 1 |σj = ωj : j ∈ ∆i )  1 exp(2β) − 1   − 2 2 µ(σ = 1, σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 h2 j j i  − 1 exp(2β) + 1  + µ(σ = 1, σ = 1c|σ = ω0 : j ∈ ∆n) + 2 2 h1 h2 j j i exp(2β) + q − 1  c c 0 n c 0 n  µ(σh1 ∈ 1 , σh2 ∈ 1 |σj = ωj : j ∈ ∆i ) + µ(σh1 = 1, σh2 = 1 |σj = ωj : j ∈ ∆i )

We have marginalized out σh2 and σh1 respectively, after simplifying:

(6.39)  1 exp(2β) − 1    (6.38) = 2 2 µ(σ = 1|σ = ω : j ∈ ∆n) − µ(σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 j j i h1 j j i − 1 exp(2β) + 1    + 2 2 µ(σ = 1c|σ = ω : j ∈ ∆n) − µ(σ = 1c|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h2 j j i h2 j j i c Because µ(σh1 = 1 ) = 1 − µ(σh1 = 1) we have:

(6.40)  1 exp(2β) − 1    (6.39) = 2 2 µ(σ = 1|σ = ω : j ∈ ∆n) − µ(σ = 1|σ = ω0 : j ∈ ∆n) exp(2β) + q − 1 h1 j j i h1 j j i − 1 exp(2β) + 1  + 2 2 × exp(2β) + q − 1  n 0 n  (1 − µ(σh2 = 1|σj = ωj : j ∈ ∆i )) − 1 − µ(σh2 = 1|σj = ωj : j ∈ ∆i ) Simplifying and placing back the supremum gives:

(6.41)  1 exp(2β) − 1   2 2 n 0 n (6.40) ≤ sup µ(σh1 = 1|σj = ωj : j ∈ ∆i ) − µ(σh1 = 1|σj = ωj : j ∈ ∆i ) ω,ω0 exp(2β) + q − 1  + µ(σ = 1|σ = ω : j ∈ ∆n) − µ(σ = 1|σ = ω0 : j ∈ ∆n) h2 j j i h2 j j i  1 1  2 exp(2β) − 2 ≤ exp(2β) + q − 1  n 0 n sup µ(σh1 = 1|σj = ωj : j ∈ ∆i ) − µ(σh1 = 1|σj = ωj : j ∈ ∆i ) ω,ω0  + µ(σ = 1|σ = ω : j ∈ ∆n) − µ(σ = 1|σ = ω0 : j ∈ ∆n) h2 j j i h2 j j i ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS35

Now we can use the n − 1 assumption:

 1 1  2 exp(2β) − 2 n−1 n (6.42) (6.41) ≤ 2 c ≤ c exp(2β) + q − 1 

6.3. Counterexample for a general discrete model. This counterexample will show that the contraction argument, used in the proof of the previous section, will not hold for every discrete stochastic model on a Bayesian network with at most two parents:

Suppose we have as a Bayesian net- work, the rooted directed binary Cay- ley tree pointing at the root. I.e. start- ing from the root, every vertex has two parents. The parents, grandpar- ents and ancestors are all distinct and have one child, as in the picture. A fam- ily of variables (σi)i∈V is placed on the vertices of the tree, the variables take values in the state space X = {0, 1}. The conditional probabilities are as fol- lows: • The probability of a child tak- ing on the value 1 if both par- ents are 1:

P(σi = 1|σp1(i) = 1,

σp2(i) = 1) = 0.05 • The probability of a child tak- ing on the value 1 if the left parent has the value 1 and the right parent the value 0:

P(σi = 1|σp1(i) = 1,

σp2(i) = 0) = 0.9 • The probability of a child tak- ing on the value 1 if the left parent has the value 0 and the right parent the value 1:

P(σi = 1|σp1(i) = 0,

σp2(i) = 1) = 0.9 • The probability of a child tak- ing on the value 1 if both par- ents are 0:

P(σi = 1|σp1(i) = 0,

σp2(i) = 0) = 0.95

The two parents of vertex i are represented by p1(i) and p2(i). If vertex i has no parents, p(i) = ∅, then σi is a specified unconditional probability (σi ∈ P(X)). 36 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

The probability measures Pn on the volumes Vn are defined as: Y (6.43) Pn(σi = ωi : i ∈ Vn) = P(σi = ωi|σp1(i) = ωp1(i), σp2(i) = ωp2(i)) i∈Vn All probability measures that are compatible to equation (6.43) will qualify as so- lutions to the model. We will show that there are at least three such probability measures.

6.3.1. numerical solutions. To find solutions numerically, we set the first generation of ancestors deterministically taking the value 1 and compare it with the first gen- eration of ancestors taking the value 0. Using application E, we see that evolving to new generations, the difference converges to 0.625 which suggests that memory doesn’t decay for this model.

6.3.2. analytical solutions. Because in a Cayley tree, all branches of ancestors are independent of each other, we have the following equation for the translation in- variant case (P(σi = 1) is constant for all i ∈ V ): 2 2 P(σi = 1) = x = 0.05 ∗ x + 2 ∗ 0.9 ∗ x ∗ (1 − x) + 0.95 ∗ (1 − x) √ This equation has the solution x = (5 17−11)/16 ≈ 0.601, with the corresponding stationary probability measures Ps,n that set the unconditional variables as: √ √ 5 17 − 11 5 17 − 11 (σ = 1) = , (σ = 0) = 1 − (6.44) Ps,n i 16 Ps,n i 16 n ≥ 0, i ∈ Wn, (p(i) = ∅) ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS37

Through the conditional equations it follows that all variables (σi : i ∈ Vn) have above probabilities (6.44), so the probability measures are compatible and Kol- mogorov’s theorem states that there exists a unique compatible measure Ps on all variables (σi : i ∈ V ). The numerical solutions of the previous section suggest that also periodic prob- ability measures of period 2 satisfy equation (6.43). To find these, we assume probabilities for all even generations are equal, and also for all uneven generations: x = P(σ0 = 1) = P(σp2l(0) = 1) and y = P(σp(0) = 1) = P(σp1+2l(0) = 1), l ∈ N: 2 2 P(σ0 = 1) = x = 0.05 ∗ y + 2 ∗ 0.9 ∗ y ∗ (1 − y) + 0.95 ∗ (1 − y) 2 2 P(σp(0) = 1) = y = 0.05 ∗ x + 2 ∗ 0.9 ∗ x ∗ (1 − x) + 0.95 ∗ (1 − x) Substituting the second equation into the first equation gives: −0.512x4 − 0.128x3 + 1.288x2 − 0.838x + 0.133 = 0 √ √ −64  5 17 − 11  5 17 + 11  1  7 = x − x + x − x − 125 16 16 4 8

The roots of the polynomial give rise to two extra sets of compatible probability measures, 1 7 7 1 = {x = , y = }, = {x = , y = } P1,n 4 8 P2,n 8 4 That is, 1 3 P1,n(σi = 1) = , P1,n(σi = 0) = i ∈ Wn, n − even (6.45) 4 4 7 1 (σ = 1) = , (σ = 0) = i ∈ W , n − odd P1,n i 8 P1,n i 8 n and 7 1 P2,n(σi = 1) = , P2,n(σi = 0) = i ∈ Wn, n − even (6.46) 8 8 1 3 (σ = 1) = , (σ = 0) = i ∈ W , n − odd P2,n i 4 P2,n i 4 n These probability measures are compatible and again, Kolmogorov’s theorem guar- antees that there exist compatible measures P1 and P2 on all variables (σi : i ∈ V ). 38 GERARD COHEN TERVAERT, SUPERVISED BY A. VAN ENTER, AND D. VALESIN

This counterexample shows that not all Bayesian networks where the maximum number of parents is 2 have decay of memory.

6.3.3. other counterexamples. There are many counterexamples to be found with application E, all of which produce periodic measures with period 2. There are some conditions for which we couldn’t find counterexamples. All starting instances converge to the same probabilities if one of the following conditions is satisfied:

• P(σi = 1|σp1(i) = 1, σp2(i) = 1) > 0.25

• P(σi = 1|σp1(i) = 0, σp2(i) = 0) < 0.75

• P(σi = 1|σp1(i) = 1, σp2(i) = 1) > 0.134 and P(σi = 1|

σp1(i) = 0, σp2(i) = 0) < 0.866 So, only if children strongly tend to take values opposite their parents, these periodic measures are found and counterexamples are possible.

7. Discussion and Conclusion For the Ising model on a Cayley tree with homogeneous fields, phase transitions have been described extensively. In this thesis we summarised these results. A critical value βc(k) > 0 exists, and for β > βc, we have a hc(β, k) > 0 such that: If β ≤ βc(k) or |h| > hc(β, k), we have uniqueness. Otherwise the model undergoes a phase transition. For inhomogeneous fields, asymptotically approaching a homo- geneous critical field, Bissacot e.a. described strict conditions for the disturbance to guarantee a phase transition. We provided a numerical example to show explicit probabilities for the model. For the Potts model, we found fewer rigorous results in the literature. For the translation invariant case, explicit solutions have been described for the binary Cayley tree and the number of measures has been provided for higher-order Cayley trees. We have created computer simulations that show dynamics of these mod- els by calculating numerical critical values and numerical probabilities for specific measures. Results from our applications equal analytic results found by K¨ulske, Rozikov and Khakimov and extend the Potts model on higher-order (k > 2) Cay- ley trees with an external field (α 6= 0). We see that higher-order Cayley trees have lower critical values. Also, the higher the cardinality of the state space, q, the higher the critical values. We conjecture that for α ≥ k − 1 the model has uniqueness. We have found some numerical relations which are left to be investigated ana- lytically in future work. The Potts model can be extended if α ∈ Rq is a vector instead of a constant. For probabilistic models on Bayesian networks with at most two parents, we aimed to find exact conditions for decay of memory. We proved a generalisation from an Ising-type model to a Potts-type model and have found a counterexample for a more general discrete model. This provides us with a lower and upper bound, though exact conditions remain to be investigated.

References [1] C. Kulske,¨ U. A. Rozikov and R. M. Khakimov: Description of the translation-invariant splitting Gibbs measures for the Potts model on a Cayley tree, Journal of Statistical Physics, Volume 156, Issue 1, pp 189-200 (2014) ABOUT ISING AND POTTS MODELS ON CAYLEY TREES AND BAYESIAN NETWORKS39

[2] R. Bissacot et al.: Stability of the phase transition of critical-field Ising model on Cayley trees under inhomogeneous external fields, Stochastic Processes and their Applications (2017), http://dx.doi.org/10.1016/j.spa.2017.03.023 [3] Utkir A Rozikov: Gibbs measures on Cayley trees, World Scientific, 2014. [4] C. Kulske,¨ A. Rozikov: Fuzzy transformations and extremality of Gibbs measures for the Potts model on a Cayley tree, arXiv:1403.5775v1 [math-ph] 23 Mar 2014 [5] Utkir A. Rozikov: Gibbs measures on Cayley trees: Results and open problems, Reviews in , Vol. 25, No. 1 (2013) 1330001 [6] S. Friedli and Y. Velenik: Statistical Mechanics of Lattice Systems: a Concrete Mathe- matical Introduction, Cambridge: Cambridge University Press, 2017 [7] Hans-Otto Georgii: Studies in Mathematics 9: Gibbs Measures and Phase Transitions, Berlin; New York : de Gruyter, 1988 [8] Finn V. Jensen: Bayesian Networks and Decision Graphs, New York: Springer-Verslag Inc., 2001 [9] H.G. Dehling and J.N. Kalma, Kansrekening, Utrecht: Epsilon Uitgaven, 1995 [10] G. Cohen Tervaert, Decay of memory on Bayesian Networks, RUG: Bachelor thesis Math- ematics, supervised by C. K¨ulske, 2009 Internet. • http://www.mathworld.com • http://www.wikipedia.com Applications. A IsingOnCayleyTree B PottsModelOnCayleyTree C PottsRecursion D MathematicaPotts E DecayOnBayesianNetworks F Dobrushin Applications A-F can be downloaded from: https://drive.google.com/drive/folders/1fZcIZyn-SNWRSvdo_4cGlHbWhb9TFaSR?usp=sharing

Applications A,B,C,E and F are created in Visual Studio 2017 free edition, Appli- cation D is created in Mathematica (student license).