Thermodynamics of the Binary Symmetric Channel Evgeny Verbitskiy1,2
Total Page:16
File Type:pdf, Size:1020Kb
Verbitskiy Pacific Journal of Mathematics for Industry (2016) 8:2 DOI 10.1186/s40736-015-0021-5 ORIGINAL ARTICLE Open Access Thermodynamics of the binary symmetric channel Evgeny Verbitskiy1,2 Abstract We study a hidden Markov process which is the result of a transmission of the binary symmetric Markov source over the memoryless binary symmetric channel. This process has been studied extensively in Information Theory and is often used as a benchmark case for the so-called denoising algorithms. Exploiting the link between this process and the 1D Random Field Ising Model (RFIM), we are able to identify the Gibbs potential of the resulting Hidden Markov process. Moreover, we obtain a stronger bound on the memory decay rate. We conclude with a discussion on implications of our results for the development of denoising algorithms. Keywords: Hidden Markov models, Gibbs states, Thermodynamic formalism, Denoising 2000 Mathematics Subject Classification: 37D35, 82B20, 82B20 1 Introduction Markov Processes are random functions of discrete-time We study the binary symmetric Markov source over the Markov chains, where the value Yn is chosen according memoryless binary symmetric channel. More specifically, to the distribution which depends on the value Xn = xn let {Xn} be a stationary two-state Markov chain with of the underlying Markov chain, independently for any values {±1},and n. The applications of Hidden Markov processes include automatic character and speech recognition, information P(X + = X ) = p, n 1 n theory, statistics and bioinformatics, see [4, 13]. The par- < < = ( ) = where 0 p 1; denote by P px,x ticular example (1.1) we consider in the present paper is 1 − pp probably one of simplest examples, and often used as a − the corresponding transition probability p 1 p benchmark for testing algorithms. π = 1 1 matrix. Note that 2 , 2 is the stationary initial In particular, this example has been studied rather distribution for this chain. extensively in connection to computation of entropy of the The binary symmetric channel will be modelled as a output process {Yn}, see e.g., [8–10, 12]. sequence of Bernoulli random variables {Zn} with The law Q of the process {Yn} is the push-forward of P × P ψ {− }Z ×{− }Z →{− }Z P ( =− ) = ε P ( = ) = −ε ε ∈ ( ) Z under : 1, 1 1, 1 1, 1 ,with Z Zn 1 , Z Zn 1 1 , 0, 1 . −1 ψ((xn, zn)) = xn · zn.WewriteQ = (P × PZ) ◦ ψ .For Finally, put ≤ n = ( ... ) ∈{− }n−m+1 every m n,andym : ym, , yn 1, 1 ,the = · measure of the corresponding cylindric set is given by Yn Xn Zn (1.1) n Q y := Q(Ym = ym, ..., Yn = yn) for all n.Theprocess{Yn} is a Hidden Markov process, m because Y ∈{−1, 1} is chosen independently for any n n = P n P n I = · π {− } π = xm Z zm yk xk zk n from an emission distribution Xn on 1, 1 : 1 xn ,zn ∈{−1,1}n−m+1 k=m (ε,1−ε) and π−1 = (1−ε, ε). More generally, the Hidden m m n−1 1 { ∈ =− } Correspondence: [email protected] = · ε# i [m,n]:xiyi 1 pxi,xi+1 1Mathematical Institute, Leiden University, Postbus 9512, 2300 RA Leiden, The 2 xn ∈{−1,1}n−m+1 i=m Netherlands m 2 { ∈ = } Johann Bernoulli Institute for Mathematics and Computer Science, University × (1 − ε)# i [m,n]:xiyi 1 . of Groningen, PO Box 407, 9700 AK Groningen, The Netherlands (1.2) © 2016 Verbitskiy. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Verbitskiy Pacific Journal of Mathematics for Industry (2016) 8:2 Page 2 of 9 = 1 Two particular cases are easy to analyze. If p 2 ,then The non-trivial part of the cylinder probability is the {Xn} is a sequence of independent identically distributed sum over all hidden configurations (xm, ..., xn): random variables with P(Xn =−1) = P(Xn =+1) = 1 { } ε = 1 2 ,and Yn has the same distribution. If 2 , then the n−1 n formula above implies that n Zm,n y := exp J xixi+1 + K xiyi m n−1 − + n ∈{− }n−m+1 = = n m 1 xm 1,1 i m i m Q n = 1 1 y px x + m 2 i, i 1 2 n ∈{− }n−m+1 = xm 1,1 i m − + is in fact the partition function of the Ising model with the 1 n m 1 = , signs of the external random fields given by yi’s. Apply- 2 ing the recursive method of [14], the partition function can be evaluated in the following fashion [1]. Consider the and hence again, {Yn} is a sequence of independent ran- Q ( =− ) = Q ( =+ ) = 1 following functions dom variables with Yn 1 Yn 1 2 . The paper is organizes as follows. In Section 2 we exploit methods of Statistical Mechanics to derive expressions for 1 cosh(w + J) probabilities of cylindric events (1.2). In Section 3, analyz- A(w) = log , 2 cosh(w − J) ing derived expressions, we show that the measure Q has 1 nice thermodynamic properties; in particular, it falls into B(w) = log [4 · cosh(w + J)cosh(w − J)] 2 the class of g-measures with exponential decay of varia- 1 − − tion (memory decay). We also obtain an novel estimate of = log e2w + e 2w + e2J + e 2J . the decay rate, which is stronger than estimates derived 2 in previous works. In Section 4 we study two-sided con- ditional probabilities and show that Q is in fact a Gibbs One readily checks that if s =±1, then for all w ∈ R state in the sense of Statistical Mechanics. We also dis- cuss well-known denoising algorithm DUDE, and suggest that the Gibbs property of Q explains why DUDE per- exp (sA(w) + B(w)) = 2cosh(w + sJ). (2.1) forms so well in this particular example. Furthermore, we argue that the development of denoising algorithsms, rely- ing on thermodynamic Gibbs ideas can result in a superior Now the partition function can be evaluated by sum- < n ∈ performance. ming the right-most spin. Namely, suppose m n, ym {−1, 1}n−m+1,then 2 Random field Ising model It was observed in [18] that the probability Q(ym, ..., yn) n−2 n−1 of a cylindric event Ym = ym, ..., Yn = yn , m ≤ n,can n Zm,n y = exp J xixi+1 + K xiyi be expressed via a partition function of a random field m n−1∈{− }m−n i=m i=m Ising model. We exploit this observation further. Assume xm 1,1 ( + ) p, ε ∈ (0, 1),andput × exn Jxn−1 Kyn xn∈{−1,1} 1 1 − p 1 1 − ε J = log , K = log . n−2 n−1 2 p 2 ε = exp J xixi+1 + K xiyi − = = n−m+1 xn 1∈{−1,1}m−n i m i m Then for any (ym, ..., yn) ∈{−1, 1} , expression m for the cylinder probability (1.2) can be rewritten as × 2cosh( Jx − + Ky ) n 1 n − − c n 2 n 1 Q( ... ) = J = + ym, , yn n−m+1 exp J xixi+1 K xiyi λ − + J,K xn ∈{−1,1}n m 1 n−1∈{− }m−n i=m i=m m xm 1,1 − n 1 n (n) + (n) exp xn−1A wn B wn exp J xixi+1 + K xiyi , i=m i=m where where cJ = cosh(J), λJ,K = 2 (cosh(J + K) + cosh(J − K)) = ( ) ( ) (n) = 4cosh J cosh K . wn Kyn. Verbitskiy Pacific Journal of Mathematics for Industry (2016) 8:2 Page 3 of 9 Hence, Definition 3.1. Suppose P is a fully supported transla- ⎛ tion invariant measure on = AZ+ ,whereA is a finite ⎜ alphabet. ⎜ n−2 n = ⎜ (i) The measure P is called a g-measure,ifforsome Zm,n ym exp ⎜J xixi+1 − ⎝ = positive continuous function g : → (0, 1) satisfying the xn 1∈{−1,1}m−n i m m normalization condition ⎞ g (ω¯ 0, ω1, ω2, ...) = 1 n−2 ⎟ ω¯ ∈ ⎟ 0 A + + + (n) ⎟ K xiyi xn−1 Kyn−1 A wn ⎠ for all ω = (ω0, ω1, ...) ∈ ,onehas i=m (n) w − P(ω |ω ω ...)= (ω ω ...) n 1 0 1, 2, g 0, 1, × (n) exp B wn , for P-a.a. ω ∈ . (ii) The measure P is Bowen-Gibbs for a continuous and thus the new sum has exactly the same form, but potential φ : → R, if there exist constants P ∈ R and (n) = + (n) instead of Kyn−1,wenowhavewn−1 Kyn−1 A wn . C ≥ 1 such that for all ω ∈ and every n ∈ N Continuing the summation over the remaining right-most 1 P({ω˜ ∈ : ω˜ 0 = ω0, ...ω˜ n−1 = ωn−1}) x-spins, one gets ≤ ≤ C, C exp ((Snφ)(ω) − nP) n − n (n) (n) ( φ)(ω) = n 1 φ( kω) Zm,n y = 2cosh w exp B w , where Sn k=0 S . m m i P i=m+1 (iii) The measure is called an equilibrium state for φ → R P where continuous potential : ,if attains maximum of the following functional (n) = + (n) < wi Kyi A wi+1 for every i n, (P) + φ P = (P˜ ) + φ P˜ ( ) = h d sup h d , (3.1) equivalently, since A 0 0, we can define P˜ ∈M∗() 1 (n) = ∀ > (n) = + (n) ∀ ≤ (P) P wi 0 i n,andwi Kyi A wi+1 i n.