Discrete Math. Appl. 2016; 26 (1):13–29

Yuriy S. Kharin and Egor V. Vecherko Detection of embeddings in binary Markov chains

DOI: 10.1515/dma-2016-0002 Received March 31, 2015

Abstract: The paper is concerned with problems in steganography on the detection of embeddings and sta- tistical estimation of positions at which message bits are embedded. Binary stationary Markov chains with known or unknown matrices of transition probabilities are used as mathematical models of cover sequences (container les). Based on the runs statistics and the likelihood ratio statistic, statistical tests are constructed for detecting the presence of embeddings. For a family of contiguous alternatives, the asymptotic power of statistical tests based on the runs statistics is found. An algorithm of polynomial complexity is developed for the statistical estimation of positions with embedded bits. Results of computer experiments are presented.

Keywords: steganography, model of embeddings, , statistical test, power, total number of runs ËË Note: Originally published in Diskretnaya Matematika (2015) 27, №3, 123–144 (in Russian).

1 Introduction

The paper is concerned with a topical problem in steganographic information security—this is the problem of embedding detection, that is of the construction of statistical tests for the existence of embeddings and of statistical estimates of positions (points) of embeddings. The problem of detection of embeddings was studied in [1, 2, 3, 4] under the assumption that the prob- abilistic model of a cover sequence is completely known. So, in [1] statistical tests were constructed for the embedding existence in the case when the initial (cover) sequence is modeled by a of inde- pendent trials; it was also shown that the embedding detection is impossible if the fraction of the embeddings tends to 0 as the length of the initial sequence tends to ∞. A similar fact was proved in [3]. In [2] a most pow- erful statistical test for the embedding existence was constructed for the model based on a Bernoulli scheme of independent trials, and statistical estimates of the fraction of embeddings were put forward. Statistical es- timates of the model parameters of the embedding in a binary Markov chain were constructed and examined in [5]; they allow to make preliminary conclusions on the fraction of embeddings. It is worth mentioning that the majority of studies on the detection of embeddings are based on empirical characteristics of sequences, which involve methods of discriminant analysis for testing the embedding existence. We also note that the above problems of recognition of embeddings are close to those on the detection of deviations of output se- quences of cryptographic generators from uniformly distributed random sequences [6]. Our purpose in this paper is to continue the studies initiated in [5]: we construct and analyse statistical tests for the embedding existence, and to develop algorithms for the statistical estimation of embeddings points. The paper is organized as follows. In § 2 we describe the mathematical (q, r)-block model of embedding in a binary Markov chain. In § 3 we construct statistical tests for the embedding existence based on the runs statistics and on the short runs statistics, and in § 4 we consider tests based on the likelihood ratio statis- tics. In § 5, we put forward an algorithm of polynomial complexity for statistical estimation. The results of numerical experiments are given in § 6.

Yuriy S. Kharin: Belarusian State University, e-mail: [email protected] Egor V. Vecherko: Belarusian State University, e-mail: [email protected] 14 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains

2 Mathematical model of embedding

We dene the generalized (q, r)-block model of embedding, a particular case of which was proposed by the authors of the present paper in [5]. Throughout, (Ø, F, P) is the underlying probability space, V = {0, 1} is V T O (⋅) the binary alphabet, T is the space of binary -dimensional vectors, is the ‘big O’ notation introduced ℕ I{A} A ut = (u , . . . , u ) ∈ V by Landau, is the set of natural numbers, is the indicator of an event , t t t t −t +1 (t , t ∈ ℕ, t ≤ t ) is a binary string of t − t + 1 successive symbols of some sequence2 {u : t ∈ ℕ}, w(⋅) is 1 2 1 2 2 1 1 1 t 2 2 1 the Hamming weight, L{î} is the probability distribution of a î, B(è) denotes the Bernoulli distribution with parameter è ∈ [0, 1]: P{î = 1} = 1 − P{î = 0} = è, Õ(⋅) is the distribution function for the standard normal law N(0, 1). According to [5], an adequate model of the cover sequence for embedding a message is a binary sequence xT = (x , x , . . . , x ) ∈ V , x ∈ V, t = 1, . . . , T, of length T, which is a homogeneous rst-order binary 1 1 2 T T t Markov chain with symmetric matrix of one-step transition probabilities P: 1 1 + ù 1 − ù 1 P = P(ù) = œ  , P{x ⊕ x } = (1 − ù), |ù| < 1. (1) 2 1 − ù 1 + ù t t+1 2

Here, ù is the parameter of the model: the case ù = 0 corresponds to a scheme of independent trials which was examined in [1]. The case ù > 0 takes into account an attraction-type dependence, and the ù < 0, a repulsion- type dependence. We note that the Markov chain (1) satises the conditions [7] and has the uniform stationary distribution (1/2, 1/2). In what follows, we shall assume that the Markov chain (1) is stationary, and so its initial probability distribution agrees with the uniform distribution. In practical applications [5] a message is subject to a cryptographic transformation before being em- bedded in the cover sequence, and hence we assume in what follows that a message îM = (î , . . . , î ) ∈ 1 1 M V ,M ≤ T M M , is a sequence of independent Bernoulli random variables: L{î } = B(è ), P{î = j} = è , j ∈ V, è = 1 − è , t = 1, . . . , M. t 1 t j 1 0 (2)

The stego-key ãT = (ã , . . . , ã ) ∈ V species the points (time instants) at which the message bits îM 1 1 T T 1 are embedded in the sequence xT. We introduce a special (q, r)-block model of the stego-key ãT (q, r ∈ ℕ, 1 1 r ≤ q), assuming that the length of the sequence xT is a multiple of q: T = Kq. 1 æ ∈ V, L{æ } = B(ä) k = 1, . . . , K Let k k , , be auxiliary independent random variables, which govern the choice of the blocks {x = xkq } for embedding the message îM: if æ = 1, then r successive bits (k) (k−1)q+1 1 k r x æ = 0 of the message are embedded in randomly chosen bits of the block (k); if k then no embedding in the block x is performed; G(q,r) = {g(q,r), . . . , g(q,r)} = {uq ∈ V : w(uq) = r} is the set consisting of (k) 1 C 1 q 1 Cr lexicographically ordered binary vectors of lengthr q equipped with the Hamming weight r; g , g ,... are q q 1 2 independent random variables, g has uniform probability distribution on the set {1, . . . , Cr }, k q 1 P{ã = g(q,r)|æ = 1} = P{g = i} = . (k) i k k Cr q

In the (q, r)-block model of embedding, the sequence ãT consists of blocks of length q: ã = ãq, ã = 1 (1) 1 (2) ã2q , . . . , ã = ãKq , q+1 (K) (K−1)q+1 (0, . . . , 0), æ = 0, .  k ã = q k = 1, . . . , T/q, (k) > (3) g(q,r) ∈ G(q,r), æ = 1, g = i, F i k k the parameter ä characterizes the fraction of embeddings. We note that for the (q, r)-block model of embed- ding the maximum capacity for the stego system is Tr/q bits, while the cardinality of the set of all possible stego-keys Ã(q,r) = {G(q,r) ∪ {(0, . . . , 0)}}T/q Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 15 is |Ã(q,r)| = (1 + Cr )T/q. In the case q = r = 1, we have the classical model [5] of a bit-wise embedding, q |Ã(1,1)| = 2T. For the most commonly encountered in steganography methods of embedding (the ‘LSB replacement’ and the ‘± embedding’ [8]) the random stego-sequence YT = (Y ,...,Y ) is generated by the sequences 1 1 T {x }, {î } {ã } t t , t via the function transform x , ã = 0, Y = x ⊕ ã x ⊕ ã î = ž t t (4) t t t t t ó î , ã = 1, ó t t where ó = ∑t ã . The sequences {x }, {î }, {ã } are assumedt to be jointly independent. t j=1 j t t t We note that for r = 1 the model presented here coincides with the q-block model considered in [5]. è = è = 1/2 From the practical point of view, the case with 0 1 in (2), which presents the greatest challenge for embedding detection, is the most noteworthy in the framework of the Markov model of embedding (1)–(4). In this case the one-dimensional distribution of probabilities is not distorted for an embedding in (4), P{Y = 1} = P{Y = 0} = P{x = 1} = P{x = 0} = 1/2, t = 1, 2, . . . , T. t t t t (5) Another justication of the relevance of the case considered in the present paper is the practical utilization of preliminary cryptographic transformation of a message that removes the nonuniformity in the probability distribution of symbols.

3 Embedding detection based on the runs statistics

3.1 Using the total number of runs statistics

We introduce two hypotheses concerning the fraction ä ∈ [0, 1] of embeddings: H : {ä = 0}, H : {ä > 0}. 0 1 (6) The hypothesis H means that there are no embeddings and the stego-sequence YT agrees with the cover 0 1 sequence xT. The composite alternative H means there exist embeddings with some unknown fraction ä > 1 1 0. If the parameter of the cover sequence ù is known, then the null hypothesis, which will be denoted by H H H 0,ù, is simple; otherwise, 0 is also a composite hypothesis. If the hypothesis 0 holds, then the probability P P P will be denoted by 0, otherwise, by ä. One similarly denotes the moments of random variables. P P The distributions 0 and ä were found in [5]. Lemma 1. Under the hypothesis H , the probability distribution of the stego-sequence YT is as follows 0,ù 1 P {YT = yT} = P {xT = yT} = 2−T(1 − ù)B −1(1 + ù)T−B , 0 1 1 0 1 1 T T where T−1 B = B (yT) = 1 + H y ⊕ y T T 1 t t+1 t=1 is the minimal sucient statistics with H . 0,ù B The statistics T is called the ‘runs test’ in [9] (it means the total number of runs). By virtue of (1), under the H I{Y ⊕ Y = 1} hypothesis 0 the sequence of indicators t t+1 consists of independent random variables with B(2−1(1 − ù)) B Bernoulli distribution . Using the exact binomial probability distribution of the statistics T with the known value of ù, one may construct a randomized statistical test for the embedding existence with the given probability of the rst kind error á,. However, for practical purposes, it is more convenient to use its asymptotic variant as T → ∞, which is given by the critical region

X B+ = {yT :B ≥ 1 + 1 T(1 − ù) − 1 t yT(1 − ù2)} for ù > 0, 1á 1 T 2 2 á X B− = {yT :B ≤ 1 + 1 T(1 − ù) + 1 t yT(1 − ù2)} for ù < 0, (7) 1á 1 T 2 2 á 16 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains

t á Õ(t ) = á where á is the -quantile of the standard normal distribution: á . Theorem 1. Let the model of embedding (4) hold. Then as T → ∞ the asymptotic size of test (7) for the hy- potheses H , H based on the total number of runs statistics B coincides with a preassigned signicance level 0,ù 1 T á ∈ (0, 1). The asymptotic expression for the power of this test in the case of the (1, 1)-model of embedding and ñ of the family of simple contiguous alternatives H : {ä = }, â > 0, is as follows: 1,ä Tâ

6.1, 0 < â < 1/2, 6 |ù| WB+ = WB− → Õ œt + 2ñ  , â = 1/2, (8) 1 1 6> á $ 2 6 1 − ù Fá, â > 1/2. Proof. H Under the hypothesis 0, the De Moivre–Laplace limit theorem implies that B − 1 − 1 T(1 − ù) L ® T 2 ¯ → N(0, 1) as T → ∞. (9) 0 1 xT(1 − ù2) 2 Hence, using (9) we have, as T → ∞,

P {X B+} → á, P {X B−} → á. 0 1á 0 1á q = r = 1 H In the case , under the alternative 1, it follows from (1), (2), (4), (5) that the initial rst moment of B the random variable T is equal to

T−1 E {B } = 1 + H E {Y ⊕ Y } = 1 + 2−1(T − 1)(1 − (1 − ä)2ù). ä T ä t t+1 t=1 H Using similar arguments, we calculate the initial second moment under the alternative 1. We have

T−1 T−1 E {B2 } = E ®(1 + H Y ⊕ Y )(1 + H Y ⊕ Y )¯ = ä T ä t t +1 t t +1 t =1 t =1 1 1 2 2 1 T−1 2 = E ® H (Y ⊕ Y )(Y ⊕ Y )¯ + 2E {B } − 1 = ä t t +1 t t +1 ä T t ,t =1 1 1 2 2 1 2 T−2 = 3E {B } − 2 + 2 H H P {Y = ℎ, Y = 1 − ℎ, Y = ℎ}+ ä T ä t t+1 t+2 t=1 ℎ∈V T−2 T−ó−1 + 2 H H H P {Y = ℎ ,Y = 1 − ℎ ,Y = ℎ ,Y = 1 − ℎ } = ä t 1 t+1 1 t+ó 2 t+ó 2 ó=2 t=1 ℎ ,ℎ ∈V = 31E2 {B } − 2 + 2−1(T − 2)(1 + ù(ù − 2)(1 − ä)2)+ ä T T−2 + 4 H(T − ó − 1)P {Y = 0, Y = 1, Y = 0, Y = 1}+ ä t t+1 t+ó t+ó+1 ó=2 + P {Y = 0, Y = 1, Y = 1, Y = 0} = ä t t+1 t+ó t+ó+1 = 1 + 2−13(T − 1)(1 − ù(1 − ä)2) + 2−1(T − 2)(1 + ù(ù − 2)(1 − ä)2)+ + 2−2(T − 2)(T − 3)(1 + ù(ùä2 − 2ùä − 2 + ù)(1 − ä)2).

For the variance, we have

D {B } = 1 T(1 − (1 − ä)2ù2(1 − 6ä + 3ä2) − 1 (1 − (1 − ä)2ù2(1 − 10ä + 5ä2)) = ä T 4 4 = T( 1 (1 − (1 − ä)2ù2(1 − 6ä + 3ä2)))(1 + o(1)), T → ∞. 4 Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 17

{Y } By the construction (4) the random sequence t satises the strong mixing property [10, 11], and hence the central limit theorem for weakly dependent random variables holds, B − 1 − 1 T(1 − (1 − ä)2ù) L ® T 2 ¯ → N(0, 1). (10) ä xD {B } ä T ñ In the case ù > 0, we take into account (10) and substitute ä = as T → ∞ into the expression for the Tâ power. As a result, we have lim WB+=lim P {X B+}=lim P {B ≥1 + 1 T(1 − ù) − 1 t yT(1 − ù2)}= 1 ä 1á ä T 2 2 á B − E {B } 1 + 1 T(1 − ù) − E {B } − 1 t xT(1 − ù2) =lim P ® T ä T ≥ 2 ä T 2 á ¯= ä xD {B } xD {B } ä T ä T Tùä(2 − ä) + t xT(1 − ù2) = Õ ¤lim á ¥ . xT(1 − (1 − ä)2ù2(1 − 6ä + 3ä2)) Analyzing various values of â in this expression, we arrive at (8). The case ù < 0 is dealt with similarly.

3.2 Using the short runs statistics

Y ,...,Y ∈ V Let us construct the sequence of indicators of sign changes in the sequence 1 T T: z = Y ⊕ Y ∈ V, t = 1, . . . , T − 1. t t t+1 (11) Next, we dene the set of patterns in sequence (11): {b , b , . . .}, b = (1, 0, . . . , 0, 1), ó ∈ ℕ ∪ {0}; 1 2 ó  ó b ó 0 here ó is the chain of successive ’s bounded from the left and right by 1’s. Such patterns specify series ó + 1 {Y } C of 0’s and 1’s of length in the stego-sequence t . Further, we consider the disjoint random events ó, ó ∈ ℕ ∪ {0}: C = {(z , z , . . . , z ) = b }. ó t t+1 t+ó+1 ó Lemma 2. Let the model of embedding hold, q = r = 1. Then under the alternative H the probability (4) 1 distribution of the random events C is given by ó P {C } = P {C } + a (ä, ù) = 2−(ó+2)(1 + ù)ó(1 − ù)2 + a (ä, ù), ä ó 0 ó ó ó (12) where a (ä, ù) → 0 as ä → 0, |ù| < 1. ó Proof. Using the law of total probability for the model of embedding under consideration we nd that P {C } = H P {ãt+ó+1=uó+2}P {(z , z , . . . , z )=b |ãt+ó+1=uó+2}= ä ó ä t 1 ä t t+1 t+ó+1 ó t 1 u ∈V ó+2 = (1 −1ä)ó+ó+22P {C } + ä H äw(u )−1(1 − ä)ó+2−w(u )× 0 ó ó+2 ó+2 1 1 u ∈V : w(u )>0 ó+2 ó+2 ×P {(z , z , . . . , z ) = b1 |ãtó++2ó+1 =1 uó+2} ÚÚÚ→ 2−(ó+2)(1 + ù)ó(1 − ù)2. ä t t+1 t+ó+1 ó t 1 ä→0

Theorem 2. Under the hypotheses of Lemma 2, the function a (ä, ù) has the asymptotic expansion ó a = äa(1)(ù) + O „ä2 , ó ó −1 . 2 ù(2 − ù), ó = 0, (13) a(1)(ù) = 2−2ù(1 − ù)(1 + ù), ó = 1, ó > F 2−ó−1ù(1 − ù)(1 + ù)ó−2(ù2 + (ó + 1)ù − ó + 2), ó ≥ 2. 18 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains

Proof. We partition the set U = {uó+2 = (u , . . . , u ) ∈ V : w(uó+2) = 1}, |U| = ó + 2, of binary ó+2,1 1 1 ó+2 ó+2 1 vectors of length ó + 2, ó ≥ 3, with unit Hamming weight into three disjoint subsets: U =U(0) ∪ U(1) ∪ U(2) , ó+2,1 ó+2,1 ó+2,1 ó+2,1 U(j) ={uó+2 ∈ U : u + u = 1}, j ∈ {0, 1}, ó+2,1 1 ó+2,1 j+1 ó+2−j U(2) ={uó+2 ∈ U : Hó u = 1}. ó+2,1 1 ó+2,1 j=3 j Arguing as in the proof of Lemma 2, we have P {C } = P {C } − ä(ó + 2)P {C }+ ä ó 0 ó 0 ó +ä H H P {(z , z , . . . , z ) = b |ãt+ó+1 = uó+2} + O „ä2 . (14) ä t t+1 t+ó+1 ó t 1 j∈{0,1,2} u ∈U ó+2 (j) 1 ó+2,1 Let us consider the case ó ≥ 2. The subset U(j) , j ∈ {0, 1, 2}, contains sequences uó+2 ∈ U such that ó+2,1 1 ó+2,1 the events C ∩ {ãt+ó+1 = uó+2} are equiprobable under the alternative H : ó t 1 1 P {C ∩ {ãt+ó+1 = uó+2}} = ä ó t 1 ä(1 − ä)ó+22−ó−3(1 − ù)(1 + ù)ó, uó+2 ∈ U(0) , 6. 1 ó+2,1 (15) = ä(1 − ä)ó+22−ó−3(1 − ù)2(1 + ù)ó, uó+2 ∈ U(1) , 6> 1 ó+2,1 ä(1 − ä)ó+22−ó−3(1 − ù)2(1 + ù2)(1 + ù)ó−2, uó+2 ∈ U(2) . F 1 ó+2,1 Now (13) with ó ≥ 2 follows by substitution of (15) into (14). The case ó < 2 in (13) is considered similarly. Theorem 3. Under the hypotheses of Lemma 2, the function a (ä, ù) has the second-order asymptotic expansion ó a = äa(1)(ù) + ä2a(2)(ù) + O „ä3 , ó ó ó where a(2)(ù)=2−2ù(−2 + ù), a(2)(ù)= 2−3ù(−1 + 4ù + ù2), 0 1 a(2)(ù)=2−4ù2(−7+10ù+ù2), a(2)(ù)= 2−5ù(1−12ù+2ù2+16ù3+ù4), 2 3 a(2)(ù)=2−ó−2ù(1+ù)ó−4(ó−2+ù(2ó2−14ó+13)+2ù2(−2ó2+7ó − 8)+ ó + 2ù3(ó2 − 3ó + 9) + ù4(5ó + 2) + ù5), ó ≥ 4. (16) U The proof is similar to that of Theorem 2, the set of stego-keys ó+2,2 being split into classes of equiprobable events. H From Theorems 2, 3 it follows that under the alternative 1 (existence of embeddings) the probability distribution of the total number of runs of a given length diers from that distribution under the hypothesis H ù > 0 C , C , C ä 0 1 0. In particular, for the probabilities of the events 0 1 2 increase as increases from to , ó > ó = 2 + ù(3 + ù)(1 − ù)−1 P {C } ä whereas for ù the probability ä ó decreases with the increasing of . This being so, we consider the statistics

T−2 T−2 B = H z , B = H z z , T,1 t T,2 t t+1 (17) t=1 t=1 B 1 {y } where the statistics T,2 is the total number of series of 0’s and of 1’s of length in the sequence t , and the B B B = B − z − 1 statistics T,1 is related to the total number of runs statistics T by the relation T,1 T T−1 . H Using Theorem 1 from [5] one may show that under the alternative 1 the initial rst-order moments of (B , B ) the bivariate statistics T,1 T,2 , as given by (17), read as E {B } = (T − 2) 1 (1 − (1 − ä)2ù) = E {B } + T 1 ä(2 − ä)ù + o(T), T → ∞, ä T,1 2 0 T,1 2 E {B } = (T − 2) 1 (1 − (1 − ä)2ù(2 − ù)) = E {B } + T 1 ä(2 − ä)ù(2 − ù) + o(T), T → ∞. (18) ä T,2 4 0 T,2 4 From (18) it is seen that for ù > 0 the mean number of sign changes or of two neighbouring sign changes is larger when the embeddings exist than in the opposite case. Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 19

Theorem 4. Let the model of embedding (4) holds. Then, as T → ∞, the statistical test for the hypotheses H , 0,ù H of the asymptotic signicance level á ∈ (0, 1) based on the bivariate statistics (17) is given by the critical 1 region B X = {yT :(B , B ) ∈ D }, (19) 1á 1 T,1 T,2 1,2 1,2 where the region D is as follows 1,2

D = ¦(B , B ):(B − Tì )ù ≥ 0, (B − Tì2 )ù ≥ 0, 1,2 T,1 T,2 T,1 0,1 T,2 0,1 (20) (5−3ù)(1−ù) 1−ù B − ì − B − ì œ T,1 0,1 ὔ ¤ 16 4 ¥ œ T,1 0,1  ≥ Tc §, 2 1−ù 1 2 1,2 B − ì − B − ì T,2 0,1 4 4 T,2 0,1 ð − arccos Œ2y 1−ù  ì = 1 (1 − ù), c = 2−5(1 − ù2)2 ln 5−3ù , 0,1 2 1,2 2ðá that is, B P {X } = P {(B , B ) ∈ D } → á. 0 1á 0 T,1 T,2 1,2 1,2 Proof. H {z } Using the fact that under the hypothesis 0,ù the random variables t are independent and have the B(2−1(1 − ù)) z z z z Bernoulli distribution , and since the random variables t t+1 and s s+1 are independent if |t − s| > 1, we nd that E {B } = T 1 (1 − ù)(1 + o(1)), E {B } = T 1 (1 − ù)2(1 + o(1)), 0 T,1 2 0 T,2 4 D {B } = (T − 2)D {z } = T 1 (1 − ù2)(1 + o(1)), 0 T,1 0 t 4 D {B } = (T − 2)D {z z } + 2 H cov {z z , z z } = 0 T,2 0 t t+1 0 t t+1 s s+1 1≤t

1 1−ù Ò = (1 − ù2) ¤ 4 4 ¥ . 0 1−ù (5−3ù)(1−ù) (21) 4 16 D ù > 0 + In Fig. 1 the region 1,2 for the case is marked by the ‘ ’ sign. Such a form of the domain follows (B , B ) from the asymptotical normality of the bivariate statistics T,1 T,2 and from expressions (18). To calculate D Ò− the probability of the rst kind error, we use the linear transform of the region 1,2. We apply the matrix 0 1 to the unit vectors (1, 0), (0, 1) ∈ R2 and construct the Gram matrix: 2

u = Ò− (1, 0)ὔ, u = Ò− (0, 1)ὔ, 1 0 1 2 0 1 2 2 ὔ ὔ 6 (5−3ù)(1−ù) 1−ù u u u u 2 − œ 1 1 1 2  = Ò−1 = ¤ 16 4 ¥ . ὔ ὔ 0 2 2 1−ù 1 u u u u (1 − ù ) − 1 2 2 2 4 4 20 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains

BT,2

 E0 BT,2

 B E0 BT,1 T,1

Figure 1. D ù > 0 ä ∈ [0, 1] The region 1,2 for and the scattering ellipses for .

õ u u The angle between the vectors 1 and 2 is expressed in terms of the coecient of correlation: uὔ u õ = arccos ¤ 1 2 ¥ = ð − arccos corr {B , B } = ð − arccos ”2y 1−ù • . |u ||u | 0 T,1 T,2 5−3ù 1 2 Because of the joint asymptotic normality of statistics (17), the random variable

1 B − ì B − ì Q = œ T,1 0,1 ὔ Ò−1 œ T,1 0,1  1,2 B − ì2 0 B − ì2 T T,2 0,1 T,2 0,1 has an asymptotically exponential distribution with the parameter 1/2 as T → ∞. Hence, from the equation

õ (ð − arccos(2y 1−ù )) P {(B , B ) ∈ D } = P {Q ≥ c} = 5−3ù = á 0 T,1 T,2 1,2 0 1,2 2ð 2ðec/2 26 we nd c = c (the ellipse equation in Fig. 1: Q = c). The case ù < 0 is considered similarly with (1 − ù2)2 1,2 1,2 u = Ò− (−1, 0)ὔ u = Ò− (0, −1)ὔ 1 0 1 , 2 0 1 . 2 2 Lemma 3. Under the (1, 1)-model of embedding and the alternative H the random variables z , z are inde- 1 t s pendent if |t − s| ≥ 2, the random variables z , z z are independent if |t − s| ≥ 2, and the random variables t s s+1 z z , z z are independent if |t − s| ≥ 3. t t+1 s s+1 Proof. z , z k ≥ 2 E {z z } k ≥ 2 Let us consider the random variables t t+k, , and nd the expectation ä t t+k , : E {z z }=P {z z = 1}=2P {Y = 0, Y = 1, Y = 0, Y = 1}+ ä t t+k ä t t+k ä t t+1 t+k t+k+1 +2P {Y = 0, Y = 1, Y = 1, Y = 0} = ä t t+1 t+k t+k+1 = 2 H P {(Y ,Y ,Y ,Y )=(0, 1, 0, 1), (ã , ã , ã , ã )=u}+ ä t t+1 t+k t+k+1 t t+1 t+k t+k+1 u∈V +2 H 4P {(Y ,Y ,Y ,Y )=(0, 1, 1, 0), (ã , ã , ã , ã )=u} = ä t t+1 t+k t+k+1 t t+1 t+k t+k+1 u∈V = 2 H (1 − 4ä)4(1 − ù)2(1 − cùk−1) + ä(1 − ä)3(2(1 − ù)(1 − cùk−1) + 2(1 − ù)(1 + cùk))+ 16 c∈{1,−1} +ä2(1 − ä)2(6 − 2ù − cùk−1 + 2cùk − cùk+1) + 4ä3(1 − ä) + ä4 = „ 1 (1 − ù(1 − ä)2) 2 = (E {z })2. 2 ä t z , z cov {z , z } = 0 k ≥ 2 Since the random variables t t+k are binary and since ä t t+k for , then such variables are z z z |t − s| ≥ 2 independent. A similar argument shows that the random variables t, s s+1 are independent if z z z z |t − s| ≥ 3 and that the random variables t t+1, s s+1 are independent if . Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 21

Now we will employ Lemma 3 to nd asymptotic expressions for the rst and second moments of the bivariate (B , B ) H statistics T,1 T,2 under the alternative 1,ä. The rst-order moments were found in (18). In the course of the proof of Theorem 1 it was shown that D {B } = T 1 (1 − (1 − ä)2ù2(1 − 6ä + 3ä2))(1 + o(1)), T → ∞. ä T,1 4 In view of Lemma 3 we have, as T → ∞,

cov {B , B } = HT−2 cov {z , z z } = ä T,1 T,2 t,s=1 ä t s s+1 = (T − 2) cov {z , z z } + (T − 3) cov {z , z z }+ ä t t t+1 ä t t+1 t+2 +(T − 3) cov {z , z z } + (T − 4) cov {z , z z } = ä t t−1 t ä t t−2 t−1 = 2T(cov {z , z z } + cov {z , z z })(1 + o(1)), ä t t t+1 ä t t+1 t+2 cov {z , z z } = P {z z = 1}(1 − P {z = 1}) = ä t t t+1 ä t t+1 ä t = 1 (1 − (1 − ä)2ù(2 − ù))(1 − 1 (1 − (1 − ä)2ù)) = 4 2 = 1 (1 − (1 − ä)2ù(1 − ù) − (1 − ä)4ù2(2 − ù)), 8 cov {z , z z } = P {z z z = 1} − P {z = 1}P {z z = 1}. ä t t+1 t+2 ä t t+1 t+2 ä t ä t+1 t+2 Using the law of total probability, we nd, for the model of embedding (1, 1), P {z z z = 1} = 2P {Y = 0, Y = 1, Y = 0, Y = 1} = ä t t+1 t+2 ä t t+1 t+2 t+3 = 2 H P {Y = 0, Y = 1, Y = 0, Y = 1, ãt+3 = u} = ä t t+1 t+2 t+3 t u∈V =4 1 (1 − (1 − ä)2ù(3 − 2ù + ù2) + (1 − ä)4ù2), 8

cov {z , z z } = P {z z z = 1} − P {z = 1}P {z z = 1} = ä t t+1 t+2 ä t t+1 t+2 ä t ä t+1 t+2 = P {z z z = 1} − 1 (1 − (1 − ä)2ù(3 − ù) + (1 − ä)4ù2(2 − ù)) = ä t t+1 t+2 8 = 1 (1 − ä)2ä(2 − ä)ù2(1 − ù). 8 We thus have cov {B , B } = T 1 (1 − (1 − ä)2ù(1 − ù)2 − (1 − ä)4ù2(3 − 2ù))(1 + o(1)). ä T,1 T,2 4 T → ∞ D {B } Using Lemma 3 as we nd the variance ä T,2 : D {B } = (T − 2)D {z z } + 2(T − 3) cov {z z , z z }+ ä T,2 ä t t+1 ä t t+1 t+1 t+2 +2(T − 4) cov {z z , z z }, ä t t+1 t+2 t+3 D {z z } = P {z z = 1}(1 − P {z z = 1} = ä t t+1 ä t t+1 ä t t+1 = 1 (1 − (1 − ä)2ù(2 − ù))(3 + (1 − ä)2ù(2 − ù)), 16 cov {z z , z z } = P {z z z = 1} − (P {z z = 1})2 = ä t t+1 t+1 t+2 ä t t+1 t+2 ä t t+1 = 1 (1 − (1 − ä)22ù(1 − ù + ù2) − (1 − ä)4ù2(2 − 4ù + ù2)), 16 cov {z z , z z } = P {z z z z = 1} − (P {z z = 1})2. ä t t+1 t+2 t+3 ä t t+1 t+2 t+3 ä t t+1 P {z z z = 1} A similar argument as for ä t t+1 t+2 shows that P {z z z z =1}=2P {Y =0, Y =1, Y =0, Y =1, Y =0} = ä t t+1 t+2 t+3 ä t t+1 t+2 t+3 t+4 = 2 H P {Y = 0, Y = 1, Y = 0, Y = 1, Y = 0, ãt+4 = u} = ä t t+1 t+2 t+3 t+4 t u∈V = 1 (1−5(1−ä)2(ù(4+ä3)−3ù2(2−2ä+ä2)+ù3(4−4ä+2ä2 −ä3)−ù4)), 16 cov {z z , z z } = 1 (1−ä)2äù(1−ù)(−ä2 +ù(2−ä−ä2)−ù2(2−ä)). ä t t+1 t+2 t+3 16 22 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains

As a result, we have

D {B } = T 1 (5 − (1 − ä)2(2(4 + ä3)ù + 2(1 − 10ä + 5ä2)ù2− ä T,2 16 −2(4 − 16ä + 8ä2 + ä3)ù3 + (3 − 10ä + ä2)ù4))(1 + o(1)).

H Using the strong mixing property [10], one may show that under the alternative 1,ä the distribution of the random vector 1 „B − 1 T(1 − (1 − ä)2ù), B − 1 T(1 − (1 − ä)2ù(2 − ù)) ὔ $T T,1 2 T,2 4 T → ∞ N ((0, 0)ὔ,Ò ) Ò = (ò ) as is asymptotically normal 2 1 with zero mean and covariance matrix 1 1,ij , i, j = 1, 2, where

ò = 1 (1 − (1 − ä)2ù2(1 − 6ä + 3ä2)), 1,00 4 ò = ò = 1 (1 − (1 − ä)2ù(1 − ù)2 − (1 − ä)4ù2(3 − 2ù)), 1,01 1,10 4 ò = 1 (5 − (1 − ä)2(2(4 + ä3)ù + 2(1 − 10ä + 5ä2)ù2− 1,11 16 −2(4 − 16ä + 8ä2 + ä3)ù3 + (3 − 10ä + ä2)ù4)).

Unfortunately, for the test (19) based on the short runs statistics we have not succeed to obtain an explicit expression for the power and to examine it, because the covariance matrix depends on ä. This dependence is illustrated in Fig. 1, which depicts the scattering ellipses (corresponding to the asymptotic matrices) when the parameter ä is increasing from 0 to 1. The following important property of the asymptotically normal H ä 0 distribution of the random vector (17) under the alternative 1,ä is worth pointing out: with changing from 1 (B , B ) to the centre of the asymptotically normal distribution of the bivariate statistics T,1 T,2 always lies on the line b = 1 TùÄ + 1 T(1 − ù), ¦ 1 2 2 Ä = ä(2 − ä). (22) b = 1 Tù(2 − ù)Ä + 1 T(1 − ù)2, 2 4 4 H ,H Taking into account the property (22), we construct a statistical test for the hypotheses 0,ù 1 based on (B , B ) the statistics obtained as the orthogonal projection of the statistics T,1 T,2 on the line (22). Such a test for ù > 0 is given by the critical region

h+ T 1 1 1 2 X = {y : B + (2 − ù)B ≥ T(1 − ù) + T(1 − ù) (2 − ù) − t yTdh}, (23) 1á 1 T,1 2 T,2 2 8 á

−6 2 2 3 4 dh = 2 (1 − ù )(68 − 100ù + 65ù − 20ù + 3ù ).

Theorem 5. Let the model of embedding (4) hold and let ù > 0. Then, as T → ∞, the asymptotic size of test (23) for the hypotheses H , H based on the projection of the short runs statistics 0,ù 1

h = B − 1 T(1 − ù) + 1 (2 − ù)(B − 1 T(1 − ù)2) (24) T,1 2 2 T,2 4 coincides with the signicance level á ∈ (0, 1). The asymptotic power of this test for the (1, 1)-model of embed- ding and for the family of contiguous alternatives H : {ä = ñ } is as follows: 1,ä $T

1 2 h ñù(1 + (2 − ù) ) W + → Õ ât + 4 ê ,T → ∞. (25) 1 á ydh

Proof. The angle between the line (22) and the b -axis is õ = arctan ( 1 (2 − ù)), and hence, the orthogonal 2 2 (B , B ) projection of the point T,1 T,2 on this line is given by (B − 1 T(1 − ù)) cos õ + (B − 1 T(1 − ù)) sin õ. T,1 2 T,1 2 Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 23

Multiplying this expression by cosec õ, we get the random variable h, which, according to (21), has the asymp- N (0, d ) H P {X h+} → á T → ∞ totically normal distribution 1 h under the hypothesis 0,ù. Hence, 0 1á as . Let us nd the power of test (23) as T → ∞ for contiguous alternatives of the form indicated in the theorem. We have

h+ 1 1 1 2 W =P {B + (2 − ù)B ≥ T(1 − ù) + T(1 − ù) (2 − ù) − t yTdh} 1 ä T,1 2 T,2 2 8 á = P B + 1 (2 − ù)B − E {B } − 1 (2 − ù)E {B } ≤ ä T,1 2 T,2 ä T,1 2 ä T,2 1 1 2 ≤ Tä(2 − ä)ù + Tä(2 − ä)ù(2 − ù) + t yTdh → 2 8 á 1 Tä(2 − ä)ù + 1 Tä(2 − ä)ù(2 − ù)2 + t yTd } á h → Õ âlim 2 8 ê . yT(ò + 1 (2 − ù)2ò + (2 − ù)ò ) 1,00 4 1,11 1,01 ñ Substituting ä = in this expression as T → ∞, we nd that $T

$Tñù + 1 $Tñù(2 − ù)2 + t yTd + O (1) 1 2 á h ñù(1 + (2 − ù) ) Wh+ → Õ âlim 4 ê → Õ ât + 4 ê . 1 á 1 yT „dh + O „ ydh $T

4 Embedding detection on the basis of the likelihood ratio statistics

ù ù ≤ |ù| < Let us now consider the case when the parameter in (1) is unknown and separated from the zero: 0 1 ù > 0 , where 0 is the known boundary value. We construct the likelihood function for the observed stego-sequence yT ∈ V . Following [5], we partition 1 T V t t + 1 the set t of binary -dimensional vectors into disjoint subsets, V = Ã(t) ∪ Ã(t) ∪ ... ∪ Ã(t), (26) t 0 1 t where Ã(t) = {ut ∈ V : u = 1}, 0 1 t t Ã(t) = {ut ∈ V : u = u = 0}, 1 1 t t−1 t Ã(t) = {ut ∈ V : u = 0, u = ... = u = 1, u = 0}, 1

T L(ù, ä) = P {YT = yT} = 2−T H (1 − ä)b (u )(ä/Cr )b (u ) I ÿ (ut , yt ), ä 1 1 T q T t 1 1 0 1 r 1 u ∈Ã t=1 T (q,r) where 1 .1, ut ∈ Ã(t), 6 1 0 ÿ (ut , yt ) = 1 + (−1)y +y ùj, ut ∈ Ã(t), 1 ≤ j < t, t 1 1 6> 1 j 6 t−j t 1, ut ∈ Ã(t). F 1 t 24 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains

The proof is similar to that of Theorem 5 for the q-block model of embedding in [5]. H ,H To test the hypotheses 0 1 on the existence of embeddings we now construct the statistical likelihood ë H ,H ratio test [12]. The statistics T of this test for the hypotheses 0 1 takes the form L(ù,̂ 0) ë = ë (yT) = −2 ln ≥ 0, (28) T T 1 max{L(ù̂ , ä̂ ), L(ù,̂ 0)} 1 1 ù,̂ (ù̂ , ä̂ ) where 1 1 are the maximum-likelihood estimates, which were constructed in [5] under the hypotheses H H 0 and 1 respectively. The statistics (28) introduced above is equivalent to the likelihood ratio statistics sup P {y , . . . , y } ä 1 T |ù|<1,ä>0 . supP {y , . . . , y } 0 1 T |ù|<1 Besides, according to [5], arg max P {y , . . . , y } = (ù̂ , ä̂ ), arg maxP {y , . . . , y } = ù.̂ ä 1 T 1 1 0 1 T |ù|<1,ä>0 |ù|<1 á ∈ (0, 1) ë The statistical test of size based on the statistics T is dened by the critical region X ë = {yT ∈ V : ë ≥ ë }, (29) 1á 1 T T á ë > 0 where á is the solution of the equation sup P {ë ≥ ë} = sup (1 − F (ù, T, ë )) = á. 0 T 0 T (30) ù ≤|ù|<1 ù ≤|ù|<1 F (ù, T, ë ) 0 0 H Here, 0 T is the distribution function of the statistics (28) under the null hypothesis 0. ë M To estimate the value of á satisfying (30), we use the Monte Carlo method: we model 0 samples of T ù a Markov chain of length with the parameter 0. For each sample we calculate the value of the statistics ë(1), . . . , ë(M ) ë by (28). Let be the calculated values. Then á can be estimated by the sample quantile of level 1 − á: 0 ë̂ = ë ; á ([(1−á)M ]) (31) M → ∞ 0 the accuracy of this estimate increases with 0 . So, the statistical tests (29) for the embedding exis- tence assumes the form: H H p ≥ á p < á , the hypothesis 0 (respectively, 1) is adopted if ( ) 1 p = Œ1 + HM I{ë(i) > ë } . T M + 1 i=10 0 The available asymptotic properties of the likelihood ratio test [12, 13] may be used under the regularity conditions [12] guaranteeing the existence, uniqueness, and asymptotic normality of the maximum likelihood estimates of the parameters ù and ä. Theorem 6. Under the model of embedding (4), as T → ∞ the test of asymptotic signicance level á ∈ (0, 1) based on the likelihood ratio statistics for the composite null hypothesis is given by the critical region (29) with the threshold ë = ö2 ; that is, á 1−á,1 P {X ë } = P {ë ≥ ö2 } → á. 0 1á 0 T 1−á,1 This test is consistent under xed alternatives ä = ä > 0 1 : Wë = P {X ë } → 1. 1 ä 1á The proof follows the argument of [13] with the use of the central limit theorem for weakly dependent random variables [10]. Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 25

5 Statistical estimation of embeddings points

H If the alternative 1 is adopted, then there arises the problem of estimation of points of embeddings—these t ∈ {1, . . . , T} {x } being the time instants at which in accordance with (4) a bit of the sequence t is replaced {î } by a bit of the hidden message t . Theorem 7. Let ãT = (ã , . . . , ã ) ∈ Ã(q,r) be the key sequence corresponding to the (q, r)-model of embedding, 1 1 T yT ∈ V be the observed stego-sequence, ãT̂ = f(yT) is some statistical estimate of the key sequence ãT based 1 T 1 1 1 on observations yT. The minimum of the error probability in estimating the stego-key 1 P {ãT̂ ̸= ãT} → min ä 1 1 is attained for the statistics ãT∗̂ = arg max P {ãT = uT|YT = yT}, (32) 1 ä 1 1 1 1 u ∈Ã T (q,r) which maximizes the a posteriori probability of1 the stego-key. The minimum of error probability is as follows:

r∗(ù, ä, T) = minP {ãT̂ ̸= ãT} = ä 1 1 f(⋅) = 1 − H P {YT = yT} max P {ãT = uT|YT = yT}. (33) ä 1 1 ä 1 1 1 1 u ∈Ã y ∈V T (q,r) T 1 1 T Proof. We choose an arbitrary statistics

ãT̂ = f(YT):V → Ã(q,r), (34) 1 1 T and calculate for it the corresponding error probability for the estimate of the true stego-key ãT ∈ Ã(q,r): 1 r(f; ù, ä, T) = P {ãT̂ ̸= ãT} = 1 − P {ãT̂ = ãT}. ä 1 1 ä 1 1 After equivalent transformations, using (34) and the rlaw of total probability, we nd that

r(f; ù, ä, T) = 1 − H P {ãT̂ = ãT, ãT = uT} = 1 − H H P {f(YT) = ãT, ãT = uT,YT = yT} = ä 1 1 1 1 ä 1 1 1 1 1 1 u ∈Ã u ∈Ã y ∈V T (q,r) T (q,r) T = 1 − H H1 P {YT = yT}P {ãT = uT|YT = 1yT} × P1 {f(YT T) = ãT|ãT = uT,YT = yT} = ä 1 1 ä 1 1 1 1 ä 1 1 1 1 1 1 y ∈V u ∈Ã T T (q,r) 1 T 1 =1−H P {YT=yT} H I{f(yT)=uT}P {ãT=uT|YT=yT}. (35) ä 1 1 1 1 ä 1 1 1 1 y ∈V u ∈Ã T T (q,r) 1 T 1 Minimizing this expression in f(⋅) and using (34), we obtain the optimal function f(⋅) in the form

f∗(yT) = arg max P {ãT = uT|YT = yT}, (36) 1 ä 1 1 1 1 u ∈Ã T (q,r) 1 which agrees with the statistics (32). Substituting (36) into (35), we get (33). The estimate (32) by the maximum a posteriori probability criterion admits the following equivalent repre- sentation, which is convenient for its evaluation:

ãT∗̂ = arg max P {ãT = uT|YT = yT} = arg max P {ãT = uT,YT = yT}. (37) 1 ä 1 1 1 1 ä 1 1 1 1 u ∈Ã u ∈Ã T (q,r) T (q,r) 1 1 The solution of problem (37) for the (q, r)-block model of embedding by the exhaustive search has a compu- tational complexity O(T(1 + Cr )T/q). Let us nd a polynomial algorithm for solving this problem on the basis q of the classical Viterbi algorithm [14]. 26 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains

We set

s (u , . . . , u ) = max log P {Yt = yt , ã = u , . . . , ã = u }, t t−c t ä 1 1 t t u ,...,u ∈V 1 1 1 c t=−c−max1 {2r + 1, q − 1}.

s (u , . . . , u ) t = 1, . . . , c The initial values of t 1 t with are as follows: s (u ) = log ÿ (u , y ) + log P {ã = u }, 1 1 1 1 1 ä 1 1 s (u , . . . , u ) = s (u , . . . , u ) + log ÿ (ut , yt )+ t 1 t t−1 1 t−1 t 1 1 + log P {ã = u |ã = u , . . . , ã = u }, 2 ≤ t ≤ c; ä t t t−1 t−1 1 1 (38) ÿ (⋅) here, t is the same as in Lemma 4. Theorem 8. Under the (q, r)-block model of embedding (4), q > r, the recurrence relation

s (u , . . . , u ) = t t−c t = max s (u , u , . . . , u ) + log f (ut , yt )+ t−1 t−c−1 t−c t−1 t u ∈V t−2r−1 t−2r−1 t−c−1 + log P {ã = u |ã = u , . . . , ã = u } ä t t t−1 t−1 t−c t−c (39) holds for s (u , . . . , u ) with t > c, where t t−c t 1 , ut ∈ Ã(t), f (ut , yt ) = ¦ 2 1 0 t t−2r−1 t−2r−1 1 (1 + (−1)y +y ùj), ut ∈ Ã(t), 1 ≤ j ≤ 2r + 1. 2 1 j t−j t Proof. In the case q ≤ 2r + 2 we have

s (u , . . . , u ) = max log P {Yt = yt , ã = u , . . . , ã = u } = t t−2r−1 t ä 1 1 t t u ,...,u ∈V 1 1 = max1 t−2rlog−2 P {Yt−1 = yt−1,Y = y , ã = u , . . . , ã = u , ã = u } = ä t t 1 1 t−1 t−1 t t u ,...,u ∈V 1 1 1 t−2r=−2 max log P {Yt−1 = yt−1, ã = u , . . . , ã = u }+ ä 1 1 t−1 t−1 u ,...,u ∈V 1 1 1 t−+2r−2log P {ã = u |ã = u , . . . , ã = u }+ ä t t 1 1 t−1 t−1 + log P {Y = y | Yt−1 = yt−1, ã = u , . . . , ã = u }. ä t t 1 1 1 1 t t The case q > 2r + 2 is dealt with similarly. Combining these cases, we arrive at (39).

Corollary 1. Under the hypotheses of Theorem 8 the estimate ãT̂ = (ã̂ ,..., ã̂ ) of the stego-key by the maximum 1 1 T a posteriori probability criterion is as follows

(ã̂ ,..., ã̂ ) = arg max s (u , . . . , u ), T−c T T T−c T u ,...,u ∈V ã̂ = arg max s (v, ãT̂ −c ,...,T ã̂ ), t = T − c − 1, . . . , 1. t t+c t+1 t+c (40) v∈V

Proof. The estimate ãT̂ = (ã̂ ,..., ã̂ ) of the stego-key is obtained as the reverse execution of the algorithm 1 1 T max s for nding T by (38), (39). u ,...,u ∈V The algorithmT−c of theT estimation of embedding points (the forward run (38), (39), the backward run (40)) has a numerical complexity O(2c + (T − c)22c+2). Having estimate the embedding points ãT by (40), one can construct an estimate î̂ of the message itself: 1

t î̂ = y , where t = min {t : H ã̂ = ó}, ó = 1, . . . , w(ãT̂ ). ó t ó k 1 t∈{1,...,T} k=1 ó Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 27

6 Results of computer experiments

We give the results of three series of computer experiments using simulated data. Series 1. The initial Markov sequence (1) of length T = 104 with the parameter ù = 0.13 was simulated. For q = r = 1, the key Bernoulli sequence was simulated using (3) with various values of the parameter ä ∈ [0, 1], the stego-sequence yT was constructed by (4). Figure 2 depicts the total number of runs statistics B 1 T versus the fraction of embeddings ä. Circles mark the values of the statistics 1 B for the sequence yT thus T−1 T 1 constructed with the corresponding fraction of embeddings ä, the solid line shows the graph for the mean value 1 E {B }. T−1 ä T

B

0.50

0.48

0.46

0.44

0.42 0.0 0.2 0.4 0.6 0.8 δ

Figure 2. B ä The total number of runs statistics T versus the fraction of embeddings .

Series 2. M = 28 As in Series 1, the Monte Carlo method with the number of replications 1 was used to H H construct estimates of powers for the tests (7), (23) under the hypotheses 0,ù, 1 with known cover sequence parameter ù = 0.48; the length T = 213, the signicance level á = 0.05, and the fraction of embeddings ä ∈ {0.005, 0.01, 0.015, 0.02, 0.025, 0.03, 0.04, 0.05, 0.06, 0.07}.

W1

0.8

0.6

0.4

0.2

0.0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 δ

h Figure 3. Powers of the tests X B+, X + versus the fraction of embeddings ä. 1á 1á

Figure 3 shows graphically the powers of the statistical tests (7), (23) versus the fraction of the embed- dings ä. The black solid line depicts the theoretical curve of the test power (7) based on the total number of runs statistics, the grey solid line shows the theoretical curve of the test power (23) based on the projection of short runs statistics, the black circles correspond to estimates of the powers of test (7), the white circles 28 Ë Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains show estimates of the powers of test (23). The 95%-condence intervals for the powers of tests (7) and (23) are shown in grey and black, respectively. It is seen from the graph that test (23) based on the short runs statistics is more powerful than test (7) based on the total number of runs statistics. Numerical experiments show that for small values ù the powers of tests (7) and (23) are practically the same.

λ W1

0.8

0.6

0.4

0.2

α 0.0 0.0 0.1 0.2 0.30 δr/q

Figure 4. Power of the test X ë versus the fraction of embeddings är/q. 1á

Series 3. For the block model of embedding with q = 2, r = 1, the Monte Carlo method was used to nd ë̂ the threshold estimates á by (31) and the power of the statistical test (29) based of the likelihood ratio with the model parameters ù = 0.12; the length T = 218, and the signicance level á = 0.05. The threshold estimate M = 500 was calculated with the number of replications 0 , the estimates of powers were calculated with the M = 250, 100, 200, 150, 100 är/q = ä/2 number of replications ë and the fraction of the actual embedding , which equals 0.10, 0.15, 0.20, 0.25, 0.30, respectively. Figure 4 shows the graph of the power estimates for the test X ë versus the fraction of the actual embedding ä/2. 1á Computer experiments demonstrate the eciency of the statistical test thus constructed for the embed- ding detection and the agreement between theoretical and experimental results. In conclusion, we note that embeddings may also be detected using small-parametric models of high- order Markov chains [15].

Acknowledgment: The authors are grateful to A. M. Zubkov for suggesting to study the runs statistics for the embedding detection and to the referees for comments and advices.

References

[1] Ponomarev K. I., “A parametric model of embedding and its statistical analysis”, Discrete Math. Appl., 19:6 (2009), 587–596. [2] Ponomarev K. I., “On one statistical model of steganography”, Discrete Math. Appl., 19:3 (2009), 329–336. [3] Ker A., “A capacity result for batch steganography”, IEEE Signal Process. Lett., 14:8 (2007), 525–528. [4] Shoytov A.M., “On the fact of detecting the noise in nite Markov chain with an unknown transition probability matrix”, Prikl. Diskr. Mat., 2010, № Supplement № 3, 44–45 (in Russian). [5] Kharin Yu. S., Vecherko E. V., “Statistical estimation of parameters for binary Markov chain models with embeddings”, Dis- crete Math. Appl., 23:2 (2013), 153–169. [6] Zubkov A. M., “Pseudorandom number generatgors and its applications”, Proc. II Int. Sci. Conf. “Mathematics and security of information technologies”, 2003, 200–206. [7] Kemeny J. G., Snell J. L., Finite Markov chains, Van Nostrand, 1960. [8] Ivanov V.A., “Models of inclusions into homogeneous random sequences”, Tr. Diskr. Mat., 10 (2008), 18–34 (in Russian). Yuriy S. Kharin and Egor V. Vecherko, Detection of embeddings in binary Markov chains Ë 29

[9] A statistical test suite for random and pseudorandom number generators for cryptographic applicatiions: NIST Special Publi- cation 800-22 Rev. 1a., Nat. Inst. Stand. Technol., 2010. [10] Doukhan P., Mixing: properties and examples, Springer-Verlag, 1994. [11] Kharin Yu. S., Voloshko V. A., “Robust estimation of AR coecients under simultaneously inuencing outliers and missing values”, J. Statist. Plan. Infer., 141:9 (2011), 3276-3288. [12] Ivchenko G. I., Medvedev Yu. I., Mathematical statistics, Vysshaya shkola, Moscow, 1984 (in Russian). [13] Wald A., “Tests of statistical hypotheses concerning several parameters when the number of observations is large”, Trans. Amer. Math. Soc., 54:3 (1943), 426–482. [14] Rabiner L. R., “A tutorial on hidden Markov models and selected applications in speech recognition”, Proc. IEEE, 77:2 (1989), 257–286. [15] Kharin Yu. S., Petlitskiy A. I., “A Markov chain of order s with r partial connections and statistical inference on its parame- ters”, Discrete Math. Appl., 17:3 (2007), 295–317.