An Efficient Bayes Coding Algorithm for the Non-Stationary Source In

An Efficient Bayes Coding Algorithm for the Non-Stationary Source in Which Context Tree Model Varies from Interval to Interval Koshi Shimada Shota Saito Toshiyasu Matsushima Department of Pure and Applied Mathematics Faculty of Informatics Department of Pure and Applied Mathematics Waseda University Gunma University Waseda University Tokyo, Japan Gunma, Japan Tokyo, Japan [email protected] [email protected] [email protected] Abstract—The context tree source is a source model in which context tree models, which reduces the computational com- the occurrence probability of symbols is determined from a finite plexity from the exponential order to the polynomial order. past sequence, and is a broader class of sources that includes i.i.d. Now, a source model for source coding should be able to and Markov sources. The proposed source model in this paper represents that a subsequence in each interval is generated from describe in a concise mathematical manner, but also should a different context tree model. The Bayes code for such sources be better reflects the probability structure of the real data requires weighting of the posterior probability distributions for sequence to be compressed. For example, the context tree the change patterns of the context tree source and all possible source includes i.i.d. source and Markov source inherent in context tree models. Therefore, the challenge is how to reduce itself. It is a broader class of sources, and has been applied to this exponential order computational complexity. In this paper, we assume a special class of prior probability distribution of text data, for example. change patterns and context tree models, and propose an efficient On the other hand, there are cases where it is appropriate to Bayes coding algorithm whose computational complexity is the think of symbols as being generated according to a different polynomial order. context tree source for each interval, rather than modeling the entire data series as being generated according to a single I. INTRODUCTION context tree source. For example, in the case of the human genome, the DNA sequence consists of about 3 billion base The arithmetic codes asymptotically achieve the minimal pairs, and it is described as a pair of series of about 30 expected length for lossless source coding. The problem with billion in length with four different alphabets: A, G, T, and this method is that it cannot be used unless the probabilistic C. Although Markov source is sometimes assumed in DNA structure of the source is known in advance. Therefore, univer- sequence compression algorithms [6], it is known that there sal codes, which can be used when the probability distribution are genetic and non-genetic regions in the human genome, of the source is unknown, have been studied. which have different structural characteristics. Therefore, in The context tree source is one of the major source models this paper, we present a non-stationary source that context tree for universal coding, and the CTW (Context Tree Weighting) source changes from interval to interval. [1] is known as an efficient universal code for context tree An example of a non-stationary source where the source sources. The CTW method can be interpreted as a special changes from interval to interval is an i.p.i.d. (independently arXiv:2105.05163v2 [cs.IT] 13 May 2021 case of the Bayes code proposed by Matsushima and Hirasawa piecewise identically distributed) source [4], [5]. An i.p.i.d. [2]. The CTW method encodes the entire source sequence at source is a source consisting of an i.i.d. sequences of parame- once, which is not an efficient use of memory and causes ters that are different for each interval. It can be regarded as a underflow problem in calculation, whereas the Bayes code of special case of the proposed source in this paper. An efficient Matsushima and Hirasawa [2] can be encoded sequentially Bayes code for i.p.i.d. sources has already been proposed by and is free from these problems. It is known that the Bayes Suso et al. [8]. code has equal codeword length when encoded sequentially Assuming the source model that symbols are generated by and when the entire sequence is encoded at once [3]. different context tree sources in each interval, we present an However, in the Bayes code, the computational complexity efficient Bayes coding algorithm for it. In this algorithm, we of weighting by the posterior probability of the context tree use the prior probability of context tree models by Matsushima models increases exponentially according to the maximum and Hirasawa [2] and that of parameter change patterns by depth of the context tree models. Matsushima and Hirasawa [2] Suko et al. [8]. The proposed algorithm achieves a reduction developed the efficient Bayes coding algorithm by assuming in computational complexity from the exponential order to the an appropriate class of prior probability distributions for the polynomial order. II. NON-STATIONARY SOURCE THAT CONTEXT TREE where MODEL CHANGES FROM INTERVAL TO INTERVAL def T θs = θ0js; θ1js; : : : ; θjX |−1js ; (4) X In this section, we present a non-stationary source that θ = 1; θ 2 (0; 1) for each symbol a. (5) context tree model changes from interval to interval. The ajs ajs a2X symbols are generated from different context tree models θ a 2 X depending on the interval, as shown in Figure 1. Note that ajs is the occurrence probability of under the state corresponding to node s, where X denotes a source alphabet. Fig. 1. Diagram of a non-stationary source with a context tree model that changes from interval to interval Fig. 2. Example of occurrence probability based on context tree model Now we define the change pattern of context tree models as follows. In the case where the change pattern c is specified in (c) advance, m is abbreviated as mt . Definition 1. The change pattern c is defined to indicate when tj j Furthermore, the parameters for the change pattern c are the context tree model has changed. That is, defined as follows. def (c) (c) (c) def N c c = w1 ; : : : ; wt ; : : : ; wN 2 CN = f0; 1g ; (1) Definition 4. The parameter Θ for the change pattern c is ( defined as follows. (c) def 1 if the context tree model changes at time t; (c) (c) (c) (c) wt = c def n m o m m mt t t0 t1 jTc|−1 0 otherwise: Θ = θ t 2 Tc = θ ; θ ;:::; θ : (C) (6) Now, for convenience, let w1 = 1. The length N of a t−1 source sequence is fixed. The set of all change patterns CN From Definition 3, using the state Sm(x ) correspond- is abbreviated as C from now on. Next, we define the set of ing to the source sequence (past context) xt−1 in a certain points in the context tree model where changes occur. context tree model m, the probability of occurrence at time t 2 [tj; tj+1) is expressed as follows: Definition 2. Let Tc denote the set of points at which the (c) m (c) t−1 m parameter changes in the change pattern c. That is, t−1 tj tj p xt x ; θ ; m ; c = p xt x ; θ ; mtj tj tj def n (c) o n (c) (c) (c) o Tc = t wt = 1 = t ; t ; : : : ; t : (2) = θ t−1 : (7) 0 1 jTc|−1 xtjSm (x ) tj tj (c) N Therefore, the probability distribution of X = X1 ··· XN where tj is the j-th changing point in the change pattern c. in the change pattern c 2 C is expressed by the following In other words, there are jTcj − 1 parameter changes in c. equation. For convenience, let t(c) = 1, t(c) = N + 1. If the change 0 jTcj jTc|−1 (c) N c Y tj+1−1 mt pattern c is specified in advance, tj is abbreviated as tj. p x Θ ; c = p x θ j ; mt tj j From the j-th changing point tj to the j + 1-th changing j=0 point tj+1, symbols are generated according to a single context jTc|−1 tj+1−1 (c) (c) m (c) Y Y t−1 mt tj j tree model m . The parameter θ for this m is defined = p xt x ; θ ; mtj tj tj tj as in Definition 3 and an example is shown in Figure 2. j=0 t=tj jTc|−1 tj+1−1 Definition 3. In the change pattern c, for the context tree Y Y = θ t−1 : (8) (c) xtjSm (x ) model m in the interval [tj; tj+1), we denote the set of its tj tj tj j=0 t=tj (c) (c) m (c) leaf nodes as L . The parameter θ tj for m is defined (c) mtj tj Regarding a change pattern c, a context tree model mtj , and as follows: (c) m (c) tj (c) the parameter θ for m , we assume prior probability dis- def n o tj mt jX j (c) (c) θ j = θs 2 (0; 1) s 2 L ; (3) (c) m (c) mtj tj tributions π(c), P (mtj jc), and w(θ jmtj ; c) respectively. III. BAYES CODE FOR THE PROPOSED SOURCE On the other hand, we have p(xN ) In this section, we present the coding probability of the Z X X (c) (c) (c) Bayes code for the proposed source. = p(θm ; m(c); c)p(xN jθm ; m(c); c)dθm c2C m(c)2M Theorem 1. The coding probability of the sequential Bayes N P P R m(c) (c) t m(c) (c) m(c) Y (c) p(θ ;m ;c)p(x jθ ;m ;c)dθ code for the proposed source is = c m : P P R p(θm(c);m(c);c)p(xt−1jθm(c);m(c);c)dθm(c) t=1 c m(c) 2 (13) ∗ t−1 X t−1 X (c) t−1 AP (xtjx ) = π(cjx ) 4 P (m jx ; c) ∗ t−1 Hence, AP (xtjx ) given as follows minimizes the Bayes (c) c2C m 2M risk.

Load more