508 11. Wiener Filtering

noisy random walk 8 observations y n 12 signal x 6 n prediction x n/n−1 4 Linear Prediction

2

0

−2

−4 0 50 100 150 200 250 300 time samples, n 12.1 Pure Prediction and Signal Modeling

In Sec. 1.17, we discussed the connection between linear prediction and signal modeling. Here, we rederive the same results by considering the linear prediction problem as a special case of the Wiener filtering problem, given by Eq. (11.4.6). Our aim is to cast the results in a form that will suggest a practical way to solve the prediction problem and hence also the modeling problem. Consider a stationary signal yn having a signal model = 2 −1 Syy(z) σ B(z)B(z ) (12.1.1)

as guaranteed by the spectral factorization theorem. Let Ryy(k) denote the autocorre- lation of yn : Ryy(k)= E[yn+kyn]

The linear prediction problem is to predict the current value yn on the basis of all the past values Yn−1 ={yi , −∞

of estimating yn from the related signal y1(n). The optimal estimation filter H(z) is given by Eq. (11.4.6), where we must identify xn and yn with yn and y1(n) of the present −1 notation. Using the filtering equation Y1(z)= z Y(z), we find that yn and y1(n) have the same spectral factor B(z)

= −1 = = 2 −1 Sy1y1 (z) (z )(z)Syy(z) Syy(z) σ B(z)B(z )

and also that = = 2 −1 Syy1 (z) Syy(z)z zσ B(z)B(z ) Inserting these into Eq. (11.4.6), we find for the optimal filter H(z) 2 −1 1 Syy (z) 1 zσ B(z)B(z ) H(z)= 1 =  , or, 2 −1 2 −1 σ B(z) B(z ) + σ B(z) B(z ) +

1 H(z)= zB(z) (12.1.2) B(z) + 510 12. Linear Prediction 12.1. Pure Prediction and Signal Modeling 511

The causal instruction can be removed as follows: Noting that B(z) is a causal and The prediction error stable filter, we may expand it in the power series en/n−1 = yn − yˆn/n−1

−1 −2 −3 is identical to the whitening sequence  driving the signal model (12.1.1) of y , indeed, B(z)= 1 + b1z + b2z + b3z +··· n n = − ˆ = − = − −1 The causal part of zB(z) is then E(z) Y(z) Y(z) Y(z) H(z)Y1(z) Y(z) H(z)z Y(z) −1 −2 −1 −2 −1 1 zB(z) + = [z + b1 + b2z + b3z +··· ]+= b1 + b2z + b3z +··· = 1 − z H(z) Y(z)= Y(z)= (z) B(z) −1 −2 −3 = z b1z + b2z + b3z +··· = z B(z)−1 Thus, in accordance with the results of Sec. 1.13 and Sec. 1.17 The prediction filter H(z) then becomes en/n−1 = yn − yˆn/n−1 = n (12.1.4) 1 1 H(z)= z B(z)−1 = z 1 − (12.1.3) B(z) B(z)

The input to this filter is y1(n) and the output is the prediction yˆn/n−1.

Example 12.1.1: Suppose that yn is generated by driving the all-pole filter

yn = 0.9yn−1 − 0.2yn−2 + n

by zero-mean white noise n. Find the best predictor yˆn/n−1. The signal model in this case −1 −2 is B(z)= 1/(1 − 0.9z + 0.2z ) and Eq. (12.1.3) gives Fig. 12.1.1 Linear Predictor. 1 z−1H(z)= 1 − = 1 − (1 − 0.9z−1 + 0.2z−2)= 0.9z−1 − 0.2z−2 B(z) An overall realization of the linear predictor is shown in Fig. 12.1.1. The indicated dividing line separates the linear predictor into the Wiener filtering part and the input The I/O equation for the prediction filter is obtained by part which provides the proper input signals to the Wiener part. The transfer function −1 −1 −2 from yn to en/n−1 is the whitening inverse filter Y(z)ˆ = H(z)Y1(z)= z H(z)Y(z)= 0.9z − 0.2z Y(z)

1 −1 and in the time domain A(z)= = 1 − z H(z) B(z) yˆn/n−1 = 0.9yn−1 − 0.2yn−2 which is stable and causal by the minimum-phase property of the spectral factorization Example 12.1.2: Suppose that (12.1.1). In the z-domain we have (1 − 0.25z−2)(1 − 0.25z2) Syy(z)= (1 − 0.8z−1)(1 − 0.8z) E(z)= (z)= A(z)Y(z)

ˆ Determine the best predictor yn/n−1. Here, the minimum phase factor is and in the time domain −2 1 − 0.25z ∞ B(z)= 1 − 0.8z−1 en/n−1 = n = amyn−m = yn + a1yn−1 + a2yn−2 +··· m=0 and therefore the prediction filter is

− − − The predicted estimate yˆ − = y − e − is 1 1 − 0.8z 1 0.8z 1 − 0.25z 2 n/n 1 n n/n 1 z−1H(z)= 1 − = 1 − = B(z) 1 − 0.25z−2 1 − .25z−2 yˆn/n−1 =− a1yn−1 + a2yn−2 +··· The I/O equation of this filter is conveniently given recursively by the difference equation These results are identical to Eqs. (1.17.2) and (1.17.3). The relationship noted above yˆn/n−1 = 0.25yˆn−2/n−3 + 0.8yn−1 − 0.25yn−2  between linear prediction and signal modeling can also be understood in terms of the 512 12. Linear Prediction 12.2. Autoregressive Models 513 gapped-function approach of Sec. 11.7. Rewriting Eq. (12.1.1) in terms of the prediction- These equations are known as the normal equations of linear prediction [915–928]. error filter A(z) we have They provide the solution to both the signal modeling and the linear prediction prob- 2 σ lems. They determine the model parameters {a ,a ,... ; σ2} of the signal y directly in S (z)=  (12.1.5) 1 2  n yy −1 A(z)A(z ) terms of the experimentally accessible quantities Ryy(k). To render them computation- from which we obtain ally manageable, the infinite matrix equation (12.1.11) must be reduced to a finite one, σ2 A(z)S (z)=  (12.1.6) and furthermore, the quantities Ryy(k) must be estimated from actual data samples of yy −1 A(z ) yn. We discuss these matters next. Since we have the filtering equation (z)= A(z)Y(z), it follows that = Sy(z) A(z)Syy(z) 12.2 Autoregressive Models and in the time domain { } ∞ In general, the number of prediction coefficients a1,a2,... is infinite since the pre- Rey(k)= E[nyn−k]= aiRyy(k − i) (12.1.7) dictor is based on the infinite past. However, there is an important exception to this; i=0 namely, when the process yn is autoregressive. In this case, the signal model B(z) is an all-pole filter of the type which is recognized as the gapped function (11.7.1). By construction, n is the orthog- onal complement of y with respect to the entire past subspace Y − ={y − ,k= n n 1 n k 1 1 } = B(z)= = (12.2.1) 1, 2,... , therefore, n will be orthogonal to each yn−k for k 1, 2,.... These are pre- −1 −2 −p A(z) 1 + a1z + a2z +···+apz cisely the gap conditions. Because the prediction is based on the entire past, the gapped function develops an infinite right-hand side gap. Thus, Eq. (12.1.7) implies which implies that the prediction filter is a polynomial ∞ = + −1 + −2 +···+ −p Rey(k)= E[nyn−k]= aiRyy(k − i)= 0 , for all k = 1, 2,... (12.1.8) A(z) 1 a1z a2z apz (12.2.2) i=0 The same result, of course, also follows from the z-domain equation (12.1.6). Both The signal generator for yn is the following difference equation, driven by the un- sides of the equation are stable, but since A(z) is minimum-phase, A(z−1) will be correlated sequence n : maximum phase, and therefore it will have a stable but anticausal inverse 1/A(z−1). y + a y − + a y − +···+a y − =  (12.2.3) Thus, the right-hand side of Eq. (12.1.6) has no strictly causal part. Equating to zero all n 1 n 1 2 n 2 p n p n the coefficients of positive powers of z−1 results in Eq. (12.1.8). and the optimal prediction of yn is simply given by: The value of the gapped function at k = 0 is equal to σ2. Indeed, using the gap  conditions (12.1.8) we find yˆ − =− a y − + a y − +···+a y − (12.2.4) n/n 1 1 n 1 2 n 2 p n p 2 = 2 = + + +··· σ E[n] E n(yn a1yn−1 a2yn−2 ) In this case, the best prediction of yn depends only on the past p samples {yn−1, = Ry(0)+a1Ry(1)+a2Ry(2)+···=Ry(0)= E[nyn] yn−2,...,yn−p}. The infinite set of equations (12.1.10) or (12.1.11) are still satisfied even though only the first p + 1 coefficients {1,a1,a2,...,ap} are nonzero. Using Eq. (12.1.7) with k = 0 and the symmetry property Ryy(i)= Ryy(−i),wefind The (p + 1)×(p + 1) portion of Eq. (12.1.11) is sufficient to determine the (p + 1) 2 = 2 = = + + +··· { 2} σ E[n] E[nyn] Ryy(0) a1Ryy(1) a2Ryy(2) (12.1.9) model parameters a1,a2,...,ap; σ : ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Equations (12.1.8) and (12.1.9) may be combined into one: ··· 2 Ryy(0)Ryy(1)Ryy(2) Ryy(p) 1 σ ∞ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ R (1)R(0)R(1) ··· R (p − 1) ⎥ ⎢ a ⎥ ⎢ 0 ⎥ − = 2 ≥ ⎢ yy yy yy yy ⎥ ⎢ 1 ⎥ ⎢ ⎥ aiRyy(k i) σ δ(k) , for all k 0 (12.1.10) ⎢ ··· − ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ Ryy(2)Ryy(1)Ryy(0) Ryy(p 2) ⎥ ⎢ a2 ⎥ = ⎢ 0 ⎥ i=0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (12.2.5) ⎢ ...... ⎥ ⎢ . ⎥ ⎢ . ⎥ which can be cast in the matrix form: ⎣ . . . . . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R (p) R (p − 1)R(p − 2) ··· R (0) a 0 R (0)R(1)R(2)R(3) ··· 1 σ2 yy yy yy yy p ⎢ yy yy yy yy ⎥ ⎢ ⎥ ⎢  ⎥ ⎢ ···⎥ ⎢ ⎥ ⎢ ⎥ ⎢ Ryy(1)Ryy(0)Ryy(1)Ryy(2) ⎥ ⎢ a1 ⎥ ⎢ 0 ⎥ Such equations may be solved efficiently by Levinson’s algorithm, which requires ⎢ ···⎥ ⎢ ⎥ ⎢ ⎥ ⎢ Ryy(2)Ryy(1)Ryy(0)Ryy(1) ⎥ ⎢ a2 ⎥ = ⎢ 0 ⎥ 2 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (12.1.11) O(p ) operations and O(p) storage locations to obtain the ais instead of O(p ) and ⎢ R (3)R(2)R(1)R(0) ···⎥ ⎢ a ⎥ ⎢ 0 ⎥ 2 ⎣ yy yy yy yy ⎦ ⎣ 3 ⎦ ⎣ ⎦ O(p ), respectively, that would be required if the inverse of the matrix ...... 514 12. Linear Prediction 12.3. Linear Prediction and the Levinson Recursion 515

{ 2} Ryy were to be computed. The finite set of model parameters a1,a2,...,ap; σ de- jω termines the signal model of yn completely. Setting z = e into Eq. (12.1.5)) we find a simple parametric representation of the power spectrum of the AR signal yn

2 2 σ σ Syy(ω)=   =   (12.2.6)  2  −jω −2jω −jωp2 A(ω) 1 + a1e + a2e +···+ape

In practice, the normal equations (12.2.5) provide a means of determining approx- { 2} Fig. 12.2.1 Yule-Walker Analysis Algorithm. imate estimates for the model parameters a1,a2,...,ap; σ . Typically, a block of length N of recorded data is available

spanned by the entire past {yn−i , 1 ≤ i<∞} is the same as the projection of yn onto y0,y1,y2,...,yN−1 the subspace spanned only by the past p samples; namely, {yn−i , 1 ≤ i ≤ p}. This is a consequence of the difference equation (12.2.3) generating y . There are many different methods of extracting reasonable estimates of the model n If the process y is not autoregressive, these two projections will be different. For parameters using this block of data. We mention: (1) the autocorrelation or Yule-Walker n any given p, the projection of y onto the past p samples will still provide the best linear method, (2) the covariance method, and (3) Burg’s method. There are also some varia- n prediction of y that can be made on the basis of these p samples. As p increases, more tions of these methods. The first method, the Yule-Walker method, is perhaps the most n and more past information is taken into account, and we expect the prediction of y obvious and straightforward one. In the normal equations (12.2.5), one simply replaces n to become better and better in the sense of yielding a smaller mean-square prediction the ensemble R (k) by the corresponding sample autocorrelations yy error. computed from the given block of data; that is, In this section, we consider the finite-past prediction problem and discuss its effi- N−1−k cient solution via the Levinson recursion [915–928]. For sufficiently large values of p,it 1 Rˆ (k)= y + y , for 0 ≤ k ≤ p (12.2.7) may be considered to be an adequate approximation to the full prediction problem and yy N n k n n=0 hence also to the modeling problem. = where only the first p + 1 lags are needed in Eq. (12.2.5). We must have, of course, Consider a stationary time series yn with autocorrelation function R(k) E[yn+kyn]. p ≤ N − 1. As discussed in Sec. 1.13, the resulting estimates of the model parameters For any given p, we seek the best linear predictor of the form { 2} aˆ1, aˆ2,...,aˆp; σˆ may be used now in a number of ways; examples include obtaining yˆn =− a1yn−1 + a2yn−2 +···+apyn−p (12.3.1) an estimate of the power spectrum of the sequence yn { } ˆ 2 ˆ 2 The p prediction coefficients a1,a2,...,ap are chosen to minimize the mean- ˆ σ σ Syy(ω)=   =   square prediction error  ˆ 2  + −jω + −2jω +···+ −jωp2 A(ω) 1 aˆ1e aˆ2e aˆpe E = 2 E[en] (12.3.2) or, representing the block of N samples yn in terms of a few (i.e., p+1) filter parameters. where en is the prediction error 2 To synthesize the original samples one would generate white noise n of variance σˆ and send it through the generator filter whose coefficients are the estimated values; that en = yn − yˆn = yn + a1yn−1 + a2yn−2 +···+apyn−p (12.3.3) is, the filter 1 1 = B(z)ˆ = = Differentiating Eq. (12.3.2) with respect to each coefficient ai, i 1, 2,...,p, yields −1 −2 −p A(z)ˆ 1 + aˆ1z + aˆ2z +···+aˆpz the orthogonality equations The Yule-Walker analysis procedure, also referred to as the autocorrelation method = = of linear prediction [917], is summarized in Fig. 12.2.1. E[enyn−i] 0 , for i 1, 2,...,p (12.3.4)

which express the fact that the optimal predictor yˆn is the projection onto the span of { = } 12.3 Linear Prediction and the Levinson Recursion the past p samples; that is, yn−i ,i 1, 2,...,p . Inserting the expression (12.3.3) for en into Eq. (12.3.4), we obtain p linear equations for the coefficients In the last section, we saw that if the signal being predicted is autoregressive of order p p p, then the optimal linear predictor collapses to a pth order predictor. The infinite di- ajE[yn−jyn−i]= R(i − j)aj = 0 , for i = 1, 2,...,p (12.3.5) mensional Wiener filtering problem collapses to a finite dimensional one. A geometrical j=0 j=0 way to understand this property is to say that the projection of yn on the subspace 516 12. Linear Prediction 12.3. Linear Prediction and the Levinson Recursion 517

Using the conditions (12.3.4) we also find for the minimized value of Our objective is to construct the latter in terms of the former. We will use the ap- proach of Robinson and Treitel, based on gapped functions [925]. Suppose that the best p 2 2 predictor of order p, [1,ap1,ap2,...,app], has already been constructed. The corre- σ = E = E[e ]= E[enyn]= R(j)aj (12.3.6) e n sponding gapped function is j=0 ⎡⎛ ⎞ ⎤ p p Equations (12.3.5) and (12.3.6) can be combined into the (p+1)×(p+1) matrix equation ⎣⎝ ⎠ ⎦ gp(k)= E[ep(n)yn−k]= E apiyn−i yn−k = apiR(k − i) (12.3.8) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ i=0 i=0 R(0)R(1)R(2) ··· R(p) 1 σ2 ⎢ ⎥ ⎢ ⎥ ⎢ e ⎥ ⎢ ··· − ⎥ ⎢ ⎥ ⎢ ⎥ It has a gap of length p as shown , that is, ⎢ R(1)R(0)R(1) R(p 1) ⎥ ⎢ a1 ⎥ ⎢ 0 ⎥ ⎢ ··· − ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ R(2)R(1)R(0) R(p 2) ⎥ ⎢ a2 ⎥ = ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (12.3.7) ⎢ . . . . . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ ...... ⎦ ⎣ . ⎦ ⎣ . ⎦ R(p) R(p − 1) R(p − 2) ··· R(0) ap 0 gp(k)= 0 , for 1 ≤ k ≤ p which is identical to Eq. (12.2.5) for the autoregressive case. It is also the truncated version of the infinite matrix equation (12.1.11) for the full prediction problem. Instead of solving the normal equations (12.3.7) directly, we would like to embed this problem into a whole class of similar problems; namely, those of determining the These gap conditions are the same as the orthogonality equations (12.3.4). Using best linear predictors of orders p = 1, p = 2, p = 3,..., and so on. This approach will gp(k) we now construct a new gapped function gp+1(k) of gap p + 1. To do this, first lead to Levinson’s algorithm and to the so-called lattice realizations of linear prediction we reflect gp(k) about the origin; that is, gp(k)→ gp(−k). The reflected function has filters. Pictorially this class of problems is illustrated below a gap of length p but at negatives times. A delay of (p + 1) time units will realign this gap with the original gap. This follows because if 1 ≤ k ≤ p, then 1 ≤ p + 1 − k ≤ p. The reflected-delayed function will be gp(p + 1 − k). These operations are shown in the following figure

Since both gp(k) and gp(p + 1 − k) have exactly the same gap, it follows that so will any linear combination of them. Therefore, where [1,a11], [1,a21,a22], [1,a31,a32,a33],..., represent the best predictors of or- = − + − ders p = 1, 2, 3,...,respectively. It was necessary to attach an extra index indicating the gp+1(k) gp(k) γp+1gp(p 1 k) (12.3.9) order of the predictor. Levinson’s algorithm is an iterative procedure that constructs will have a gap of length at least p. We now select the parameter γp+1 so that gp+1(k) the next predictor from the previous one. In the process, all optimal predictors of lower acquires an extra gap point; its gap is now of length p + 1. The extra gap condition is orders are also computed. Consider the predictors of orders p and p + 1, below

gp+1(p + 1)= gp(p + 1)−γp+1gp(0)= 0 yn−p−1 yn−p ··· yn−2 yn−1 yn

app ··· ap2 ap1 1 which may be solved for gp(p + 1) ap+1,p+1 ap+1,p ··· ap+1,2 ap+1,1 1 γp+1 = gp(0) Evaluating Eq. (12.3.8) at k = p + 1, and using the fact that the value of the gapped ep(n)= yn + ap1yn−1 + ap2yn−2 +···+appyn−p function at k = 0 is the minimized value of the mean-squared error, that is, ep+1(n)= yn + ap+1,1yn−1 + ap+1,2yn−2 +···+ap+1,p+1yn−p−1 = 2 = = Ep E ep(n) E[ep(n)yn] gp(0) (12.3.10) 518 12. Linear Prediction 12.3. Linear Prediction and the Levinson Recursion 519 we finally find Taking inverse z-transforms, we find Δp γ + = (12.3.11) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ p 1 E 1 1 0 p ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ap+1,1 ⎥ ⎢ ap1 ⎥ ⎢ app ⎥ where we set Δp = gp(p + 1) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a + ⎥ ⎢ a ⎥ ⎢ a − ⎥ ⎡ ⎤ ⎢ p 1,2 ⎥ ⎢ p2 ⎥ ⎢ p,p 1 ⎥ ⎢ ⎥ = ⎢ ⎥ − γ + ⎢ ⎥ (12.3.15) 1 ⎢ . ⎥ ⎢ . ⎥ p 1 ⎢ . ⎥ ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ a ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ p ⎢ p1 ⎥ ⎣ a + ⎦ ⎣ a ⎦ ⎣ a ⎦ ⎢ ⎥ p 1,p pp p1 = + − = + − ⎢ ap2 ⎥ Δp apiR(p 1 i) R(p 1), R(p), R(p 1),...,R(1) ⎢ ⎥ (12.3.12) ap+1,p+1 0 1 = ⎢ . ⎥ i 0 ⎣ . ⎦ which can also be written as app

ap+1,i = api − γp+1ap,p+1−i , for 1 ≤ i ≤ p The coefficients γp+1 are called reflection, PARCOR,orSchur coefficients. This ter- minology will become clear later. Evaluating Eq. (12.3.9) at k = 0 and using gp(p + 1)= ap+1,p+1 =−γp+1 γp+1gp(0), we also find a recursion for the quantity Ep+1 = gp+1(0) R = −p −1 Introducing the reverse polynomial Ap (z) z Ap(z ), we may write Eq. (12.3.14) as Ep+1 = gp+1(0)= gp(0)−γp+1gp(p + 1)= gp(0)−γp+1 · γp+1gp(0), or, −1 R Ap+1(z)= Ap(z)−γp+1z A (z) (12.3.16) = − 2 p Ep+1 (1 γp+1)Ep (12.3.13) 2 Taking the reverse of both sides, we find This represents the minimum value of the mean-square prediction error E ep+1(n) for the predictor of order p + 1. Since both E and E + are nonnegative, it follows p p 1 −1 = −1 − p+1 − 2 Ap+1(z ) Ap(z ) γp+1z Ap(z) that the factor (1 γp+1) will be nonnegative and less than one. It represents the improvement in the prediction obtained by using a predictor of order p + 1 instead of a R = −(p+1) −1 = −(p+1) −1 − Ap+1(z) z Ap+1(z ) z Ap(z ) γp+1Ap(z) , or, predictor of order p. It also follows that γp+1 has magnitude less than one, |γp+1|≤1. R = −1 R − To find the new prediction coefficients, ap+1,i, we use the fact that the gapped func- Ap+1(z) z Ap (z) γp+1Ap(z) (12.3.17) tions are equal to the convolution of the corresponding prediction-error filters with the Equation (12.3.17) is, in a sense, redundant, but it will prove convenient to think autocorrelation function of yn : of the Levinson recursion as a recursion on both the forward, Ap(z), and the reverse, R × p Ap (z), polynomials. Equations (12.3.16) and Eq. (12.3.17) may be combined into a 2 2 gp(k)= apiR(k − i) ⇒ Gp(z)= Ap(z)Syy(z) matrix recursion equation, referred to as the forward Levinson recursion: i=0 p+1 −1 Ap+1(z) 1 −γp+1z Ap(z) = (forward recursion) (12.3.18) g + (k)= a + R(k − i) ⇒ G + (z)= A + (z)S (z) R − −1 R p 1 p 1,i p 1 p 1 yy Ap+1(z) γp+1 z Ap (z) i=0 The recursion is initialized at p = 0 by setting where Syy(z) represents the power spectral density of yn. Taking z-transforms of both sides of Eq. (12.3.9), we find = R = = = 2 A0(z) A0 (z) 1 and E0 R(0) E[yn] (12.3.19) −(p+1) −1 G + (z)= G (z)−γ + z G (z ), or, p 1 p p 1 p which corresponds to no prediction at all. We summarize the computational steps of −(p+1) −1 −1 the Levinson algorithm: Ap+1(z)Syy(z)= Ap(z)Syy(z)−γp+1z Ap(z )Syy(z ) where we used the fact that the reflected gapped function gp(−k) has z-transform 1. Initialize at p = 0 using Eq. (12.3.19). G (z−1), and therefore the delayed (by p + 1) as well as reflected gapped function p 2. At stage p, the filter Ap(z) and error Ep are available. −(p+1) −1 −1 gp(p + 1 − k) has z-transform z Gp(z ). Since Syy(z)= Syy(z ) because of the 3. Using Eq. (12.3.11), compute γp+1. symmetry relations R(k)= R(−k), it follows that Syy(z) is a common factor in all the 4. Using Eq. (12.3.14) or Eq. (12.3.18), determine the new polynomial A + (z). terms. Therefore, we obtain a relationship between the new best prediction-error filter p 1 5. Using Eq. (12.3.13), update the mean-square prediction error to E + . Ap+1(z)and the old one Ap(z) p 1 6. Go to stage p + 1. −(p+1) −1 Ap+1(z)= Ap(z)−γp+1z Ap(z ) (Levinson recursion) (12.3.14) 520 12. Linear Prediction 12.3. Linear Prediction and the Levinson Recursion 521

The iteration may be continued until the final desired order is reached. The depen- The function bkwlev is the inverse operation to frwlev. Its input is the vector of dence on the autocorrelation R(k) of the signal yn is entered through Eq. (12.3.11) and prediction-error filter coefficients [1,aM1,aM2,...,aMM] of the final order M, and its E0 = R(0). To reach stage p, only the p+1 autocorrelation lags {R(0), R(1),...,R(p)} output is the matrix L containing all the lower order prediction-error filters. The set of are required. At the pth stage, the iteration already has provided all the prediction fil- reflection coefficients are extracted from the first column of L. This function is based ters of lower order, and all the previous reflection coefficients. Thus, an alternative on the inverse of the matrix equation (12.3.18). Shifting p down by one unit, we write parametrization of the pth order predictor is in terms of the sequence of reflection Eq. (12.3.18) as −1 { } A (z) 1 −γ z A − (z) coefficients γ1,γ2,...,γp and the prediction error Ep p = p p 1 R − −1 R (12.3.22) Ap (z) γp z Ap−1(z) { }  { } Ep,ap1,ap2,...,app Ep,γ1,γ2,...,γp Its inverse is A − (z) 1 1 γ A (z) One may pass from one parameter set to another. And both sets are equivalent p 1 = p p R 2 R (backward recursion) (12.3.23) to the autocorrelation set {R(0), R(1),...,R(p)}. The alternative parametrization of Ap−1(z) 1 − γp γpzz Ap (z) the autocorrelation function R(k) of a stationary random sequence in terms of the At each stage p, start with Ap(z) and extract γp =−app from the highest coefficient equivalent set of reflection coefficients is a general result [929,930], and has also been of Ap(z). Then, use Eq. (12.3.23) to obtain the polynomial Ap−1(z). The iteration begins extended to the multichannel case [931]. at the given order M and proceeds downwards to p = M − 1,M− 2,...,1, 0. If the process yn is autoregressive of order p, then as soon as the Levinson recursion The function rlev generates the set of autocorrelation lags {R(0), R(1), ..., R(M)} reaches this order, it will provide the autoregressive coefficients {a1,a2,...,ap} which from the knowledge of the final prediction-error filter AM(z) and final prediction error are also the best prediction coefficients for the full (i.e., based on the infinite past) EM. It calls bkwlev to generate all the lower order prediction-error filters, and then prediction problem. Further continuation of the Levinson recursion will produce nothing = it reconstructs the autocorrelation lags using the gapped function condition gp(p) p new—all prediction coefficients (and all reflection coefficients) of order higher than p a R(p − i)= 0, which may be solved for R(p) in terns of R(p − i), i = 1, 2,...,p, = i=0 pi will be zero, so that Aq(z) Ap(z) for all q>p. as follows: The four functions lev, frwlev, bkwlev, and rlev allow the passage from one pa- p rameter set to another. The function lev is an implementation of the computational R(p)=− apiR(p − i) , p = 1, 2,...,M (12.3.24) = sequence outlined above. The input to the function is the final desired order of the i 1 predictor, say M, and the vector of autocorrelation lags {R(0), R(1), ..., R(M)}. Its out- For example, the first few iterations of Eq. (12.3.24) will be: put is the lower-triangular matrix L whose rows are the reverse of all the lower order R(1) =− a11R(0) prediction-error filters. For example, for M = 4 the matrix L would be =− + ⎡ ⎤ R(2) a21R(1) a22R(0) 10000 ⎢ ⎥ =− + + ⎢ ⎥ R(3) a31R(2) a32R(1) a33R(0) ⎢ a11 1000⎥ ⎢ ⎥ L = ⎢ a22 a21 100⎥ (12.3.20) To get this recursion started, the value of R(0) may be obtained from Eq. (12.3.13). ⎢ ⎥ ⎣ a33 a32 a31 10⎦ Using Eq. (12.3.13) repeatedly, and E0 = R(0) we find a44 a43 a42 a41 1 = − 2 − 2 ··· − 2 EM (1 γ1)(1 γ2) (1 γM)R(0) (12.3.25) The first column of L contains the negatives of all the reflection coefficients. This fol- Since the reflection coefficients are already known (from the call to bkwlev) and EM lows from the Levinson recursion (12.3.14) which implies that the negative of the highest is given, this equation provides the right value for R(0). coefficient of the pth prediction-error filter is the pth reflection coefficient; namely, The function schur, based on the Schur algorithm and discussed in Sec. 12.10, is an alternative to lev. The logical interconnection of these functions is shown below. γp =−app ,p= 1, 2,...,M (12.3.21)

This choice for L is justified below and in Sec. 12.9. The function lev also produces the vector of mean-square prediction errors {E0,E1,...,EM} according to the recursion (12.3.13). The function frwlev is an implementation of the forward Levinson recursion (12.3.18) or (12.3.15). Its input is the set of reflection coefficients {γ1,γ2,...,γM} and its output is the set of all prediction-error filters up to order M, that is, Ap(z), p = 1, 2,...,M. Again, this output is arranged into the matrix L. 522 12. Linear Prediction 12.3. Linear Prediction and the Levinson Recursion 523

Example 12.3.1: Given the autocorrelation lags Sending these lags through the function lev we find the prediction-error filters:

= − −1 {R(0), R(1), R(2), R(3), R(4)}={128, −64, 80, −88, 89} A1(z) 1 0.8z

−1 −2 A2(z) = 1 − 0.889z + 0.111z Find all the prediction-error filters Ap(z) up to order four, the four reflection coefficients, = − −1 + −3 and the corresponding mean-square prediction errors. Below, we simply state the results A3(z) 1 0.875z 0.125z

obtained using the function lev: −1 −4 A4(z) = 1 − 0.857z + 0.143z −1 A1(z) = 1 + 0.5z Therefore, the fourth order prediction of yn given by Eq. (12.3.1) is −1 −2 A2(z) = 1 + 0.25z − 0.5z yˆn = 0.857yn−1 − 0.143yn−4 −2 −3 A3(z) = 1 − 0.375z + 0.5z

−1 −2 −3 −4 which gives yˆ5 = 0.857 − 0.143 = 0.714.  A4(z) = 1 − 0.25z − 0.1875z + 0.5z − 0.5z The results of this section can also be derived from those of Sec. 1.8 by invoking The reflection coefficients are the negatives of the highest coefficients; thus, stationarity and making the proper identification of the various quantities. The data = = vector y and the subvectors y¯ and y˜ are identified with y yp+1(n), y¯ yp(n), and {γ1,γ2,γ3,γ4}={−0.5, 0.5, −0.5, 0.5} y˜ = yp(n − 1), where ⎡ ⎤ The vector of mean-squared prediction errors is given by ⎡ ⎤ ⎡ ⎤ yn ⎢ ⎥ yn yn−1 { }={ } ⎢ yn−1 ⎥ ⎢ ⎥ ⎢ ⎥ E0,E1,E2,E3,E4 128, 96, 72, 54, 40.5 ⎢ ⎥ ⎢ y − ⎥ ⎢ y − ⎥ ⎢ . ⎥ ⎢ n 1 ⎥ ⎢ n 2 ⎥ y (n)= ⎢ . ⎥ , y (n)= ⎢ ⎥ , y (n − 1)= ⎢ ⎥ (12.3.26) p+1 ⎢ . ⎥ p ⎢ . ⎥ p ⎢ . ⎥ Sending the above vector of reflection coefficients through the function frwlev would gen- ⎢ ⎥ ⎣ . ⎦ ⎣ . ⎦ ⎣ yn−p ⎦ erate the above set of polynomials. Sending the coefficients of A4(z) through bkwlev yn−p yn−p−1 yn−p−1 would generate the same set of polynomials. Sending the coefficients of A4(z) and E4 = 40.5 through rlev would recover the original autocorrelation lags R(k), k = 0, 1, 2, 3, 4.  It follows from stationarity that the autocorrelation matrices of these vectors are independent of the absolute time instant n; therefore, we write The Yule-Walker method (see Sec. 12.2) can be used to extract the linear prediction = T = − − T = T parameters from a given set of signal samples. From a given length-N block of data Rp E[yp(n)yp(n) ] E[yp(n 1)yp(n 1) ], Rp+1 E[yp+1(n)yp+1(n) ]

y0,y1,y2,...,yN−1 It is easily verified that Rp is the order-p autocorrelation matrix defined in Eq. (12.3.7) and that the order-(p+1) autocorrelation matrix Rp+1 admits the block decompositions compute the sample autocorrelations {R(ˆ 0), R(ˆ 1),...,R(M)ˆ } using, for example, Eq. (12.2.7), ⎡ ⎤ ⎡ ⎤ R(0) R(1) ··· R(p + 1) R(p + 1) and send them through the Levinson recursion. The yw implements the Yule-Walker ⎢ ⎥ ⎢ ⎥ ⎢ R(1) ⎥ ⎢ . ⎥ method. The input to the function is the data vector of samples {y ,y ,...,y − } and ⎢ ⎥ ⎢ . ⎥ 0 1 N 1 R + = ⎢ ⎥ = ⎢ Rp . ⎥ p 1 ⎢ . ⎥ ⎢ ⎥ the desired final order M of the predictor. Its output is the set of all prediction-error fil- ⎣ . Rp ⎦ ⎣ R(1) ⎦ ters up to order M, arranged in the matrix L, and the vector of mean-squared prediction R(p + 1) R(p + 1) ··· R(1) R(0) errors up to order M, that is, {E0,E1,...,EM} It follows, in the notation of Sec. 1.8, that R¯ = R˜ = Rp and ρa = ρb = R(0), and Example 12.3.2: Given the signal samples ⎡ ⎤ ⎡ ⎤ R(1) R(p + 1) ⎢ ⎥ ⎢ ⎥ { }={ } = ⎢ . ⎥ = ⎢ . ⎥ y0,y1,y2,y3,y4 1, 1, 1, 1, 1 ra ⎣ . ⎦ , rb ⎣ . ⎦ R(p + 1) R(1) determine all the prediction-error filters up to order four. Using the fourth order predictor,

predict the sixth value in the above sequence, i.e., the value of y5. Thus, ra and rb are the reverse of each other. It follows that the backward predictors The sample autocorrelation of the above signal is easily computed using the methods of are the reverse of the forward ones. Therefore, Eq. (12.3.14) is the same as Eq. (1.8.40), Chapter 1. We find (ignoring the 1/N normalization factor): with the identifications

{R(ˆ 0), R(ˆ 1), R(ˆ 2), R(ˆ 3), {R(ˆ 4)}={5, 4, 3, 2, 1} a = ap+1 , b = bp+1 , ¯a = ˜a = ap , b¯ = b˜ = bp 524 12. Linear Prediction 12.4. Levinson’s Algorithm in Matrix Form 525

where for the unknowns {E3,a31,a32,a33}. The corresponding prediction-error filter is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 a + + −1 −2 −3 p 1,p 1 A3(z)= 1 + a31z + a32z + a33z ⎢ ⎥ ⎢ ⎥ 1 app ⎢ a + ⎥ ⎢ a + ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ p 1,1 ⎥ ⎢ p 1,p ⎥ ⎢ a ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ p1 ⎥ ⎢ . ⎥ a + = ⎢ . ⎥ , b + = ⎢ . ⎥ a = ⎢ ⎥ , b = ⎢ . ⎥ and the minimum value of the prediction error is E3. The solution is obtained in an p 1 ⎢ . ⎥ p 1 ⎢ . ⎥ p ⎢ . ⎥ p ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ . ⎦ ⎣ ap1 ⎦ iterative manner, by solving a family of similar matrix equations of lower dimensionality. ⎣ ap+1,p ⎦ ⎣ ap+1,1 ⎦ app 1 Starting at the upper left corner, ap+1,p+1 1

= R = R ¯ = ˜ = = = Symbolically, bp ap , bp+1 ap+1. We have Ea Eb Ep and γa γb γp+1. Thus, Eq. (12.3.15) may be written as a 0 a 0 = p − = p − ap+1 γp+1 γp+1 R (12.3.27) 0 bp 0 ap

The normal Eqs. (12.3.7) can be written for orders p and p + 1 in the compact form of Eqs. (1.8.38) and (1.8.12) the R matrices are successively enlarged until the desired dimension is reached (4×4in this example). Therefore, one successively solves the matrix equations 1 up R a = E u ,R+ a + = E + u + , u = , u + = (12.3.28) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ p p p p p 1 p 1 p 1 p 1 p 0 p 1 0 R0 R1 R2 1 E2 R0 R1 1 E1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ [R ][1]= [E ], = , ⎣ R1 R0 R1 ⎦ ⎣ a21 ⎦ = ⎣ 0 ⎦ T 0 0 Recognizing that Eq. (12.3.12) can be written as Δp = a rb, it follows that the re- R1 R0 a11 0 p R2 R1 R0 a22 0 flection coefficient equation (12.3.11) is the same as (1.8.42). The rows of the matrix L defined by Eq. (12.3.20) are the reverse of the forward predictors; that is, the backward The solution of each problem is obtained in terms of the solution of the previous predictors of successive orders. Thus, L is the same as that defined in Eq. (1.8.13). The one. In this manner, the final solution is gradually built up. In the process, one also rows of the matrix U defined in Eq. (1.8.30) are the forward predictors, with the first finds all the lower order prediction-error filters. row being the predictor of highest order. For example, The iteration is based on two key properties of the autocorrelation matrix: first, ⎡ ⎤ the autocorrelation matrix of a given size contains as subblocks all the lower order 1 a a a a ⎢ 41 42 43 44 ⎥ autocorrelation matrices; and second, the autocorrelation matrix is reflection invariant. ⎢ ⎥ ⎢ 01a31 a32 a33 ⎥ ⎢ ⎥ That is, it remains invariant under interchange of its columns and then its rows. This U = ⎢ 00 1a21 a22 ⎥ ⎢ ⎥ interchanging operation is equivalent to the similarity transformation by the “reversing” ⎣ ⎦ 0001a11 matrix J having 1’s along its anti-diagonal, e.g., 00001 ⎡ ⎤ 0001 ⎢ ⎥ Comparing L with U, we note that one is obtained from the other by reversing its ⎢ 0010⎥ = J = ⎢ ⎥ (12.4.1) rows and then its columns; formally, U JLJ, where J is the corresponding reversing ⎣ 0100⎦ matrix. 1000

The invariance property means that the autocorrelation matrix commutes with the 12.4 Levinson’s Algorithm in Matrix Form matrix J − JRJ 1 = R (12.4.2) In this section, we illustrate the mechanics of the Levinson recursion—cast in matrix form—by explicitly carrying out a few of the recursions given in Eq. (12.3.15). The This property immediately implies that if the matrix equation is satisfied: objective of such recursions is to solve normal equations of the type ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R R R R a b ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ 0 1 2 3 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ R R R R ⎥ ⎢ a ⎥ ⎢ b ⎥ R0 R1 R2 R3 1 E3 ⎢ 1 0 1 2 ⎥ ⎢ 1 ⎥ = ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎢ R R R R ⎥ ⎢ a ⎥ ⎢ 0 ⎥ R2 R1 R0 R1 a2 b2 ⎢ 1 0 1 2 ⎥ ⎢ 31 ⎥ = ⎢ ⎥ ⎣ R2 R1 R0 R1 ⎦ ⎣ a32 ⎦ ⎣ 0 ⎦ R3 R2 R1 R0 a3 b3 R3 R2 R1 R0 a33 0 526 12. Linear Prediction 12.4. Levinson’s Algorithm in Matrix Form 527 then the following equation is also satisfied: Step 2 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R R R R a b ⎢ 0 1 2 3 ⎥ ⎢ 3 ⎥ ⎢ 3 ⎥ We wish to solve ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ R R R R ⎥ ⎢ a ⎥ ⎢ b ⎥ R R R 1 E ⎢ 1 0 1 2 ⎥ ⎢ 2 ⎥ = ⎢ 2 ⎥ 0 1 2 2 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ R2 R1 R0 R1 a1 b1 ⎣ R1 R0 R1 ⎦ ⎣ a21 ⎦ = ⎣ 0 ⎦ (12.4.4) R3 R2 R1 R0 a0 b0 R2 R1 R0 a22 0 The steps of the Levinson algorithm are explicitly as follows: Try an expression of the form: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 Step 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ a21 ⎦ = ⎣ a11 ⎦ − γ2 ⎣ a11 ⎦ , with γ2 to be determined · = Solve R0 1 E0. This defines E0. Then enlarge to the next size by padding a zero, that a22 0 1 is, R0 R1 1 E0 Acting on both sides by the 3×3 autocorrelation matrix and using Step 1, we find = , this defines Δ0. Then, also R1 R0 0 Δ0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ E E Δ ⎢ 2 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ R0 R1 0 Δ0 ⎣ 0 ⎦ = ⎣ 0 ⎦ − γ ⎣ 0 ⎦ , or, = , by reversal invariance 2 R1 R0 1 E0 0 Δ1 E1 These are the preliminaries to Step 1. E2 = E1 − γ2Δ1 , 0 = Δ1 − γ2E1 , or Δ 1 Step 1 = 1 = − 2 = γ2 ,E2 (1 γ2)E1 , where Δ1 R2,R1 E1 a11 We wish to solve R0 R1 1 E1 These define γ2 and E2. As a preliminary to Step 3, enlarge Eq. (12.4.4) to the next = (12.4.3) R1 R0 a11 0 size by padding a zero Try an expression of the form ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R0 R1 R2 R3 1 E2 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = − ⎢ R1 R0 R1 R2 ⎥ ⎢ a21 ⎥ ⎢ 0 ⎥ γ1 ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ , this defines Δ2. Then, also a11 0 1 ⎣ R2 R1 R0 R1 ⎦ ⎣ a22 ⎦ ⎣ 0 ⎦ R3 R2 R1 R0 0 Δ2 R0 R1 Acting on both sides by and using the results of Step 0, we obtain ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R1 R0 R R R R 0 Δ ⎢ 0 1 2 3 ⎥ ⎢ ⎥ ⎢ 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ R R 1 R R 1 R R 0 R1 R0 R1 R2 a22 0 0 1 = 0 1 − 0 1 ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ , by reversal invariance γ1 , or, ⎣ R2 R1 R0 R1 ⎦ ⎣ a21 ⎦ ⎣ 0 ⎦ R1 R0 a11 R1 R0 0 R1 R0 1 R3 R2 R1 R0 1 E2 E1 E0 Δ0 = − γ1 , or, 0 Δ0 E0 Step 3

E1 = E0 − γ1Δ0 , 0 = Δ0 − γ1E0 , or We wish to solve ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Δ0 = = − = − 2 = R0 R1 R2 R3 1 E3 γ1 ,E1 E0 γ1Δ0 (1 γ1)E0 , where Δ0 R1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ E0 ⎢ R R R R ⎥ ⎢ a ⎥ ⎢ 0 ⎥ ⎢ 1 0 1 2 ⎥ ⎢ 31 ⎥ = ⎢ ⎥ These define γ and E . As a preliminary to Step 2, enlarge Eq. (12.4.3) to the next (12.4.5) 1 1 ⎣ R2 R1 R0 R1 ⎦ ⎣ a32 ⎦ ⎣ 0 ⎦ size by padding a zero ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R3 R2 R1 R0 a33 0 R R R 1 E ⎢ 0 1 2 ⎥ ⎢ ⎥ ⎢ 1 ⎥ Try an expression of the form: ⎣ R R R ⎦ ⎣ a ⎦ = ⎣ 0 ⎦ , this defines Δ . Then, also 1 0 1 11 1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R2 R1 R0 0 Δ1 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ a31 ⎢ a21 a22 R0 R1 R2 0 Δ1 ⎢ ⎥ = ⎥ − γ3 ⎢ ⎥ , with γ3 to be determined ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ a32 ⎦ ⎣ a22 ⎦ ⎣ a21 ⎦ ⎣ R1 R0 R1 ⎦ ⎣ a11 ⎦ = ⎣ 0 ⎦ , by reversal invariance a33 0 1 R2 R1 R0 1 E1 528 12. Linear Prediction 12.5. Autocorrelation Sequence Extensions 529

Acting on both sides by the 4×4 autocorrelation matrix and using Step 2, we obtain It follows from the Levinson recursion that all prediction-error filters of order greater

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ than p will remain equal to the pth filter, Ap(z)= Ap+1(z)= Ap+2(z)=···. Therefore, E3 E2 Δ2 = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ the corresponding whitening filter will be A(z) Ap(z), that is, an ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ ⎥ = ⎢ ⎥ − γ ⎢ ⎥ , or, of order p. With the exception of the above autoregressive extension that leads to an ⎣ 0 ⎦ ⎣ 0 ⎦ 3 ⎣ 0 ⎦ all-pole signal model, the extendibility conditions |γp+i| < 1, i ≥ 1, do not necessarily 0 Δ2 E2 guarantee that the resulting signal model will be a rational (pole-zero) model.

E3 = E2 − γ3Δ2 , 0 = Δ2 − γ3E2 , or Example 12.5.1: Consider the three numbers {R(0), R(1), R(2)}={8, 4, −1}. The Levinson ⎡ ⎤ recursion gives {γ1,γ2}={0.5, −0.5} and {E1,E2}={6, 4.5}. Thus, the above numbers 1 Δ2 2 ⎢ ⎥ qualify to be autocorrelation lags. The corresponding prediction-error filters are γ = ,E= (1 − γ )E , where Δ = R ,R ,R ⎣ a21 ⎦ 3 E 3 3 2 2 3 2 1 ⎡ ⎤ ⎡ ⎤ 2 a 1 1 22 1 1 ⎢ ⎥ ⎢ ⎥ a1 = = , a2 = ⎣ a21 ⎦ = ⎣ −0.75 ⎦ Clearly, the procedure can be continued to higher and higher orders, as required a11 −0.5 a22 0.5 in each problem. Note that at each step, we used the order-updating Eqs. (1.8.40) in conjunction with Eq. (1.8.47). The next lag in this sequence can be chosen according to Eq. (12.5.1) R(3)= γ3E2 − a21R(2)+a22R(1) = 4.5γ3 − 2.75 12.5 Autocorrelation Sequence Extensions where γ3 is any number in the interval −1 <γ3 < 1 . The resulting possible values of In this section, we discuss the problem of extending an autocorrelation function and R(3) are plotted below versus γ3 . In particular, the autoregressive extension corresponds = =−  the related issues of singular autocorrelation matrices. The equivalence between an to γ3 0, which gives R(3) 2.75. autocorrelation function and the set of reflection coefficients provides a convenient and systematic way to (a) test whether a given finite set of numbers are the autocorrelation lags of a stationary signal and (b) extend a given finite set of autocorrelation lags to arbitrary lengths while preserving the autocorrelation property. For a finite set of numbers {R(0), R(1),...,R(p)} to be the lags of an autocorre- lation function, it is necessary and sufficient that all reflection coefficients, extracted from this set via the Levinson recursion, have magnitude less than one; that is, |γi| < 1, for i = 1, 2,...,p, and also that R(0)> 0. These conditions are equivalent to the pos- itive definiteness of the autocorrelation matrix Rp. The proof follows from the fact The end-points, γp+1 =±1, of the allowed interval (−1, 1) correspond to the two that the positivity of Rp is equivalent to the conditions on the prediction errors Ei > 0, possible extreme values of R(p + 1): for i = 1, 2,...,p. In turn, these conditions are equivalent to E = R(0)> 0 and and, 0 + =± − + − +···+ through Eq. (12.3.13), to the reflection coefficients having magnitude less than one. R(p 1) Ep ap1R(p) ap2R(p 1) appR(1) { } The problem of extending a finite set R(0), R(1),...,R(p) of autocorrelation lags = − 2 = In this case, the corresponding prediction error vanishes Ep+1 (1 γp+1)Ep 0. is to find a number R(p+1) such that the extended set {R(0), R(1),...,R(p), R(p+1)} This makes the resulting order-(p + 1) autocorrelation matrix Rp+1 singular. The pre- is still an autocorrelation sequence. This can be done by parametrizing R(p + 1) in diction filter becomes either the symmetric (if γp+1 =−1) or antisymmetric (if γp+1 = 1) terms of the next reflection coefficient γ + . Solving Eq. (12.3.12) for R(p + 1) and p 1 combination using Eq. (12.3.11), we obtain a 0 = p + = + −1 R + = − + − +···+ ap+1 R ,Ap+1(z) Ap(z) z Ap (z) , or, R(p 1) γp+1Ep ap1R(p) ap2R(p 1) appR(1) (12.5.1) 0 ap − a 0 Any number γp+1 in the range 1 <γp+1 < 1 will give rise to an acceptable value for = p − = − −1 R ap+1 R ,Ap+1(z) Ap(z) z Ap (z) R(p+1) . The choice γp+1 = 0 is special and corresponds to the so-called autoregressive 0 ap or maximum entropy extension of the autocorrelation function (see Problem 12.16). If this choice is repeated to infinity, we will obtain the set of reflection coefficients In either case, it can be shown that the zeros of the polynomial Ap+1(z) lie on the unit circle, and that the prediction filter ap+1 becomes an eigenvector of Rp+1 with + + = {γ1,γ2,...,γp, 0, 0,...} zero eigenvalue; namely, Rp 1ap 1 0. This follows from the normal Eqs. (12.3.28) Rp+1ap+1 = Ep+1up+1 and Ep+1 = 0. 530 12. Linear Prediction 12.5. Autocorrelation Sequence Extensions 531

± Example 12.5.2: Consider the extended autocorrelation sequence of Example 12.5.1 defined unit circle coinciding with the sinusoids present in R, namely, z = e jω1 . The other two

by the singular choice γ3 =−1. Then, R(3)=−4.5 − 2.75 =−7.25. The corresponding eigenvectors of R are order-3 prediction-error filter is computed using the order-2 predictor and the Levinson ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 0 recursion ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ √ ⎥ = ⎣ ⎦ = ⎣ ⎦ = ⎣ ⎦ = ⎣ ⎦ 1 1 0 1 c cos ω1 0.5 , d sin ω1 √3/2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a ⎥ ⎢ −0.75 ⎥ ⎢ 0.5 ⎥ ⎢ −0.25 ⎥ cos 2ω1 −0.5 sin 2ω1 3/2 = ⎢ 31 ⎥ = ⎢ ⎥ − ⎢ ⎥ = ⎢ ⎥ a3 ⎣ ⎦ ⎣ ⎦ γ3 ⎣ − ⎦ ⎣ − ⎦ a32 0.5 0.75 0.25 √ a33 0 1 1 both belonging to eigenvalue λ = 3. Their norm is c = d = 3/2. The three eigen-

vectors a2, c, d are mutually orthogonal. It is easily verified that the matrix R may be rep- − −1 − It is symmetric about its middle. Its zeros, computed as the solutions of (1 0.25z resented in the form R = 2ccT + 2ddT, which, after normalizing c and d to unit norm, is −2 + −3 = + −1 − −1 + −2 = 0.25z z ) (1 z )(1 1.25z z ) 0 are recognized as the eigendecomposition of R, We can also express R in terms of its complex √ = † + ∗ T 5 ± j 39 sinusoidal components in the form R ss s s , where z =−1 ,z= ⎡ ⎤ 8 1 ⎢ ⎥ † ∗ − − s = c + jd = ⎣ ejω1 ⎦ , s = s T = 1,e jω1 ,e 2jω1 and lie on the unit circle. Finally, we verify that a3 is an eigenvector of R3 with zero e2jω1 eigenvalue: ⎡ ⎤ ⎡ ⎤ 84−1 −7.25 1 Example 12.5.4: Similarly, one can verify that the four autocorrelation lags {8, 4, −1, −7.25} ⎢ ⎥ ⎢ ⎥ ⎢ 484−1 ⎥ ⎢ −0.25 ⎥ of the singular matrix of Example 12.5.2 can be represented in the sinusoidal form R a = ⎢ ⎥ ⎢ ⎥ = 0  3 3 ⎣ −148 4⎦ ⎣ −0.25 ⎦ = jω1k + jω2k + jω3k = −7.25 −14 8 1 R(k) P1e P2e P3e , for k 0, 1, 2, 3

= = = Singular autocorrelation matrices, and the associated symmetric or antisymmetric where P1 8/13, P2 P3 96/13, and ωi correspond to the zeros of the prediction prediction filters with zeros on the unit circle, find application in the method of line filter a3, namely, √ √ spectrum pairs (LSP) of speech analysis [937]. They are also intimately related to the 5 + j 39 5 − j 39 jω1 jω2 jω3 e =−1 ,e = ,e = , so that, ω3 =−ω2 eigenvector methods of spectrum estimation, such as Pisarenko’s method of harmonic 8 8 retrieval, discussed in Sec. 14.2. This connection arises from the property that singular autocorrelation matrices (with nonsingular principal minors) admit a representation as The matrix itself has the sinusoidal representation a sum of sinusoidal components [938], the frequencies of which are given precisely by ⎡ ⎤ 1 the zeros, on the unit circle, of the corresponding prediction filter. This sinusoidal ⎢ jω ⎥ † † † ⎢ e i ⎥ R = P s s + P s s + P s s , where s = ⎢ ⎥ representation is equivalent to the eigen-decomposition of the matrix. The prediction 1 1 1 2 2 2 3 3 3 i ⎣ e2jωi ⎦ filter can, alternatively, be computed as the eigenvector belonging to zero eigenvalue. e3jωi The proof of these results can be derived as a limiting case; namely, the noise-free case, of the more general eigenvector methods that deal with sinusoids in noise. A direct Had we chosen the value γ3 = 1 in Example 12.5.2, we would have found the extended lag T proof is suggested in Problem 14.10. R(3)= 1.75 and the antisymmetric order-3 prediction-error filter a3 = [1, −1.25, 1.25, −1] , ⎡ ⎤ whose zeros are on the unit circle: 21−1 √ √ ⎢ ⎥ 1 + j 63 1 − j 63 Example 12.5.3: Consider the autocorrelation matrix R = ⎣ 12 1⎦. It is easily verified ejω1 = 1 ,ejω2 = ,ejω3 = −11 2 8 8 that the corresponding autocorrelation lags R(k) admit the sinusoidal representation with R(k) admitting the sinusoidal representation

jω1k −jω1k R(k)= 2 cos(ω1k)= e + e , for k = 0, 1, 2 R(k)= P1 + 2P2 cos(ω2k)= [8, 4, −1, 1.75], for k = 0, 1, 2, 3

where ω1 = π/3. Sending these lags through the Levinson recursion, we find {γ1,γ2}= where P1 = 24/7 and P2 = 16/7.  {0.5, −1} and {E1,E2}={1.5, 0}. Thus, R singular. Its eigenvalues are {0, 3, 3}. The T T corresponding prediction filters are a1 = [1, −0.5] and a2 = [1, −1, 1] . It is easily

verified that a2 is an eigenvector of R with zero eigenvalue, i.e., Ra2 = 0. The corresponding −1 −2 eigenfilter A2(z)= 1 − z + z , is symmetric about its middle and has zeros on the 532 12. Linear Prediction 12.6. Split Levinson Algorithm 533

12.6 Split Levinson Algorithm or, −1 −1 Fp+2(z)= (1 + z )Fp+1(z)−αp+1z Fp(z) (12.6.4) The main computational burden of Levinson’s algorithm is 2p multiplications per stage, where αp+1 = (1 + γp+1)(1 − γp). In block diagram form arising from the p multiplications in Eq. (12.3.15) and in the computation of the inner product (12.3.12). Thus, for M stages, the algorithm requires

M 2 p = M(M + 1) p=1 or, O(M2) multiplications. This represents a factor of M savings over solving the normal equations (12.3.7) by direct matrix inversion, requiring O(M3) operations. The savings −1 can be substantial considering that in speech processing M = 10–15, and in seismic Because Fp(z) has order p and is delayed by z , the coefficient form of (12.6.4) is = ⎡ ⎤ processing M 100–200. Progress in VLSI hardware has motivated the development 0 of efficient parallel implementations of Levinson’s algorithm and its variants [939–958]. fp+1 0 ⎢ ⎥ fp+2 = + − αp+1 ⎣ fp ⎦ (12.6.5) 0 f + With M parallel processors, the complexity of the algorithm is typically reduced by p 1 0 another factor of M to O(M) or O(M log M) operations. −1 An interesting recent development is the realization that Levinson’s algorithm has The recursion is initialized by F0(z)= 2 and F1(z)= 1 + z . Because of the sym- some inherent redundancy, which can be exploited to derive more efficient versions metric nature of the polynomial Fp(z) only half of its coefficients need be updated by of the algorithm allowing an additional 50% reduction in computational complexity. Eqs. (12.6.4) or (12.6.5). To complete the recursion, we need an efficient way to update These versions were motivated by a new stability test for linear prediction polynomials the coefficients αp+1. Taking the dot product of both sides of Eq. (12.6.2) with the row by Bistritz [959], and have been termed Split Levinson or Immitance-Domain Levinson vector R(0), R(1),...,R(p) , we obtain algorithms [960–967]. They are based on efficient three-term recurrence relations for + R = − R(0),...,R(p) ap R(0), . . . , R(p) ap (1 γp) R(0),...,R(p) fp the symmetrized or antisymmetrized prediction polynomials. Following [960], we define the order-p symmetric polynomial The first term is recognized as the gapped function gp(0)= Ep, and the second term as gp(p)= 0. Dividing by 1 − γp and denoting τp = Ep/(1 − γp), we obtain − ap−1 0 = + 1 R = + p Fp(z) Ap−1(z) z Ap−1(z) , fp R (12.6.1) 0 ap−1 τp = R(0), R(1),...,R(p) fp = R(i)fpi (12.6.6) i=0 The coefficient vector fp is symmetric about its middle; that is, fp0 = fpp = 1 and fpi = ap−1,i + ap−1,p−i = fp,p−i, for i = 1, 2,...,p− 1. Thus, only half of the vector Because of the symmetric nature of fp the quantity τp can be computed using only fp, is needed to specify it completely. Using the backward recursion (12.3.22) to write half of the terms in the above inner product. For example, if p is odd, the above sum Ap−1(z) in terms of Ap(z), we obtain the alternative expression may be folded to half its terms

(p−1)/2 1 R −1 R 1 R Fp = (Ap + γpA )+z (γpzAp + zA ) = [Ap + A ], or, 2 p p p τp = R(i)+R(p − i) fpi 1 − γp 1 − γp i=0 − = + R − = + R (1 γp)Fp(z) Ap(z) Ap (z) , (1 γp)fp ap ap (12.6.2) Because Eqs. (12.6.5) and (12.6.6) can be folded in half, the total number of multi- = The polynomial Ap(z) and its reverse may be recovered from the knowledge of the plications per stage will be 2(p/2) p, as compared with 2p for the classical Levinson symmetric polynomials Fp(z). Writing Eq. (12.6.1) for order p + 1, we obtain Fp+1(z)= algorithm. This is how the 50% reduction in computational complexity arises. The re- − + 1 R + Ap(z) z Ap (z). This equation, together with Eq. (12.6.2), may be solved for Ap(z) cursion is completed by noting that αp 1 can be computed in terms of τp by R and Ap (z), yielding τp+1 αp+1 = (12.6.7) − − −1 − − τp Fp+1(z) (1 γp)z Fp(z) R (1 γp)Fp(z) Fp+1(z) Ap(z)= ,A(z)= (12.6.3) 1 − z−1 p 1 − z−1 This follows from Eq. (12.3.13), 2 Inserting these expressions into the forward Levinson recursion (12.3.16) and can- τ + E + 1 − γ 1 − γ + p 1 = p 1 p = p 1 − = + − = celing the common factor 1/(1 − z−1), we obtain a three-term recurrence relation for (1 γp) (1 γp+1)(1 γp) αp+1 τp 1 − γp+1 Ep 1 − γp+1 Fp(z): A summary of the algorithm, which also includes a recursive computation of the −1 −1 −1 Fp+2 − (1 − γp+1)z Fp+1 = Fp+1 − (1 − γp)z Fp − γp+1z (1 − γp)Fp − Fp+1 reflection coefficients, is as follows: 534 12. Linear Prediction 12.7. Analysis and Synthesis Lattice Filters 535

T 1. Initialize with τ0 = E0 = R(0), γ0 = 0, f0 = [2], f1 = [1, 1] . τ3 = R(0), R(1), R(2), R(3) f3 = R(0)+R(3) f30 + R(1)+R(2) f31 = 36

2. At stage p, the quantities τp,γp, fp, fp+1 are available. which gives α3 = τ3/τ2 = 36/144 = 0.25 and γ3 =−1 + α3/(1 − γ2)=−0.5. Next, we 3. Compute τ + from Eq. (12.6.6), using only half the terms in the sum. p 1 compute f4 and τ4 =− + − 4. Compute αp+1 from Eq. (12.6.7), and solve for γp+1 1 αp+1/(1 γp). ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 1 5. Compute fp+2 from Eq. (12.6.5), using half of the coefficients. ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ − ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0 ⎢ 0.25 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 0.5 ⎥ 6. Go to stage p + 1. f3 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ f = + − α ⎣ f2 ⎦ = ⎢ −0.25 ⎥ + ⎢ −0.25 ⎥ − 0.25 ⎢ 1 ⎥ = ⎢ −0.75 ⎥ 4 0 f 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 3 0 ⎣ 1 ⎦ ⎣ −0.25 ⎦ ⎣ 1 ⎦ ⎣ 0.5 ⎦ After the final desired order is reached, the linear prediction polynomial can be 0 1 0 1 recovered from Eq. (12.6.3), which can be written recursively as τ4 = R(0), R(1), R(2), R(3), R(4) f4 api = ap,i−1 + fp+1.i − (1 − γp)fp,i−1 ,i= 1, 2,...,p (12.6.8) = R(0)+R(4) f40 + R(1)+R(3) f41 + R(2)f42 = 81 with ap0 = 1, or vectorially, = = = =− + − = which gives α4 τ4/τ3 81/36 2.25 and γ4 1 α4/(1 γ3) 0.5. The final prediction filter a can be computed using Eq. (12.6.9) or (12.6.10). To avoid computing f ap 0 0 4 5 = + fp+1 − (1 − γp) (12.6.9) we use Eq. (12.6.10), which gives 0 ap fp ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 1 0 0 Using the three-term recurrence (12.6.5), we may replace f + in terms of f and f − , ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ p 1 p p 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a41 ⎥ ⎢ 1 ⎥ ⎢ 0.5 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ and rewrite Eq. (12.6.9) as ⎢ ⎥ ⎢ ⎥ ⎢ − ⎥ ⎢ ⎥ ⎢ − ⎥ ⎢ a42 ⎥ ⎢ a41 ⎥ ⎢ 0.75 ⎥ ⎢ 0.5 ⎥ ⎢ 0.25 ⎥ ⎢ ⎥ = ⎢ ⎥ + ⎢ ⎥ + 0.5 ⎢ ⎥ − 2.25 ⎢ ⎥ ⎡ ⎤ ⎢ a ⎥ ⎢ a ⎥ ⎢ 0.5 ⎥ ⎢ −0.75 ⎥ ⎢ −0.25 ⎥ ⎢ 43 ⎥ ⎢ 42 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ap 0 fp 0 ⎢ ⎥ a44 a43 1 0.5 1 = + + γp − αp ⎣ fp−1 ⎦ (12.6.10) 0 a 0 f 0 a44 0 1 0 p p 0 = − − − T  and in the z-domain with solution a4 [1, 0.25, 0.1875, 0.5, 0.5] .

−1 −1 −1 Ap(z)= z Ap(z)+(1 + γpz )Fp(z)−αpz Fp−1(z) (12.6.11) 12.7 Analysis and Synthesis Lattice Filters Example 12.6.1: We rederive the results of Example 12.3.1 using this algorithm, showing ex- T plicitly the computational savings. Initialize with τ0 = R(0)= 128, f0 = [2], f1 = [1, 1] . The Levinson recursion, expressed in the 2×2 matrix form of Eq. (12.3.18) forms the ba- Using Eq. (12.6.6), we compute sis of the so-called lattice, or ladder, realizations of the prediction-error filters and their inverses [917]. Remembering that the prediction-error sequence ep(n) is the convolu- τ1 = R(0), R(1) f1 = R(0)+R(1) f10 = 128 − 64 = 64 tion of the prediction-error filter [1,ap1,ap2,...,app] with the original data sequence y , that is, Thus, α = τ /τ = 64/128 = 0.5 and γ =−1 + α =−0.5. Using Eq. (12.6.5) we find n 1 1 0 1 1 + = + + +···+ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ep (n) yn ap1yn−1 ap2yn−2 appyn−p (12.7.1) 0 1 0 0 1 f 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = 1 + − ⎣ f ⎦ = ⎣ 1 ⎦ + ⎣ 1 ⎦ − ⎣ 2 ⎦ = ⎣ 1 ⎦ we find in the z-domain f2 α1 0 0.5 + 0 f1 = 0 0 1 0 1 Ep (z) Ap(z)Y(z) (12.7.2) + where we changed the notation slightly and denoted ep(n) by ep (n). At this point, and compute τ 2 it proves convenient to introduce the backward prediction-error sequence, defined in τ2 = R(0), R(1), R(2) f2 = R(0)+R(2) f20 + R(1)f21 = 144 terms of the reverse of the prediction-error filter, as follows:

E−(z) = AR(z)Y(z) Thus, α2 = τ2/τ1 = 144/64 = 2.25 and γ2 =−1 + α2/(1 − γ1)=−1 + 2.25/1.5 = 0.5. p p (12.7.3) Next, compute f3 and τ3 − e (n) = yn−p + ap1yn−p+1 + ap2yn−p+2 +···+appyn ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ p ⎡ ⎤ 1 0 0 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ R f 0 ⎢ ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ −0.25 ⎥ where A (z) is the reverse of Ap(z), namely, = 2 + − ⎣ ⎦ = ⎢ ⎥ + ⎢ ⎥ − ⎢ ⎥ = ⎢ ⎥ p f3 α2 f1 ⎣ ⎦ ⎣ ⎦ 2.25 ⎣ ⎦ ⎣ ⎦ 0 f2 1 1 1 −0.25 0 R = −p −1 = + −1 + −2 +···+ −(p−1) + −p 0 1 0 1 Ap (z) z Ap(z ) app ap,p−1z ap,p−2z ap1z z 536 12. Linear Prediction 12.7. Analysis and Synthesis Lattice Filters 537

− The signal sequence ep (n) may be interpreted as the postdiction error in postdicting and in block diagram form the value of yn−p on the basis of the future p samples {yn−p+1,yn−p+2,...,yn−1,yn}, as shown below

These recursions are initialized at p = 0by

Actually, the above choice of postdiction coefficients is the optimal one that mini- ± = = ± = E0 (z) A0(z)Y(z) Y(z) and e0 (n) yn (12.7.8) mizes the mean-square postdiction error + Eqs. (12.7.7) are identical to (1.8.50), with the identifications e → e (n), e¯ → − 2 = a p+1 a E[ep (n) ] min (12.7.4) + → − → − − ep (n), eb ep+1(n), e˜b ep (n 1) the last following from Eq. (12.3.26). This is easily shown by inserting Eq. (12.7.3) into (12.7.4) and using stationarity The lattice realization of the prediction-error filter is based on the recursion (12.7.7). Starting at p = 0, the output of the pth stage of (12.7.7) becomes the input of the ⎡⎛ ⎞ ⎤ p 2 p (p + 1)th stage, up to the final desired order p = M. This is depicted in Fig. 12.7.1. ⎢ ⎥ − 2 = ⎣⎝ ⎠ ⎦ = E[ep (n) ] E apmyn−p+m apmE[yn−p+myn−p+k]apk m=0 m,k=0

p = − = + 2 apmR(m k)apk E[ep (n) ] m,k=0 which shows that the forward and the backward prediction error criteria are the same, thus, having the same solution for the optimal coefficients. We can write Eqs. (12.7.1) Fig. 12.7.1 Analysis lattice filter. and (12.7.3) vectorially ⎡ ⎤ y At each time instant n the numbers held in the M delay registers of the lattice can ⎢ n ⎥ ⎢ yn−1 ⎥ be taken as the internal state of the lattice. The function lattice is an implementation of + ⎢ ⎥ = 1 ⎢ ⎥ = aTy (12.7.5a) ± ep (n) [ ,ap1,...,app] ⎢ . ⎥ p p(n) Fig. 12.7.1. At each instant n, the function takes two overall inputs e0 (n), makes M calls ⎣ . ⎦ to the function section that implements the single lattice section (12.7.7), produces the y − ± n p two overall outputs e (n), and updates the internal state of the lattice in preparation ⎡ ⎤ M y for the next call. By allowing the reflection coefficients to change between calls, the ⎢ n ⎥ ⎢ y − ⎥ function can also be used in adaptive lattice filters. ⎢ n 1 ⎥ + − = ⎢ ⎥ = RT = T Eqs. (12.7.3) imply that the transfer function from the input y to the output e (n) ep (n) [app,ap,p−1,...,1] ⎢ . ⎥ ap yp(n) bp yp(n) (12.7.5b) n M ⎣ . ⎦ . is the desired prediction-error filter AM(z), whereas the transfer function from yn to y − − R n p eM(n) is the reversed filter AM(z). The lattice realization is therefore equivalent to the direct-form realization They are recognized as the forward and backward prediction errors ea and eb of Eq. (1.8.9). Multiplying both sides of the Levinson recursion (12.3.18) by Y(z), we cast it + e (n)= y + a y − + a y − +···a y − in the equivalent form in terms of the forward and backward prediction-error sequences: M n M1 n 1 M2 n 2 MM n M + −1 + E (z) 1 −γ + z E (z) p+1 = p 1 p − − −1 − (12.7.6) Ep+1(z) γp+1 z Ep (z) and in the time domain + = + − − − ep+1(n) ep (n) γp+1ep (n 1) (12.7.7) − = − − − + ep+1(n) ep (n 1) γp+1ep (n) 538 12. Linear Prediction 12.8. Alternative Proof of the Minimum-Phase Property 539 realized directly in terms of the prediction coefficients. It is depicted below

Fig. 12.7.2 Synthesis lattice filter.

Lattice structures based on the split Levinson algorithm can also be developed [960,961]. They are obtained by cascading the block diagram realizations of Eq. (12.6.4) for differ- ent values of αp. The output signals from each section are defined by

p The synthesis filter 1/AM(z) can also be realized in a lattice form. The input to + ep(n)= fpiyn−i ,Ep(z)= Fp(z)Y(z) the synthesis filter is the prediction error sequence eM(n) and its output is the original i=0 sequence yn : Multiplying both sides of Eq. (12.6.1) by Y(z) we obtain the time-domain expression

= + + − − Its direct-form realization is: ep(n) ep−1(n) ep−1(n 1)

Similarly, multiplying both sides of Eq. (12.6.4) by Y(z) we obtain the recursions

ep+2(n)= ep+1(n)+ep+1(n − 1)−αpep(n − 1)

They are initialized by e0(n)= 2yn and e1(n)= yn + yn−1. Putting together the various sections we obtain the lattice-type realization

+ For the lattice realization, since yn corresponds to e0 (n), we must write Eq. (12.7.7) + + = in an order-decreasing form, starting at eM(n) and ending with e0 (n) yn. Rearranging the terms of the first of Eq. (12.7.7), we have The forward prediction error may be recovered from Eq. (12.6.3) or Eq. (12.6.11) by + = + + − − ep (n) ep+1(n) γp+1ep (n 1) multiplying both sides with Y(z); for example, using Eq. (12.6.11) we find (12.7.9) − − + = − − + + ep+1(n) ep (n 1) γp+1ep (n) = + + − − ep (n) ep−1(n) ep(n) γpep(n) αpep−1(n 1) which can be realized as shown below: 12.8 Alternative Proof of the Minimum-Phase Property

The synthesis filter 1/AM(z) must be stable and causal, which requires all the M zeros of the prediction-error filter AM(z) to lie inside the unit circle on the complex z-plane. We have already presented a proof of this fact which was based on the property that + 2 the coefficients of AM(z) minimized the mean-squared prediction error E[e (n) ]. Note the difference in signs in the upper and lower adders. Putting together the M Here, we present an alternative proof based on the Levinson recursion and the fact that stages from p = M to p = 0, we obtain the synthesis lattice filter shown in Fig. 12.7.2. 540 12. Linear Prediction 12.8. Alternative Proof of the Minimum-Phase Property 541

all reflection coefficients γp have magnitude less than one [920]. From the definition then Eq. (12.8.4) implies that γp+1 will have magnitude less than one: (12.7.3), it follows that |γp+1|≤1 , for each p = 0, 1,... (12.8.5) − e (n − 1)= yn−p−1 + ap1yn−p + ap2yn−p+1 +···+appyn−1 (12.8.1) p To prove the minimum-phase property of AM(z) we must show that all of its M zeros are inside the unit circle. We will do this by induction. Let Zp and Np denote the This quantity represents the estimation error of postdicting yn−p−1 on the basis of the number of zeros and poles of Ap(z) that lie inside the unit circle. Levinson’s recursion, p future samples {yn−p,yn−p+1,...,yn−1}. Another way to say this is that the linear Eq. (12.3.13), expresses Ap+1(z) as the sum of Ap(z) and a correction term F(z)= combination of these p samples is the projection of y − − onto the subspace of random n p 1 − −1 R γp+1z Ap (z), that is, variables spanned by {yn−p,yn−p+1,...,yn−1}; that is, Ap+1(z)= Ap(z)+F(z) − − = − − − − − { − − + − } ep (n 1) yn p 1 (projection of yn p 1 onto yn p,yn p 1,...,yn 1 ) (12.8.2) Using the inequality (12.8.5) and the fact that Ap(z) has the same magnitude spectrum R + as Ap (z), we find the inequality On the other hand, ep (n) given in Eq. (12.7.1) is the estimation error of yn based on           = − −1 R  =   ≤   the same set of samples. Therefore, F(z) γp+1z Ap (z) γp+1Ap(z) Ap(z) + = − { } for z = ejω on the unit circle. Then, the argument principle and Rouche’s theorem imply ep (n) yn (projection of yn onto yn−p,yn−p+1,...,yn−1 ) (12.8.3) that the addition of the function F(z) will not affect the difference Np − Zp of poles The samples {yn−p,yn−p+1,...,yn−1} are the intermediate set of samples between yn−p−1 and zeros contained inside the unit circle. Thus, and yn as shown below: Np+1 − Zp+1 = Np − Zp

Since the only pole of Ap(z) is the multiple pole of order p at the origin arising from −p the term z , it follows that Np = p. Therefore,

(p + 1)−Zp+1 = p − Zp , or, Therefore, according to the discussion in Sec. 1.7, the PARCOR coefficient between Zp+1 = Zp + 1 yn−p−1 and yn with the effect of intermediate samples removed is given by Starting at p = 0 with A0(z)= 1, we have Z0 = 0. It follows that + − − E ep (n)ep (n 1) PARCOR = Z = p − 2 p E ep (n − 1) which states that all the p zeros of the polynomial Ap(z) lie inside the unit circle. This is precisely the reflection coefficient γ + of Eq. (12.3.11). Indeed, using Eq. (12.8.1) p 1 Another way to state this result is: “A necessary and sufficient condition for a poly- and the gap conditions, gp(k)= 0, k = 1, 2,...,p,wefind nomial AM(z) to have all of its M zeros strictly inside the unit circle is that all re- { } + − − = + + + +···+ flection coefficients γ1,γ2,...,γM resulting from AM(z) via the backward recursion E ep (n)ep (n 1) E ep (n)(yn−p−1 ap1yn−p ap2yn−p+1 appyn−1) Eq. (12.3.21) have magnitude strictly less than one.” This is essentially equivalent to the = gp(p + 1)+ap1gp(p)+ap2gp(p − 1)+···appgp(1) well-known Schur-Cohn test of stability [968–971]. The function bkwlev can be used in this regard to obtain the sequence of reflection coefficients. The Bistritz test [959], = gp(p + 1) mentioned in Sec. 12.6, is an alternative stability test.

Similarly, invoking stationarity and Eq. (12.7.4), Example 12.8.1: Test the minimum phase property of the polynomials − − 2 = − 2 = + 2 = = − −1 + −2 − −3 + −4 E ep (n 1) E ep (n) E ep (n) gp(0) (a) A(z) 1 2.60z 2.55z 2.80z 0.50z (b) A(z)= 1 − 1.40z−1 + 1.47z−2 − 1.30z−3 + 0.50z−4 Thus, the reflection coefficient γp+1 is really a PARCOR coefficient: Sending the coefficients of each through the function bkwlev, we find the set of reflection E e+(n)e−(n − 1) E e+(n)e−(n − 1) coefficients p p  p p γp+1 = − = (12.8.4) E e (n − 1)2 − 2 + 2 p E ep (n − 1) E ep (n) (a) {0.4, −0.5, 2.0, −0.5} (b) {0.4, −0.5, 0.8, −0.5} Using the Schwarz inequality with respect to the inner product E[uv], that is,   Since among (a) there is one reflection coefficient of magnitude greater than one, case (a) E[uv]2 ≤ E[u2]E[v2] will not be minimum phase, whereas case (b) is.  542 12. Linear Prediction 12.9. Orthogonality of Backward Prediction Errors—Cholesky Factorization 543

12.9 Orthogonality of Backward Prediction Errors—Cholesky which can be rearranged into the vector form ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ Factorization − e0 (n) 1000yn ⎢ − ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ e (n) ⎥ ⎢ a 100⎥ ⎢ y − ⎥ − = ⎢ 1 ⎥ = ⎢ 11 ⎥ ⎢ n 1 ⎥ = Another interesting structural property of the lattice realizations is that, in a certain e (n) ⎣ − ⎦ ⎣ ⎦ ⎣ ⎦ Ly(n) (12.9.4) − e2 (n) a22 a21 10 yn−2 sense, the backward prediction errors ep (n) are orthogonal to each other. To see this, − e3 (n) a33 a32 a31 1 yn−3 consider the case M = 3, and form the matrix product ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ It is identical to Eq. (1.8.15). Using Eq. (12.9.1), it follows now that the covariance R0 R1 R2 R3 1 a11 a22 a33 E0 000 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − = T ⎢ R R R R ⎥ ⎢ 01a a ⎥ ⎢ ∗ E 00⎥ matrix of e (n) is diagonal; indeed, since R E[y(n)y(n) ], ⎢ 1 0 1 2 ⎥ ⎢ 21 32 ⎥ = ⎢ 1 ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ∗∗ ⎦ R2 R1 R0 R1 00 1a31 E2 0 − − T T R − − = E[e (n)e (n) ]= LRL = D (12.9.5) R R R R 0001 ∗∗∗E e e  3 2  1 0       3 

R LT L1 which can also be expressed component-wise as the zero-lag cross-correlation Because the normal equations (written upside down) are satisfied by each prediction- − − R − − (0)= E e (n)e (n) = δ E (12.9.6) error filter, the right-hand side will be a lower-triangular matrix. The “don’t care” entries ep eq p q pq p ∗ have been denoted by s. Multiply from the left by L to get Thus, at each time instant n, the backward prediction errors e−(n) are mutually ⎡ ⎤ p E 000 uncorrelated (orthogonal) with each other. The orthogonality conditions (12.9.6) and the ⎢ 0 ⎥ ⎢ ∗ E 00⎥ lower-triangular nature of L render the transformation (12.9.4) equivalent to the Gram- T = = ⎢ 1 ⎥ LRL LL1 = T ⎣ ∗∗E2 0 ⎦ Schmidt orthogonalization of the data vector y(n) [yn,yn−1,yn−2,yn−3] . Equation ∗∗∗E3 (12.9.1), written as R = L−1DL−T Since L is by definition lower-triangular, the right-hand side will still be lower tri- angular. But the left-hand side is symmetric. Thus, so is the right-hand side and as a corresponds to an LU Cholesky factorization of the covariance matrix R. − = result it must be diagonal. We have shown that Since the backward errors ep (n), p 0, 1, 2,...,M, for an Mth order predictor are generated at the output of each successive lattice segment of Fig. 12.7.1, we may view LRLT = D = diag{E ,E ,E ,E } (12.9.1) 0 1 2 3 the analysis lattice filter as an implementation of the Gram-Schmidt orthogonalization T or, written explicitly of the vector y(n)= [yn,yn−1,yn−2,...,yn−M] . ⎡ ⎤⎡ ⎤⎡ ⎤ ⎡ ⎤ It is interesting to note, in this respect, that this implementation requires only knowl- 1000R0 R1 R2 R3 1 a11 a22 a33 E0 000 ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ edge of the reflection coefficients {γ1,γ2,...,γM}. ⎢ a 100⎥⎢ R R R R ⎥⎢ 01a a ⎥ ⎢ 0 E 00⎥ ⎢ 11 ⎥⎢ 1 0 1 2 ⎥⎢ 21 32 ⎥=⎢ 1 ⎥ The data vector y(n) can also be orthogonalized by means of the forward predictors, ⎣ a a 10⎦⎣ R R R R ⎦⎣ 00 1a ⎦ ⎣ 00E 0 ⎦ 22 21 2 1 0 1 31 2 using the matrix U. This representation, however, is not as conveniently realized by a a a 1 R R R R 0001 000E 33 32 31 3 2 1 0 3 the lattice structure because the resulting orthogonalized vector consists of forward This is identical to Eq. (1.8.17). The pqth element of this matrix equation is then prediction errors that are orthogonal, but not at the same time instant. This can be seen T = from the definition of the forward errors bp Rbq δpqEp (12.9.2) ⎡ ⎤ ⎡ ⎤ ⎡ + ⎤ T 1 a31 a32 a33 yn e3 (n) where bp and bq denote the pth and qth columns of L . These are recognized as the ⎢ ⎥ ⎢ ⎥ ⎢ + ⎥ ⎢ 01a a ⎥ ⎢ y − ⎥ ⎢ e (n − 1) ⎥ backward prediction-error filters of orders p and q. Eq. (12.9.2) implies then the or- = ⎢ 21 22 ⎥ ⎢ n 1 ⎥ = ⎢ 2 ⎥ Uy(n) ⎣ ⎦ ⎣ ⎦ ⎣ + − ⎦ 00 1a11 yn−2 e1 (n 2) thogonality of the backward prediction-error filters with respect to an inner product + 0001 y − e (n − 3) xTRy. n 3 0 The backward prediction errors e−(n) can be expressed in terms of the b s and the p p Thus, additional delays must be inserted at the forward outputs of the lattice struc- T vector of samples y(n)= [y ,y − ,y − ,y − ] , as follows: n n 1 n 2 n 3 ture to achieve orthogonalization. For this reason, the backward outputs, being mutually − = = T = e0 (n) [1, 0, 0, 0]y(n) b0 y(n) yn orthogonal at the same time instant n, are preferred. The corresponding UL factoriza- − T tion of R is in this basis e (n)= [a11, 1, 0, 0]y(n)= b y(n)= a11yn + yn−1 1 1 T = { } − = = T = + + URU diag E3,E2,E1,E0 e2 (n) [a22,a21, 1, 0]y(n) b2 y(n) a22yn a21yn−1 yn−2 − = = T = + + + This is the reverse of Eq. (12.9.1)) obtained by acting on both sides by the reversing e3 (n) [a33,a32,a31, 1]y(n) b3 y(n) a33yn a32yn−1 a31yn−2 yn−3 2 (12.9.3) matrix J and using the fact that U = JLJ, the invariance of R = JRJ, and J = I. 544 12. Linear Prediction 12.9. Orthogonality of Backward Prediction Errors—Cholesky Factorization 545

= R = RT T= T The above orthogonalization may also be understood in the z-domain: since the where b3 a3 [αα3 , 1] [a33,a32,a31, 1] . This is identical to Eq. (1.8.28). − backward prediction error ep (n) is the output of the reversed prediction-error filter Thus, through Levinson’s algorithm, as the prediction coefficients α3 and error E3 R Ap (z) driven by the data sequence yn, we have for the cross-density are obtained, the inverse of R may be updated to the next higher order. Eq. (12.9.11) also suggests an efficient way of solving more general normal equations of the type − − − = R R 1 Sep eq (z) Ap (z)Syy(z)Aq (z ) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R R R R h r ⎢ 0 1 2 3 ⎥ ⎢ 30 ⎥ ⎢ 0 ⎥ Integrating this expression over the unit circle and using Eq. (12.9.6), we find ⎢ R1 R0 R1 R2 ⎥ ⎢ h31 ⎥ ⎢ r1 ⎥ R3h3 = ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ = r3 (12.9.12)   ⎣ R2 R1 R0 R1 ⎦ ⎣ h32 ⎦ ⎣ r2 ⎦ − dz dz R R 1 = − − R R R R h r Ap (z)Syy(z)Aq (z ) Sep eq (z) 3 2 1 0 33 3 u.c. 2πjz u.c. 2πjz (12.9.7) − − = − − = = for a given right-hand vector r3. Such normal equations arise in the design of FIR Wiener Rep eq (0) E ep (n)eq (n) δpqEp filters; for example, Eq. (11.3.9). The solution for h3 is obtained recursively from the R solution of similar linear equations of lower order. For example, let h2 be the solution that is, the reverse polynomials Ap (z) are mutually orthogonal with respect to the above of the previous order inner product defined by the (positive-definite) weighting function Syy(z). Equation (12.9.7) is the z-domain expression of Eq. (12.9.2). This result establishes an intimate ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ R R R h r connection between the linear prediction problem and the theory of orthogonal polyno- ⎢ 0 1 2 ⎥ ⎢ 20 ⎥ ⎢ 0 ⎥ R h = ⎣ R R R ⎦ ⎣ h ⎦ = ⎣ r ⎦ = r mials on the unit circle developed by Szego¨ [972,973]. 2 2 1 0 1 21 1 2 R R R h r The LU factorization of R implies a UL factorization of the inverse of R; that is, 2 1 0 22 2 solving Eq. (12.9.1) for R−1 we have: where the right-hand side vector r2 is part of r3. Then, Eq. (12.9.11) implies a recursive − − relationship between h and h : R 1 = LTD 1L (12.9.8) 3 2 ⎡ ⎤ ⎡ ⎤ 1 1 1 −1 + R RT R + R + RT Since the Levinson recursion generates all the lower order prediction-error filters, it ⎢ R2 α3 α3 α3 ⎥ ⎢ h2 α3 (r3 α3 r2) ⎥ − ⎢ E3 E3 ⎥ r2 ⎢ E3 ⎥ h = R 1r = = essentially generates the inverse of R. 3 3 3 ⎣ 1 1 ⎦ ⎣ 1 ⎦ RT r3 + RT The computation of this inverse may also be done recursively in the order, as follows. α3 (r3 α3 r2) E3 E3 E3 To keep track of the order let us use an extra index = R = T= RT T In terms of the reverse prediction-error filter b3 a3 [a33,a32,a31, 1] [αα3 , 1] , −1 = T −1 R3 L3 D3 L3 (12.9.9) we may write The matrix L3 contains as a submatrix the matrix L2; in fact, h 1 1 = 2 + = + RT = T h3 c b3 , where c (r3 α3 r2) b3 r3 (12.9.13) ⎡ ⎤ 0 E3 E3 1000 ⎢ ⎥ ⎢ a11 100 ⎥ L2 0 Thus, the recursive updating of the solution h must be done by carrying out the aux- L3 = ⎢ ⎥ = (12.9.10) ⎣ a a 1 0 ⎦ RT 2 22 21 α3 1 iliary updating of the prediction-error filters. The method requires O(M ) operations, a33 a32 a31 1 compared to O(M3) if the inverse of R were to be computed directly. This recursive method of solving general normal equations, developed by Robinson RT where α3 denotes the transpose of the reverse of the vector of prediction coefficients; and Treitel, has been reviewed elsewhere [921,922,974–976] Some additional insight into RT = −1 namely, α3 [a33,a32,a21]. The diagonal matrix D3 may also be block divided in the properties of these recursions can be gained by using the Toeplitz property of R. the same manner: − This property together with the symmetric nature of R imply that R commutes with the D 1 0 −1 = 2 reversing matrix: D3 T ⎡ ⎤ 0 1 0001 ⎢ ⎥ Inserting these block decompositions into Eq. (12.9.9) and using the lower order ⎢ 0010⎥ −1 J3 = ⎢ ⎥ = J ,J3R3J3 = R3 (12.9.14) −1 = T −1 ⎣ 0100⎦ 3 result R2 L2 D2 L2,wefind ⎡ ⎤ 1000 −1 1 R RT 1 R + α α α −1 −1 ⎢ R2 α3 α3 α3 ⎥ R 0 Therefore, even though the inverse R is not Toeplitz, it still commutes with this − ⎢ E3 E3 ⎥ 2 1 3 R 1 = = + b bT (12.9.11) 3 ⎣ 1 1 ⎦ T 3 3 reversing matrix; that is, RT 0 0 E3 − − α3 J R 1J = R 1 (12.9.15) E3 E3 3 3 3 3 546 12. Linear Prediction 12.10. Schur Algorithm 547

The effect of this symmetry property on the block decomposition Eq. (12.9.11) may 12.10 Schur Algorithm be seen by decomposing J3 also as The Schur algorithm has its roots in the original work of Schur on the theory of functions T 0 J2 0 1 = = bounded in the unit disk [985,986]. It is an important signal processing tool in a variety J3 T 1 0 J2 0 of contexts, such as linear prediction and signal modeling, fast matrix factorizations, filter synthesis, inverse scattering, and other applications [987–1007]. where J2 is the lower order reversing matrix. Combining Eq. (12.9.15) with Eq. (12.9.11), In linear prediction, Schur’s algorithm is an efficient alternative to Levinson’s al- we find gorithm and can be used to compute the set of reflection coefficients from the auto- ⎡ ⎤ 1 1 correlation lags and also to compute the conventional LU Cholesky factorization of the −1 + R RT R T R2 α3 α3 α3 0 1 ⎢ ⎥ 0 J2 autocorrelation matrix. The Schur algorithm is essentially the gapped function recursion −1 = −1 = ⎢ E3 E3 ⎥ R3 J3R3 J3 ⎣ ⎦ T (12.3.9). It proves convenient to work simultaneously with Eq. (12.3.9) and its reverse. J2 0 1 RT 1 1 0 α3 E3 E3 We define the forward and backward gapped functions of order p + + − − R = g (k)= E[e (n)y − ], g (k)= E[e (n)y − ] (12.10.1) or, since R2 commutes with J2, and J2α3 α3, we have p p n k p p n k ⎡ ⎤ 1 1 T The forward one is identical to that of Eq. (12.3.8). The backward one is the convo- ⎢ α3 ⎥ T = R − ⎢ E3 E3 ⎥ 0 0 1 lution of the backward filter bp ap with the autocorrelation function; that is, R 1 = = + a aT (12.9.16) 3 ⎣ 1 1 ⎦ −1 3 3 −1 + T 0 R2 E3 α3 R2 α3α3 p p E3 E3 + = − − = − gp (k) apiR(k i) , gp (k) bpiR(k i) (12.10.2) −1 i=0 i=0 which is the same as Eq. (1.8.35). Both ways of expressing R3 given by Eqs. (12.9.16) and (12.9.11), are useful. They may be combined as follows: Eq. (12.9.16) gives for the where bpi = ap,p−i. In the z-domain, we have ijth matrix element: + = − = R Gp (z) Ap(z)Syy(z) , Gp (z) Ap (z)Syy(z) (12.10.3) −1 = −1 + T 1 = −1 + −1 (R3 )ij (R2 α3α3 E3)i−1,j−1 (R2 )i−1,j−1 α3iα3jE3 −1 Using Syy(z)= Syy(z ), it follows that which valid for 1 ≤ i, j ≤ 3. On the other hand, from Eq. (12.9.11) we have − = R = −p −1 −1 = −p + −1 Gp (z) Ap (z)Syy(z) z Ap(z )Syy(z ) z Gp (z ) −1 −1 R R −1 (R ) − − = (R ) − − +α α E 3 i 1,j 1 2 i 1,j 1 3i 3j 3 and in the time domain: − = + − which is valid also for 1 ≤ i, j ≤ 3. Subtracting the two to cancel the common term gp (k) gp (p k) (12.10.4) −1 (R2 )i−1,j−1, we obtain the Goberg-Semencul-Trench-Zohar recursion [977–981]: Thus, the backward gapped function is the reflected and delayed version of the forward one. However, the delay is only p units—one less than required to completely −1 = −1 + T − R RT −1 ≤ ≤ (R3 )ij (R3 )i−1,j−1 (αα3α3 α3 α3 )ijE3 , 1 i, j 3 (12.9.17) align the gaps. Therefore, the forward and backward gapped functions have slightly

−1 different gaps of length p; namely, which allows the building-up of R3 along each diagonal, provided one knows the “bound- ary” values to get these recursions started. But these are: + = = gp (k) 0 , for k 1, 2,...,p (12.10.5) −1 = −1 −1 = −1 = −1 ≤ ≤ − = = − (R3 )00 E3 ,(R3 )i0 (R3 )0i a3iE3 , 1 i, j 3 (12.9.18) gp (k) 0 , for k 0, 1,...,p 1

Thus, from the prediction-error filter a3 and its reverse, the entire inverse of the By the definition (12.10.1), the gap conditions of the backward function are equiva- autocorrelation matrix may be built up. Computationally, of course, the best procedure lent to the orthogonality conditions for the backward predictor; namely, that the esti- − { = − } is to use Eq. (12.9.8), where L and D are obtained as byproducts of the Levinson re- mation error ep (n) be orthogonal to the observations yn−k,k 0, 1,...,p 1 that cursion. The function lev of the appendix starts with the M + 1 autocorrelation lags make up the estimate of yn−p. Inserting the lattice recursions (12.7.7) into (12.10.1), or {R(0), R(1),...,R(M)} and generates the required matrices L and D. The main reason using the polynomial recursions (12.3.18) into (12.10.3), we obtain the lattice recursions for the existence of fast algorithms for Toeplitz matrices can be traced to the nesting for the gapped functions, known as the Schur recursions property that the principal submatrices of a Toeplitz matrix are simply the lower order + = + − − − gp+1(k) gp (k) γp+1gp (k 1) Toeplitz submatrices. Similar fast algorithms have been developed for other types of (12.10.6) − = − − − + structured matrices, such as Hankel and Vandermonde matrices [982–984]. gp+1(k) gp (k 1) γp+1gp (k) 548 12. Linear Prediction 12.10. Schur Algorithm 549

− = − − − T or, in matrix form where e (n) e0 (n), e1 (n), ...,eM(n) is the decorrelated vector of backward pre- diction errors. Following Eq. (1.8.57), we multiply (12.10.12) from the left by the lower + + g (k) 1 −γ + g (k) − p+1 = p 1 p triangular matrix L, and using the transformation e (n)= Ly(n) and Eq. (12.9.5), we − − − − gp+1(k) γp+1 1 gp (k 1) obtain − − − LG = LE[y(n)e (n)T]= E[e (n)e (n)T]= D ± = They are initialized by g0 (k) R(k). The first term of Eq. (12.10.6) is identical to Eq. (12.3.9) and the second term is the reverse of Eq. (12.3.9) obtained by the substitution Therefore, G is essentially the inverse of L + k → p+1−k. The forward gap condition g (p+1)= 0 can be solved for the reflection − p+1 G = L 1D (12.10.13) coefficient g+(p + 1) p Using Eq. (12.9.1), we obtain the conventional LU Cholesky factorization of the auto- γp+1 = − (12.10.7) gp (p) correlation matrix R in the form − + Note that Eq. (12.10.4) implies g (p)= g (0)= E , and therefore, Eq. (12.10.7) is − − − − − p p p R = L 1DL T = (GD 1)D(D 1GT)= GD 1GT (12.10.14) the same as Eq. (12.3.11). For an Mth order predictor, we only need to consider the ± = values gp (k), for k 0, 1,...,M. We arrange these values (for the backward function) The backward gapped functions are computed by iterating the Schur recursions into the column vector ⎡ ⎤ ≤ ≤ ≤ ≤ − (12.10.6) for 0 k M and 0 p M. One computational simplification is that, g (0) ± ⎢ p ⎥ because of the presence of the gap, the functions g (k) need only be computed for ⎢ − ⎥ p ⎢ gp (1) ⎥ ≤ ≤ + = − = ⎢ ⎥ p k M (actually, gp (p) 0 could also be skipped). This gives rise to the Schur gp ⎢ . ⎥ (12.10.8) ⎣ . ⎦ algorithm: − gp (M) ± = = 0. Initialize in order by g0 (k) R(k), k 0, 1,...,M. = − ± ≤ ≤ By virtue of the gap conditions (12.10.5), the first p entries, k 0, 1,...,p 1, of 1. At stage p, we have available gp (k) for p k M. + + this vector are zero. Therefore, we may construct the lower-triangular matrix having the gp (p 1) − 2. Compute γp+1 = − . gp s as columns gp (p) = − − ··· − G [g0 , g1 , , gM] (12.10.9) 3. For p + 1 ≤ k ≤ M, compute For example, if M = 3, + = + − − − gp+1(k) gp (k) γp+1gp (k 1) − ⎡ − ⎤ = − − − + gp+1(k) gp (k 1) γp+1gp (k) g0 (0) 000 ⎢ − − ⎥ ⎢ g0 (1)g1 (1) 00⎥ + G = ⎢ − − − ⎥ 4. Go to stage p 1. ⎣ ⎦ − g0 (2)g1 (2)g2 (2) 0 = − − − − 5. At the final order M, set EM gM(M). g0 (3)g1 (3)g2 (3)g3 (3) The function schur is an implementation of this algorithm. The inputs to the func- The first column of G consists simply of the M + 1 autocorrelation lags: tion are the order M and the lags {R(0), R(1),...,R(M)}. The outputs are the pa- ⎡ ⎤ rameters {EM,γ1,γ2,...,γM}. This function is a simple alternative to lev. It may be R(0) ⎢ ⎥ ⎢ ⎥ used in conjunction with frwlev, bkwlev, and rlev, to pass from one linear prediction ⎢ R(1) ⎥ − = ⎢ ⎥ parameter set to another. The function schur1 is a small modification of schur that, in g0 ⎢ . ⎥ (12.10.10) ⎣ . ⎦ addition to the reflection coefficients, outputs the lower triangular Cholesky factor G. R(M) The prediction errors can be read off from the main diagonal of G, that is, EP = G(p, p), p = 0, 1,...,M. The main diagonal consists of the prediction errors of successive orders, namely, − = = Example 12.10.1: Sending the five autocorrelation lags, {128, −64, 80, −88, 89}, of Example gp (p) Ep, for p 0, 1,...,M. Stacking the values of definition (12.10.1) into a vector, 12.3.1 through schur1 gives the set of reflection coefficients {γ1,γ2,γ3,γ4}={−0.5, 0.5, we can write compactly, − −0.5, 0.5}, and the matrix G g = E ep(n)y(n) (12.10.11) p ⎡ ⎤ T 128 0 0 0 0 where y(n)= [yn,yn−1,...,yn−M] is the data vector for an Mth order predictor. Thus, ⎢ ⎥ ⎢ − ⎥ ⎢ 64 96 0 0 0 ⎥ the matrix G can be written as in Eq. (1.8.56) ⎢ ⎥ G = ⎢ 80 −24 72 0 0 ⎥ ⎢ ⎥ ⎣ −88 36 0 54 0 ⎦ = − − − = − T G E y(n) e0 (n), e1 (n),...,eM(n) E y(n)e (n) (12.10.12) 89 −43.513.513.540.5 550 12. Linear Prediction 12.10. Schur Algorithm 551

Recall that the first column should be the autocorrelation lags and the main diagonal should Example 12.10.2: Send the autocorrelation lags of Example 12.10.1 into the lattice filter of consist of the mean square prediction errors. It is easily verified that GD−1GT = R.  Fig. 12.7.1 (with all its delay registers initialized to zero), arrange the forward/backward ± outputs from the pth section into the column vectors, yp , and put these columns together The computational bottleneck of the classical Levinson recursion is the computation to form the output matrices Y±. The result is, of the inner product (12.3.12). The Schur algorithm avoids this step by computing γp+1 ⎡ ⎤ ⎡ ⎤ 128 64 −64 64 −64 128 128 128 128 128 as the ratio of the two gapped function values (12.10.7). Moreover, at each stage p, the ⎢ ⎥ ⎢ ⎥ ⎢ − − ⎥ ⎢ − − − − ⎥ ⎢ 64 96 64 80 96 ⎥ ⎢ 64 0 32 64 96 ⎥ computations indicated in step 3 of the algorithm can be done in parallel. Thus, with − ⎢ ⎥ + ⎢ ⎥ Y = ⎢ 80 −24 72 64 −96 ⎥ ,Y= ⎢ 80 48 0 32 72 ⎥ M parallel processors, the overall computation can be reduced to O(M) operations. As ⎢ ⎥ ⎢ ⎥ ⎣ −88 36 0 54 64 ⎦ ⎣ −88 −48 −36 0 −32 ⎦ formulated above, the Schur algorithm is essentially equivalent to the Le Roux-Gueguen 89 −43.513.513.540.5 89 45 27 27 0 fixed-point algorithm [990]. The possibility of a fixed-point implementation arises from the fact that all gapped functions have a fixed dynamic range, bounded by − ± The lower-triangular part of Y agrees with G. The forward/backward outputs yp can be    ±  computed using, for example, the function lattice. They can also be computed directly by g (k) ≤ R(0) (12.10.15) p convolving the prediction filters with the input. For example, the backward filter of order 4 given in Example 12.3.1 is aR = [−0.5, 0.5, −0.1875, −0.25, 1]T. Convolving it with the This is easily seen by applying the Schwarz inequality to definition (12.10.1) and 4 autocorrelation sequence gives the last column of Y− using Ep ≤ R(0)     [128, −64, 80, −88, 89]∗[−0.5, 0.5, −0.1875, −0.25, 1]= [−64, 96, −96, 64, 40.5,... ]  ± 2 =  ± 2 ≤ ± 2 2 ≤ ≤ 2 gp (k) E[ep (n)yn−k] E[ep (n) ]E[yn−k] EpR(0) R(0) Convolving the forward filter a with the autocorrelation sequence gives the last column The Schur algorithm admits a nice filtering interpretation in terms of the lattice struc- 4 of the matrix Y+ ture. By definition, the gapped functions are the convolution of the forward/backward ± pth order prediction filters with the autocorrelation sequence R(k). Therefore, gp (k) [128, −64, 80, −88, 89]∗[1, −0.25, −0.1875, 0.5, −0.5]= [128, −96, 72, −32, 0,... ] will be the outputs from the pth section of the lattice filter, Fig. 12.7.1, driven by the input R(k). Moreover, Eq. (12.10.6) states that the (p + 1)st reflection coefficient is Note that we are interested only in the outputs for 0 ≤ k ≤ M = 4. The last 4 outputs (in obtainable as the ratio of the two inputs to the (p + 1)st lattice section, at time instant general, the last p outputs for a pth order filter) of these convolutions were not shown. + − = − + − They correspond to the transient behavior of the filter after the input is turned off.  p 1 (note that gp (p) gp (p 1 1) is outputted at time p from the pth section and is delayed by one time unit before it is inputted to the (p + 1)st section at time p + 1.) ± It is also possible to derive a split or immitance-domain version of the Schur al- The correct values of the gapped functions gp (k) are obtained when the input to the lattice filter is the infinite double-sided sequence R(k). If we send in the finite causal gorithm that achieves a further 50% reduction in computational complexity [962,963]. sequence Thus, with M parallel processors, the complexity of the Schur algorithm can be reduced x(k)={R(0), R(1),...,R(M),0, 0,...} to O(M/2) operations. We define a symmetrized or split gapped function in terms of the symmetric polynomial Fp(z) defined in Eq. (12.6.1) then, because of the initial and final transient behavior of the filter, the outputs of the pth section will agree with g±(k) only for p ≤ k ≤ M. To see this, let y±(k) denote the p p p = − = two outputs. Because of the causality of the input and filter and the finite length of the gp(k) fpi R(k i) , Gp(z) Fp(z)Syy(z) (12.10.16) = input, the convolutional filtering equation will be i 0 It can be thought of as the output of the filter F (z) driven by the autocorrelation min{p,k} min{p,k} p + = − = − sequence. Multiplying both sides of Eq. (12.6.1) by Syy(z) and using the definition yp (k) api x(k i) api R(k i) = + + −1 − i=max{0,k−M} i=max{0,k−M} (12.10.3), we obtain Gp(z) Gp−1(z) z Gp−1(z), or, in the time domain

This agrees with Eq. (12.10.2) only after time p and before time M, that is, = + + − − gp(k) gp−1(k) gp−1(k 1) (12.10.17) ± = ± ≤ ≤ yp (k) gp (k) , only for p k M Similarly, Eq. (12.6.2) gives T The column vector y− = y−(0), y−(1),...,y−(M) , formed by the first M back- − = + + − p p p p (1 γp)gp(k) gp (k) gp (k) (12.10.18) − ≤ ≤ ward output samples of the pth section, will agree with gp only for the entries p k − = − − − M. Thus, the matrix of backward outputs Y [y0 , y1 ,...,yM] formed by the columns It follows from Eqs. (12.10.4) and (12.10.18) or from the symmetry property of Fp(z) − = − = yp will agree with G only in its lower-triangular part. But this is enough to determine that gp(k) gp(p k), and in particular, gp(0) gp(p). The split Levinson algorithm of G because its upper part is zero. Sec. 12.6 requires the computation of the coefficients αp+1 = τp+1/τp. Setting k = 0in 552 12. Linear Prediction 12.11. Lattice Realizations of FIR Wiener Filters 553

= − ± the definition (12.10.16) and using the reflection symmetry R(i) R( i), we recognize Gp (z) is an all-pass stable transfer function, otherwise known as a lossless bounded real that the inner product of Eq. (12.6.6) is τp = gp(0)= gp(p). Therefore, the coefficient function [971]: αp+1 can be written as the ratio of the two gapped function values − R −1 −p Gp (z) Ap (z) app + ap,p−1z +···+z S (z)= = = (12.10.22) + p + −1 −p gp+1(p 1) Gp (z) Ap(z) 1 + ap1z +···+appz αp+1 = (12.10.19) gp(p) The all-pass property follows from the fact that the reverse polynomial AR(z) has Because the forward and backward gapped functions have overlapping gaps, it fol- the same magnitude response as Ap(z). The stability property follows from the minimum- = = − lows that gp(k) will have gap gp(k) 0, for k 1, 2,...,p 1. Therefore, for an Mth phase property of the polynomials Ap(z), which in turn is equivalent to all reflection order predictor, we only need to know the values of gp(k) for p ≤ k ≤ M. These coefficients having magnitude less than one. Such functions satisfy the boundedness can be computed by the following three-term recurrence, obtained by multiplying the property     recurrence (12.6.4) by Syy(z) Sp(z) ≤ 1 , for |z|≥1 (12.10.23) with equality attained on the unit circle. Taking the limit z →∞, it follows from gp+2(k)= gp+1(k)+gp+1(k − 1)−αp+1gp(k − 1) (12.10.20) Eq. (12.10.22) that the reflection coefficient γp is obtainable from Sp(z) by −1 Using F0(z)= 2 and F1(z)= 1 + z , it follows from the definition that g0(k)= Sp(∞)= app =−γp (12.10.24) 2R(k) and g1(k)= R(k)+R(k−1). To initialize τ0 correctly, however, we must choose g (0)= R(0), so that τ = g (0)= R(0). Thus, we are led to the following split Schur 0 0 0 Using the backward Levinson recursion reqs5.3.23, we obtain a new all-pass function algorithm: G− (z) AR (z) z(γ A + AR) = = + − = = p−1 = p−1 = p p p 0. Initialize by g0(k) 2R(k), g1(k) R(k) R(k 1), for k 1, 2,...,M, and Sp−1(z) + R Gp−1(z) Ap−1(z) Ap + γpAp g0(0)= R(0), γ0 = 0. ≤ ≤ + ≤ 1. At stage p, we have available γp, gp(k) for p k M, and gp+1(k) for p 1 or, dividing numerator and denominator by Ap(z) k ≤ M. S (z)+γ 2. Compute α + from Eq. (12.10.19) and solve for γ + =−1 + α + /(1 − γ ). p p p 1 p 1 p 1 p Sp−1(z)= z (12.10.25) 1 + γpSp(z) 3. For p + 2 ≤ k ≤ M, compute gp+2(k) using Eq. (12.10.20) 4. Go to stage p + 1. This is Schur’s original recursion [985]. Applying this recursion repeatedly from

some initial value p = M down to p = 0, with S0(z)= 1, will give rise to the set of Recalling that E = τ (1 − γ ), we may set at the final order E = τ (1 − γ )= p p p M M M reflection or Schur coefficients {γ1,γ2,...,γM}. The starting all-pass function SM(z) − gM(M)(1 γM). Step 3 of the algorithm requires only one multiplication for each k, will be stable if and only if all reflection coefficients have magnitude less than one. We whereas step 3 of the ordinary Schur algorithm requires two. This reduces the compu- note finally that there is an intimate connection between the Schur algorithm and inverse tational complexity by 50%. The function schur2 (see Appendix B) is an implemen- scattering problems [991,995,1001,1002,1005–1007,1053]. tation of this algorithm. The inputs to the function are the order M and the lags In Sec. 12.13, we will see that the lattice recursions (12.10.6) describe the forward {R(0), R(1),...,R(M)}. The outputs are the parameters {E ,γ ,γ ,...,γ }. The M 1 2 M and backward moving waves incident on a layered structure. The Schur function Sp(z) function can be modified easily to include the computation of the backward gapped will correspond to the overall reflection response of the structure, and the recursion functions g−(k), which are the columns of the Cholesky matrix G. This can be done by p (12.10.25) will describe the successive removal of the layers. The coefficients γp will the recursion represent the elementary reflection coefficients at the layer interfaces. This justifies the − = − − + − − gp (k) gp (k 1) (1 γp)gp(k) gp+1(k) (12.10.21) term reflection coefficients for the γs. + ≤ ≤ − = = − where p 1 k M, with starting value gp (p) Ep gp(p)(1 γp). This recur- sion will generate the lower-triangular part of G. Equation (12.10.21) follows by writing 12.11 Lattice Realizations of FIR Wiener Filters Eq. (12.10.17) for order (p + 1) and subtracting it from Eq. (12.10.18). Note, also, that Eq. (12.10.17) and the bound (12.10.15) imply the bound |g (k)|≤2R(0), which allows p In this section, we combine the results of Sections 11.3 and 12.9 to derive alternative a fixed-point implementation. realizations of Wiener filters that are based on the Gram-Schmidt lattice structures. We finish this section by discussing the connection of the Schur algorithm to Schur’s Consider the FIR Wiener filtering problem of estimating a desired signal xn, on the basis original work. It follows from Eq. (12.10.3) that the ratio of the two gapped functions of the related signal yn, using an Mth order filter. The I/O equation of the optimal filter is given by Eq. (11.3.8). The vector of optimal weights is determined by solving the set of 554 12. Linear Prediction 12.11. Lattice Realizations of FIR Wiener Filters 555 normal equations, given by Eq. (11.3.9). The discussion of the previous section suggests  This is easily recognized as the projection of xn onto the subspace spanned by − − − { that Eq. (11.3.9) can be solved efficiently using the Levinson recursion. Defining the data e0 (n), e1 (n),...,eM(n) , which is the same as that spanned by the data vector yn, vector ⎡ ⎤ yn−1,...,yn−M}. Indeed, it follows from Eqs. (12.11.7) and (12.11.2) that y ⎢ n ⎥ ⎢ ⎥ T = T −1 = T T −1 −1 ⎢ yn−1 ⎥ g h L E[xny(n) ]E[y(n)y(n) ] L = ⎢ ⎥ y(n) ⎢ . ⎥ (12.11.1) . − T −T −1 − − T −T −1 −1 ⎣ . ⎦ = E[xne (n) ]L L E[e (n)e (n) ]L L yn−M − T − − T −1 = E[xne (n) ]E[e (n)e (n) ] we rewrite Eq. (11.3.9) in the compact matrix form = − − − E[xne0 (n)]/E0,E[xne1 (n)]/E1,..., E[xneM(n)]/EM Ryyh = rxy (12.11.2) so that the estimate of xn can be expressed as where Ryy is the (M +1)×(M +1) autocorrelation matrix of y(n), and rxy, the (M +1)- − T − − T −1 − T T −1 vector of cross-correlations between xn, and y(n), namely, xˆn = E[xne (n) ]E[e (n)e (n) ] e (n)= E[xny(n) ]E[y(n)y(n) ] y(n) ⎡ ⎤ R (0) ⎢ xy ⎥ The key to the lattice realization of the optimal filtering equation (12.11.8) is the ⎢ ⎥ ⎢ Rxy(1) ⎥ observation that the analysis lattice filter of Fig. 12.7.1 for the process y , provides, in R = E y(n)y(n)T , r = E[x y(n)]= ⎢ ⎥ (12.11.3) n yy xy n ⎢ . ⎥ − ⎣ . ⎦ its successive lattice stages, the signals ep (n) which are required in the sum (12.11.8). Thus, if the weight vector g is known, an alternative realization of the optimal filter will Rxy(M) be as shown in Fig. 12.11.1. By comparison, the direct form realization using Eq. (12.11.5) and h is the (M + 1)-vector of optimal weights operates directly on the vector y(n), which, at each time instant n, is available at the ⎡ ⎤ tap registers of the filter. This is depicted in Fig. 12.11.2. h ⎢ 0 ⎥ ⎢ ⎥ Both types of realizations can be formulated adaptively, without requiring prior ⎢ h1 ⎥ h = ⎢ ⎥ (12.11.4) knowledge of the filter coefficients or the correlation matrices Ryy and rxy. We will ⎢ . ⎥ ⎣ . ⎦ discuss adaptive implementations in Chap. 16. If Ryy and rxy are known, or can be es- hM timated, then the design procedure for both the lattice and the direct form realizations is implemented by the following three steps: The I/O equation of the filter, Eq. (12.9.4), is

T 1. Using Levinson’s algorithm, implemented by the function lev, perform the LU xˆn = h y(n)= h0yn + h1yn−1 +···+hMyn−M (12.11.5) Cholesky factorization of Ryy, to determine the matrices L and D. Next, consider the Gram-Schmidt transformation of Eq. (12.9.4) from the data vector 2. The vector of weights g can be computed in terms of the known quantities L, D, rxy y(n) to the decorrelated vector e−(n): ⎡ ⎤ ⎡ ⎤ as follows: e−(n) y ⎢ 0 ⎥ ⎢ n ⎥ = −T = −T −1 = −T T −1 = −1 − g L h L Ryy rxy L L D L rxy D Lrxy ⎢ e (n) ⎥ ⎢ yn−1 ⎥ − ⎢ 1 ⎥ ⎢ ⎥ e (n)= Ly(n) or, ⎢ ⎥ = L ⎢ ⎥ (12.11.6) ⎢ . ⎥ ⎢ . ⎥ ⎣ . ⎦ ⎣ . ⎦ 3. The vector h can be recovered from g by h = LTg. − eM(n) yn−M The function firw is an implementation of this design procedure. The inputs to Inserting (12.11.6) into (12.11.5), we find the function are the order M and the correlation lags Ryy(0), Ryy(1),...,Ryy(M) T −1 − and Rxy(0), Rxy(1),...,Rxy(M) . The outputs are the quantities L, D, g, and h. The xˆn = h L e (n) estimate (12.11.8) may also be written recursively in the order of the filter. If we denote, Defining the (M + 1)-vector p −T g = L h (12.11.7) = − xˆp(n) giei (n) (12.11.9) we obtain the alternative I/O equation for the Wiener filter: i=0

M we obtain the recursion T − − − − − xˆn = g e (n)= gpe (n)= g0e (n)+g1e (n)+···+gMe (n) (12.11.8) p 0 1 M = + − = p=0 xˆp(n) xˆp−1(n) gpep (n) , p 0, 1,...,M (12.11.10) 556 12. Linear Prediction 12.11. Lattice Realizations of FIR Wiener Filters 557

Fig. 12.11.2 Direct-form realization of FIR Wiener filter.

= − E = 2 Using gp E[xnep (n)]/Ep, we find for p E[ep(n) ]

p E = 2 − − = E − − p E[xn] giE[xnei (n)] p−1 gpE[xnep (n)] i=0 Fig. 12.11.1 Lattice realization of FIR Wiener filter. = E − − 2 = E − 2 p−1 E[xnep (n)] /Ep p−1 gpEp E E = Thus, p is smaller than p−1. This result shows explicitly how the estimate is con- initialized as xˆ−1(n) 0. The quantity xˆp(n) is the projection of xn on the subspace − − stantly improved as the length of the filter is increased. The nice feature of the lat- spanned by e (n), e (n),...,e−(n) , which by virtue of the lower-triangular nature 0 1 p tice realization is that the filter length can be increased simply by adding more lattice of the matrix L is the same space as that spanned by {yn,yn−1,...,yn−p}. Thus, xˆp(n) sections without having to recompute the weights gp of the previous sections. A re- represents the optimal estimate of xn based on a pth order filter. Similarly, xˆp−1(n) alization equivalent to Fig. 12.11.1, but which shows explicitly the recursive construc- represents the optimal estimate of xn based on the (p − 1)th order filter; that is, based tion (12.11.10) of the estimate of xn and of the estimation error (12.11.11), is shown in on the past p − 1 samples {yn,yn−1,...,yn−p+1}. These two subspaces differ by yn−p. − Fig. 12.11.3. The term ep (n) is by construction the best postdiction error of estimating yn−p from { } − The function lwf is an implementation of the lattice Wiener filter of Fig. 12.11.3. The the samples yn,yn−1,...,yn−p+1 ; that is, ep (n) is the orthogonal complement of yn−p − function dwf implements the direct-form Wiener filter of Fig. 12.11.2. Each call to these projected on that subspace. Therefore, the term gpe (n) in Eq. (12.11.10) represents p functions transforms a pair of input samples {x, y} into the pair of output samples the improvement in the estimate of xn that results by taking into account the additional {x,ˆ e} and updates the internal state of the filter. Successive calls over n = 0, 1, 2,..., past value yn−p; it represents that part of xn that cannot be estimated in terms of the will transform the input sequences {xn,yn} into the output sequences {xˆn,en}. In both subspace {yn,yn−1,...,yn−p+1}. The estimate xˆp(n) of xn is better than xˆp−1(n) in the realizations, the internal state of the filter is taken to be the vector of samples stored sense that it produces a smaller mean-squared estimation error. To see this, define the − in the delays of the filter; that is, w (n)= e (n − 1), p = 1, 2,...,M for the lattice estimation errors in the two cases p p−1 case, and wp(n)= yn−p, p = 1, 2,...,M for the direct-form case. By allowing the filter coefficients to change between calls, these functions can be used in adaptive implemen- ep(n)= xn − xˆp(n) , ep−1(n)= xn − xˆp−1(n) tations. Using the recursion (12.11.10), we find Next, we present a Wiener filter design example for a noise canceling application. The primary and secondary signals x(n) and y(n) are of the form = − − ep(n) ep−1(n) gpep (n) (12.11.11) x(n)= s(n)+v1(n) , y(n)= v2(n) 558 12. Linear Prediction 12.11. Lattice Realizations of FIR Wiener Filters 559

primary signal x(n) 4

x(n) = s(n) + v1(n) 3 ω s(n) = sin( 0n) 2

1

0

−1

−2

−3

−4 0 20 40 60 80 100 time samples, n Fig. 12.11.3 Lattice realization of FIR Wiener filter. Fig. 12.11.4 Noise corrupted sinusoid. where s(n) is a desired signal corrupted by noise v1(n). The signal v2(n) is correlated secondary signal y(n) with v1(n) but not with s(n), and provides a reference noise signal. The noise canceler 4 is to be implemented as a Wiener filter of order M, realized either in the direct or the y(n) = v (n) 3 2 lattice form. It is shown below: 2

1

0

−1

−2

Its basic operation is that of a correlation canceler; that is, the optimally designed −3

filter H(z) will transform the reference noise v2(n) into the best replica of v1(n), and −4 0 20 40 60 80 100 then proceed to cancel it from the output, leaving a clean signal s(n). For the purpose time samples, n of the simulation, we took s(n) to be a simple sinusoid

Fig. 12.11.5 Reference noise. s(n)= sin(ω0n) , ω0 = 0.075π[rads/sample] and v1(n) and v2(n) were generated by the difference equations really an estimate of s(n). This estimate of s(n) is shown in Figs. (12.11.6) and 12.11.7, = = v1(n) =−0.5v1(n − 1)+v(n) for the cases M 4 and M 6, respectively. The improvement afforded by a higher order filter is evident. For the particular realization of x(n) and y(n) that we used, the v2(n) = 0.8v2(n − 1)+v(n) sample correlations Ryy(k), Rxy(k), k = 0, 1,...,M, were: driven by a common, zero-mean, unit-variance, uncorrelated sequence v(n). The dif- Ryy = [2.5116, 1.8909, 1.2914, 0.6509, 0.3696, 0.2412, 0.1363] ference equations establish a correlation between the two noise components v1 and v2, which is exploited by the canceler to effect the noise cancellation. Rxy = [0.7791, −0.3813, 0.0880, −0.3582, 0.0902, −0.0684, 0.0046] Figs. 12.11.4 and 12.11.5 show 100 samples of the signals x(n), s(n), and y(n) generated by a particular realization of v(n). For M = 4 and M = 6, the sample auto- and the resulting vector of lattice weights gp, p = 0, 1,...,M, reflection coefficients γp, correlation and cross-correlation lags, Ryy(k), Rxy(k), k = 0, 1,...,M, were computed and sent through the function firw to get the filter weights g and h. The reference signal yn was filtered through H(z) to get the estimate xˆn—which is really an estimate of v1(n)—and the estimation error e(n)= x(n)−x(n)ˆ , which is 560 12. Linear Prediction 12.12. Autocorrelation, Covariance, and Burg’s Methods 561

recovered signal e(n) that were based on the length-100 data blocks. For M = 6, we have: 4 = − − − 3 e(n) g [0.2571, 0.9286, 0.4643, 0.2321, 0.1161, 0.0580, 0.0290] s(n) 2 γ = [0.8, 0, 0, 0, 0, 0]

1 h = [1, −1.3, 0.65, −0.325, 0.1625, −0.0812, 0.0290] 0 As we discussed in Sec. 1.8, the lattice realizations based on the backward orthog- −1 onal basis have three major advantages over the direct-form realizations: (a) the filter

−2 processes non-redundant information only, and hence adaptive implementations would adapt faster; (b) the design of the optimal filter weights g does not require any matrix −3 inversion; and (c) the lower-order portions of g are already optimal. Moreover, it appears −4 0 20 40 60 80 100 that adaptive versions of the lattice realizations have better numerical properties than time samples, n the direct-form versions. In array processing problems, because the data vector y(n) does not have the tapped-delay line form (12.11.1), the Gram-Schmidt orthogonalization Fig. 12.11.6 Output of noise canceler (M = 4). cannot be done by a simple a lattice filter. It requires a more complicated structure that basically amounts to carrying out the lower-triangular linear transformation (12.11.6). recovered signal e(n) The benefits, however, are the same. We discuss adaptive versions of Gram-Schmidt 4 preprocessors for arrays in Chap. 16. 3 e(n) s(n) 2 12.12 Autocorrelation, Covariance, and Burg’s Methods 1

0 As mentioned in Sec. 12.3, the finite order linear prediction problem may be thought of as an approximation to the infinite order prediction problem. For large enough order −1 p of the predictor, the prediction-error filter Ap(z) may be considered to be an ade- −2 quate approximation to the whitening filter A(z) of the process yn. In this case, the + −3 prediction-error sequence ep (n) is approximately white, and the inverse synthesis filter 1/A (z) is an approximation to the signal model B(z) of y . Thus, we have obtained −4 p n 0 20 40 60 80 100 an approximate solution to the signal modeling problem depicted below: time samples, n

Fig. 12.11.7 Output of noise canceler (M = 6).

+ The variance of ep (n) is Ep. Depending on the realization one uses, the model pa- p = 1, 2,...,M, and direct-form weights hm, m = 0, 1,...,M were for M = 6, rameters are either the set {ap1,ap2,...,app; Ep}, or, {γ1,γ2,...,γp; Ep}. Because these can be determined by solving a simple linear system of equations—that is, the normal g = [0.3102, −0.8894, 0.4706, −0.2534, 0.1571, −0.0826, 0.0398] equations (12.3.7)—this approach to the modeling problem has become widespread. In this section, we present three widely used methods of extracting the model pa- γ = [0.7528, −0.1214, −0.1957, 0.1444, 0.0354, −0.0937] rameters from a given block of measured signal values yn [917,920,926,927,1008–1018]. h = [0.9713, −1.2213, 0.6418, −0.3691, 0.2245, −0.1163, 0.0398] These methods are:

To get the g and γ of the case M = 4, simply ignore the last two entries in the above. 1. The autocorrelation, or Yule-Walker, method The corresponding h is in this case: 2. The covariance method. 3. Burg’s method. h = [0.9646, −1.2262, 0.6726, −0.3868, 0.1571] We have already discussed the Yule-Walker method, which consists simply of replac- Using the results of Problems 12.25 and 12.26, we may compute the theoretical filter ing the theoretical autocorrelations Ryy(k) with the corresponding sample autocorre- weights for this example, and note that they compare fairly well with the estimated ones lations Rˆyy(k) computed from the given frame of data. This method, like the other 562 12. Linear Prediction 12.12. Autocorrelation, Covariance, and Burg’s Methods 563

two, can be justified on the basis of an appropriate least-squares minimization criterion where R(k)ˆ denotes the sample autocorrelation of the length-N data sequence yn: + 2 obtained by replacing the ensemble averages E[ep (n) ] by appropriate time averages. N−1−k The theoretical minimization criteria for the optimal forward and backward predic- R(k)ˆ = R(ˆ −k)= y + y , 0 ≤ k ≤ N − 1 tors are n k n n=0 + 2 = − 2 = E[ep (n) ] min ,E[ep (n) ] min (12.12.1) + − where the usual normalization factor 1/N has been ignored. This equation is identical where ep (n) and ep (n) are the result of filtering yn through the prediction-error fil- ˆ T R T to Eq. (12.12.3) with R replaced by R. Thus, the minimization of the time-average index ter a = [1,a ,...,a ] and its reverse a = [a ,a − ,...,a , 1] , respectively; p1 pp pp p,p 1 p1 (12.12.5) with respect to the prediction coefficients will lead exactly to the same set of namely, normal equations (12.3.7) with R replaced by Rˆ. The positive definiteness of the sample + = + + +···+ ep (n) yn ap1yn−1 ap2yn−2 appyn−p autocorrelation matrix also guarantees that the resulting prediction-error filter will be (12.12.2) − minimum phase, and thus also that all reflection coefficients will have magnitude less e (n) = y − + a y − + + a y − + +···+a y p n p p1 n p 1 p2 n p 2 pp n than one. ± 2. The covariance method replaces Eq. (12.12.1) by the time average Note that in both cases the mean-square value of ep (n) can be expressed in terms of the (p + 1)×(p + 1) autocorrelation matrix N−1 E = + 2= ep (n) min (12.12.6) R(i, j)= R(i − j)= E[y + − y ]= E[y − y − ], 0 ≤ i, j ≤ p n i j n n j n i n=p as follows where the summation in n is such that the filter does not run off the ends of the data + 2 = − 2 = T E[ep (n) ] E[ep (n) ] a Ra (12.12.3) block, as shown below:

Consider a frame of length N of measured values of yn

y0,y1,...,yN−1

1. The Yule-Walker,orautocorrelation, method replaces the ensemble average (12.12.1) by the least-squares time-average criterion To explain the method and to see its potential problems with stability, consider a N+p−1 simple example of a length-three sequence and a first-order predictor: E = + 2= ep (n) min (12.12.4) n=0 + + = where ep (n) is obtained by convolving the length-(p 1) prediction-error filter a T [1,ap1,...,app] with the length-N data sequence yn. The length of the sequence + + + − = + ep (n) is, therefore, N (p 1) 1 N p, which justifies the upper-limit in the summation of Eq. (12.12.4). This convolution operation is equivalent to assuming that the block of data yn has been extended both to the left and to the right by padding it 2 with zeros and running the filter over this extended sequence. The last p output samples + + + E = e (n)2= e (1)2+e (2)2= (y + a y )2+(y + a y )2 e+(n), N ≤ n ≤ N + p − 1, correspond to running the filter off the ends of the data 1 1 1 1 11 0 2 11 1 p n=1 sequence to the right. These terms arise because the prediction-error filter has memory Differentiating with respect to a and setting the derivative to zero gives of p samples. This is depicted below: 11

(y1 + a11y0)y0 + (y2 + a11y1)y1 = 0 y y + y y a =− 1 0 2 1 11 2 + 2 y0 y1

Note that the denominator does not depend on the variable y2 and therefore it is Inserting Eq. (12.12.2) into (12.12.4), it is easily shown that E can be expressed in the possible, if y2 is large enough, for a11 to have magnitude greater than one, making equivalent form the prediction-error filter nonminimal phase. Although this potential stability problem N+p−1 p E = + 2= ˆ − = T ˆ exists, this method has been used with good success in speech processing, with few, ep (n) aiR(i j)aj a Ra (12.12.5) n=0 i,j=0 if any, such stability problems. The autocorrelation method is sometimes preferred in 564 12. Linear Prediction 12.12. Autocorrelation, Covariance, and Burg’s Methods 565 speech processing because the resulting normal equations have a Toeplitz structure and zero we find N−1 + − their solution can be obtained efficiently using Levinson’s algorithm. However, similar ∂E + ∂ep (n) − ∂ep (n) = 2 e (n) + e (n) = 0 ways of solving the covariance equations have been developed recently that are just as p p ∂γp n=p ∂γp ∂γp efficient [1013]. Using the lattice relationships 3. Although the autocorrelation method is implemented efficiently, and the resulting prediction-error filter is guaranteed to be minimum phase, it suffers from the effect of + = + − − − ep (n) ep−1(n) γpep−1(n 1) windowing the data sequence yn, by padding it with zeros to the left and to the right. (12.12.9) − = − − − + This reduces the accuracy of the method somewhat, especially when the data record N ep (n) ep−1(n 1) γpep−1(n) is short. In this case, the effect of windowing is felt more strongly. The proper way both valid for p ≤ n ≤ N − 1 if the filter is not to run off the ends of the data, we find to extend the sequence yn, if it must be extended, is a way compatible with the signal model generating this sequence. Since we are trying to determine this model, the fairest the condition N−1 way of proceeding is to try to use the available data block in a way which is maximally + − − + − + = ep (n)ep−1(n 1) ep (n)ep−1(n) 0 , or, noncommittal as to what the sequence is like beyond the ends of the block. n=p Burg’s method, also known as the maximum entropy method (MEM), arose from the N−1 desire on the one hand not to run off the ends of the data, and, on the other, to always + − − − − − + − − − + + = ep−1(n) γpep−1(n 1) ep−1(n 1) ep−1(n 1) γpep−1(n) ep−1(n) 0 result in a minimum-phase filter. Burg’s minimization criterion is to minimize the sum- n=p squared of both the forward and the backward prediction errors: which can be solved for γp to give N−1 + − − E = e (n)2+e (n)2 = min (12.12.7) N1 p p + − − n=p 2 ep−1(n)ep−1(n 1) n=p γp = (12.12.10) where the summation range is the same as in the covariance method, but with both the N−1 + 2+ − − 2 forward and the reversed filters running over the data, as shown: ep−1(n) ep−1(n 1) n=p

This expression for γp is of the form

2a · b γp = |a|2 +|b|2

where a and b are vectors. Using the Schwarz inequality, it is easily verified that γp has magnitude less than one. Equations (12.12.8) through (12.12.10) define Burg’s method. If the minimization is performed with respect to the coefficients api, it is still possi- The computational steps are summarized below: ble for the resulting prediction-error filter not to be minimum phase. Instead, Burg sug- gests an iterative procedure: Suppose that the prediction-error filter [1,ap−1,1,ap−1,2, 0. Initialize in order as follows: ...,ap−1,p−1] of order (p − 1) has already been determined. Then, to determine the N−1 prediction-error filter of order p, one needs to know the reflection coefficient γp and to + − 1 e (n)= e (n)= y , for 0 ≤ n ≤ N − 1 , and A (z)= 1,E = y2 apply the Levinson recursion: 0 0 n 0 0 N n n=0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1. At stage (p − 1), we have available the quantities: ⎢ a ⎥ ⎢ a − ⎥ ⎢ a − − ⎥ ⎢ p1 ⎥ ⎢ p 1,1 ⎥ ⎢ p 1,p 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ap2 ⎥ ⎢ ap−1,2 ⎥ ⎢ ap−1,p−2 ⎥ ± − ≤ ≤ − ⎢ ⎥ = ⎢ ⎥ − ⎢ ⎥ Ap−1(z), Ep−1, and ep−1(n), for p 1 n N 1 ⎢ . ⎥ ⎢ . ⎥ γp ⎢ . ⎥ (12.12.8) ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ap,p−1 ap−1,p−1 ap−1,1 2. Using Eq. (12.12.10), compute the reflection coefficient γp.

app 0 1 3. Using (12.12.8), compute Ap(z). 4. Using (12.12.9), compute e±(n), for p ≤ n ≤ N − 1. To guarantee the minimum-phase property, the reflection coefficient γp must have p = − 2 magnitude less than one. The best choice for γp is that which minimizes the perfor- 5. Update the mean-square error by Ep (1 γp)Ep−1. mance index (12.12.7). Differentiating with respect to γp and setting the derivative to 6. Go to stage p. 566 12. Linear Prediction 12.12. Autocorrelation, Covariance, and Burg’s Methods 567

The function burg is an implementation of this method. The inputs to the function are the vector of data samples {y0,y1,...,yN−1} and the desired final order M of the predictor. The outputs are all the prediction-error filters of order up to M, arranged as usual into the lower triangular matrix L, and the corresponding mean-square prediction errors {E0,E1,...,EM}.

Example 12.12.1: The length-six block of data

yn = [4.684, 7.247, 8.423, 8.650, 8.640, 8.392]

for n = 0, 1, 2, 3, 4, 5, is known to have been generated by sending zero-mean, unit-variance,

white-noise n through the difference equation Fig. 12.12.1 LPC analysis and synthesis of speech.

yn − 1.70yn−1 + 0.72yn−2 = n measure can be introduced as follows: Consider two autoregressive signal sequences, = − −1+ Thus, the theoretical prediction-error filter and mean-square error are A2(z) 1 1.70z the test sequence yT(n) to be compared against the reference sequence yR(n). Let −2 = 0.72z and E2 1. Using Burg’s method, extract the model parameters for a second-order AT(z) and AR(z) be the two whitening filters, both of order M. The two signal models model. The reader is urged to go through the algorithm by hand. Sending the above six yn are samples through the function burg, we find the first- and second-order prediction-error filters and the corresponding errors:

−1 A1(z) = 1 − 0.987z ,E1 = 1.529 Now, suppose the sequence to be tested, yT(n), is filtered through the whitening

−1 −2 filter of the reference signal A2(z) = 1 − 1.757z + 0.779z ,E2 = 0.60

−1 −2 We note that the theoretical first-order filter obtained from A2(z)= 1 − 1.70z + 0.72z −1 via the backward Levinson recursion is A1(z)= 1 − 0.9884 z .  resulting in the output signal eT(n). The mean output power is easily expressed as The resulting set of LPC model parameters, from any of the above analysis methods, π π † dω   dω can be used in a number of ways as suggested in Sec. 1.13. One of the most successful 2 = = =  2 E[eT(n) ] aRRTaR SeTeT (ω) AR(ω) SyTyT (ω) applications has been to the analysis and synthesis of speech [920,1019–1027]. Each −π 2π −π 2π frame of speech, of duration of the order of 20 msec, is subjected to the Yule-Walker π   σ2  2 T dω = AR(ω)   analysis method to extract the corresponding set of model parameters. The order M −  2 π AT(ω) 2π of the predictor is typically 10–15. Pitch and voiced/unvoiced information are also extracted. The resulting set of parameters represents that speech segment. where RT is the autocorrelation matrix of yT(n). On the other hand, if yT(n) is filtered To synthesize the segment, the set of model parameters are recalled from memory through its own whitening filter, it will produce T(n). Thus, in this case and used in the synthesizer to drive the synthesis filter. The latter is commonly realized † σ2 = E[ (n)2]= a R a as a lattice filter. Lattice realizations are preferred because they are much better well- T T T T T behaved under quantization of their coefficients (i.e., the reflection coefficients) than the direct-form realizations [920,1023,1024]. A typical speech analysis and synthesis It follows that   2 † π  2 system is shown in Fig. 12.12.1. E[eT(n) ] aRRTaR AR(ω) dω = † =   (12.12.11) E[ (n)2] −  2 2π Linear predictive modeling techniques have also been applied to EEG signal process- T aTRTaT π AT(ω) ing in order to model EEG spectra, to classify EEGs automatically, to detect EEG transients The log of this quantity is Itakura’s LPC distance measure that might have diagnostic significance, and to predict the onset of epileptic seizures ! !   2 † π  2 [1028–1035]. E[eT(n) ] aRRTaR AR(ω) dω d(aT, aR)= log = log † = log   LPC methods have been applied successfully to signal classification problems such E[ (n)2] −  2 2π T aTRTaT π AT(ω) as speech recognition [1022,1036–1041] or the automatic classification of EEGs [1032]. Distance measures between two sets of model parameters extracted from two signal In practice, the quantities aT, RT, and aR are extracted from a frame of yT(n) and a frames can be used as measures of similarity between the frames. Itakura’s LPC distance frame of yR(n). If the model parameters are equal, the distance is zero. This distance 568 12. Linear Prediction 12.13. Dynamic Predictive —Waves in Layered Media 569 measure effectively provides a comparison between the two spectra of the processes The seismic problem is somewhat different. Here it is not the transmitted wave that yT and yR, but instead of comparing them directly, a prewhitening of yT(n) is carried is experimentally accessible, but rather the overall reflected wave: out by sending it through the whitening filter of the other signal. If the two spectra are close, the filtered signal eT(n) will be close to white—that is, with a spectrum close to being flat; a measure of this flatness is precisely the above integrated spectrum of Eq. (12.12.11).

12.13 Dynamic Predictive Deconvolution—Waves in Layered Media An impulsive input to the earth, such as a dynamite explosion near the surface, The analysis and synthesis lattice filters, implemented via the Levinson recursion, were will set up seismic elastic waves propagating downwards. As the various earth layers obtained within the context of linear prediction. Here, we would like to point out the re- are encountered, reflections will take place. Eventually each layer will be reverberating markable fact that the same analysis and synthesis lattice structures also occur naturally and an overall reflected wave will be measured at the surface. On the basis of this in the problem of wave propagation in layered media [920–925,974,976,1010,1019,1042– reflected wave, the layered structure (i.e., reflection coefficients, impedances, etc.) must 1059]. This is perhaps the reason behind the great success of linear prediction methods be extracted by deconvolution techniques. These are essentially identical to the linear in speech and seismic signal processing. In fact, historically many linear prediction prediction methods. techniques were originally developed within the context of these two application areas. In addition to geophysical and speech applications, this wave problem and the as- In speech, the vocal tract is modeled as an acoustic tube of varying cross-sectional sociated inverse problem of extracting the structure of the medium from the observed area. It can be approximated by the piece-wise constant area approximation shown (reflected or transmitted) response have a number of other applications. Examples in- below: clude the probing of dielectric materials by electromagnetic waves, the study of the optical properties of thin films, the probing of tissues by ultrasound, and the design of broadband terminations of transmission lines. The mathematical analysis of such wave propagation problems has been done more or less independently in each of these application areas, and is well known dating back to the time of Stokes. In this type of wave propagation problem there are always two associated propa- gating field quantities, the ratio of which is constant and equal to the corresponding characteristic impedance of the propagation medium. Examples of these include the The acoustic impedance of a sound wave varies inversely with the tube area electric and magnetic fields in the case of EM waves, the air pressure and particle vol- ρc ume velocity for sound waves, the stress and particle displacement for seismic waves, Z = A and the voltage and current waves in the case of TEM transmission lines. As a concrete example, we have chosen to present in some detail the case of EM where ρ, c, A are the air density, speed of sound, and tube area, respectively. Therefore, waves propagating in lossless dielectrics. The simplest and most basic scattering prob- as the sound wave propagates from the glottis to the lips, it will suffer reflections every lem arises when there is a single interface separating two semi-infinite dielectrics of time it encounters an interface; that is, every time it enters a tube segment of differ- characteristic impedances Z and Z , as shown ent diameter. Multiple reflections will be set up within each segment and the tube will reverberate in a complicated manner depending on the number of segments and the diameter of each segment. By measuring the speech wave that eventually comes out of the lips, it is possible to remove, or deconvolve, the reverberatory effects of the tube and, in the process, extract the tube parameters, such as the areas of the segments or, equivalently, the reflection coefficients at the interfaces. During speech, the configu- ration of the vocal tract tube changes continuously. But being a mechanical system, it where E+ and E− are the right and left moving electric fields in medium Z, and E+ and does so fairly slowly, and for short periods of time (of the order of 20–30 msec) it may E− are those in medium Z . The arrows indicate the directions of propagation, the fields be assumed to maintain a fixed configuration. From each such short segment of speech, are perpendicular to these directions. Matching the boundary conditions (i.e., continuity a set of configuration parameters (e.g., reflection coefficients) may be extracted. This set may be used to synthesize the speech segment. 570 12. Linear Prediction 12.13. Dynamic Predictive Deconvolution—Waves in Layered Media 571 of the tangential fields at the interface), gives the two equations:

E+ + E− = E+ + E− (continuity of electric field)

1 1 (E+ − E−)= (E+ − E−) (continuity of magnetic field) Z Z

Introducing the reflection and transmission coefficients,

Z − Z ρ = ,τ= 1 + ρ, ρ =−ρ, τ = 1 + ρ = 1 − ρ (12.13.1) Z + Z the above equations can be written in a transmission matrix form Fig. 12.13.1 Layered structure.

E+ 1 1 ρ E+ = (12.13.2) E− τ ρ 1 E−

The flow of energy carried by these waves is given by the Poynting vector 1 ∗ 1 1 ∗ ∗ P = Re (E+ + E−) (E+ − E−) = (E+ E+ − E− E−) (12.13.3) 2 Z 2Z

One consequence of the above matching conditions is that the total energy flow to the right is preserved across the interface; that is, Fig. 12.13.2 Reflection and transmission responses. 1 ∗ ∗ 1 ∗ ∗ (E+ E+ − E− E−)= (E+ E+ − E− E−) (12.13.4) 2Z 2Z case, we have a dielectric structure consisting of M slabs stacked together as shown in It proves convenient to absorb the factors 1/2Z and 1/2Z into the definitions for Fig. 12.13.1. the fields by renormalizing them as follows: The media to the left and right in the figure are assumed to be semi-infinite. The reflection and transmission responses (from the left, or from the right) of the structure E+ 1 E+ E 1 E = √ + = √ + , are defined as the responses of the structure to an impulse (incident from the left, or E− 2Z E− E− 2Z E− from the right) as shown in Fig. 12.13.2. Then, Eq. (12.13.4) reads The corresponding scattering matrix is defined as

∗ − ∗ = ∗ − ∗ TR E+E+ E−E− E+ E+ E− E− (12.13.5) S = RT and the matching equations (12.13.2) can be written in the normalized form and by linear superposition, the relationship between arbitrary incoming and outgoing  E+ 1 1 ρ E √ waves is + 2 = ,t= 1 − ρ = ττ (12.13.6) E− t ρ 1 E−

E+ TR E+ They may also be written in a scattering matrix form that relates the outgoing fields = E− RT E− to the incoming ones, as follows:

E+ tρ E+ E+ The inverse scattering problem that we pose is how to extract the detailed prop- = = S (12.13.7) E− ρt E− E− erties of the layered structure, such as the reflection coefficients ρ0,ρ1,...,ρM from the knowledge of the scattering matrix S; that is, from observations of the reflection This is the most elementary scattering matrix of all, and ρ and t are the most ele- response R or the transmission response T. mentary reflection and transmission responses. From these, the reflection and trans- Without loss of generality, we may assume the M slabs have equal travel time.We mission response of more complicated structures can be built up. In the more general denote the common one-way travel time by T1 and the two-way travel time by T2 = 2T1. 572 12. Linear Prediction 12.13. Dynamic Predictive Deconvolution—Waves in Layered Media 573

As an impulse δ(t) is incident from the left on interface M, there will be immediately interface. This is shown below. a reflected wave and a transmitted wave into medium M. When the latter reaches inter- face M − 1, part of it will be transmitted into medium M − 1, and part will be reflected back towards interface M where it will be partially rereflected towards M − 1 and par- tially transmitted to the left into medium M + 1, thus contributing towards the overall reflection response. Since the wave had to travel to interface M − 1 and back, this latter contribution will occur at time T2. Similarly, another wave will return back to interface M due to reflection from the second interface M − 2; this wave will return 2T2 seconds later and will add to the contribution from the zig-zag path within medium M which is also returning at 2T2, and so on. The timing diagram below shows all the possible return paths up to time t = 3T2, during which the original impulse can only travel as far as interface M − 3: The matching equations are: E+ 1 1 ρ E+ m = m m = − 2 1/2 − − ,tm (1 ρm) (12.13.8) Em tm ρm 1 Em

− − Since the left-moving wave Em is the delayed replica of Em−1 by T1 seconds, and + + Em is the advanced replica of Em−1 by T1 seconds, it follows that

+ = 1/2 + − = −1/2 − Em z Em−1 ,Em z Em−1

or, in matrix form + E+ z1/2 0 E m = m−1 − −1/2 − (12.13.9) Em 0 z Em−1 When we add the contributions of all the returned waves we see that the reflection where the variable z−1 was defined above and represents the two-way travel time delay, response will be a linear superposition of returned impulses while z−1/2 represents the one-way travel time delay. Combining the matching and prop- ± ∞ agation equations (12.13.8) and (12.13.9), we obtain the desired relationship between Em ± R(t)= Rkδ(t − kT2) and Em−1: + − + k=0 E z1/2 1 ρ z 1 E m = m m−1 − −1 − (12.13.10) E t ρ z E − It has a Fourier transform expressible more conveniently as the z-transform m m m m 1 Or, written in a convenient vector notation ∞ −k jωT2 R(z)= Rkz ,z= e , (here, ω is in rads/sec) Em(z)= ψm(z)Em−1(z) (12.13.11) k=0 where we defined We observe that R is periodic in frequency ω with period 2π/T2, which plays a role analogous to the sampling frequency in a sample-data system. Therefore, it is enough E+ (z) z1/2 1 ρ z−1 − = m = m to specify R within the Nyquist interval [ π/T2, π/T2]. Em(z) − ,ψm(z) −1 (12.13.12) E (z) t ρm z Next, we develop the lattice recursions that facilitate the solution of the direct and m m the inverse scattering problems. Consider the mth slab and let E± be the right/left m The “match-and-propagate” transition matrix ψm(z) has two interesting properties; moving waves incident on the left side of the mth interface. To relate them to the same −1 namely, defining ψ¯ m(z)= ψm(z ) ± − quantities Em−1 incident on the left side of the (m 1)st interface, first we use the matching equations to “pass” to the other side of the mth interface and into the mth 10 ψ¯ (z)TJ ψ (z)= J ,J= (12.13.13) slab, and then we propagate these quantities to reach the left side of the (m − 1)st m 3 m 3 3 0 −1 01 ψ¯ (z)= J ψ (z)J ,J= (12.13.14) m 1 m 1 1 10 574 12. Linear Prediction 12.13. Dynamic Predictive Deconvolution—Waves in Layered Media 575

where J1,J3 are recognized as two of the three Pauli spin matrices. From Eq. (12.3.13), which implies that the quantity A¯m(z)Am(z)−B¯m(z)Bm(z) is independent of z: we have with E¯± (z)= E± (z−1): m m ¯ −¯ = 2 Am(z)Am(z) Bm(z)Bm(z) σm (12.13.20) ¯+ + − ¯− − = ¯T = ¯T ¯ T = ¯T ¯ EmEm EmEm EmJ3Em Em−1ψmJ3ψmEm−1 Em−1J3Em−1 R R (12.13.15) Property (12.13.14) implies that Cm and Dm are the reverse polynomials Bm and Am, = ¯+ + − ¯− − Em−1Em−1 Em−1Em−1 respectively; indeed AR CR A¯ C¯ which is equivalent to energy conservation, according to Eq. (12.13.5). The second prop- m m = −m m m = −m m/2 ¯ ··· ¯ ¯ R R z ¯ ¯ z z σmψm ψ1ψ0 erty, Eq. (12.13.14), expresses time-reversal invariance and allows the construction of a Bm Dm Bm Dm second, linearly independent, solution of the recursive equations (12.13.11), Using the − Am Cm property J2 = I, we have = z m/2σ J (ψ ···ψ )J = J J (12.13.21) 1 m 1 m 0 1 1 B D 1 m m − E¯ ˆ = ¯ = m = ¯ ¯ = ¯ ¯ = ˆ 01 A C 01 D B Em J1Em ¯+ J1ψmEm−1 J1ψmJ1J1Em−1 ψmEm−1 (12.13.16) = m m = m m Em 10 Bm Dm 10 Cm Am

The recursions (12.13.11) may be iterated now down to m = 0. By an additional R R from which it follows that Cm(z)= B (z) and Dm(z)= A (z). The definition (12.13.18) boundary match, we may pass to the right side of interface m = 0: m m implies also the recursion = ··· = ··· R −1 R Em ψmψm−1 ψ1E0 ψmψm−1 ψ1ψ0E0 A B 1 ρ z A − B m m = m m 1 m−1 R −1 R Bm Am ρm z Bm−1 Am−1 where we defined ψ0 by 1 1 ρ0 Therefore each column of the ABCD matrix satisfies the same recursion. To sum- ψ0 = t0 ρ0 1 marize, we have or, more explicitly A (z) BR (z) 1 ρ z−1 1 ρ z−1 1 ρ m m = m ··· 1 0 R −1 −1 (12.13.22) + B (z) A (z) ρ z ρ z ρ 1 E+ zm/2 1 ρ z−1 1 ρ z−1 1 ρ E m m m 1 0 m = m ··· 1 0 0 − −1 −1 − Em tmtm−1 ···t1t0 ρm z ρ1 z ρ0 1 E0 with the lattice recursion (12.13.17) −1 A (z) 1 ρ z A − (z) To deal with this product of matrices, we define m = m m 1 −1 (12.13.23) Bm(z) ρm z Bm−1(z) A C 1 ρ z−1 1 ρ z−1 1 ρ m m = m ··· 1 0 = −1 −1 (12.13.18) and the property (12.13.20). The lattice recursion is initialized at m 0 by: Bm Dm ρm z ρ1 z ρ0 1 R A0(z) B0 (z) 1 ρ0 −1 A (z)= 1 ,B(z)= ρ , or, = (12.13.24) where Am,Cm,Bm,Dm are polynomials of degree m in the variable z . The energy 0 0 0 R B0(z) A0 (z) ρ0 1 conservation and time-reversal invariance properties of the ψm matrices imply similar properties for these polynomials. Writing Eq. (12.13.18) in terms of the ψms, we have Furthermore, it follows from the lattice recursion (12.13.23) that the reflection co- efficients ρm always appear in the first and last coefficients of the polynomials Am(z) Am Cm −m/2 and Bm(z), as follows = z σmψmψm−1 ···ψ1ψ0 Bm Dm am(0)= 1 ,am(m)= ρ0ρm ,bm(0)= ρm ,bm(m)= ρ0 (12.13.25) where we defined the quantity Eq. (12.13.17) for the field components reads now "m 2 1/2 + m/2 R + σm = tmtm−1 ···t1t0 = (1 − ρ ) (12.13.19) Em z Am Bm E0 i = − = − R i 0 Em σm Bm Am E0

Property (12.13.13) implies the same for the above product of matrices; that is, with Setting m = M, we find the relationship between the fields incident on the dielectric −1 A¯m(z)= Am(z ), etc., slab structure from the left to those incident from the right: + + A¯ C¯ 10 A C 10 E zM/2 A BR E m m m m = 2 M = M M 0 ¯ ¯ − − σm − R − (12.13.26) Bm Dm 0 1 Bm Dm 0 1 EM σM BM AM E0 576 12. Linear Prediction 12.13. Dynamic Predictive Deconvolution—Waves in Layered Media 577

It describes the effect of adding a layer. Expanding it in a power series, we have = + − 2 −1 − − 2 −1 2 +··· Rm(z) ρm (1 ρm) z Rm−1(z) (1 ρm)ρm z Rm−1(z)

It can be verified easily that the various terms in this sum correspond to the multiple reflections taking place within the mth layer, as shown below: All the multiple reflections and reverberatory effects of the structure are buried in the transition matrix R AM BM R BM AM In reference to Fig. 12.13.2, the reflection and transmission responses R, T, R ,T of the structure can be obtained from Eq. (12.13.26) by noting that M/2 R M/2 R 1 z AM BM T 0 z AM BM R = R , = R R σM BM AM 0 T σM BM AM 1 The first term in the expansion is always ρm; that is, ρm = Rm(∞). Thus, from the which may be combined into one equation: knowledge of Rm(z) we may extract ρm. With ρm known, we may invert Eq. (12.13.28) to get Rm−1(z) from which we can extract ρm−1; and so on, we may extract the series 10 zM/2 A BR TR = M M of reflection coefficients. The inverse of Eq. (12.13.28), which describes the effect of RT B AR 01 σM M M removing a layer, is Rm(z)−ρm that can be written as follows: Rm−1(z)= z (12.13.29) 1 − ρmRm(z) − M/2 R 1 −1 − z AM BM 10 TR 10 T 0 1 R Up to a difference in the sign of ρm, this is recognized as the Schur recursion R = = σM BM AM RT 01 R 1 0 T 01 (12.10.25). It provides a nice physical interpretation of that recursion; namely, the Schur functions represent the overall reflection responses at the successive layer interfaces, Solving these for the reflection and transmission responses, we find: which on physical grounds must be stable, causal, and bounded |Rm(z)|≤1 for all z in B (z) σ z−M/2 their region of convergence that includes, at least, the unit circle and all the points out- = M = M R(z) , T(z) side it. We may also derive a recursion for the transmission responses, which requires AM(z) AM(z) (12.13.27) the simultaneous recursion of Rm(z): BR (z) σ z−M/2 R (z)=− M ,T (z)= M AM(z) AM(z) −1/2 tmz Tm−1(z) 1/2 tmTm(z) T (z)= ,T− (z)= z (12.13.30) Note that T(z)= T (z). Since on physical grounds the transmission response T(z) m −1 m 1 1 + ρmz Rm−1(z) 1 − ρmRm(z) must be a stable and causal z-transform, it follows that necessarily the polynomial −M/2 AM(z) must be a minimum-phase polynomial. The overall delay factor z in T(z) is The dynamic predictive deconvolution method is an alternative method of extracting of no consequence. It just means that before anything can be transmitted through the the sequence of reflection coefficients and is discussed below. structure, it must traverse all M slabs, each with a travel time delay of T1 seconds; that The equations (12.13.27) for the scattering responses R, T, R ,T imply the unitarity is, with overall delay of MT1 seconds. of the scattering matrix S given by Let R − (z) and T − (z) be the reflection and transmission responses based on m 1 m 1 − TR m 1 layers. The addition of one more layer will change the responses to Rm(z) and S = RT Tm(z). Using the lattice recursions, we may derive a recursion for these responses:

−1 Bm(z) ρmAm−1(z)+z Bm−1(z) that is, R (z)= = T −1 T m −1 S(z)¯ S(z)= S(z ) S(z)= I (12.13.31) Am(z) Am−1(z)+ρmz Bm−1(z) where I is the 2×2 unit matrix. On the unit circle z = ejωT2 the scattering matrix Dividing numerator and denominator by Am−1(z) we obtain becomes a unitary matrix: S(ω)†S(ω)= I. Component-wise, Eq. (12.13.31) becomes −1 ρm + z Rm−1(z) R (z)= (12.13.28) m −1 TT¯ + RR¯ = T¯ T + R¯ R = 1 , TR¯ + RT¯ = 0 (12.13.32) 1 + ρmz Rm−1(z) 578 12. Linear Prediction 12.13. Dynamic Predictive Deconvolution—Waves in Layered Media 579

Robinson and Treitel’s dynamic predictive deconvolution method [974] of solving equations: the inverse scattering problem is based on the above unitarity equation. In the inverse ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ φ(0)φ(1)φ(2) ··· φ(M) 1 σ2 problem, it is required to extract the set of reflection coefficients from measurements of ⎢ ⎥ ⎢ ⎥ ⎢ M ⎥ ⎢ ··· − ⎥ ⎢ ⎥ ⎢ ⎥ either the reflection response R or the transmission response T. In speech processing it ⎢ φ(1)φ(0)φ(1) φ(M 1) ⎥ ⎢ aM(1) ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ φ(2)φ(1)φ(0) ··· φ(M − 2) ⎥ ⎢ aM(2) ⎥ = ⎢ 0 ⎥ is the transmission response that is available. In geophysical applications, or in studying ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ . . . . ⎥ ⎢ . ⎥ ⎢ . ⎥ the reflectivity properties of thin films, it is the reflection response that is available. The ⎣ . . . . ⎦ ⎣ . ⎦ ⎣ . ⎦ problem of designing terminations of transmission lines also falls in the latter category. φ(M) φ(M − 1) φ(M − 2) ··· φ(0) aM(M) 0 In this case, an appropriate termination is desired that must have a specified reflection (12.13.37) response R(z); for example, to be reflectionless over a wide band of frequencies about which can be solved efficiently using Levinson’s algorithm. Having obtained AM(z) some operating frequency. and noting the BM(z)= AM(z)R(z), the coefficients of the polynomial BM(z) may be The solution of both types of problems follows the same steps. First, from the recovered by convolution: knowledge of the reflection response R(z), or the transmission response T(z), the spectral function of the structure is defined: n bM(n)= aM(n − m)R(m) , n = 0, 1,...,M (12.13.38) σ2 m=0 Φ(z)= 1 − R(z)R(z)¯ = T(z)T(z)¯ = M (12.13.33) A (z)A¯ (z) M M Having obtained both AM(z) and BM(z) and noting that ρM = bM(0), the lattice recursion (12.13.23) may be inverted to recover the polynomials A − (z) and B − (z) This is recognized as the power spectrum of the transmission response, and it is of M 1 M 1 as well as the next reflection coefficient ρ − = b − (0), and so on. The inverse of the the autoregressive type. Thus, linear prediction methods can be used in the solution. M 1 M 1 lattice recursion matrix is In the time domain, the autocorrelation lags φ(k) of the spectral function are ob- tained from the sample autocorrelations of the reflection sequence, or the transmission −1 1 ρ z−1 1 1 −ρ m = m sequence: −1 2 ρm z 1 − ρ −ρmzz φ(k)= δ(k)−C(k)= D(k) (12.13.34) m where C(k) and D(k) are the sample autocorrelations of the reflection and transmission Therefore, the backward recursion becomes: time responses: Am−1(z) 1 1 −ρm Am(z) ρ = b (0), = (12.13.39) m m 2 − C(k)= R(n + k)R(n) , D(k)= T(n + k)T(n) (12.13.35) Bm−1(z) 1 − ρm ρmzz Bm(z) n n In this manner, all the reflection coefficients {ρ0,ρ1,...,ρM} can be extracted. The In practice, only a finite record of the reflection (or transmission) sequence will be computational algorithm is summarized as follows: available, say {R(0), R(1),...,R(N − 1)}. Then, an approximation to C(k) must be used, as follows: 1. Measure R(0), R(1),...,R(N− 1). 2. Select a reasonable value for the number of slabs M. N−1−k + C(k)= R(n + k)R(n) , k = 0, 1,...,M (12.13.36) 3. Compute the M 1 sample autocorrelation lags C(0), C(1), . . . , C(M) of the re- n=0 flection response R(n), using Eq. (12.13.36). 4. Compute φ(k)= δ(k)−C(k), k = 0, 1,...,M. The polynomial AM(z) may be recovered from the knowledge of the first M lags of 5. Using Levinson’s algorithm, solve the normal equations (12.13.37) for the coeffi- the spectral function; that is, {φ(0), φ(1),...,φ(M)}. The determining equations for cients of AM(z). the coefficients of AM(z) are precisely the normal equations of linear prediction. In the present context, they may be derived directly by noting that Φ(z) is a stable spectral 6. Convolve AM(z) with R(z) to find BM(z). density and is already factored into its minimum-phase factors in Eq. (12.13.33). Thus, 7. Compute ρM = bM(0) and iterate the backward recursion (12.13.39) from m = M writing down to m = 0. σ2 Φ(z)A (z)= M M −1 The function dpd is an implementation of the dynamic predictive deconvolution pro- AM(z ) cedure. The inputs to the function are N samples of the reflection response {R(0), R(1), . . . , R(N− it follows that the right-hand side is expandable in positive powers of z; the negative 1)} and the number of layers M. The outputs are the lattice polynomials Ai(z) and powers of z in the left-hand side must be set equal to zero. This gives the normal 580 12. Linear Prediction 12.13. Dynamic Predictive Deconvolution—Waves in Layered Media 581

Bi(z), for i = 0, 1,...,M, arranged in the two lower-triangular matrices A and B whose ⎡ ⎤ 1.0000 0000 rows hold the coefficients of these polynomials; that is, A(i, j)= a (j),or ⎢ ⎥ i ⎢ ⎥ ⎢ 1.0000 0.2500 0 0 0 ⎥ ⎢ ⎥ A = ⎢ 1.0000 0.5000 0.2500 0 0 ⎥ i exact ⎢ ⎥ k R(k) = −j ⎣ 1.0000 0.7500 0.5625 0.2500 0 ⎦ Ai(z) A(i, j)z 00.5000 = 1.0000 1.0000 0.9375 0.6250 0.2500 j 0 10.3750 ⎡ ⎤ 1.0000 0000 20.1875 and similarly for Bi(z). The function invokes the function lev to solve the normal ⎢ ⎥ ⎢ ⎥ 30.0234 ⎢ 1.0000 0.2509 0 0 0 ⎥ equations (12.13.34). The forward scattering problem is implemented by the function ⎢ ⎥ A = ⎢ 1.0000 0.5009 0.2510 0 0 ⎥ 4 −0.0586 scatt, whose inputs are the set of reflection coefficients {ρ ,ρ ,...,ρ } and whose extract ⎢ ⎥ 0 1 M ⎣ 1.0000 0.7509 0.5638 0.2508 0 ⎦ 5 −0.1743 outputs are the lattice polynomials A (z) and B (z), for i = 0, 1,...,M, as well as a i i 1.0000 1.0009 0.9390 0.6263 0.2504 60.1677 pre-specified number N of reflection response samples {R(0), R(1),...,R(N− 1)}.It ⎡ ⎤ 70.0265 0.5000 0000 − utilizes the forward lattice recursion (12.13.23) to obtain the lattice polynomials, and ⎢ ⎥ 8 0.0601 ⎢ ⎥ then computes the reflection response samples by taking the inverse z-transform of ⎢ 0.5000 0.5000 0 0 0 ⎥ 9 −0.0259 ⎢ ⎥ B = ⎢ 0.5000 0.6250 0.5000 0 0 ⎥ Eq. (12.13.27). exact ⎢ ⎥ 10 0.0238 ⎣ 0.5000 0.7500 0.7500 0.5000 0 ⎦ Next, we present a number of deconvolution examples simulated by means of the 11 0.0314 0.5000 0.8750 1.0313 0.8750 0.5000 − functions scatter and dpd. In each case, we specified the five reflection coefficients 12 0.0225 ⎡ ⎤ 13 −0.0153 of a structure consisting of four layers. Using scatter we generated the exact lattice 0.5010 0000 ⎢ ⎥ ⎢ ⎥ 14 0.0109 polynomials whose coefficients are arranged in the matrices A and B, and also generated ⎢ 0.5000 0.5010 0 0 0 ⎥ ⎢ ⎥ 15 0.0097 = B = ⎢ 0.5000 0.6255 0.5010 0 0 ⎥ 16 samples of the reflection response R(n), n 0, 1,...,15. These 16 samples were extract ⎢ ⎥ sent through the dpd function to extract the lattice polynomials A and B. ⎣ 0.5000 0.7505 0.7510 0.5010 0 ⎦ The first figure of each example displays a table of the reflection response samples 0.5000 0.8755 1.0323 0.8764 0.5010 and the exact and extracted polynomials. Note that the first column of the matrix B is the vector of reflection coefficients, according to Eq. (12.13.25). The remaining two Fig. 12.13.3 Reflection response and lattice polynomials. graphs of each example show the reflection response R in the time domain and in the frequency domain. Note that the frequency response is plotted only over one Nyquist time response R(k) frequency response |R(ω)|2 0.6 1 interval [0, 2π/T2], and it is symmetric about the Nyquist frequency π/T2. Figs. 12.13.3 and 12.13.4 correspond to the case of equal reflection coefficients 0.5 0.8 {ρ0,ρ1,ρ2,ρ3,ρ4}={0.5, 0.5, 0.5, 0.5, 0.5}. 0.4 In Figs. 12.13.5 and 12.13.6 the reflection coefficients have been tapered somewhat 0.3 0.6 at the ends (windowed) and are {0.3, 0.4, 0.5, 0.4, 0.3}. Note the effect of tapering on 0.2 the lobes of the reflection frequency response. Figs. 12.13.7 and 12.13.8 correspond 0.1 0.4 to the set of reflection coefficients {0.1, 0.2, 0.3, 0.2, 0.1}. Note the broad band of fre- quencies about the Nyquist frequency for which there is very little reflection. In con- 0 0.2 trast, the example in Figs. 12.13.9 and 12.13.10 exhibits high reflectivity over a broad −0.1 band of frequencies about the Nyquist frequency. Its set of reflection coefficients is −0.2 0 { − − } 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 0.5, 0.5, 0.5, 0.5, 0.5 . time k frequency in cycles/sample In this section we have discussed the inverse problem of unraveling the structure of a medium from the knowledge of its reflection response. The connection of the Fig. 12.13.4 Reflection responses in the time and frequency domains. dynamic predictive deconvolution method to the conventional inverse scattering meth- ods based on the Gelfand-Levitan-Marchenko approach [1054] has been discussed in [1043,1055,1056].‘The lattice recursions characteristic of the wave propagation prob- Cholesky factorization of Toeplitz or near-Toeplitz matrices via the Schur algorithm lem were derived as a direct consequence of the boundary conditions at the interfaces can be cast in a wave propagation model and derived as a simple consequence of energy between media, whereas the lattice recursions of linear prediction were a direct con- conservation [1002]. sequence of the Gram-Schmidt orthogonalization process and the minimization of the prediction-error performance index. Is there a deeper connection between these two problems [1005–1007]? One notable result in this direction has been to show that the 582 12. Linear Prediction 12.13. Dynamic Predictive Deconvolution—Waves in Layered Media 583

⎡ ⎤ ⎡ ⎤ 1.0000 0000 1.0000 0000 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1.0000 0.1200 0 0 0 ⎥ ⎢ 1.0000 0.0200 0 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ A = ⎢ 1.0000 0.3200 0.1500 0 0 ⎥ A = ⎢ 1.0000 0.0800 0.0300 0 0 ⎥ exact ⎢ ⎥ k R(k) exact ⎢ ⎥ k R(k) ⎣ 1.0000 0.5200 0.3340 0.1200 0 ⎦ ⎣ 1.0000 0.1400 0.0712 0.0200 0 ⎦ 00.3000 00.1000 1.0000 0.6400 0.5224 0.2760 0.0900 1.0000 0.1600 0.1028 0.0412 0.0100 10.3640 10.1980 ⎡ ⎤ ⎡ ⎤ 1.0000 0000 20.3385 1.0000 0000 20.2812 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 30.0664 ⎢ ⎥ 30.1445 ⎢ 1.0000 0.1200 0 0 0 ⎥ ⎢ 1.0000 0.0200 0 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ A = ⎢ 1.0000 0.3200 0.1500 0 0 ⎥ 4 −0.0468 A = ⎢ 1.0000 0.0800 0.0300 0 0 ⎥ 40.0388 extract ⎢ ⎥ extract ⎢ ⎥ ⎣ 1.0000 0.5200 0.3340 0.1200 0 ⎦ 5 −0.1309 ⎣ 1.0000 0.1400 0.0712 0.0200 0 ⎦ 5 −0.0346 1.0000 0.6400 0.5224 0.2760 0.0900 60.0594 1.0000 0.1600 0.1028 0.0412 0.0100 6 −0.0072 ⎡ ⎤ 70.0373 ⎡ ⎤ 70.0017 0.3000 0000 − 0.1000 0000 ⎢ ⎥ 8 0.0146 ⎢ ⎥ 80.0015 ⎢ ⎥ ⎢ ⎥ ⎢ 0.4000 0.3000 0 0 0 ⎥ 9 −0.0148 ⎢ 0.2000 0.1000 0 0 0 ⎥ 90.0002 ⎢ ⎥ ⎢ ⎥ B = ⎢ 0.5000 0.4600 0.3000 0 0 ⎥ B = ⎢ 0.3000 0.2060 0.1000 0 0 ⎥ − exact ⎢ ⎥ 10 0.0014 exact ⎢ ⎥ 10 0.0002 ⎣ 0.4000 0.6280 0.5200 0.3000 0 ⎦ 11 0.0075 ⎣ 0.2000 0.3160 0.2120 0.1000 0 ⎦ 11 −0.0001 0.3000 0.5560 0.7282 0.5560 0.3000 12 −0.0001 0.1000 0.2140 0.3231 0.2140 0.1000 12 0.0000 ⎡ ⎤ 13 −0.0029 ⎡ ⎤ 13 0.0000 0.3000 0000 0.1000 0000 ⎢ ⎥ − ⎢ ⎥ ⎢ ⎥ 14 0.0003 ⎢ ⎥ 14 0.0000 ⎢ 0.4000 0.3000 0 0 0 ⎥ ⎢ 0.2000 0.1000 0 0 0 ⎥ ⎢ ⎥ 15 0.0010 ⎢ ⎥ 15 −0.0000 B = ⎢ 0.5000 0.4600 0.3000 0 0 ⎥ B = ⎢ 0.3000 0.2060 0.1000 0 0 ⎥ extract ⎢ ⎥ extract ⎢ ⎥ ⎣ 0.4000 0.6280 0.5200 0.3000 0 ⎦ ⎣ 0.2000 0.3160 0.2120 0.1000 0 ⎦ 0.3000 0.5560 0.7282 0.5560 0.3000 0.1000 0.2140 0.3231 0.2140 0.1000

Fig. 12.13.5 Reflection response and lattice polynomials. Fig. 12.13.7 Reflection response and lattice polynomials.

time response R(k) frequency response |R(ω)|2 time response R(k) frequency response |R(ω)|2 0.4 1 0.3 1

0.25 0.3 0.8 0.8 0.2 0.2 0.15 0.6 0.6 0.1 0.1

0.4 0.05 0.4 0 0 0.2 0.2 −0.1 −0.05

−0.2 0 −0.1 0 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 time k frequency in cycles/sample time k frequency in cycles/sample

Fig. 12.13.6 Reflection responses in the time and frequency domains. Fig. 12.13.8 Reflection responses in the time and frequency domains. 584 12. Linear Prediction 12.14. Least-Squares Waveshaping and Spiking Filters 585

⎡ ⎤ 1.0000 0 0 0 0 ⎢ ⎥ ⎢ − ⎥ ⎢ 1.0000 0.2500 0 0 0 ⎥ ⎢ ⎥ A = ⎢ 1.0000 −0.5000 0.2500 0 0 ⎥ exact ⎢ ⎥ k R(k) 12.14 Least-Squares Waveshaping and Spiking Filters ⎣ 1.0000 −0.7500 0.5625 −0.2500 0 ⎦ 00.5000 1.0000 −1.0000 0.9375 −0.6250 0.2500 1 −0.3750 In linear prediction, the three practical methods of estimating the prediction error filter ⎡ ⎤ 1.0000 0 0 0 0 20.1875 coefficients were all based on replacing the ensemble mean-square minimization crite- ⎢ ⎥ ⎢ − ⎥ 3 −0.0234 ⎢ 1.0000 0.2509 0 0 0 ⎥ rion by a least-squares criterion based on time averages. Similarly, the more general ⎢ ⎥ A = ⎢ 1.0000 −0.5009 0.2510 0 0 ⎥ 4 −0.0586 Wiener filtering problem may be recast in terms of such time averages. A practical for- extract ⎢ ⎥ ⎣ 1.0000 −0.7509 0.5638 −0.2508 0 ⎦ 50.1743 mulation, which is analogous to the Yule-Walker or autocorrelation method, is as follows 1.0000 −1.0009 0.9390 −0.6263 0.2504 60.1677 [974,975,1010,1059]. Given a record of available data ⎡ ⎤ 7 −0.0265 0.5000 0 0 0 0 − ⎢ ⎥ 8 0.0601 y0,y1,...,yN ⎢ − ⎥ ⎢ 0.5000 0.5000 0 0 0 ⎥ 90.0259 ⎢ ⎥ B = ⎢ 0.5000 −0.6250 0.5000 0 0 ⎥ find the best linear FIR filter of order M exact ⎢ ⎥ 10 0.0238 ⎣ −0.5000 0.7500 −0.7500 0.5000 0 ⎦ 11 −0.0314 h ,h ,...,h 0.5000 −0.8750 1.0313 −0.8750 0.5000 12 −0.0225 0 1 M ⎡ ⎤ 13 0.0153 0.5010 0 0 0 0 which reshapes yn into a desired signal xn, specified in terms of the samples: ⎢ ⎥ ⎢ − ⎥ 14 0.0109 ⎢ 0.5000 0.5010 0 0 0 ⎥ ⎢ ⎥ 15 −0.0097 x0,x1,...,xN+M B = ⎢ 0.5000 −0.6255 0.5010 0 0 ⎥ extract ⎢ ⎥ − − ⎣ 0.5000 0.7505 0.7510 0.5010 0 ⎦ where for consistency of convolution, we assumed we know N + M + 1 samples of the 0.5000 −0.8755 1.0323 −0.8764 0.5010 desired signal. The actual convolution output of the waveshaping filter will be:

min(n,M) Fig. 12.13.9 Reflection response and lattice polynomials. xˆn = hmxn−m , 0 ≤ n ≤ N + M (12.14.1) m=max(0,n−N) time response R(k) frequency response |R(ω)|2 0.6 1 and the estimation error:

0.4 en = xn − xˆn , 0 ≤ n ≤ N + M (12.14.2) 0.8

0.2 As the optimality criterion, we choose the least-squares criterion: 0.6 N+M 0 E = 2 = en min (12.14.3) 0.4 n=0 −0.2 The optimal filter weights hm are selected to minimize E. It is convenient to recast 0.2 −0.4 the above in a compact matrix form. Define the (N + M + 1)×(M + 1) convolution data matrix Y, the (M + 1)×1 vector of filter weights h, the (N + M + 1)×1 vector of desired −0.6 0 0 2 4 6 8 10 12 14 0 0.2 0.4 0.6 0.8 1 samples x, (and estimates xˆ and estimation errors e), as follows: time k frequency in cycles/sample ⎡ ⎤ y 00··· 0 ⎢ 0 ⎥ Fig. 12.13.10 Reflection responses in the time and frequency domains. ⎢ ··· ⎥ ⎢ y1 y0 0 0 ⎥ ⎢ ⎥ ⎢ y2 y1 y0 ··· 0 ⎥ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ . . . . ⎥ h0 x0 ⎢ . . . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ . . . . ⎥ ⎢ h ⎥ ⎢ x ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ Y = ⎢ yN yN−1 yN−2 ··· yN−M ⎥ , h = ⎢ ⎥ , x = ⎢ ⎥ (12.14.4) ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ ··· ⎥ ⎣ . ⎦ ⎣ . ⎦ ⎢ 0 yN yN−1 yN−M+1 ⎥ ⎢ ⎥ h x + ⎢ 00 yN ··· yN−M+2 ⎥ M N M ⎢ ⎥ ⎢ . . . . ⎥ ⎣ . . . . ⎦ 00 0 ··· yN 586 12. Linear Prediction 12.14. Least-Squares Waveshaping and Spiking Filters 587

Equations (12.14.1) through (12.14.3) now become applications. More generally. the vector x may be chosen to be any one of the unit vectors ⎡ ⎤ xˆ = Yh , e = x − xˆ , E = eTe (12.14.5) 0 ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ E ⎢ . ⎥ Minimizing with respect to the weight vector h results in the orthogonality equations: ⎢ ⎥ ⎢ 0 ⎥ ⎢ ⎥ T = T − = = = ⎢ ⎥ ← = + Y e Y (x Yh) 0 (12.14.6) x ui ⎢ 1 ⎥ ith slot ,i 0, 1,...,N M (12.14.13) ⎢ ⎥ ⎢ 0 ⎥ ⎢ ⎥ which are equivalent to the normal equations: ⎢ . ⎥ ⎣ . ⎦ YTYh = YTx (12.14.7) 0 which corresponds to a unit impulse occurring at the ith time instant instead of at the Solving for h,wefind − − origin; that is, x = δ(n − i). The actual output from the spiking filter is given by h = (YTY) 1YTx = R 1r (12.14.8) n where the quantities xˆ = Px = Pui = ith column of P (12.14.14) R = YTY, r = YTx (12.14.9) Thus, the ith column of the matrix P is the output of the ith spiking filter which may be recognized (see Sec. 1.11) as the (M+1)×(M+1) autocorrelation matrix formed attempts to compress yn into a spike with i delays. The corresponding ith filter is by the sample autocorrelations Rˆ (0), Rˆ (1),...Rˆ (M) of y , and as the (M +1)×1 −1 T yy yy yy n h = R Y ui. Therefore, the columns of the matrix vector of sample cross-correlations Rˆxy(0), Rˆxy(1),...Rˆxy(M) between the desired and −1 T T −1 T the available vectors xn and yn. We have already used this expression for the weight H = R Y = (Y Y) Y (12.14.15) vector h in the example of Sec. 12.11. Here we have justified it in terms of the least- squares criterion (12.14.3). The function firw may be used to solve for the weights are all the optimal spiking filters. The estimation error of the ith filter is (12.14.8) and, if so desired, to give the corresponding lattice realization. The actual E = T − = − filter output xˆ is expressed as i ui (I P)ui 1 Pii (12.14.16)

− xˆ = Yh = YR 1YTx = Px (12.14.10) where Pii, is the ith diagonal element of P. Since the delay i may be positioned anywhere from i = 0toi = N + M, there are N + M + 1 such spiking filters, each with error Ei. where Among these, there will be one that has the optimal delay i which corresponds to the −1 T T −1 T P = YR Y = Y(Y Y) Y (12.14.11) smallest of the Eis; or, equivalently, to the maximum of the diagonal elements Pii. The design procedure for least-squares spiking filters for a given finite signal y , The error vector becomes e = (I − P)x. The “performance” matrix P is a projection n n = 0, 1,...,N− 1 is summarized as follows: matrix, and thus, so is (I − P). Then, the error square becomes 1. Compute R = YTY. E = eTe = xT(I − P)2x = xT(I − P)x (12.14.12) 2. Compute the inverse R−1 (preferably by the Levinson recursion). − The (N + M + 1)×(N + M + 1) matrix P has trace equal to M + 1, as can be checked 3. Compute H = R 1YT = all the spiking filters. easily. Since its eigenvalues as a projection matrix are either 0 or 1, it follows that in 4. Compute P = YH = YR−1YT = all spiking filter outputs. order for the sum of all the eigenvalues (the trace) to be equal to M + 1, there must 5. Select that column i of P for which Pii is the largest. necessarily be M + 1 eigenvalues that are equal to 1, and N eigenvalues equal to 0. − Therefore, the matrix P has rank M + 1, and if the desired vector x is selected to be any If the Levinson-Cholesky algorithm is used to compute the inverse R 1, this design of the M + 1 eigenvectors belonging to eigenvalue 1, the corresponding estimation error procedure becomes fairly efficient. An implementation of the procedure is given by the will be zero. function spike. The inputs to the function are the N + 1 samples {y0,y1,...,yN}, the Among all possible waveshapes that may be chosen for the desired vector x,of desired order M of the spiking filter, and a so-called “prewhitening” or Backus-Gilbert particular importance are the spikes, or impulses. In this case, x is a unit impulse, say parameter , which will be explained below. The outputs of the function are the matrices at the origin; that is, xn = δn. The convolution xˆn = hn ∗ yn of the corresponding filter P and H. with yn is the best least-squares approximation to the unit impulse. In other words, hn is To explain the role of the parameter , let us go back to the waveshaping problem. the best least-squares inverse filter to yn that attempts to reshape, or compress, yn into When the data sequence yn to be reshaped into xn is inaccurately known—if, for example, a unit impulse. Such least squares inverse filters are used extensively in deconvolution it has been contaminated by white noise vn—the least-squares minimization criterion 588 12. Linear Prediction 12.14. Least-Squares Waveshaping and Spiking Filters 589

(12.14.3) can be extended slightly to accomplish the double task of (1) producing the away,” the resulting reflection sequence xn may be subjected to the dynamic predictive best estimate of xn and (2) reducing the noise at the output of the filter hn as much as deconvolution procedure to unravel the earth structure. Or, fn may represent the im- possible. pulse response of a channel, or a magnetic recording medium, which broadens and blurs The input to the filter is the noisy sequence yn + vn and its output is hn ∗ yn + hn ∗ (intersymbol interference) the desired message xn. vn = xˆn + un, where we set un = hn ∗ vn. The term un represents the filtered noise. The least-squares inverse spiking filters offer a way to solve the deconvolution prob- The minimization criterion (12.14.3) may be replaced by lem: Simply design a least-squares spiking filter hn corresponding to the blurring func- tion fn; that is, hn ∗ fn  δn, in the least-squares sense. Then, filtering yn through hn E = e2 + λE[u2 ]= min (12.14.17) n n will recover the desired signal xn: n = ∗ = ∗ ∗  ∗ = where λ is a positive parameter which can be chosen by the user. Large λ emphasizes xˆn hn yn (hn fn) xn δn xn xn (12.14.21) large reduction of the output noise, but this is done at the expense of resolution; that is, If the ith spiking filter is used, which compresses f into an impulse with i delays, at the expense of obtaining a very good estimate. On the other hand, small λ emphasizes n h ∗ f  δ(n − i), then the desired signal x will be recovered with a delay of i units higher resolution but with lesser noise reduction. This tradeoff between resolution and n n n of time. noise reduction is the basic property of this performance index. Assuming that vn is 2 This and all other approaches to deconvolution work well when the data yn are not white with variance σv, we have noisy. In presence of noise, Eq. (12.14.20) becomes M 2 = 2 2 = 2 T = ∗ + E[un] σv hn σv h h yn fn xn vn (12.14.22) n=0 2 where vn may be assumed to be zero-mean white noise of variance σ . Even if the Thus, Eq. (12.14.17) may be written as v blurring function fn is known exactly and a good least-squares inverse filter hn can be E = T + 2 T = designed, the presence of the noise term can distort the deconvolved signal beyond e e λσv h h min (12.14.18) recognition. This may be explained as follows. Filtering yn through the inverse filter hn Its minimization with respect to h gives the normal equations: results in hn ∗ yn = (hn ∗ fn)∗xn + hn ∗ vn  xn + un T + 2 = T (Y Y λσv I)h Y x (12.14.19) where un = hn ∗ vn is the filtered noise. Its variance is from which it is evident that the diagonal of YTY is shifted by an amount λσ2; that is, v M 2 = 2 T = 2 2 λσ2 E[un] σv h h σv hn ˆ −→ ˆ + 2 ≡ + ˆ = v = Ryy(0) Ryy(0) λσv (1 )Ryy(0),  n 0 Rˆyy(0) which, depending on the particular shape of hn may be much larger than the original In practice,  may be taken to be a few percent or less. It is evident from Eq. (12.14.19) 2 variance σv. This happens, for example, when fn consists mainly of low frequencies. that one beneficial effect of the parameter  is the stabilization of the inverse of the For h to compress f into a spike with a high frequency content, the impulse response T + 2 n n matrix Y Y λσv I. T hn itself must be very spiky, which can result in values for h h which are greater than The main usage of spiking filters is in deconvolution problems [59,60,95,144–146], one. where the desired and the available signals x and y are related to each other by the n n To combat the effects of noise, the least-squares design criterion for h must be convolutional relationship changed by adding to it a term λE[u2 ] as was done in Eq. (12.14.17). The modified n design criterion is then yn = fn ∗ xn = fmxn−m (12.14.20) m M E = − ∗ 2+ 2 2 where fn is a “blurring” function which is assumed to be approximately known. The ba- (δn hn fn) λσv hn n n=0 sic deconvolution problem is to recover xn from yn if fn is known. For example, yn may represent the image of an object xn recorded through an optical system with a point- which effectively amounts to changing the autocorrelation lag Rˆff(0) into (1+)Rˆff(0). spread function f . Or, y might represent the recorded seismic trace arising from the n n The first term in this performance index tries to produce a good inverse filter; the second excitation of the layered earth by an impulsive waveform f (the source wavelet) which n term tries to minimize the output power of the noise after filtering by the deconvolu- is convolved with the reflection impulse response xn of the earth (in the previous sec- tion filter hn. Note that conceptually this index is somewhat different from that of tion xn was denoted by Rn.) If the effect of the source wavelet fn can be “deconvolved 590 12. Linear Prediction 12.14. Least-Squares Waveshaping and Spiking Filters 591

Eq. (12.14.17), because now v represents the noise in the data y whereas there v signal to be spiked impulse response of spiking filter n n n 1 20 represented inaccuracies in the knowledge of the wavelet fn. spiking delay = 25 15 In this approach to deconvolution we are not attempting to determine the best least- 0.8 10 squares estimate of the desired signal xn, but rather the best least-squares inverse to 0.6 the blurring function fn. If the second order statistics of xn were known, we could, of 5 0.4 course, determine the optimal (Wiener) estimate xˆn of xn. This is also done in many 0 applications. 0.2 −5 The performance of the spiking filters and their usage in deconvolution are illus- 0 −10 trated by the following example: The blurring function fn to be spiked was chosen as ⎧ −0.2 −15 ⎨ g(n − 25), n = 0, 1,...,65 = −0.4 −20 fn ⎩ 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0, for other n time samples time samples where g(k) was the “gaussian hat” function: Fig. 12.14.1 Spiking filter and its inverse.

2 g(k)= cos(0.15k)exp(−0.004k ) spiked signal, spiking delay = 25 performance index P(k,k) 1 100

The signal xn to be recovered was taken to be the series of delayed spikes: 0.8 spiked signal original 80 9 0.6 = − xn aiδ(n ni) 60 0.4 i=0

0.2 percent where the amplitudes ai and delays ni were chosen as 40 0 ai = 1, 0.8, 0.5, 0.95, 0.7, 0.5, 0.3, 0.9, 0.5, 0.85 20 −0.2 ni = 25, 50, 60, 70, 80, 90, 100, 120, 140, 160 −0.4 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 for i = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. time samples spiking delay position k Fig. 12.14.1 shows the signal fn to be spiked. Since the gaussian hat is symmetric about the origin, we chose the spiking delay to be at i = 25. The order of the spiking Fig. 12.14.2 Deconvolved signal and performance index. filter hn was M = 50. The right graph in Fig. 12.14.1 shows the impulse response hn versus time. Note the spiky nature of hn which is required here because fn has a fairly The approximate signal fn is shown in Fig. 12.14.4. The spiking filter was designed low frequency content. Fig. 12.14.2 shows the results of the convolution h ∗ f , which n n on the basis of f rather than fn. The result of filtering the same composite signal yn − n is the best least-squares approximation to the impulse δ(n 25). through the corresponding inverse filter is shown on the right in Fig. 12.14.4. The delays The “goodness” of the spiking filter is judged by the diagonal entries of the per- and amplitudes ai and ni are not well resolved, but the basic nature of xn can still be = formance matrix P, according to Eq. (12.14.16). For the chosen delay k 25, we find seen. Inspecting Fig. 12.14.1 we note the large spikes that are present in the impulse P(25, 25)= 0.97. To obtain a better picture of the overall performance of the spiking response hn of the inverse filter; these can cause the amplification of any additive noise filters, on the right in Fig. 12.14.2 we have plotted the diagonal elements P(k, k) versus T component. Indeed, the noise reduction ratio of the filter hn is h h = 612, thus it will = k. It is seen that the chosen delay k 25 is nearly optimal. Fig. 12.14.3 shows the tend to amplify even small amounts of noise. composite signal y obtained by convolving f and x , according to Eq. (12.14.20). n n n To study the effect of noise, we have added a noise term vn, as in Eq. (12.14.22), with Fig. 12.14.3 shows on the right the deconvolved signal xn according to Eq. (12.14.21). −4 variance equal to 10 (this corresponds to just 1% of the amplitude a0); the composite The recovery of the amplitudes a and delays n of x is very accurate. These results i i n signal yn is shown on the left in Fig. 12.14.5. One can barely see the noise. Yet, after represent the idealistic case of noise-free data y and perfect knowledge of the blurring n filtering with the inverse filter hn of Fig. 12.14.1, the noise component is amplified to function f . To study the sensitivity of the deconvolution technique to inaccuracies in n a great extent. The result of deconvolving the noisy yn with hn is shown on the right the knowledge of the signal fn we have added a small high frequency perturbation on in Fig. 12.14.5. To reduce the effects of noise, the prewhitening parameter  must be fn as follows: chosen to be nonzero. Even a small nonzero value of  can have a beneficial effect. The = + − fn fn 0.05 sin 1.5(n 25) 592 12. Linear Prediction 12.14. Least-Squares Waveshaping and Spiking Filters 593

composite signal deconvolution of composite signal composite signal plus noise deconvolution of noisy data 1 1 1 1 deconvolved ε = 0 0.8 0.8 composite 0.8 0.8

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

0 0 0 0

−0.2 −0.2 −0.2 −0.2

−0.4 −0.4 −0.4 −0.4 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 time samples time samples time samples time samples

Fig. 12.14.3 Composite and deconvolved signal. Fig. 12.14.5 Composite and deconvolved signal.

approximate signal deconvolution based on approximate signal deconvolution of noisy data deconvolution of noisy data 1 1 1 1 deconvolved ε = 0.0001 ε = 0.001 0.8 0.8 composite 0.8 0.8

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

0 0 0 0

−0.2 −0.2 −0.2 −0.2

−0.4 −0.4 −0.4 −0.4 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 time samples time samples time samples time samples

Fig. 12.14.4 Composite and deconvolved signal. Fig. 12.14.6 Deconvolved signals.

graphs in Fig. 12.14.6 show the deconvolved signal xn when the filter hn was designed replicas of fn. with the choices  = 0.0001 and  = 0.001, respectively. Note the trade-off between the 4. In the presence of noise in the data yn to deconvolved, some improvement may noise reduction and the loss of resolution in the recovered spikes of xn. result by introducing a nonzero value for the prewhitening parameter , where Based on the studies of Robinson and Treitel [974], Oldenburg [1065], and others, effectively the sample autocorrelation Rff(0) is replaced by (1 + )Rˆff(0). The the following summary of the use of the above deconvolution method may be made: trade-off is a resulting loss of resolution. 1. If the signal f to be spiked is a minimum-phase signal, the optimal spiking delay n The deconvolution problem of Eq. (12.14.20) and (12.14.22) has been approached by must be chosen at the origin i = 0. The optimality of this choice is not actually a wide variety of other methods. Typically, a finite number of samples yn, n = 0, 1,...,N seen until the filter order M is sufficiently high. The reason for this choice has to T is available. Collecting these into a vector y = [y0,y1,...,yN] , we write Eq. (12.14.22) do with the minimum-delay property of such signals which implies that most of in an obvious vectorial form their energy is concentrated at the beginning, therefore, they may be more easily y = Fx + v (12.14.23) compressed to spikes with zero delay. Instead of determining an approximate inverse filter for the blurring function F,an 2. If fn is a mixed-delay signal, as in the above example, then the optimal spiking delay will have some intermediate value. alternative method is to attempt to determine the best—in some sense—vector x which is compatible with these equations. A popular method is based on the least-squares 3. Even if the shape of fn is not accurately known, the deconvolution procedure based on the approximate fn might have some partial success in deconvolving the 594 12. Linear Prediction 12.15. Computer Project – ARIMA Modeling 595 criterion [1060]. Y = load(’airline.dat’); % in OSP data file folder N Y = Y’; Y = Y(:); % concatenate rows E = 2 = T = − T − = vn v v (y Fx) (y Fx) min (12.14.24) y = log(Y); % log data n=0 The data represent monthly airline passengers for the period Jan. 1949 to Dec. 1960. That is, x is chosen so as to minimize E. Setting the derivative with respect to x to There are N = 144 data points. In this experiment, we will work with a subset of the zero gives the standard least-squares solution first n0 = 108 points, that is, yn,0≤ n ≤ n0 − 1, and attempt to predict the future 36 − xˆ = (FTF) 1FTy months of data (108 + 36 = 144.) A prewhitening term can be added to the right of the performance index to stabilize a. Plot Yn and the log-data yn = ln Yn versus n and note the yearly periodicity. Note the indicated inverse how the log-transformation tends to equalize the apparent increasing amplitude of T T E = v v + λx x the original data. with solution xˆ = (FTF +λI)−1FTy. Another approach that has been used with success b. Compute and plot the normalized sample ACF, ρ = R(k)/R(0), of the zero-mean is based on the L -norm criterion k 1 log-data for lags 0 ≤ k ≤ 40 and note the peaks at multiples of 12 months. N = − −1 − −12 E = |vn|=min (12.14.25) c. Let xn (1 Z )(1 Z )yn in the model of Eq. (12.15.1). The signal xn follows n=0 an MA model with spectral density:

This quantity is referred to as the L1 norm of the vector v. The minimization of this 2 −1 −12 12 Sxx(z)= σ (1 − bz )(1 − bz)(1 − Bz )(1 − Bz ) norm with respect to x may be formulated as a linear programming problem [1063– ε 1073]. It has been observed that this method performs very well in the presence of Multiply the factors out and perform an inverse z-transform to determine the auto- noise, and it tends to ignore a few “bad” data points—that is, those for which the noise correlation lags Rxx(k). Show in particular, that value vn might be abnormally high—in favor of the good points, whereas the standard Rxx(1) b Rxx(12) B least-squares method based on the L2-norm (12.14.24) will spend all its efforts trying =− , =− (12.15.3) 2 2 to minimize the few large terms in the sum (12.14.24), and might not result in as good Rxx(0) 1 + b Rxx(0) 1 + B an estimate of x as it would if the few bad data points were to be ignored. We discuss Filter the subblock y , 0 ≤ n ≤ n − 1 through the filter (1 − z−1)(1 − z−12) to this approach further in Sec. 15.11 n 0 determine x . You may discard the first 13 transient outputs of x . Use the rest Another class of deconvolution methods are iterative methods, reviewed in [1074]. n n of x to calculate its sample ACF and apply Eq. (12.15.3) to solve for the model Such methods, like the linear programming method mentioned above, offer the addi- n parameters b, B. tional option of enforcing priori constraints that may be known to be satisfied by x, for example, positivity, band-limiting, or time-limiting constraints. The imposition of such This simple method gives values that are fairly close to the Box/Jenkins values deter- constraints can improve the restoration process dramatically. mined by maximum likelihood methods, namely, b = 0.4 and B = 0.6, see Ref. [22].

d. Consider, next, the model of Eq. (12.15.2) with a second-order AR model i.e., A(z) = = − −12 = − 12.15 Computer Project – ARIMA Modeling filter order of p 2. Define xn (1 Z )yn yn yn−12 and denote its auto- correlation function by Rk. Since xn follows an AR(2) model, its model parameters 2 The Box-Jenkins airline data set has served as a benchmark in testing seasonal ARIMA a1,a2,σε can be computed from: models. In particular, it has led to the popular “airline model”, which, for monthly data −1 a R R R with yearly periodicity, is defined by the following innovations signal model: 1 =− 0 1 1 2 = + + ,σε R0 a1R1 a2R2 (12.15.4) a2 R1 R0 R2 −1 −12 −1 −12 (1 − Z )(1 − Z )yn = (1 − bZ )(1 − BZ )εn (12.15.1) Using the data subset yn, 0 ≤ n ≤ n0 − 1, calculate the signal xn and discard the −1 where Z denotes the delay operator and b, B are constants such that |b| < 1 and first 12 transient samples. Using the rest of xn, compute its sample ACF, Rˆk for |B| < 1. In this experiment, we briefly consider this model, but then replace it with the 0 ≤ k ≤ M with M = 40, and use the first 3 computed lags Rˆ0, Rˆ1, Rˆ2 in Eqs. (12.15.4) 2 following ARIMA model, to estimate a1,a2,σε (in computing the ACF, the signal xn need not be replaced by its zero-mean version). −12 1 −1 −2 −p (1 − Z )yn = εn , A(z)= 1 + a1z + a2z +···+apz (12.15.2) Because of the assumed autoregressive model, it is possible to calculate all the au- A(Z) tocorrelation lags Rk for k ≥ p + 1 from the first p + 1 lags. This can accomplished The airline data set can be loaded with the MATLAB commands: by the MATLAB “autocorrelation sequence extension” function: 596 12. Linear Prediction 12.15. Computer Project – ARIMA Modeling 597

M=40; Rext = acext(R(1:p+1), zeros(1,M-p)); Consider the following model for the first n0 points of the log data yn that assumes a quadratic trend and 12-month, 6-month, and 4-month periodicities, i.e., three har- ≤ ≤ − On the same graph, plot the sample and extended ACFs, R(k)ˆ and Rext(k) versus monics of the fundamental frequency of 1/12 cycle/month, for 0 n n0 1, 0 ≤ k ≤ M normalized by their lag-0 values. 2 2πn 2πn yˆn = c0 + c1n + c2n +c3 sin + c4 cos + e. Let xˆ denote the estimate/prediction of x based on the data subset {x ,m ≤ 12 12 n n m n0 − 1}. Clearly, xˆn = xn,ifn ≤ n0 − 1. Writing Eq. (12.15.2) recursively, we obtain 4πn 4πn +c5 sin + c6 cos (12.15.7) for the predicted values into the future beyond n0: 12 12 =− + + 6πn 6πn xn a1xn−1 a2xn−2 εn +c7 sin + c8 cos (12.15.5) 12 12 xˆn =− a1xˆn−1 + a2xˆn−2 , for n ≥ n0 Carry out a least-squares fit to determine the nine coefficients ci, that is, determine

where we set εˆn = 0 because all the observations are in the strict past of εn when the ci that minimize the quadratic performance index, ≥ n n0. − n0 1 2 Calculate the predicted values 36 steps into the future, xˆn for n0 ≤ n ≤ N − 1, using J = yn − yˆn = min (12.15.8) n=0 the fact that xˆn = xn,ifn ≤ n0 − 1. Once you have the predicted xn’s, you can calculate the predicted yn’s by the recursive equation: In addition, carry out a fit based on the L1 criterion, yˆ = yˆ − + xˆ (12.15.6) n n 12 n − n0 1    J = yn − yˆn = min (12.15.9) where you must use yˆn = yn,ifn ≤ n0 − 1. Compute yˆn for n0 ≤ n ≤ N − 1, and n=0 plot it on the same graph with the original data yn,0≤ n ≤ N − 1. Indicate the start of the prediction horizon with a vertical line at n = n0 (see example graph at end.) Show that the coefficients are, for the two criteria, = f. Repeat parts (d,e) using an AR(4) model (i.e., p 4), with signal model equations: L2 L1 c0 4.763415 4.764905 x = y − y − ,x=− a x − + a x − + a x − + a x − + ε n n n 12 n 1 n 1 2 n 2 3 n 3 4 n 4 n c1 0.012010 0.011912 c2 −0.000008 −0.000006 and normal equations: c3 0.036177 0.026981 c −0.135907 −0.131071 ⎡ ⎤ ⎡ ⎤−1 ⎡ ⎤ 4 a R R R R R ⎢ 1 ⎥ ⎢ 0 1 2 3 ⎥ ⎢ 1 ⎥ c5 0.063025 0.067964 ⎢ a ⎥ ⎢ R R R R ⎥ ⎢ R ⎥ ⎢ 2 ⎥ =−⎢ 1 0 1 2 ⎥ ⎢ 2 ⎥ 2 = + + + + c6 0.051890 0.051611 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ,σε R0 a1R1 a2R2 a3R3 a4R4 a3 R2 R1 R0 R1 R3 c7 −0.025511 −0.017619 a4 R3 R2 R1 R0 R4 c8 −0.009168 −0.012753

with predictions: Using the found coefficients for each criterion, evaluate the fitted quantity yˆn of Eq. (12.15.7) over the entire data record, 0 ≤ n ≤ N − 1, and plot it on the same xˆ =− a xˆ − + a xˆ − + a xˆ − + a xˆ − n 1 n 1 2 n 2 3 n 3 4 n 4 graph together with the actual data, and with the AR(4) prediction. It appears that

yˆn = yˆn−12 + xˆn this approach also works well but over a shorter prediction interval, i.e., 24 months instead of 36 for the AR(4) method.

taking into account the properties that xˆn = xn and yˆn = yn,ifn ≤ n0 − 1. Some example graphs for this experiment are included below. g. Inspecting the log-data, it is evident that there is an almost linear trend as well as a twelve-month, or even, a six-month periodicity. In this part, we will attempt to fit the the data using a conventional basis-functions method and then use this model to predict the last 36 months. 598 12. Linear Prediction 12.16. Problems 599

airline data airline log data AR(4) and fitting models 700 6.5 6.5 data AR(4) 600 fit 6 6 500

400 5.5 5.5

300 5 5 200

100 4.5 0 12 24 36 48 60 72 84 96 108 120 132 144 0 12 24 36 48 60 72 84 96 108 120 132 144 4.5 months months 0 12 24 36 48 60 72 84 96 108 120 132 144 months

/ R AR(2) model ACF of log data, Rk 0 1 6.5 data 12.16 Problems 0.9 prediction

0.8 6 12.1 (a) Following the methods of Sec. 12.1, show that the optimal filter for predicting D steps 0.7 into the future—i.e., estimating y(n + D) on the basis of {y(m); m ≤ n}—is given by

0.6 5.5 1 D H(z)= z B(z) + 0.5 B(z)

0.4 (b) Express zDB(z) in terms of B(z) itself and the first D − 1 impulse response coef- 5 + 0.3 ficients bm, m = 1, 2,...,D− 1ofB(z). 0.2 (c) For the two random signals yn defined in Examples 12.1.1 and 12.1.2, find the optimal 0.1 4.5 = = 0 12 24 36 0 12 24 36 48 60 72 84 96 108 120 132 144 prediction filters for D 2 and D 3, and write the corresponding I/O equations. lag, k months 12.2 Consider the order-p autoregressive sequence yn defined by the difference equation (12.2.3). Show that a direct consequence of this difference equation is that the projection of yn onto AR(4) model ACF of deseasonalized data, R / R k 0 the subspace spanned by the entire past {yn−i;1≤ i<∞} is the same as the projection of 6.5 1 y onto the subspace spanned only by the past p samples {y − ;1≤ i ≤ p}. data sample ACF n n i prediction AR(4) ACF 0.9 12.3 (a) Show that the performance index (12.3.2) may be written as 6 E = E[e2 ]= aTRa 0.8 n

where a = [1,a ,...,a ]T is the order-p prediction-error filter, and R the autocorre- 5.5 0.7 1 p lation matrix of yn; that is, Rij = E[yn−iyn−j]. 0.6 (b) Derive Eq. (12.3.7) by minimizing the index E with respect to the weights a, subject 5 to the linear constraint that a0 = 1, and incorporating this constraint by means of a 0.5 Lagrange multiplier.

4.5 0.4 12.4 Take the inverse z-transform of Eq. (12.3.17) and compare the resulting equation with Eq. (12.3.15). 0 12 24 36 48 60 72 84 96 108 120 132 144 0 12 24 36 months lag, k 12.5 Verify that Eq. (12.3.22) and (12.3.23) are inverses of each other. 12.6 A fourth order all-pole random signal process y(n) is represented by the following set of signal model parameters (reflection coefficients and input variance):

{ 2}={ − − } γ1,γ2,γ3,γ4,σ 0.5, 0.5, 0.5, 0.5, 40.5

(a) Using the Levinson recursion, find the prediction error filter A4(z). 600 12. Linear Prediction 12.16. Problems 601

2 = (b) Determine σy Ryy(0). Using intermediate results from part (a), determine the au- (b) With the help of the LU factorization (12.9.1), show that ratio of the determinants of tocorrelation lags Ryy(k), k = 1, 2, 3, 4. an order M autocorrelation matrix and its order p(p

12.7 The first five lags of the autocorrelation function of a fourth-order autoregressive random "M det RM = Ei sequence y(n) are det R p i=p+1 {R(0), R(1), R(2), R(3), R(4)}={256, 128, −32, −16, 22} (c) Consider all possible autocorrelation extensions of the set {R(0), R(1), . . . , R(p)} Determine the best prediction-error filters and the corresponding mean-square errors of up to order M. For gaussian processes, use the results in parts (a) and (b) to show that = = + orders p = 1, 2, 3, 4 by using Levinson’s algorithm in matrix form. the particular extension defined by the choice γi 0, i p 1,...,M maximizes the entropy of the order-M process; hence, the name maximum entropy extension. 12.8 The fourth-order prediction-error filter and mean-square prediction error of a random signal T = have been determined to be 12.17 Consider the LU factorization LRL D of an order-M autocorrelation matrix R. Denote T = by bp , p 0, 1,...,M the rows of L. They are the backward prediction filters with zeros −1 −2 −3 −4 A4(z)= 1 − 1.25z + 1.3125z − z + 0.5z ,E4 = 0.81 padded to their ends to make them (M + 1)-dimensional vectors.

−1 T −1 Using the function rlev, determine the autocorrelation lags R(k),0≤ k ≤ 4, the four (a) Show that the inverse factorization R = L D L can be written as

reflection coefficients, and all the lower order prediction-error filters. M 1 −1 = T 12.9 Verify the results of Example 12.3.1 using the functions lev, frwlev, bkwlev, and rlev,as R bpbp Ep required. p=0

12.10 (a) Given the five signal samples (b) Define the “phasing” vectors s(z)= [1,z−1,z−2, ..., z−M]T. Show that the z-transform of an order-M filter and its inverse can be expressed compactly as {y ,y ,y ,y ,y }={1, −1, 1, −1, 1} 0 1 2 3 4  dz = T = −1 compute the corresponding sample autocorrelation lags R(k)ˆ , k = 0, 1, 2, 3, 4, and A(z) s(z) a , a A(z)s(z ) u.c 2πjz send them through the function lev to determine the fourth-order prediction error −1 filter A4(z). (c) Define the “kernel” vector k(w)= R s(w). The z-transform of this vector is called a (b) Predict the sixth sample in this sequence. reproducing kernel [972,973,981]. Show that it can be written in the alternative forms

(c) Repeat (a) and (b) for the sequence of samples {1, 2, 3, 4, 5}. K(z, w)= s(z)Tk(w)= k(z)Ts(w)= k(z)TR k(w)= s(z)TR−1s(w) 12.11 Find the infinite autoregressive or maximum-entropy extension of the two autocorrelation −M −1 sequences (d) Let J denote the (M + 1)×(M + 1) reversing matrix. Show that Js(z)= z s(z ). And that K(z, w)= z−Mw−MK(z−1,w−1). (a) {R(0), R(1)}={1, 0.5} (e) Show that K(z, w) admits the following representations in terms of the backward and { }={ } (b) R(0), R(1), R(2) 4, 0, 1 forward prediction polynomials

In both cases, determine the corresponding power spectrum density Syy(z) and from it M M 1 1 −(M−p) −(M−p) calculate the R(k) for all lags k. K(z, w)= Bp(z)Bp(w)= Ap(z)Ap(w)z w = Ep = Ep 12.12 Write Eq. (12.3.24) for order p + 1. Derive Eq. (12.5.1) from Eq. (12.3.24) by replacing the p 0 p 0

filter ap+1 in terms of the filter ap via the Levinson recursion. 12.18 Let Syy(z) be the power spectral density of the autocorrelation function R(k) from which 12.13 Do Problem 12.7 using the split Levinson algorithm. we build the matrix R of the previous problem. Show that R and R−1 admit the following representations in terms of the phasing and kernel vectors: 12.14 Draw the lattice realization of the analysis and synthesis filters A4(a) and 1/A4(z) obtained in Problems 12.6, 12.7, and 12.8.   −1 T dz −1 −1 T dz R = Syy(z)s(z )s(z) ,R = Syy(z)k(z )k(z) 12.15 Test the minimum-phase property of the two polynomials u.c. 2πjz u.c. 2πjz

−1 −2 −3 −4 A(z) = 1 − 1.08z + 0.13z + 0.24z − 0.5z Then, show the reproducing kernel property  A(z) = 1 + 0.18z−1 − 0.122z−2 − 0.39z−3 − 0.5z−4 −1 du K(z, w)= K(z, u )K(w, u)Syy(u) u.c. 2πju 12.16 (a) The entropy of an M-dimensional random vector is defined by S =− p(y)ln p(y)dMy. Show that the entropy of a zero-mean gaussian y with covariance matrix R is given, = 1 up to an additive constant, by S 2 ln(det R). 602 12. Linear Prediction 12.16. Problems 603

= −1 −2 −p T −1 12.19 (a) Let sp(z) [1,z ,z , ..., z ] . Using the order-updating formulas for Rp show 12.25 Consider a generalized version of the simulation example discussed in Sec. 12.11, defined = −1 that the kernel vector kp(w) Rp sp(w) satisfies the following order-recursive equa- by tions x(n)= s(n)+v1(n) , y(n)= v2(n) k − (w) 1 0 1 where = p 1 + = + kp(w) bpBp(w) , kp(w) −1 apAp(w) s(n) = sin(ω0n + φ) 0 Ep w kp−1(w) Ep

v1(n) = a1v1(n − 1)+v(n) (b) Show that the corresponding reproducing kernels satisfy v2(n) = a2v2(n − 1)+v(n) 1 = + Kp(z, w) Kp−1(z, w) Bp(z)Bp(w) where v(n) is zero-mean, unit-variance, white noise, and φ is a random phase independent Ep of v(n). This ensures that the s(n) component is uncorrelated with v1(n) and v2(n). −1 −1 1 Kp(z, w) = z w Kp−1(z, w)+ Ap(z)Ap(w) Ep (a) Show that ak ak R (k)= 1 ,R(k)= 2 ,k≥ 0 (c) Using part (b), show the Christoffel-Darboux formula [972,973,981] xy − yy − 2 1 a1a2 1 a2 1 A (z)A (w)−z−1w−1B (z)B (w) (b) Show that the infinite-order Wiener filter for estimating x(n) on the basis of y(n) has K (z, w)= p p p p p −1 −1 a (causal) impulse response Ep 1 − z w

h = 1 ,h= (a − a )ak−1 ,k≥ 1 (d) Let zi be the ith zero of the prediction polynomial Ap(z). Using part (c), evaluate 0 k 1 2 1 ∗ | |≤ Kp(zi,zi ) and thereby show that necessarily zi 1. This is yet another proof of the minimum-phase property of the prediction-error filters. Show further that if the (c) Next, consider the order-M FIR Wiener filter. Send the theoretical correlations of part = R (a) for k = 0, 1,...,M through the function firw to obtain the theoretical Mth order prediction filter ap is symmetric; i.e., ap ap , then its zeros lie on the unit circle. Wiener filter realized both in the direct and the lattice forms. Draw these realizations. (e) Show the Christoffel-Darboux formula [972,973,981] Compare the theoretical values of the weights h, g, and γ with the simulated values presented in Sec. 12.11 that correspond to the choice of parameters M = 4, a =−0.5, 1 Ap(z)Ap(w)−Bp(z)Bp(w) 1 K −1(z, w)= p −1 −1 and a2 = 0.8. Also compare the answer for h with the first (M + 1) samples of the Ep 1 − z w infinite-order Wiener filter impulse response of part (b). | |≤ and use this expression to prove the result in (d) that zi 1. (d) Repeat (c) with M = 6. 12.20 Do Problem 12.7 using the Schur algorithm, determine the Cholesky factor G, and verify − 12.26 A closed form solution of Problem 12.25 can be obtained as follows. R = GD 1GT by explicit matrix multiplication. 12.21 For the Example 12.10.2, compute the entries of the output matrices Y± by directly convolv- (a) Show that the inverse of the (M + 1)×(M + 1) autocorrelation matrix defined by

ing the forward/backward prediction filters with the input autocorrelation lags. the autocorrelation lags Ryy(k), k = 0, 1,...,M of Problem 12.25(a) is given by the tridiagonal matrix: 12.22 Do Problem 12.7 using the split Schur algorithm, and determine the Cholesky factor G by ⎡ ⎤ the recursion (12.10.21). 1 −a 0 ··· 00 ⎢ 2 ⎥ ⎢ − − ··· ⎥ 12.23 (a) Show the identity ⎢ a2 b a2 00⎥ ⎢ − ··· ⎥   ⎢ 0 a2 b 00⎥  2 − ⎢ ⎥  −a∗ + z−1  1 −|z−1|2 1 −|a|2 R 1 = ⎢ . . . . . ⎥   = − yy ⎢ ...... ⎥  −  1 − ⎢ ...... ⎥ 1 − az 1 |1 − az 1|2 ⎢ ⎥ ⎣ 000··· b −a2 ⎦ 000··· −a 1 (b) Using part (a), show that the all-pass Schur function Sp(z) defined by Eq. (12.10.22) sat- 2 isfies the boundedness inequality (12.10.23), with equality attained on the unit circle. = + 2 Show that it also satisfies |Sp(z)| > 1 for |z| < 1. where b 1 a2. 12.24 Define the Schur function (b) Using this inverse, show that the optimal Mth order Wiener filter has impulse response

− − − 0.125 − 0.875z 2 + z 3 k−1 a1 a2 M−1 h0 = 1 ,hk = (a1 − a2)a , for 1 ≤ k ≤ M − 1 , and hM = a S3(z)= 1 − 1 1 − 0.875z−1 + 0.125z−3 1 a1a2

Carry out the recursions (12.10.24) and (12.10.25) to construct the lower order Schur func- (c) Show that the lattice weights g can be obtained from h by the backward substitution

tions Sp(z), p = 2, 1, 0, and, in the process, extract the corresponding reflection coefficients. gM = hM , and gm = a2gm+1 + hm ,m= M − 1,M− 2,...,1, 0 604 12. Linear Prediction 12.16. Problems 605

(d) For M = 4, a1 =−0.5, a2 = 0.8, compute the numerical values of h and g using the 12.34 Show the two properties of the matrix ψm(z) stated in Eqs. (12.13.13) and (12.13.14). above expressions and compare them with those of Problem 12.25(c). 12.35 Show Eqs. (12.13.25). 12.27 Computer Experiment. Consider the noise canceling example of Sec. 12.11 and Problem 12.36 The reflection response of a stack of four dielectrics has been found to be 12.25, defined by the choice of parameters −0.25 + 0.0313z−1 + 0.2344z−2 − 0.2656z−3 + 0.25z−4 = = =− = = R(z)= ω0 0.075π [radians/sample] ,φ 0 ,a1 0.5 ,a2 0.8 ,M 4 1 − 0.125z−1 + 0.0664z−3 − 0.0625z−4 (a) Generate 100 samples of the signals x(n), s(n), and y(n). On the same graph, plot Determine the reflection coefficients {ρ0,ρ1,ρ2,ρ3,ρ4}. x(n) and s(n) versus n. Plot y(n) versus n. ˆ ˆ = (b) Using these samples, compute the sample correlations Ryy(k), Rxy(k), for k 0, 1, 12.37 Computer Experiment. It is desired to probe k R(k) ...,M, and compare them with the theoretical values obtained in Problem 12.25(a). the structure of a stack of dielectrics from 0 −0.2500 (c) Send these lags through the function firw to get the optimal Wiener filter weights h its reflection response. To this end, a unit 1 0.0000 and g, and the reflection coefficients γ. Draw the lattice and direct-form realizations impulse is sent incident on the stack and the 2 0.2344 of the Wiener filter. reflection response is measured as a func- 3 −0.2197 tion of time. (d) Filter y(n) through the Wiener filter realized in the lattice form, and plot the output 4 0.2069 It is known in advance (although this is not e(n)= x(n)−x(n)ˆ versus n. 5 0.0103 necessary) that the stack consists of four 6 0.0305 (e) Repeat (d) using the direct-form realization of the Wiener filter. equal travel-time slabs stacked in front of 7 −0.0237 (f) Repeat (d) when M = 6. a semi-infinite medium. 8 0.0093 Thirteen samples of the reflection response 12.28 The following six samples 9 −0.0002 are collected as shown here. Determine the 10 0.0035 {y ,y ,y ,y ,y ,y }={4.684, 7.247, 8.423, 8.650, 8.640, 8.392} { } 0 1 2 3 4 5 reflection coefficients ρ0,ρ1,ρ2,ρ3,ρ4 by 11 −0.0017 means of the dynamic predictive deconvolu- have been generated by sending zero-mean unit-variance white noise through the difference 12 0.0004 tion procedure. equation

yn = a1yn−1 + a2yn−2 + n 12.38 Computer Experiment. Generate the results of Figures 5.16–5.17 and 5.25–5.26.

where a1 = 1.70 and a2 =−0.72. Iterating Burg’s method by hand, obtain estimates of the 12.39 Computer Experiment. This problem illustrates the use of the dynamic predictive deconvolu- 2 model parameters a1,a2, and σ . tion method in the design of broadband terminations of transmission lines. The termination 12.29 Derive Eq. (12.12.11). is constructed by the cascade of M equal travel-time segments of transmission lines such that the overall reflection response of the structure approximates the desired reflection re- 12.30 Computer Experiment. Ten samples from a fourth-order autoregres- n y(n) sponse. The characteristic impedances of the various segments are obtainable from the { } sive process y(n) are given. It is desired to extract the model pa- 0 4.503 reflection coefficients ρ0,ρ1,...,ρM . The reflection response R(ω) of the structure is a { 2} = rameters a1,a2,a3,a4,σ as well as the equivalent parameter set 1 −10.841 periodic function of ω with period ωs 2π/T2, where T2 is the two-way travel time delay { 2} γ1,γ2,γ3,γ4,σ . 2 −24.183 of each segment. The design procedure is illustrated by the following example: The desired − frequency response R(ω) is defined over one Nyquist period, as shown in Fig. 5.39: (a) Determine these parameters using Burg’s method. 3 25.662 − ⎧ (b) Repeat using the Yule-Walker method. 4 14.390 ⎨ 0, for 0.25ω ≤ ω ≤ 0.75ω 5 1.453 = 2 s R(ω) ⎩ Note: The exact parameter values by which the above simulated sam- 6 10.980 0.9, for 0 ≤ ω<0.25ωs and 0.75ωs <ω≤ ωs ples were generated are 7 13.679 8 15.517 {a ,a ,a ,a ,σ2}={−2.2137, 2.9403, −2.2697, 0.9606, 1} 1 2 3 4  9 15.037 12.31 Using the continuity equations at an interface, derive the transmission matrix equation (12.13.2) and the energy conservation equation (12.13.4). 12.32 Show Eq. (12.13.6). 12.33 Fig. 12.13.2 defines the scattering matrix S. Explain how the principle of linear superposition may be used to show the general relationship Fig. 12.16.1 Desired reflection frequency response.

E+ E+ = S E− E− (a) Using the Fourier series method of designing digital filters, design an N = 21-tap between incoming and outgoing fields. filter with impulse response R(k), k = 0, 1,...,N − 1, whose frequency response 606 12. Linear Prediction 12.16. Problems 607

reflection coefficients, γ approximates the desired response defined above. Window the designed reflection p Quarterly GNP and its prediction 0.6 impulse response R(k) by a length-N Hamming window. Plot the magnitude frequency 9.6 response of the windowed reflection series over one Nyquist interval 0 ≤ ω ≤ ωs. 0.5

(b) For M = 6, send the N samples of the windowed reflection series through the dynamic 0.4 9.4 predictive deconvolution function dpd to obtain the polynomials AM(z) and BM(z) 0.3 and the reflection coefficients {ρ ,ρ ,...,ρ }. Plot the magnitude response of the 0 1 M 9.2 structure; that is, plot 0.2     0.1 BM(z) ω   log(GNP) 9 |R(ω)|=  ,z= exp(jωT2)= exp 2πj AM(z) ωs 0 −0.1 and compare it with the windowed response of part (a). To facilitate the comparison, 8.8 plot both responses of parts (a) and (b) on the same graph. −0.2 data = = = prediction (p=10) (c) Repeat part (b) for M 2, M 3, and M 10. −0.3 8.6 0 2 4 6 8 10 12 14 16 18 20 22 24 1980 1985 1990 1995 2000 2005 2010 (d) Repeat parts (a) through (c) for N = 31 reflection series samples. p year (e) Repeat parts (a) through (c) for N = 51. DTFT and Burg spectra + DTFT and Burg spectra 12.40 Show that the performance matrix P defined by Eq. (12.14.11) has trace equal to M 1. DTFT 1 AR(10) DTFT 12.41 Computer Experiment. Reproduce the results of Figs. 12.14.1 through 12.14.6. 1 AR(15) 12.42 Computer Experiment – ARIMA Models. The file GNPC96.dat contains the data for the U.S. Real Gross National Product (in billions of chained 2005 dollars, measured quarterly and seasonally adjusted.) In this experiment, you will test whether an ARIMA(p, 1, 0) model is appropriate for these data. 0.5 Extract the data from 1980 onwards and take their log. This results in N = 119 observations, 0.5 y = ln(GNP ), n = 0, 1,...,N − 1. Because of the upward trend of the data, define the

n n magnitude square magnitude square length-(N−1) differenced signal zn = yn − yn−1, for n = 1, 2,...,N− 1. Then, subtract its sample mean, say μ, to get the signal xn = zn − μ, for n = 1, 2,...,N− 1.

a. Calculate and plot the first M = 24 sample autocorrelation lags R(k),0≤ k ≤ M, 0 0 of the signal x . Send these into the Levinson algorithm to determine and plot the 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 n ω / π ω / π = corresponding reflection coefficients γp, for p 1, 2 ...,M.√ Add on that graph the 95% confidence bands, that is, the horizontal lines at ±1.96/ N. Based on this plot, 12.43 Computer Experiment – Design. It is desired to design a Wiener filter to enhance determine a reasonable value for the order p of an autoregressive model of fitting the a sinusoidal signal buried in noise. The noisy sinusoidal signal is given by

xn data. = + = b. Check the chosen value of p against also the FPE, AIC, and MDL criteria, that is, by xn sn vn , where sn sin(ω0n) plotting them versus p = 1, 2,...,M, and identifying their minimum: with ω0 = 0.075π. The noise component vn is related to the secondary signal yn by N + p + 1 FPEp = Ep , AICp = N ln Ep + 2(p + 1), MDLp = N ln Ep + (p + 1)ln N N − p − 1 vn = yn + yn−1 + yn−2 + yn−3 + yn−4 + yn−5 + yn−6 c. For the chosen value of p, use the Yule-Walker method to determine the linear pre- a. Generate N = 200 samples of the signal yn by assuming that it is an AR(4) process diction error filter ap of order p, then, calculate the one-step-ahead prediction xˆn/n−1, with reflection coefficients: add the mean μ to get the prediction zˆn/n−1, and undo the differencing operation to compute the prediction yˆn/n−1 of yn. There is a small subtlety here that has to do with {γ1,γ2,γ3,γ4}={0.5, −0.5, 0.5, −0.5} the initial value of yn. On the same graph plot yn and its prediction. = = 2 d. Repeat part (c) for a couple of other values of p, say p 1 and p 10. The variance σ of the driving white noise of the model must be chosen in such a way 2 2 2 = e. Calculate the DTFT |X(ω)| of the data xn over 0 ≤ ω ≤ π. For the chosen order as to make the variance σv of the noise component vn approximately σv 0.5, such p, calculate the corresponding AR power spectrum but this time use the Burg method that the two terms sn and vn of xn have approximately equal strengths, that is, 0 dB to determine the order-p prediction filter. Plot the DTFT and AR spectra on the same signal-to-noise ratio. = T 2 = T graph, but for convenience in comparing them, normalize both spectra to their max- (This can be done by writing vn c y(n) and therefore, σv c Rc, where c is a imum values. Investigate if higher values of p can model these spectra better, for 7-dimensional column vector of ones, and R is the order-6 autocorrelation matrix, = = 2 = 2 T example, try the orders p 10 and p 15. Some example graphs are included below. you can then write σv σyc Rnormc, where Rnorm has all its entries normalized by 608 12. Linear Prediction

= 2 R(0) σy. You can easily determine Rnorm by doing a maximum entropy extension to = = order six, starting with the four reflection coefficients and setting γ5 γ6 0.) 13 In generating yn make sure that the transients introduced by the filter have died out. Then, generate the corresponding N samples of the signal xn. On the same graph, plot xn together with the desired signal sn. On a separate graph (but using the same vertical scales as the previous one) plot the reference signal yn versus n. Kalman Filtering b. For M = 4, design a Wiener filter of order-M based on the generated signal blocks

{xn,yn}, n = 0, 1,...,N− 1, and realize it in both the direct and lattice forms.

c. Using the lattice form, filter the signals xn,yn through the designed filter and generate the outputs xˆn,en. Explain why en should be an estimate of the desired signal sn.On the same graph, plot en and sn using the same vertical scales as in part (a). d. Repeat parts (b) and (c) for filter orders M = 5, 6, 7, 8. Discuss the improvement ob- tained with increasing order. What is the smallest M that would, at least theoretically,

result in en = sn? Some example graphs are included below.

Wiener filter inputs Wiener filter − error output, M=6 13.1 State-Space Models 4 4 y(n) e(n) 3 x(n) 3 s(n) The Kalman filter is based on a state/measurement model of the form:

2 2 xn+1 = Anxn + wn (state model) 1 1 (13.1.1) = + 0 0 yn Cnxn vn (measurement model)

−1 −1 where xn is a p-dimensional state vector and yn,anr-dimensional vector of observations. −2 −2 The p×p state-transition matrix An and r×p measurement matrix Cn may depend on −3 −3 time n. The signals wn, vn are assumed to be mutually-independent, zero-mean, white- noise signals with known covariance matrices Q and R : −4 −4 n n 0 50 100 150 200 0 50 100 150 200 time samples, n time samples, n T = E[wnwi ] Qnδni T = E[vnvi ] Rnδni (13.1.2) T = E[wnvi ] 0

The model is iterated starting at n = 0. The initial state vector x0 is assumed to be random and independent of wn, vn, but with a known mean x¯0 = E[x0] and covariance T matrix Σ0 = E[(x0 − x¯0)(x0 − x¯0) ]. We will assume, for now, that x0, wn, vn are nor- mally distributed, and therefore, their statistical description is completely determined T = by their means and covariances. A non-zero cross-covariance E[wnvi ] Snδni may also be assumed. A scalar version of the model was discussed in Chap. 11. The covariance matrices Qn,Rn have dimensions p×p and r×r, but they need not have full rank (which would mean that some of the components of xn or yn would, in an appropriate basis, be noise-free.) For example, to allow the possibility of fewer state noise components, the model (13.1.1) is often written in the form:

xn+1 = Anxn + Gnwn (state model) (13.1.3) yn = Cnxn + vn (measurement model)

609