Quick viewing(Text Mode)

Block Matrices with L-Block-Banded Inverse

Block Matrices with L-Block-Banded Inverse

630 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 2, FEBRUARY 2005 Block Matrices With v-Block-banded Inverse: Inversion Algorithms Amir Asif, Senior Member, IEEE, and José M. F. Moura, Fellow, IEEE

Abstract—Block-banded matrices generalize banded matrices. scalars. In contrast with banded matrices, there are much fewer We study the properties of positive definite full matrices whose results published for block-banded matrices. Any block-banded inverses are -block-banded. We show that, for such matrices, is also banded; therefore, the methods in [3]–[10] could the blocks in the -block band of completely determine ; namely, all blocks of outside its -block band are computed in principle be applicable to block-banded matrices. Exten- from the blocks in the -block band of . We derive fast inver- sion of inversion algorithms designed for banded matrices to sion algorithms for and its inverse that, when compared block-banded matrices generally fails to exploit the sparsity to direct inversion, are faster by two orders of magnitude of patterns within each block and between blocks; therefore, the linear dimension of the constituent blocks. We apply these they are not optimal [11]. Further, algorithms that use the inversion algorithms to successfully develop fast approximations to Kalman–Bucy filters in applications with high dimensional block-banded structure of the matrix to be inverted are compu- states where the direct inversion of the is tationally more efficient than those that just manipulate scalars computationally unfeasible. because with block-banded matrices more computations are Index Terms—Block-banded matrix, , associated with a given data movement than with banded covariance matrix, Gauss–Markov random process, Kalman–Bucy matrices. An example of an inversion algorithm that uses the filter, matrix inversion, . inherent structure of the constituent blocks in a block-banded matrix to its advantage is in [3]; this reference proposes a fast I. INTRODUCTION algorithm for inverting block Toeplitz matrices with Toeplitz blocks. References [12]–[17] describe alternative algorithms LOCK-banded matrices and their inverses arise fre- for matrices with a similar Toeplitz structure. B quently in signal processing applications, including This paper develops results for positive definite and sym- autoregressive or moving average image modeling, covariances metric -block-banded matrices and their inverses . of Gauss–Markov random processes (GMrp) [1], [2], or with An example of is the covariance matrix of a Gauss–Markov finite difference numerical approximations to partial differen- random field (GMrp); see [2] with its inverse referred to as the tial equations. For example, the point spread function (psf) in information matrix. Unlike existing approaches, no additional image restoration problems has a block-banded structure [3]. structure or constraint is assumed on ; in particular, the algo- Block-banded matrices are also used to model the correlation rithms in this paper do not require to be Toeplitz. Our inver- of cyclostationary processes in periodic time series [4]. We sion algorithms generalize our earlier work presented in [18] for are motivated by signal processing problems with large di- tridiagonal block matrices to block matrices with an arbitrary mensional states where a straightforward implementation of a bandwidth . We show that the matrix , whose inverse is recursive estimation algorithm, such as the Kalman–Bucy filter an -block-banded matrix, is completely defined by the blocks (KBf), is prohibitively expensive. In particular, the inversion within its -block band. In other words, when the of the error covariance matrices in the KBf is computationally has an -block-banded inverse, is highly structured. Any intensive precluding the direct implementation of the KBf to block entry outside the -block of can be obtained such problems. from the block entries within the -block diagonals of . The Several approaches1 [3]–[10] have been proposed in the paper proves this fact, which is at first sight surprising, and de- literature for the inversion of (scalar, not block) banded ma- rives the following algorithms for block matrices whose in- trices. In banded matrices, the entries in the band diagonals are verses are -block-banded: 1) Inversion of : An inversion algorithm for that uses Manuscript received January 7, 2003; revised February 24, 2004. This work only the block entries in the -block band of . This is a was supported in part by the Natural Science and Engineering Research Council (NSERC) of Canada under Grants 229862-01 and 228415-03 and by the United very efficient inversion algorithm for such ; it is faster States Office of Naval Research under Grant N00014-97-1-0800. The associate than direct inversion by two orders of magnitude of the editor coordinating the review of this paper and approving it for publication was linear dimension of the blocks used. Prof Trac D. Tran. A. Asif is with the Department of Computer Science and Engineering, York 2) Inversion of : A fast inversion algorithm for the University, Toronto, ON, M3J 1P3 Canada (Email: [email protected]) -block-banded matrix that is faster than its direct J. M. F. Moura is with the Department of Electrical and Computer Engi- inversion by up to one order of magnitude of the linear neering, Carnegie Mellon University, Pittsburgh, PA, 15213 USA (e-mail: [email protected]). dimension of its constituent blocks. Digital Object Identifier 10.1109/TSP.2004.840709 Compared with the scalar banded representations, the block- 1This review is intended only to provide context to the work and should not banded implementations of Algorithms 1 and 2 provide com- be treated as complete or exhaustive. putational savings of up to three orders of magnitude of the

1053-587X/$20.00 © 2005 IEEE ASIF AND MOURA: BLOCK MATRICES WITH -BLOCK-BANDED INVERSE: INVERSION ALGORITHMS 631 dimension of the constituent blocks used to represent and spanning block rows and columns through its inverse . The inversion algorithms are then used to de- is given by velop alternative, computationally efficient approximations of the KBf. These near-optimal implementations are obtained by imposing an -block-banded structure on the inverse of the error . (2) covariance matrix (information matrix) and correspond to mod- .. eling the error field as a reduced-order Gauss–Markov random process (GMrp). Controlled simulations show that our KBf im- plementations lead to results that are virtually indistinguishable The Cholesky factorization of results in the from the results for the conventional KBf. Cholesky factor that is an upper . To indi- The paper is organized as follows. In Section II, we define cate that the matrix is the upper triangular Cholesky factor of the notation used to represent block-banded matrices and derive , we work often with the notation chol . three important properties for -block-banded matrices. These Lemma 1.1 shows that the Cholesky factor of the -block- properties express the block entries of an -block-banded ma- banded matrix has exactly nonzero block diagonals above trix in terms of the block entries of its inverse, and vice versa. the main . Section III applies the results derived in Section II to derive in- Lemma 1.1: A positive definite and version algorithms for an -block-banded matrix and its in- is -block-banded if and only if (iff) the constituent blocks verse . We also consider special block-banded matrices that in its upper triangular Cholesky factor are have additional zero block diagonals within the first -block di- agonals. To illustrate the application of the inversion algorithms, for (3) we apply them in Section IV to inverting large covariance ma- trices in the context of an approximation algorithm to a large The proof of Lemma 1.1 is included in the Appendix. state space KBf problem. These simulations show an almost per- The inverse of the Cholesky factor is an upper trian- fect agreement between the approximate filter and the exact KBf gular matrix estimate. Finally, Section V summarizes the paper.

II. BANDED MATRIX RESULTS A. Notation .. .. (4) Consider a positive-definite symmetric matrix represented . . by its constituent blocks , , . The matrix is assumed to have an -block-banded inverse , , , with the following structure: where the lower diagonal entries in are zero blocks . More importantly, the main diagonal entries in are block inverses of the corresponding blocks in . These features are used next to derive three important theorems for -block-banded matrices where we show how to obtain the following: (1) a) block entry of the Cholesky factor from a selected number of blocks of without inverting the full matrix ; b) block entries of recursively from the blocks where the square blocks and the zero square blocks are of without inverting the complete Cholesky of order . A diagonal block matrix is a 0-block-banded matrix factor ; and . A tridiagonal block matrix has exactly one nonzero c) block entries , outside the first diago- block diagonal above and below the main diagonal and is there- nals of from the blocks within the first -diagonals of fore a 1-block-banded matrix and similarly for higher . values of . Unless otherwise specified, we use calligraphic Since we operate at a block level, the three theorems offer con- fonts to denote matrices (e.g., or ) with dimensions siderable computational savings over direct computations of . Their constituent blocks are denoted by capital letters (e.g., from and vice versa. The proofs are included in the Appendix. or ) with the subscript representing their location inside the full matrix in terms of the number of block rows B. Theorems and block columns . The blocks (or ) are of dimen- sions , implying that there are block rows and block Theorem 1: The Cholesky blocks ’s3 on columns in matrix (or ). block row of the Cholesky factor of an -block-banded ma- To be concise, we borrow the MATLAB2 notation to refer to an ordered combination of blocks . A principal submatrix of 3A comma in the subscript helps in differentiating between € and € that in our earlier notation is written as € . We will use comma in 2MATLAB is a registered trademark of Mathworks. the subscript only for cases where confusion may arise. 632 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 2, FEBRUARY 2005 trix are determined from the principal submatrix We now proceed with Theorem 2, which expresses the blocks of by of in terms of the Cholesky blocks . Theorem 2: The upper triangular blocks in , with being -block-banded, are obtained recursively from the Cholesky blocks of the Cholesky factor chol by ......

(9) (5)

(10)

. . . (11) .. . . for the last row. Theorem 2 states that the blocks on block row and within the first -block diagonals in can be evalu- (6) ated from the corresponding Cholesky blocks in . . and the -banded blocks in the lower block rows of , i.e., , with . (7) To illustrate the recursive nature of the computations, con- sider computing the diagonal block of a matrix that, for example, has a 2-block-banded inverse. This re- Theorem 1 shows how the blocks of the Cholesky quires computing the following blocks: factor are determined from the blocks of the -banded . Equations (5) and (6) show that the Cholesky blocks on block row of , only involve the blocks in the principal subma- trix that are in the neighborhood of these Cholesky blocks . For block rows in the reverse zig-zag order specified as follows: , the dimensions of the required principal submatrix of is further reduced to , as shown by (6). In other words, all block rows of the Cholesky factor can be determined independently of each other by selecting the appropriate principal submatrix of and then applying (5). For block row , the required principal submatrix of spans block where the number indicates the order in which the blocks rows (and block columns) through . are computed. The block is calculated first, followed by An alternative to (5) can be obtained by right multiplying both , and so on with the remaining entries until is sides in (5) by and rearranging terms to get reached. Next, we present Theorem 3, which expresses the block en- tries outside the first diagonals in in terms of its blocks within the first -diagonals...... Theorem 3: Let be -block-banded and . Then

(8) where is the identity block of order . Equation (8) is solved for . The blocks are ob- . . .. . (12) tained by Cholesky factorization of the first term . This factorization to be well defined requires that the resulting first term in (8) be positive definite. This is easily verified. Since block is a principal submatrix This theorem shows that the blocks , out- of , its inverse is positive definite. The top left entry corre- side the -band of are determined from the blocks , sponding to on the right-hand side of (8) is obtained by within its -band. In other words, the matrix is com- selecting the first principal submatrix of the inverse of pletely specified by its first -block diagonals. Any blocks out- , which is then positive definite as desired. side the -block diagonals can be evaluated recursively from ASIF AND MOURA: BLOCK MATRICES WITH -BLOCK-BANDED INVERSE: INVERSION ALGORITHMS 633 blocks within the -block diagonals. In the paper, we refer to entry of can be computed from its significant blocks from the blocks in the -block band of as the significant blocks. the following expression: The blocks outside the -block band are referred to as the non- significant blocks. By Theorem 3, the nonsignificant blocks are determined from the significant blocks of . To illustrate the recursive order by which the nonsignificant (17) blocks are evaluated from the significant blocks, consider an example where we compute block in , which we assume has a 3-block-banded inverse . First, write In Corollary 3.1, the following notation is used: as given by Theorem 3 as (18)

We note that in (17), the block is expressed in terms of blocks on the main diagonal and on the first upper diagonal Then, note that all blocks on the right-hand side of the equation . Thus, any nonsignificant block in is computed are significant blocks, i.e., these lie in the 3-block band of , directly from the significant blocks without the except , which is a nonsignificant block. Therefore, we need need for a recursion. to compute first. By application of Theorem 3 again, we can see that block can be computed directly from the significant III. INVERSION ALGORITHMS blocks, i.e., from blocks that are all within the 3-block band of In this section, we apply Theorems 1 and 2 to derive compu- , so that no additional nonsignificant blocks of are needed. tationally efficient algorithms to invert the full symmetric pos- As a general rule, to compute the block entries outside the itive definite matrix with an -block-banded inverse and -block band of , we should first compute the blocks on the to solve the converse problem of inverting the symmetric posi- th diagonal from the significant blocks, followed by the tive definite -block-banded matrix to obtain its full inverse block diagonal entries, and so on, until all blocks outside . We also include results from simulations that illustrate the the band have been computed. computational savings provided by Algorithms 1 and 2 over di- We now restate, as Corollaries 1.1–3.1, Theorems 1–3 for ma- rect inversion of the matrices. In this section, the matrix is trices with inverses, i.e., for . These , i.e., with blocks of order . We only count corollaries are the results in [18]. the multiplication operations assuming that inversion or multi- Corollary 1.1: The Cholesky blocks of a tridi- plication of generic matrices requires floating-point agonal block-banded matrix can be com- operations (flops). puted directly from the main diagonal blocks and the first upper diagonal blocks of using the following A. Inversion of Matrices With Block-Banded Inverses expressions: Algorithm 1— : This algorithm computes the -block-banded inverse from blocks of using Steps 1 chol (13) and 2. Since is symmetric ( ) (similarly for ), we only compute the upper triangular blocks of (or ). chol Step 1: Starting with , the Cholesky’s blocks are calculated recursively using Theorem 1. for (14) The blocks on row , for example, are calculated using (8), which computes the terms Corollary 2.1: The main and the first upper block diagonal for . The main diagonal Cholesky blocks entries of with a tridiagonal block- are obtained by solving for the Cholesky factors of banded inverse can be evaluated from the Cholesky factors . The off-diagonal Cholesky blocks , of from the following expressions: are evaluated by multiplying the corresponding entity calculated in (8) by the inverse of . Step 2: The upper triangular block entries ( , )in (15) the information matrix are determined from the following expression:

for for (16) (19) for Corollary 3.1: Given the main and the first upper block diag- onal entries of with a tridiagonal block- obtained by expanding in terms of the constituent banded inverse , any nonsignificant upper triangular block blocks of . 634 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 2, FEBRUARY 2005

Alternative Implementation: A second implementation of Step 1: Calculate the Cholesky blocks from . These Algorithm 1 that avoids Cholesky factorization is obtained by can be evaluated recursively using the following expressions: expressing (19) as

chol (24) (20)

(25) and solving (8) for the Cholesky products , for and , instead of the individual Cholesky The boundary condition (b.c.) for the first row is blocks. Throughout the manuscript, we use this implementation whenever we refer to Algorithm 1. chol and Computations: In (8), the principal matrix (26) is of order . Multiplying its inverse with as in (8) is equivalent to selecting its first column. Thus, only Equations (24)–(26) are derived by rearranging terms in (19). out of block entries of the inverse of Step 2: Starting with , the block entries , , are needed, reducing by a factor of the computations , and in are determined recursively from the to inverting the principal submatrix. The number of flops to Cholesky blocks using Theorems 2 and 3. calculate the Cholesky product terms on row Alternative Implementation: To compute from (24) de- of is therefore or . The total number of mands that the matrix flops to compute the Cholesky product terms for rows in is then (27)

Number of flops in Step 1 (21) be positive definite. Numerical errors with badly conditioned In Step 2 of Algorithm 1, the number of summation terms in matrices may cause the factorization of this matrix to fail. The (20) to compute is (except for the first few initial rows, Cholesky factorization can be avoided by noting that Theorem ). Each term involves two block multiplications,4 i.e., 2 requires only terms and , which in turn flops are needed to compute . There are roughly use . We can avoid the Cholesky factorization of nonzero blocks in the upper half of the -block- matrix (27) by replacing Step 1 as follows: banded inverse resulting in the following flop count: Step 1: Calculate the product terms

Number of flops in Step 2 (28) (22) The total number of flops to compute using Algorithm 1 is therefore given by (29) Number of flops in Algorithm 1 (23) or , which is an improvement of over the (30) direct inversion of . As an aside, it may be noted that Step 1 of Algorithm 1 com- for , , and with boundary putes the Cholesky factors of an -block-banded matrix and can condition , and be used for Cholesky factorization of . . We will use implemen- tation (28)–(30) in conjunction with Step 2 for the inversion of -block-banded matrices. B. Inversion of -Block-Banded Matrices Computations: Since the term is obtained directly by iteration of the previous rows, (28) in Step 1 of Algorithm Algorithm 2— : This algorithm calculates from 2 only involves additions and does not require multiplications. its -block-banded inverse from the following two steps. Equation (29) requires one .5 The number of terms on each block row of is ; therefore, the 4Equation (20) also inverts once for each block row i the matrix @ A. Such an inversion I i xas times requires xs flops, which is a factor of 5As we explained in footnote 4, (29) inverts matrix for each block v less than our result in (21) not affecting the order of the number of compu- row i.For@I i xasA, this requires xs flops that do not affect the order tations. of the number of computations. ASIF AND MOURA: BLOCK MATRICES WITH -BLOCK-BANDED INVERSE: INVERSION ALGORITHMS 635 number of flops for computing all such terms on block row is . Equation (30) computes and involves two matrix multiplications. There are such terms in row , re- quiring a total of flops. The number of flops required in Step 1 of Algorithm 2 is therefore given by

Number of flops in Step 1 (31) Step 2 of Algorithm 2 uses Theorem 2 to compute blocks . Each block typically requires multiplications of blocks. There are such blocks in , giving the fol- lowing expression for the number of flops:

Number of flops in Step 2 (32) Fig. 1. Number of flops required to invert a full matrix € with v-block-banded inverse using Algorithm 1 for v a 2, 4, 8 and 16. The plots are normalized by Adding the results from (31) and (32) gives the number of flops required to invert € directly.

Number of flops in Algorithm 2 (33) or flops, which is an improvement of approxi- mately a factor of over direct inversion of matrix .

C. Simulations In Figs. 1 and 2, we plot the results of Monte Carlo sim- ulations that quantify the savings in floating-point operations (flops) resulting from Algorithms 1 and 2 over the direct in- version of the matrices. The plots are normalized by the total number of flops required in the direct inversion; therefore, the region below the ordinate in these figures corresponds to the number of computations smaller than the number of com- putations required by the direct inversion of the matrices. This region represents computational savings of our algorithms over v @st  stA direct inversion. In each case, the dimension of the constituent Fig. 2. Number of flops required to invert an -block-banded matrix e using Algorithm 2 for v a 2, 4, 8 and 16. The plots are normalized blocks in (or of in ) is kept constant at , by the number of flops required to invert e directly. whereas the parameter denoting the number of blocks on the main diagonal in (or ) is varied from 1 to 50. The where the superscript in denotes the size of the blocks maximum dimensions of matrices and in the simulation is used in the representation. The matrix is expressed in (250 250). Except for the few initial cases where the overhead terms of blocks for ( , ). involved in indexing and identifying constituent blocks exceeds For , will reduce to , which is the same as the savings provided by Algorithm 2, both algorithms exhibit our representation used earlier. Similar representations are considerable savings over direct inversion. For Algorithm 1, the used for : the block-banded inverse of . Further, assume that computations can be reduced by a factor of 10–100, whereas for the block bandwidth for expressed in terms of Algorithm 2, the savings can be by a factor of 10. Higher sav- block sizes is . Representation for the same matrix uses ings will result with larger matrices. blocks of dimensions and has a block bandwidth of . By following the procedure of Sections III-A D. Choice of Block Dimensions and B, it can be shown that the flops required to compute i) the In this subsection, we compare different implementations of -block-banded matrix from using Algorithm 1 Algorithms 1 and 2 obtained by varying the size of the blocks and ii) from using Algorithm 2 are given by in the matrix . For example, consider the following repre- sentation for with scalar dimensions of : Alg. 1: Total number of flops (35) Alg. 2: Total number of flops (36) . (34) .. To illustrate the effect of the dimensions of the constituent blocks used in (or ) on the computational complexity 636 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 2, FEBRUARY 2005

TABLE I NUMBER OF FLOPS FOR ALGORITHMS 1 AND 2EXPRESSED AS A FUNCTION OF THE BLOCK SIZE @ks  ksA.FLOPS FOR ALGORITHM 1ARE EXPRESSED AS A MULTIPLE OF xs AND FOR ALGORITHM 2 AS A MULTIPLE OF x s

of our algorithms, we consider an example where is constrained in having all odd numbered block diagonals within assumed to be 1-block-banded with . Different the first -block diagonals both above and below the main block implementations of Algorithms 1 and 2 can be derived by diagonal consisting of blocks, i.e., varying factor in dimensions of the constituent blocks in . As shown in the top row of Table I, several values of are considered. As dimensions of the block size are reduced, the block bandwidth of the matrix changes as well. The values of the block bandwidth corresponding to dimensions are shown in row 2 of Table I. In row 3, we list, as a multiple of , the flops required to invert using (37) different implementations of Algorithm 1 for the respective values of and shown in rows 1 and 2. Similarly, in row 4, we tabulate the number of flops required to invert using different implementations of Algorithm 2. The number of flops for Algorithm 2 is expressed as a multiple of . Table I illustrates the computational gain obtained in Algo- By appropriate permutation of the block diagonals, the -block- rithms 1 and 2 with blocks of larger dimensions. The increase in banded matrix can be reduced to a lower order block-banded the number of computations in the two algorithms using smaller matrix with bandwidth . Alternatively, Lemma 1 and Theo- blocks can be mainly attributed to additional zero blocks that rems 1–3 can be applied directly with the following results. are included in the outermost block diagonals of when the 1) The structure of the upper triangle Cholesky block is size of the constituent blocks in is reduced. Because similar to , with the block entries , , the inversion algorithms do not recognize these blocks as zero given by blocks to simplify the computations, their inclusion increases the overall number of flops used by the two algorithms. If the for (38) dimensions of the constituent blocks used in inverting and for (39) with a -block-banded inverse are reduced from to ,it follows from (35) that the number of flops in Algorithm 1 is in- creased by . Similarly, if the dimensions of In other words, the blocks on all odd numbered the constituent blocks used in inverting a -block-banded diagonals in are . matrix are reduced to , it follows from (36) that the 2) In evaluating the nonzero Cholesky blocks (or of number of flops in Algorithm 2 increases by . Assuming the inverse ), the only blocks required from to be a power of 2, the above discussion can be extended until are the blocks corresponding to the nonzero block diago- is expressed in terms of scalars, i.e., is a scalar entry. nals in (or ). Theorem 1 reduces to Using such scalar implementations, it can be shown that the number of flops in Algorithms 1 and 2 are increased by . . . Algorithm 1: Increase in number of flops .. . .

(40) for . Recall that the blocks of required and Algorithm 2: Increase in number of flops to compute are referred to as the significant blocks. In our example, the significant blocks are which is roughly an increase by a factor of over the block- banded implementation based on blocks. for (41)

E. Sparse Block-Banded Matrices 3) The blocks on all odd-numbered diagonals in the full ma- An interesting application of the inversion algorithms is to trix , which is the inverse of a sparse L-block banded ma- invert a matrix that is not only -block-banded but is also trix with zero blocks on the odd-numbered diagonals, ASIF AND MOURA: BLOCK MATRICES WITH -BLOCK-BANDED INVERSE: INVERSION ALGORITHMS 637

are themselves zero blocks. This is verified from Theorem and are, therefore, fairly sparse. This is typically the case with 2, which reduces to remote sensing platforms on board orbiting satellites. Implementation of optimal filters such as the KBf to esti- mate the field (the estimated field is denoted by ) in such (42) cases requires storage and manipulation of to matrices, which is computationally not practical. To obtain a practical implementation, we approximate the non-Markov for . The off-diagonal significant entries are error process at each time iteration in the KBf expressed as by a Gauss–Markov random process (GMrp) of order . This is equivalent to approximating the error covariance matrix for , where is the expectation operator, by a matrix whose inverse is -block-banded. We note that it is the inverse of the covariance that is block-banded—the covariance itself is for still a full matrix. In the context of image compression, first- (43) order GMrp approximations have been used to model noncausal for and with boundary conditions images. In [2] and [22], for example, an uncorrelated error field is generated recursively by subtracting a GMrp based prediction (44) of the intensity of each pixel in the image from its actual value. and (45) -block-banded Approximation: The approximated matrix is obtained directly from in a single step by retaining the 4) Theorem 3 simplifies to the following. For significant blocks of in , i.e., and (49)

The nonsignificant blocks in , if required, are obtained by applying Theorem 3 and using the significant blocks of in (49). In [6], it is shown that the GMrp approximations optimize . .. the Kullback-Leibler mean information distance criterion under certain constraints. The resulting implementations of the KBf obtained by ap- proximating the error field with a GMrp are referred to as the (46) . local KBf [18] and [19], where we introduced the local KBf for . a first-order GMrp. This corresponds to approximating the in- verse of the error covariance matrix (information matrix) with a 1-block-banded matrix. In this section, we derive several im- Result 4 illustrates that only the even-numbered signifi- plementations of the local KBf using different values of of cant blocks in are used in calculating its nonsignificant the GMrp approximation and explore the tradeoff between the blocks. approximation error versus the block bandwidth in our ap- proximation. We carry this study using the frame-reconstruction IV. APPLICATION TO KALMAN–BUCY FILTERING problem studied in [20], where a sequence of (100 100) im- For typical image-based applications in computer vision and ages of the moving tip of a quadratic cone are synthesized. The the physical sciences, the visual fields of interest are often spec- surface translates across the image frame with a constant ified by spatial local interactions. In other cases, these fields velocity whose components along the two frame axes are both are modeled by finite difference equations obtained from dis- 0.2 pixels/frame, i.e., cretizing partial differential equations. Consequently, the state matrices and in the state equation (with forcing term ) (50)

(47) where is the forcing term. Since the spatial coordi- nates take only integer values in the discrete dynamical are block-banded and sparse, i.e., for . model on which the filters are based, we use a finite difference A similar structure exists for with block bandwidth model obtained by discretizing (50) with the leap-frog method . The dimension of the state vector is on the order of the [21]. The dynamical equation in (50) is a simplified case of the number of pixels in the field, which is typically to ele- thin-plate model with the spatial coherence constraint that is ments. Due to this large dimensionality, it is usually only prac- suitable for surface interpolation. We assume that data is avail- tical to observe a fraction of the field. The observations in the able on a few spatial points on adjacent rows of the field , and a observation model with noise different row is observed at each iteration. The initial conditions used in the local KBf are and . The (48) forcing term and the observation noise are both assumed 638 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 2, FEBRUARY 2005

Fig. 3. Comparison of MSEs (normalized with the actual field’s energy) of Fig. 4. Comparison of 2-norm differences between error covariance matrices the optimal KBf versus the local KBfs with v aIto 4. The plots for @vbIA of the optimal KBf versus the local KBfs with v aIto 4. overlap the plot for the optimal KBf.

mented. Second, we illustrate that the estimates from the local independent identically distributed (iid) random variables with KBf are in almost perfect agreement with the direct KBf, indi- Gaussian distribution of zero mean and unit or a signal cating that the local KBf is a fairly good approximation of the to noise ratio (SNR) of 10 dB. Our discrete state and observa- direct KBf. In our simulations, a relatively small value of the tion models are different from [20]. block bandwidth is sufficient for an effective approximation Fig. 3 shows the evolution over time of the mean square error of the covariance matrix. Intuitively, this can be explained by the (MSE) for the estimated fields obtained from the optimal KBf effect of the strength of the state process and noise on the struc- and the local KBfs. In each case, the MSE are normalized with ture of the error covariance matrix in the KBf. When the process the energy present in the field. The solid line in Fig. 3 corre- noise is low, the covariance matrix approaches the structure sponds to the MSE for the exact KBf, whereas the dotted line imposed by the state matrix . Since is block-banded, it makes is the MSE obtained for the local KBf using a 1-block-banded sense to update the -block diagonals of the error covariance approximation. The MSE plots for higher order block- matrix. On the other hand, when the process noise is high, the banded approximations are so close to the exact KBf that they prediction is close to providing no information about the un- are indistinguishable in the plot from the MSE of the optimal known state. Thus, the structural constraints on the inverse of KBf. It is clear from the plots that the local KBf follows closely the covariance matrix have little effect. As far as the measure- the optimal KBf, showing that the reduced-order GMrp approx- ments are concerned, only a few adjacent rows of the field are imation is a fairly good approximation to the problem. observed during each time iteration. In the error covariance ma- To quantify the approximation to the error covariance matrix trix obtained from the filtering step of the KBf, blocks corre- , we plot in Fig. 4 the 2-norm difference between the error sponding to these observed rows are more significantly affected covariance matrix of the optimal KBf and the local KBfs with than the others. These blocks lie close to the main block diag- , 2, and 4 block-banded approximations. The 2-norm dif- onal of the error covariance matrix, which the local KBf updates ferences are normalized with the 2-norm magnitude of the error in any case. As such, little difference is observed between the covariance matrix of the optimal KBf. The plots show that after a exact and local KBfs. small transient, the difference between the error covariance ma- trices of the optimal KBf and the local KBfs is small, with the V. S UMMARY approximation improving as the value of is increased. An in- teresting feature for is the sharp bump in the plots around The paper derives inversion algorithms for -block-banded the iterations 6 to 10. The bump reduces and subsequently dis- matrices and for matrices whose inverse are -block-banded. appears for higher values of . The algorithms illustrate that the inverse of an -block-banded Discussion: The experiments included in this section were matrix is completely specified by the first -block entries ad- performed to make two points. First, we apply the inversion al- jacent to the main diagonal and any outside entry can be deter- gorithms to derive practical implementations of the KBf. For mined from these significant blocks. Any block entry outside the applications with large dimensional fields, the inversion of the -block diagonals can therefore be obtained recursively from error covariance matrix is computationally intensive precluding the block entries within the -block diagonals. Compared to di- the direct implementation of the KBf to such problems. By ap- rect inversion of a matrix, the algorithms provide computational proximating the error field with a reduced order GMrp, we im- savings of up to two orders of magnitude of the dimension of the pose a block-banded structure on the inverse of the covariance constituent blocks used. Finally, we apply our inversion algo- matrix (information matrix). Algorithms 1 and 2 invert the ap- rithms to approximate covariance matrices in signal processing proximated error covariance matrix with a much lower compu- applications like in certain problems of the Kalman–Bucy fil- tational cost, allowing the local KBf to be successfully imple- tering (KBf), where the state is large, but a block-banded struc- ASIF AND MOURA: BLOCK MATRICES WITH -BLOCK-BANDED INVERSE: INVERSION ALGORITHMS 639 ture occurs due to the nature of the state equations and the spar- for . From the previous induction steps, we sity of the block measurements. In these problems, direct inver- know that in (52) are all zero blocks for sion of the covariance matrix is computationally intensive due . Equation (52) reduces to to the large dimensions of the state fields. The block-banded ap- proximation to the inverse of the error covariance matrix makes for (53) the KBf implementation practical, reducing the computational complexity of the KBf by at least two orders of the linear di- For , the block , implying that its mensions of the estimated field. Our simulations show that the Cholesky factor .For , (53) resulting KBf implementations are practically feasible and lead reduces to , implying that the off-diagonal to results that are virtually indistinguishable from the results of entries in are zero blocks. This completes the proof of Lemma the conventional KBf. 1.1 for the diagonal case . Case : By induction, we assume that the matrix is block-banded iff its Cholesky factors is APPENDIX upper triangular with only its first diagonals consisting of In the Appendix, we provide proofs for Lemma 1.1 and The- nonzero blocks. orems 1–3. Case : We prove the if and only if statements Lemma 1.1: Proved by induction on , the block bandwidth separately. of . If statement: The Cholesky block is upper triangular with Case : For , is a block diagonal ma- only its main and upper -block diagonals being nonzeros. Ex- trix. Lemma 1.1 implies that is block diagonal iff its Cholesky panding gives factor is also block diagonal. This case is verified by ex- panding in terms of the constituent blocks and . We verify the if and only if statements (54) separately. If statement: Assume that is a block , i.e., for and . We prove the if statement for a for . By expanding , it is straightfor- block-banded matrix by a nested induction on block row . ward to derive For , (54) reduces to . Since the Cholesky blocks are zero blocks for , and for therefore, must also be for on block row . is therefore a block diagonal matrix if its Cholesky factor By induction, we assume for and is block diagonal. . only if statement: To prove to be block diagonal for case For block row , (54) reduces to , we use a nested induction on block row . Expanding the expression gives (55) for . (51) With block row and for , note that the Cholesky blocks in (55) lie outside the -block diagonals and are, therefore, . Substituting the value for and . of the Cholesky blocks in (55), proves that , for For block row , (51) reduces to for . Matrix is therefore -block-banded. . The first block on block row is Only if statement: Given that matrix is -block-banded, we given by . The block is a principal submatrix show that its Cholesky factor is upper triangular with only the of a positive definite matrix ; hence, . Since is a main and upper -block diagonals being nonzeros. We prove by Cholesky factor of , hence, . The remaining upper induction on the block row . triangular blocks in the first block row of are For block row , (51) reduces to given by . Since and is invertible, the off-diagonal entries , on block row are, for (56) therefore, zero blocks. For the first entry , , which proves that the By induction, assume that all upper triangular off-diagonal Cholesky factor and, hence, is invertible. Substituting entries on row in are zero blocks, i.e., for for in (56), it is trivial to show that . the corresponding entries and ,in are For , expression (51)6 also . By induction, assume for block row and (52) ). Substituting block row in 6There is a small variation in the number of terms at the boundary. The proofs (51) gives (52). It is straightforward to show that the Cholesky for the b.c. follow along similar lines and are not explicitly included here. blocks outside the -bands for 640 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 2, FEBRUARY 2005 are by noting that blocks in (52) are for To prove Theorem 1 for the block row , equate the . block elements on both Theorem 1: Theorem 1 is proved by induction. sides of (59) as Case : For , Theorem 1 reduces to Corol- lary 1.1, proved in [18]. Block Expression Case : By the induction step, Theorem 1 is valid for a -block-banded matrix, i.e., (60) (61) ...... (62)

(57) which expressed in the matrix- proves Theorem where the Cholesky blocks in on row of the 1 for block row . Cholesky factor are obtained by left multiplying the inverse Theorem 2: Theorem 2 is proved by induction. of the principal submatrix Case : For , Theorem 2 reduces to Corol- of with the -block column vector with only one lary 2.1 proved in [18]. nonzero block entry at the top. The dimensions Case : By the induction step, Theorem 2 is valid of the principal submatrix and the block for a -block-banded matrix. By rearranging terms, Theorem 2 column vector containing as its first block entry depend for is expressed as on the block bandwidth or the number of nonzero Cholesky blocks on row of the Cholesky factor . Below, we prove Theorem 1 for a block-banded matrix. . . (63) Case : From Lemma 1.1, a block- banded matrix has the Cholesky factor that is upper trian- gular and is block-banded. Row of the Cholesky factor now has nonzero entries . By induc- tion from the previous case, these Cholesky blocks can be calcu- . (64) lated by left multiplying the inverse of the . principal submatrix of with the -block column vector with one nonzero block entry at the top as in where the dimensions of the block row vector are . The dimensions of the block column vector are . In evaluating , the procedure for . . selecting the constituent blocks in the block row vector of and .. . the block column vector of is straightforward. For , we select the blocks on block row spanning columns through . Similarly, for , the blocks on block column spanning rows to are selected. The number of spanned block rows (or block columns) . (58) depends on the block bandwidth . . Case : By induction, from the case, the dimensions of the block column vectors or block row vectors in (63) and (64) would increase by one block. The block To prove this result, we express the equality row vector derived from in (63) now spans block columns through along block row of and is given by in the form (59) . The block column vector involving in (63) is now given by and spans block rows replace and as , and substitute from through of block column . Below, we verify (4). We prove by induction on the block row , . Theorem 2 for the case by a nested induction on For block row , we equate the block elements block row , . in (59). This gives , which results in the first For block row , Theorem 2 becomes b.c. for Theorem 1. , which is proven directly by rearranging terms of By induction, assume that Theorem 1 is valid for the block the first b.c. in Theorem 1. row . Assume Theorem 2 is valid for block row . ASIF AND MOURA: BLOCK MATRICES WITH -BLOCK-BANDED INVERSE: INVERSION ALGORITHMS 641

TABLE II CASE @v a kA:SUBMATRICES NEEDED TO CALCULATE THE NONSIGNIFICANT ENTRY € IN A k BLOCK BANDED MATRIX

Entry Dimensions Block row vector Principal submatrix Block column vector

TABLE III CASE @v a k CIA:SUBMATRICES NEEDED TO CALCULATE THE NONSIGNIFICANT ENTRY € IN A k CIBLOCK BANDED MATRIX

Entry Dimensions Block row vector Principal submatrix Block column vector

For block row , Theorem 2 can be proven by right We prove Theorem 3 for a block-banded matrix by a multiplying (60)–(62) with and solving for for nested induction on the block row , . . To prove (9), right multiplying (60) by gives For block row , the only nonsignificant entry on the block row in are . Theorem 3 for is verified by equating the block on both sides of the expression (65) which proves Theorem 2 for block . Equation (10) is verified (67) individually for block row with by right multiplying (61) and (62) on both sides by and solving for . which can be expressed in the form Theorem 3: Theorem 3 is proved through induction. Case : For , Theorem 3 reduces to Corol- . lary 3.1 proved in [18]. . Case : By induction, assume Theorem 3 is valid for (68)

where . To express the Cholesky product terms in terms of , put in (61) and (62) and left multiply .. . . . with . By rearranging terms, the resulting expression is

(66) for and . To calculate the nonsignificant entries , the submatrices of are shown in Table II. The number of blocks selected in each case depends . on block bandwidth . .. (69) Case : By induction from the previous case, the submatrices required to compute the nonsignificant entry of with an block-banded inverse are appended Substituting the value of the Cholesky product terms from (69) with the additional blocks shown in Table III, i.e., the block row in (68) proves Theorem 3 for block row . vector is appended with an additional By induction, assume that Theorem 3 is valid for block row block . The principal submatrix of is appended with . an additional row consisting of blocks from row The proof for block row is similar to the proof for spanning block columns through of and block row . The nonsignificant blocks on row are an additional column from column spanning block . Equating the for rows through of . Similarly, the block block entries on both sides of the equation, gives column vector has an additional block . (70) 642 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 53, NO. 2, FEBRUARY 2005

Solving (70) for and then taking the of (70) gives [20] T. M. Chin, W. C. Karl, and A. S. Willsky, “Sequential filtering for multi-frame visual reconstruction,” Signal Process., vol. 28, no. 1, pp. 311–333, 1992. [21] J. C. Strikwerda, Finite Element Schemes and Partial Differential Equa- . tions. Pacific Grove, CA: Wadsworth, Brooks/Cole, 1989. . [22] A. Asif and J. M. F. Moura, “Image codec by noncausal prediction, residual mean removal, and cascaded VQ,” IEEE Trans. Circuits Syst. (71) Video Technol., vol. 6, no. 1, pp. 42–55, Feb. 1996. To express the Cholesky product terms in terms of , we left multiply (61) and (62) with , and rearrange terms. The re- Amir Asif (M’97–SM’02) received the B.S. degree sulting expression is in electrical engineering with the highest honor and distinction in 1990 from the University of Engineering and Technology, Lahore, Pakistan, and the M.Sc. and Ph.D. degrees in electrical and com- puter engineering from Carnegie Mellon University, Pittsburgh, PA in 1993 and 1996, respectively. . .. (72) He has been an Associate Professor of computer science and engineering at York University, North York, ON, Canada, since 2002. Prior to this, he was on the faculty of Carnegie Mellon University, Substituting the Cholesky product terms from (72) in (71) where he was a Research Engineer from 1997 to 1999 and the Technical proves Theorem 3 for block row . University of British Columbia, Vancouver, BC, Canada, where he was an Assistant/Associate Professor from 1999 to 2002. His research interests lie in statistical signal processing (one and multidimensional), image and video REFERENCES processing, multimedia communications, and genomic signal processing. He has authored over 30 technical contributions, including invited ones, published [1] J. W. Woods, “Two-dimensional discrete Markovian fields,” IEEE Trans. in international journals and conference proceedings. Inf. Theory, vol. IT-18, no. 2, pp. 232–240, Mar. 1972. Dr. Asif has been a Technical Associate Editor for the IEEE SIGNAL [2] J. M. F. Moura and N. Balram, “Recursive structure of noncausal Gauss PROCESSING LETTERS since 2002. He has served on the advisory committees Markov random fields,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 334–354, Mar. 1992. of the Science Council of British Columbia from 2000 to 2002. He has [3] S. J. Reeves, “Fast algorithm for solving block-banded Toeplitz systems organized two IEEE conferences on signal processing theory and applications with banded Toeplitz blocks,” in Proc. Int. Conf. Acoust., Speech, Sgnal and served on the technical committees of several international conferences. Process., vol. 4, May 2002, pp. 3325–3329. He has received several distinguishing awards including the 2004 Excellence [4] M. Chakraborty, “An efficient algorithm for solving general periodic in Teaching Award from the Faculty of Science and Engineering and the 2003 Toeplitz system,” IEEE Trans. Signal Process., vol. 46, no. 3, pp. Mildred Baptist teaching excellence award from the Department of Computer 784–787, Mar. 1998. Science and Engineering. He is a member of the Professional Engineering [5] G. S. Ammar and W. B. Gragg, “Superfast solution of real positive defi- Society of Ontario. nite Toeplitz systems,” SIAM J. Matrix Anal. Applicat., vol. 9, pp. 61–76, Jan. 1988. [6] A. Kavcic and J. M. F. Moura, “Matrix with banded inverses: algorithms José M. F. Moura (S’71–M’75–SM’90–F’94) and factorization of Gauss-Markov processes,” IEEE Trans. Inf. Theory, received the engenheiro electrotécnico degree in vol. 46, no. 4, pp. 1495–1509, Jul. 2000. 1969 from Instituto Superior Técnico (IST), Lisbon, [7] S. Chandrasekaran and A. H. Syed, “A fast stable solver for nonsym- Portugal, and the M.Sc., E.E., and the D.Sc. in metric Toeplitz and quasi-Toeplitz systems of linear equations,” SIAM Electrical Engineering and Computer Science from J. Matrix Anal. Applicat., vol. 19, no. 1, pp. 107–139, Jan. 1998. the Massachusetts Institute of Technology (M.I.T.), [8] P. Dewilde and A. J. van der Veen, “Inner-outer factorization and the in- Cambridge, in 1973 and 1975, respectively. version of locally finite systems of equations,” Applicat., He is a Professor of electrical and computer en- vol. 313, no. 1–3, pp. 53–100, Jul. 2000. gineering at Carnegie Mellon University, Pittsburgh, [9] A. M. Erishman and W. Tinney, “On computing certain elements of the PA, since 1986. From 1999 to 2000, he was a Visiting Commun. ACM inverse of a sparse matrix,” , vol. 18, pp. 177–179, Mar. Professor of electrical engineering at M.I.T. Prior to 1975. [10] I. S. Duff, A. M. Erishman, and W. Tinney, Direct Methods for Sparse this, he was on the faculty of IST from 1975 to 1984. He was visiting Genrad As- Matrices. Oxford, U.K.: Oxford Univ. Press, 1992. sociate Professor of electrical engineering and computer science at M.I.T. from [11] G. H. Golub and C. V. Loan, Matrix Computations, 3rd ed. Baltimore, 1984 to 1986 and a Visiting Research Scholar at the Department of Aerospace MD: ohn Hopkins Univ. Press, 1996, pp. 133–192. Engineering, University of Southern California, Los Angeles, during the Sum- [12] J. Chun and T. Kailath, , Digital Signal Pro- mers of 1978 to 1981. His research interests include statistical signal, image, cessing and Parallel Algorithms. New York: Springer-Verlag, 1991, and video processing, and communications. He has published over 270 tech- pp. 215–236. nical contributions, is the co-editor of two books, holds six patents on image [13] N. Kalouptsidis, G. Carayannis, and D. Manolakis, “Fast algorithms for and video processing, and digital communications with the US Patent Office, block-Toeplitz matrices with Toeplitz entries,” Signal Process., vol. 6, and has given numerous invited seminars at US and European Universities and no. 3, pp. 77–81, 1984. Laboratories. [14] T. Kailath, S.-Y. Kung, and M. Morf, “Displacement ranks of matrices Dr. Moura served as Vice-President for Publications for the IEEE Signal and linear equations,” J. Math. Anal. Applicat., vol. 68, pp. 395–407, Processing Society (SPS), where he was a Board of Governors elected member 1979. from 2000 to 2002. He was also Vice-President for Publications for the IEEE [15] D. A. Bini and B. Meini, “Toeplitz system,” SIAM J. Matrix Anal. Ap- Sensors Council from 2000 to 2002. He is on the Editorial Board of the plicat., vol. 20, no. 3, pp. 700–719, Mar. 1999. IEEE PROCEEDINGS, the IEEE SIGNAL PROCESSING MAGAZINE, and the ACM [16] A. E. Yagle, “A fast algorithm for Toeplitz-block-Toeplitz linear sys- Transactions on Sensor Networks. He chaired the IEEE TAB Transactions tems,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 3, Committee from 2002 to 2003 that joins the about 80 Editors-in-Chief of the Phoenix, AZ, May 2001, pp. 1929–1932. IEEE Transactions. He was the Editor-in-Chief for the IEEE TRANSACTIONS [17] C. Corral, “Inversion of matrices with prescribed structured inverses,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 2, May ON SIGNAL PROCESSING from 1995 to 1999 and Acting Editor-in-Chief for the 2002, pp. 1501–1504. IEEE SIGNAL PROCESSING LETTERS from December 2001 to May 2002. He [18] A. Asif and J. M. F. Moura, “Data assimilation in large time-varying has been a member of several Technical Committees of the SPS. He was on the multidimensional fields,” IEEE Trans. Image Process., vol. 8, no. 11, IEEE Press Board from 1991 to 1995. pp. 1593–1607, Nov. 1999. Dr. Moura is corresponding member of the Academy of Sciences of Portugal [19] , “Fast inversion of v-block-banded matrices and their inverses,” in (Section of Sciences). He was awarded the 2003 IEEE Signal Processing So- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 2, Orlando, ciety meritorious service award, and in 2000, the IEEE Millenium medal. He is FL, May 2002, pp. 1369–1372. affiliated with several IEEE societies, Sigma Xi, AMS, AAAS, IMS, and SIAM.