The Pennsylvania State University
The Graduate School
Department of Computer Science and Engineering
PARALLEL BOUNDARY ELEMENT SOLUTIONS OF BLOCK
CIRCULANT LINEAR SYSTEMS FOR ACOUSTIC RADIATION
PROBLEMS WITH ROTATIONALLY SYMMETRIC
BOUNDARY SURFACES
A Thesis in
Computer Science and Engineering
by
Kenneth D. Czuprynski
c 2012 Kenneth D. Czuprynski
Submitted in Partial Fulfillment of the Requirements for the Degree of
Master of Science
May 2012 The thesis of Kenneth D. Czuprynski was reviewed and approved* by the following:
Suzanne M. Shontz Assistant Professor of Computer Science and Engineering Thesis Adviser
Jesse L. Barlow Professor of Computer Science and Engineering
John B. Fahnline Assistant Professor of Acoustics
Raj Acharya Professor of Computer Science and Engineering Head of the Department of Computer Science and Engineering
*Signatures are on file in the Graduate School. iii Abstract
Coupled finite element/boundary element (FE/BE) formulations are commonly used to solve structural-acoustic problems where a vibrating structure is idealized as being submerged in a fluid that extends to infinity in all directions. Typically in (FE/BE) formulations, the structural analysis is performed using the finite element method, and the acoustic analysis is performed using the boundary element method. In general, the problem is solved frequency by frequency, and the coefficient matrix for the boundary element analysis is fully populated and little can be done to alleviate the storage and computational requirements. Because acoustic boundary element calculations require approximately six elements per wavelength to produce accurate solutions, the boundary element formulation is limited to relatively low frequencies. However, when the outer surface of the structure is rotationally symmetric, the system of linear equations becomes block circulant. We propose a parallel algorithm for distributed memory systems which takes advantage of the underlying concurrency of the inversion formula for block circulant matrices. By using the structure of the coefficient matrix in tandem with a distributed memory system setting, we show that the storage and computational requirements are substantially lessened. iv Table of Contents
List of Figures ...... viii
Chapter 1. Introduction ...... 1
1.1 Acoustic Radiation Problems ...... 3
1.2 Boundary Element Method ...... 4
1.3 The Fourier Matrix and Fast Fourier Transform ...... 10
1.4 Circulant Matrices ...... 18
Chapter 2. Literature Review ...... 21
Chapter 3. Problem Formulation ...... 25
3.1 Coefficient Matrix Derivation ...... 26
3.2 Block Circulant Inversion ...... 32
3.3 Invertibility ...... 35
Chapter 4. Parallel Solution Algorithm ...... 38
4.1 Block DFT Algorithm ...... 38
4.2 Block FFT Algorithm ...... 44
4.3 System Solves ...... 48
4.4 Parallel Algorithm ...... 49
Chapter 5. Theoretical Timing Analysis ...... 51
5.1 Parallel Linear System Solve ...... 52 v
5.2 Block DFT using the DFT Algorithm ...... 53
5.3 Block DFT Using the FFT Algorithm ...... 55
5.4 Bounds ...... 56
Chapter 6. Numerical Experiments ...... 58
6.1 Experiment 1 ...... 58
6.2 Experiment 2 ...... 59
6.3 Numerical Results ...... 59
6.3.1 Experiment 1 ...... 59
6.3.2 Experiment 2 ...... 64
Chapter 7. Conclusions ...... 70
Appendix. BEM Code ...... 72
A.1 STATIC MULTIPOLE ARRAYS ...... 73
A.1.1 Sequential ...... 73
A.1.1.1 General Case ...... 73
A.1.1.2 Rotationally Symmetric ...... 74
A.1.2 Parallel ...... 74
A.1.2.1 General Case ...... 74
A.1.2.2 Rotationally Symmetric ...... 75
A.2 COEFF MATRIX ...... 75
A.2.1 Sequential ...... 76
A.2.1.1 General Case ...... 76 vi
A.2.1.2 Rotationally Symmetric ...... 76
A.2.2 Parallel ...... 77
A.2.2.1 General Case ...... 77
A.2.2.2 Rotationally Symmetric ...... 77
A.3 SOURCE AMPLITUDES MODES ...... 80
A.3.1 Sequential ...... 80
A.3.1.1 General Case ...... 80
A.3.1.2 Rotationally Symmetric ...... 80
A.3.2 Parallel ...... 82
A.3.2.1 General Case ...... 82
A.3.2.2 Rotationally Symmetric ...... 83
A.4 SOURCE POWER ...... 84
A.4.1 Sequential ...... 86
A.4.1.1 General Case ...... 86
A.4.1.2 Rotationally Symmetric ...... 87
A.4.2 Parallel ...... 87
A.4.2.1 General Case ...... 87
A.4.2.2 Rotationally Symmetric ...... 90
A.5 MODAL RESISTANCE ...... 91
A.5.1 Sequential ...... 91
A.5.1.1 General Case ...... 91
A.5.2 Rotationally Symmetric ...... 92
A.5.3 Parallel ...... 92 vii
A.5.3.1 General Case ...... 92
A.5.3.2 Rotationally Symmetric ...... 92
References ...... 93 viii List of Figures
1.1 Radix 2 element interaction pattern obtained from [18]...... 16
3.1 A propeller with three times rotational symmetry [37]...... 26
3.2 A four times rotationally symmetric sketch of a propeller...... 27
4.1 Initial data distribution assumed in the DFT computation for the case
P = m =4...... 39
4.2 The DFT computation for the case P = m = 4. Each arrow indicates
the communication of a processor’s owned submatrix to a neighboring
processor in the direction of the arrow...... 40
4.3 Parallel block DFT data decomposition for P > m...... 42
4.4 Parallel block DFT data decomposition and processor groupings for P > m. 43
4.5 Process illustrating the distributed FFT. Lines crossing to different pro-
cessors indicate communication from left to right. Note the output is in
reverse bit-reversed order relative to numbering starting at zero; that is,
A1 is element 0; A2 is element 1, etc...... 47
4.6 Processor grid creation for P=16 and m=4...... 48
6.1 Runtime comparison using the DFT algorithm for varying P and N with
m =4...... 60
6.2 Runtime comparison using the FFT algorithm for varying P and N with
m =4...... 60 ix
6.3 Speedups using the DFT algorithm for varying P and N with m = 4. . 61
6.4 Speedups using the FFT algorithm for varying P and N with m = 4. . . 61
6.5 Efficiency using the DFT algorithm for varying N and P with m = 4. . 63
6.6 Efficiency using the FFT algorithm for varying N and P with m = 4. . 64
6.7 Runtime comparison using the DFT algorithm for varying P and N with
m =8...... 65
6.8 Runtime comparison using the FFT algorithm for varying P and N with
m =8...... 65
6.9 Speedup comparison using the DFT algorithm for varying P and N when
m =8...... 67
6.10 Speedup comparison using the FFT algorithm for varying P and N when
m =8...... 67
6.11 Efficiency comparison using the DFT algorithm for varying P and N
when m =8...... 68
6.12 Efficiency comparison using the FFT algorithm for varying P and N
when m =8...... 69 1
Chapter 1
Introduction
Coupled finite element/boundary element (FE/BE) formulations are commonly used to solve structural-acoustic problems where a vibrating structure is idealized as be- ing submerged in a fluid that extends to infinity in all directions. Typically in (FE/BE) formulations, the structural analysis is performed using the finite element method, and the acoustic analysis is performed using the boundary element method (BEM). The boundary element formulation is advantageous for the acoustic radiation problem be- cause only the outer surface of the structure in contact with the acoustic medium is discretized. This formulation also allows us to neglect meshing the infinite fluid exterior to the structure, as would be required if the finite element method were used instead.
Using the BEM, we compute the radiated sound field of a vibrating structure
3 Ω ⊂ R . The main obstacle in computing the sound radiation is solving the linear system of equations to enforce the specified boundary conditions. In the context of the BEM, this requires the solution of a dense, complex linear system. In general, the problem is solved frequency by frequency, and the coefficient matrix for the boundary element analysis is fully populated and exhibits no exploitable structure. The size, N 2, of the coefficient matrix is directly correlated with the level of discretization, N, used for the surface in question. Because acoustic boundary element calculations require approximately six elements per wavelength to produce accurate solutions, the boundary 2 element formulation is limited to relatively low frequencies. For high frequency problems, and for problems which involve large and/or complex surfaces, these matrices are large, dense, and unstructured; therefore, there is little which can be done to alleviate the storage and computational requirements. Iterative solvers and preconditioners have been investigated [4, 5, 28] and are a natural choice for large problems because the cost of direct solvers can become prohibitive. While the computational requirements can be lessened by iterative methods, the storage requirements can still present a problem. One obvious solution is to perform the solve in a distributed memory parallel setting. A distributed memory parallel algorithm distributes the workload and allows the storage of the matrix to be split between many individual systems with local memories, thereby increasing the total available memory. In addition, because linear systems are ubiquitous throughout scientific computation, libraries exist for their efficient parallel solution. In particular, because the matrix is dense, Scalable LAPACK (ScaLAPACK) [6] is a favored choice.
While in general these matrices exhibit no exploitable structure, when the bound- ary surface is rotationally symmetric, the coefficient matrix is block circulant. Circulant matrices are defined as each row being a circular shift of the row above it. One property of circulant matrices is that they are all diagonalizable by the Fourier matrix. There- fore, the Discrete or Fast Fourier Transform (D/FFT) can be used in the solution of the system. These results generalize to the block case and can be used in the solution of block circulant linear systems arising from acoustic radiation problems involving ro- tationally symmetric boundary surfaces. In addition, the inversion formula for block circulant matrices is highly amendable to parallel computation. 3
We propose an algorithm for distributed memory systems which takes advantage of the underlying concurrency of the inversion formula for block circulant matrices. By using the structure of the coefficient matrix in tandem with a distributed system setting, the storage and computational limitations are substantially lessened. Therefore, the algorithm allows larger and higher frequency acoustic radiation problems to be explored.
1.1 Acoustic Radiation Problems
The goal is to compute the radiated sound field due to a vibrating structure
3 Ω ⊂ R subject to given boundary conditions. The governing partial differential equation
(PDE) for acoustic radiation problems is the Helmholtz equation, i.e.,
2 2 ∇ − k u(p) = 0, p Ω+, (1.1)
2 ω where ∇ is the Laplacian; k = c is the wave number; ω is angular frequency, and c
3 the speed of sound in the chosen medium. Ω+ = R \{Ω}, denotes the region exterior to
Ω. In structural acoustics problems, it is common for the velocity distribution over the boundary of Ω, denoted by ∂Ω, to be specified. This equates to the Neumann boundary condition
∂u(p) = f(p), p ∂Ω, (1.2) ∂np 4 where ∂ denotes differentiation in the direction of the outward normal at p ∂Ω. In ∂np addition, to ensure all radiated waves are outgoing, the Sommerfield radiation condition
∂u(p) lim r − iku(p) = 0 (1.3) r→∞ ∂r is enforced, where r is the distance of p from a fixed origin. Therefore, in order to solve for the radiated sound field due to Ω, a solution to the Helmholtz equation (1.1), subject to equations (1.2) and (1.3), must be found.
1.2 Boundary Element Method
The boundary element method is an algorithm for the numerical solution of PDEs which have an equivalent boundary integral representation. The BEM reformulates the PDE into an equivalent boundary integral equation (BIE), which is then solved numerically. The benefit of the formulation is that it reduces the problem to one over the boundary. However, because the BEM requires an equivalent BIE formulation, if the PDE cannot be represented as an equivalent BIE, the BEM cannot be used. The remainder of the section will outline the BEM within the context of an acoustic radiation problem.
3 Consider a vibrating structure Ω ⊂ R . The Helmholtz equation is the governing
PDE for the radiated sound field produced by Ω and is given by (1.1). A standard boundary integral formulation of (1.1) yields the following equations
1 Z ∂G(p, q) ∂u(q) u(q) − G(p, q) d (∂Ω) = u(p), p Ω+ (1.4) 4π ∂Ω ∂nq ∂nq 5 and
1 Z ∂G(p, q) ∂u(q) u(q) − G(p, q) d (∂Ω) = u(p), p ∂Ω, (1.5) 2π ∂Ω ∂nq ∂nq where G(p, q) is the Green’s function, which can loosely be thought of as the effect the point q has on point p. In the context of an acoustic radiation problem, the Green’s function corresponds to the fundamental solution of the Helmholtz equation and is given
eik|p−q| by G(p, q) = |p−q| , in which |p − q| denotes the Euclidean distance between the points p and q. A solution to u in the exterior domain with respect to the points on the boundary is provided by (1.4). Therefore, if the quantities u and ∂u(p) are known over ∂nq the boundary, the solution for the points in the exterior can be easily computed. In addition, (1.5) provides a means of solving for the aforementioned quantities. However, by applying the Fredholm alternative to (1.5) it is found that the solutions are not unique for all wave numbers k, and thus an alternative formulation is required [34]. Burton and
Miller [9] showed how a unique solution can be derived. Differentiating (1.5) in the direction of the outward normal yields
1 ∂ Z ∂G(p, q) ∂u(q) ∂u(p) u(q) − G(p, q) d (∂Ω) = , p ∂Ω. (1.6) 2π ∂np ∂Ω ∂nq ∂nq ∂np
Then constructing a linear combination of equations (1.5) and (1.6) using a purely imagi- nary coupling coefficient, β, produces a modified BIE formulation with a unique solution. 6
The formulation is given by
1 Z ∂G(p, q) ∂u(q) u(q) − G(p, q) d (∂Ω) + (1.7) 2π ∂Ω ∂nq ∂nq 1 ∂ Z ∂G(p, q) ∂u(q) β u(q) − G(p, q) d (∂Ω) 2π ∂np ∂Ω ∂nq ∂nq ∂u(p) = u(p) + β . ∂np
Assuming a Neumann boundary condition, (1.7) can be rearranged as follows:
! Z ∂2G(p, q) ∂G(p, q) u(q) β + d (∂Ω) − 2πu(p) (1.8) ∂Ω ∂nq∂np ∂nq
Z ∂u(q) ∂G(p, q) ∂u(p) = G(p, q) + β d (∂Ω) + β2π , p ∂Ω. ∂Ω ∂nq ∂np ∂np
Note, in the case of a Dirichlet boundary condition, ∂u(p) can be solved for by rearranging ∂nq
(1.8). Once u(p) has been solved for over the boundary, the solution for all points in the exterior can be obtained. Therefore, a means for numerically solving equation (1.8) must be devised. For notational convenience, let v(q) = ∂u(q) , and redefine portions of ∂nq both integrands as
∂2G(p, q) ∂G(p, q) T (p, q) = β + (1.9) ∂nq∂np ∂nq and
∂G(p, q) H(p, q) = G(p, q) + β . (1.10) ∂np 7
Equation (1.8) becomes
Z Z u(q)T (p, q)d (∂Ω) − 2πu(p) = v(q)H(p, q)d (∂Ω) + β2πv(p), p ∂Ω. (1.11) ∂Ω ∂Ω
The next step in the BEM is to discretize the boundary surface, ∂Ω, into smaller quadri- lateral or triangular surface elements. After the discretization, the boundary can be
th represented as ∂Ω = ∂Ω1 ∪∂Ω2 ∪· · ·∪∂ΩN , where ∂Ωi represents the i surface element in the discretization of ∂Ω and ∂Ωi ∩ ∂Ωj = ∅ for i 6= j.
Equation (1.11) can then be represented as
N N X Z X Z u(q)T (p, q)d (∂Ωi) − 2πu(p) = v(q)H(p, q)d (∂Ωi) (1.12) i=1 ∂Ωi i=1 ∂Ωi +2βπv(p), p ∂Ω.
The most straightforward approach to numerically solving equation (1.12) is to assume u(p) and v(p) are constant along each surface element, ∂Ωi, i = 1,...,N. Therefore, let u(p) ≈ uj and v(p) ≈ vj for p ∂Ωj, j = 1,...,N. Under this assumption, equation
(1.12) can be decomposed into N equations, i.e., one equation for each surface element; that is,
N N X Z X Z ui T (p, q)d (∂Ωi) − 2πuj = vi H(p, q)d (∂Ωi) (1.13) i=1 ∂Ωi i=1 ∂Ωi
+2βπvj, p ∂Ωj.
Equation (1.13) yields a solution for the jth surface element of the boundary. The boundary is constructed of N surface elements; therefore, there are N equations and N 8 unknowns total. Using this, equation (1.13) can more concisely be expressed in matrix notation. Let
R R R T (p, q)d (∂Ω1) T (p, q)d (∂Ω2) ... T (p, q)d (∂ΩN ) ∂Ω1 ∂Ω2 ∂ΩN R R R T (p, q)d (∂Ω1) T (p, q)d (∂Ω2) ... T (p, q)d (∂ΩN ) ∂Ω1 ∂Ω2 ∂ΩN M = . . . . . . . .. . . . . R T (p, q)d (∂Ω ) R T (p, q)d (∂Ω ) ... R T (p, q)d (∂Ω ) ∂Ω1 1 ∂Ω2 2 ∂ΩN N
Similarly, let the column vector b represent right hand side; that is,
PN hR i vi H(p, q)d (∂Ωi) + β2πv1 i=1 ∂Ωi PN hR i vi H(p, q)d (∂Ωi) + β2πv2 i=1 ∂Ωi b = . . . . h i PN v R H(p, q)d (∂Ω ) + β2πv i=1 i ∂Ωi i N
With a Neumann boundary condition, each vi, i = 1,...,N, is known, and the integrals can be computed via numerical quadrature. Therefore, the matrix M and vector b are known quantities. Using the new quantities, the linear system
(M − 2πI)u = b, p ∂Ω, (1.14)
can be used to solve for the approximation of u over the boundary. Once we have an approximate solution for u over the surface, (1.4) can be used to solve for u in the exterior. 9
It is difficult to precisely enforce the boundary conditions for the surface velocity at edges and corners when the basis functions are constructed using surface distributions of simple and dipole sources, as they are in Burton and Miller’s standard implementation.
To avoid this difficulty, it is possible to rewrite the solution in terms of surface-averaged quantities instead, which is common in acoustics. For example, surface-averaged pres- sures and volume velocities are commonly used in lumped parameter representations of transducers. Since the goal is no longer to match the boundary conditions on a point-by-point basis, it becomes permissible to simplify the solution by constructing the basis functions from discrete sources rather than distributions of sources. Using surface- averaged pressures and volume velocities as variables can also be shown to produce a solution that converges with mesh density, unlike the standard formulation which can produce a less accurate solution as the mesh is refined. The solution is then derived in terms of source amplitudes rather than physical quantities, such as pressure or velocity.
For this type of indirect solution, an approach similar to Burton and Miller’s can be used to prevent nonexistence/nonuniqueness difficulties. A hybrid ”tripole” source type is created from a simple and dipole source with a complex-valued coupling coefficient, as is discussed by Hwang and Chang [19]. The numerical implementation discussed in this thesis is based on an indirect solution using tripole sources, but the basic formulation shares many characteristics with the standard Burton and Miller approach discussed previously. 10
1.3 The Fourier Matrix and Fast Fourier Transform
The Fourier matrix is given by
1 1 1 ··· 1 1 ω1 ω2 ··· ωn−1 n n n 1 F = √ 1 ω2 ω4 ··· ω2(n−1) , (1.15) n n n n . . . . . ...... . . . . (n−1) 2(n−1) (n−1)(n−1) 1 ωn ωn ··· ωn
i2π √ n √1 where ωn = e , i = −1, and normalizing by n makes F unitary. The discrete Fourier transform (DFT) is defined as a matrix vector multiplication involving the Fourier ma- trix. That is,
y = F x. (1.16)
The vector y is called the DFT of x. Similarly, the inverse discrete Fourier transform
(IDFT) of x is given by
y = F −1x. (1.17)
However, because F has been defined to be unitary, (1.17) becomes
y = F ∗x. (1.18)
The Fourier matrix is highly structured, and this structure can be used to com- pute the DFT. The improved method of computing the DFT is called the Fast Fourier 11 transform (FFT) and was first introduced by Cooley and Tukey [12]. It was shown that
h + for vectors with n = 2 elements, h Z , the DFT can be computed in O(n log n). Over the years, the method has been extended to handle vectors with an arbitrary number of elements; a comprehensive overview of these can be found in [11, 26]. This thesis uses the Cooley and Tukey version of the algorithm, also now termed the radix-2 FFT. We thus now overview the radix-2 algorithm.
Assuming the first column and first row are indexed by 0, consider the element
i2πkj th th kj n in the k row and the j column of the Fourier matrix, which is given by ωn = e .
Note then that each element is periodic in n. This can readily be seen by using Euler’s formula. Applying Euler’s formula, we have
kj kj ωkj = cos 2π + i sin 2π . (1.19) n n n
Because sin and cos both have period 2π, by (1.19), if kj ≥ n, the elements begin to
k repeat. It follows that each element in the Fourier matrix can be represented by ωn for k = 0, . . . , n − 1. For example, consider the four-by-four Fourier matrix
1 1 1 1 1 ω1 ω2 ω3 1 4 4 4 F = √ . (1.20) 4 4 1 ω2 ω4 ω6 4 4 4 3 6 9 1 ω4 ω4 ω4 12
By the periodicity of the elements, (1.20) becomes
1 1 1 1 1 ω1 ω2 ω3 1 4 4 4 F = √ . (1.21) 4 4 1 ω2 1 ω2 4 4 3 2 1 1 ω4 ω4 ω4
The FFT algorithm uses properties of ω coupled with a divide and conquer strategy.
The following derivation relies heavily on [11]; we follow their derivation closely.
h + Recall that n = 2 for h Z , and consider the operation y = F x. Expanding the matrix vector product gives
n−1 X jk yk = xjωn , k = 0, . . . , n − 1. (1.22) j=0
Equation (1.22) can be split into two summations: one containing all of the even terms, and one containing all of the odd terms, i.e.,
n n 2 −1 2 −1 X 2jk X (2j+1)k yk = x2jωn + x2j+1ωn , k = 0, . . . , n − 1. (1.23) j=0 j=0
k A ωn term in the second summation can be pulled out of the summation, i.e.,
n n 2 −1 2 −1 X 2jk k X 2jk yk = x2jωn + ωn x2j+1ωn , k = 0, . . . , n − 1. (1.24) j=0 j=0 13
2 Using the fact that ω = ω n , (1.24) becomes n 2
n n 2 −1 2 −1 X jk k X jk yk = x2jω n + ω x2j+1ω n , k = 0, . . . , n − 1. (1.25) 2 n 2 j=0 j=0
n (k+ 2 )j kj n The next observation to make is that ω n = ω n for k = 0,..., − 1. That is, 2 2 2 because ω n has a smaller period, the elements begin to repeat sooner, and k, in turn, 2
n need not go beyond 2 − 1. Therefore, (1.25) becomes
n n 2 −1 2 −1 X jk k X jk n yk = x2jω n + ω x2j+1ω n , k = 0,..., 2 − 1. (1.26) 2 n 2 j=0 j=0
n Looking more closely, each summation represents a DFT of length 2 . Therefore, a DFT of length n can be broken into two DFTs each half the size of the previous DFT. However,
n (1.26) contains only the first 2 terms of y. Computing the remaining terms yields
n n 2 −1 2 −1 j(k+ n ) n j(k+ n ) X 2 k+ 2 X 2 n yk+ n = x2jω n + ω x2j+1ω n , k = 0,..., − 1. (1.27) 2 2 n 2 2 j=0 j=0
We then obtain
n n 2 −1 2 −1 j n n j n X jk 2 k+ 2 X jk 2 n yk+ n = x2jω n ω n + ω x2j+1ω n ω n , k = 0,..., 2 − 1. (1.28) 2 2 2 n 2 2 j=0 j=0
n n j 2 k+ k Because ω n = 1 and ω 2 = −ω , (1.28) becomes 2 n n
n n 2 −1 2 −1 X jk k X jk n yk+ n = x2jω n − ω x2j+1ω n , k = 0,..., 2 − 1. (1.29) 2 2 n 2 j=0 j=0 14
Therefore, the entire vector y can be obtained by
n n 2 −1 2 −1 X jk k X jk n yk = x2jω n + ω x2j+1ω n , k = 0,..., 2 − 1 (1.30) 2 n 2 j=0 j=0
n n 2 −1 2 −1 X jk k X jk n yk+ n = x2jω n − ω x2j+1ω n , k = 0,..., 2 − 1. 2 2 n 2 j=0 j=0
Let sj = x2j and tj = x2j+1 for j = 0,...,N/2 − 1; that is, s is the vector containing all the even elements of x, and t is the vector contain all of its odd elements. Then (1.30) may be written as
h i k h i n [Fnx]k = F n s + ω F n t , k = 0,..., − 1 (1.31) 2 k n 2 k 2
h i k h i n n n [Fnx]k+ n = F s − ωn F t , k = 0,..., − 1. 2 2 k 2 k 2
From (1.31), the recursive nature of the algorithm should be clear. The DFT of a vector y can be split into two DFTs of half the size. We can proceed in computing F n s and 2
F n t, as if it were the first time, and proceed as above. Algorithm 1.1 gives a pseudocode 2 of the algorithm. 15
Algorithm 1.1 Radix-2 FFT pseudocode. 1: Y=Radix-2FFT(X,n) 2: if n == 1 then 3: return Y; 4: else n 5: s = Radix-2FFT(Even(X), 2 ); n 6: t = Radix-2FFT(Odd(X), 2 ); n 7: for k = 0 to 2 − 1 do k 8: Yk = sk + ωntk; k 9: Y n = s − ω t ; k+ 2 k n k 10: end for 11: end if 12: return Y;
Algorithm 1.1 follows nicely from the derived mathematics; however, the recursion can be unrolled into an iterative format which will later facilitate the explanation of our parallel algorithm. The algorithm can be found in [24], and our explanation follows their discussion closely.
Algorithm 1.2 Iterative Radix-2 FFT pseudocode as presented in [24]. 1: Y=Radix-2FFT(X,Y,n) 2: r = log n; 3: R = X; 4: for m = 0 to r − 1 do 5: S = R; 6: for i = 0 to n − 1 do 7: //Let (b0b1 . . . br−1) be the binary representation of i 8: j = (b0 . . . bm−1 0 bm+1 . . . br−1); 9: k = (b0 . . . bm−1 1 bm+1 . . . br−1); 10: r = (bmbm−1 . . . b0 0 ... 0); r 11: Ri = Sj + Skωn; 12: end for 13: end for 14: Y = R; 16
Algorithm 1.2 is the iterative version of Algorithm 1.1. Each iteration of the outer loop (line 4) represents one level of the recursion, starting with the deepest level. At each level of recursion, the output vector is updated by two entries of the given input vector and a multiple of the factor ω, (lines 8 and 9 of Algorithm 1.1 and line 11 for
Algorithm 1.2). Algorithm 1.1 uses the input to the function at each level of recursion to update the output vector; whereas, Algorithm 1.2 uses binary representations of the index being modified.
The most relevant property to notice, with respect to the parallel algorithm, is the pattern of interaction between different elements of the input vector. Figure 1.1 shows which elements in the input vector, denoted x, are used in computing each element of the output vector, denoted X, for a vector of length n = 16.
Fig. 1.1 Radix 2 element interaction pattern obtained from [18]. 17
In order to solidify this notion and to clarify the meaning behind Figure 1.1, consider the transformation of x(0). The elements of the initial input vector involved in the transformation of x(0) are: x(0), x(8), followed by modified versions of x(4), x(2), and x(1). Similarly, each element of the input vector in the diagram can be traced to see the elements of the initial vector involved in each computation.
A final note about FFTs is the ordering of the output. When the algorithm is run in place, such that it overwrites the array containing the initial data, the output is in bit-reversed order. This can be seen in Figure 1.1. For another example, let n = 8, and consider the computation x = F8x, where the vector x is overwritten. This yields
x x 0 0 x1 x4 x2 x2 x3 x6 x = 7−→ . x x 4 1 x x 5 5 x x 6 3 x7 x7
The indices are converted to binary, and the bit string is reversed before being converted back into decimal. In the above example, consider the indices one and seven, i.e., (1)10 =
(001)2, and flipping the bit string yields (100)2 = (4)10. This means that data migrates to bit-reversed order when the FFT is done in place. 18
1.4 Circulant Matrices
Circulant matrices are a subset of Toeplitz matrices which have the added prop- erty that each row is a circular shift of the previous row. The matrix C is circulant if it has the form c c c ··· c 1 2 3 n c c c ··· c n 1 2 n−1 C = c c c ··· c . n−1 n 1 n−2 . . . . . ...... . . . . c2 c3 c4 ··· c1
Matrices of this form can be uniquely represented by their first row and will be denoted by C = circ(c0, c1, c2, ··· , cn−1).
A thorough treatment of circulant matrices is given in [13]. The important prop- erty of circulant matrices that is used heavily throughout this thesis concerns the eigen-
T values and eigenvectors of circulant matrices. Let v = [c1 c2 c3 . . . cn] be the column vector constructed from the first row of a circulant matrix C. Then the eigenvalues of
C are given by
λ = F v, (1.32) where F is the unitary Fourier matrix [13]. That is, the discrete Fourier transform (DFT) of the first row of C yields the eigenvalues of C. Further, the eigenvectors of a circulant matrix C are given by the columns of the Fourier matrix of appropriate dimension. Thus,
C has eigenvalue decomposition
C = F ∗DF, (1.33) 19 where F is again the Fourier matrix, and D is the diagonal matrix whose elements are the eigenvalues of C, i.e., D = diag(λ). This means that every circulant matrix of the same dimension has the same eigenvectors, and that the matrix C is given by
C = F ∗diag(λ)F. (1.34)
With this decomposition, a formulation for the inversion of C can easily be obtained.
The inverse of C is then given by
C−1 = F diag(λ)−1F ∗. (1.35)
This formulation can then be used to solve a linear system. Consider the linear system
Cx = b. (1.36)
Left multiplication by C−1 yields
x = C−1b. (1.37)
Now, substituting for the definition of C−1 given by (1.35) yields
x = F diag(λ)−1F ∗b. (1.38)
Rearranging gives
diag(λ)F ∗x = F ∗b. (1.39) 20
Letx ˜ = F ∗x and ˜b = F ∗b; then (1.39) becomes
diag(λ)˜x = ˜b, (1.40)
whose solution is trivial. Therefore, the solution of a linear system equates to computing three DFTs and a backsolve involving a diagonal matrix. The steps are:
1. Compute λ = F v.
2. Compute ˜b = F ∗b.
3. Solve diag(λ)˜x = ˜b.
4. Compute x = F x˜.
This formulation is advantageous because the most expensive operation needed is the computation of the DFT, which, in its crudest form, is a matrix vector multiplication, and is thus O(n2). However, if permissible, the fast Fourier transform (FFT) can be used in place of the DFT, and the computation becomes O(n log n). 21
Chapter 2
Literature Review
Circulant matrices are a desirable structure in computation because of their re- lation to the Fast Fourier transform (FFT). Therefore, many variations of circulant matrices have appeared throughout the literature and in a wide variety of contexts.
These range from the solution of circulant tridiagonal and banded systems [32, 16, 15] to effective preconditioners [25] and are able to exploit their computational relation to the FFT.
We are concerned with the solution of linear systems involving block circulant matrices and assume the blocks in the matrix themselves are dense and contain no additional structure. The desirable properties extend to the block case as well; namely, block circulant matrices are block diagonalizable by the block Fourier matrix. The generalization to the block case, however, means that the inversion/solution formula must be extended. We first note that every block circulant matrix (BCM) can be mapped to an equivalent block matrix with circulant blocks (CBM). This can be accomplished by multiplying by the appropriate permutation matrices. Therefore, algorithms for solving
BCMs and CBMs are equivalent.
Within engineering, when problems with periodicity properties are considered, block circulant matrices arise in many contexts. These usually result when such periodic problems are solved by means of integral equations, which includes the BEM. Using 22 the method of fundamental solutions [17], block circulant matrices in the contexts of axisymmetric problems in potential theory [21], as well as axisymmetric harmonic and biharmonic [38], linear elasticity [23, 22], and heat conduction problems [36] have been investigated. In addition, scattering and radiation problems in electromagnetics have taken advantage of block circulant matrices for a variety of integral equation techniques
[33, 30, 14, 20] including the BEM [40]. With respect to acoustics, a National Physical
Laboratory tech report discussed some properties of rotationally symmetric problems for the BEM as applied to the Helmholtz equation [42].
Just as circulant matrices are a subset of Toeplitz matrices, block circulant matri- ces are a subset of block Toeplitz matrices. Therefore, it is not surprising that one of the
first inversion algorithms applied to block circulant matrices was an inversion algorithm for block Toeplitz matrices [2]. Closed form solutions for the inversion of block circulant matrices were formalized in [27] and presented again more concisely in [41]. The se-
∗ quential inversion formula shows that a BCM, A, has the decomposition A = Fb DFb, in which Fb represents the block Fourier matrix, and D represents a block diagonal matrix.
The blocks along the diagonal are obtained by computing the block DFT of the first block row of A; this means if v is defined to be the first block row of A, D = diag{Fbv}.
−1 −1 ∗ The inversion is then given by A = Fb (diag{Fbv}) Fb , and only the blocks of the block diagonal matrix are inverted. Extending the closed form inversion formulations, an algorithm for solving a block circulant linear system was developed alongside many vari- ants of circulant linear systems [10]. The solution of the linear system involving BCMs resulted from a straightforward application of the inversion formula. Following these ef- forts, [31] proposed an algorithm for the solution of CBMs. The most recent contribution 23 to CBMs was given in [39]. The algorithm first diagonalizes each block of the matrix by the Fourier relation. The matrix is then a block matrix with diagonal blocks. The algo- rithm decomposes the matrix into a two-by-two block matrix and successively performs this decomposition to the first principal submatrix until a diagonal matrix is reached.
The diagonal matrix is inverted, and the Schur complement formulation for the inverse of a two-by-two block matrix is successively used to compute the inversion of the entire matrix. All inversion/solution formula of consequence use the spectral properties of the circulant matrices. This is exploited in all aforementioned sequential inversion/solution algorithms.
While sequential solution algorithms have been fully developed, little work has been done on parallel algorithms for block circulant linear systems. A parallel solution for block Toeplitz matrices exists, and parallelizes the generalized Schur algorithm [3].
Yet, using a Toeplitz solver neglects the use of the FFT and potential concurrent cal- culations found in the BCM inversion formula. In fact, the only work we are aware of is a parallel solver for electromagnetic problems which considers the axisymmetric case
[29]. The proposed parallel algorithm was for distributed memory systems and paral- lelized the inversion formulation for BCMs. The assumptions of the work differ from our own; that is, they assume a larger number of blocks of smaller order, and, in turn, assumed that the number of processors was some fraction of the number of blocks in the matrix. This means each processor contained multiple blocks, denoted q, of the BCM.
For each block owned by a processor, the corresponding right-hand side also resides on that processor. This means that when solving the block diagonal matrix, each processor could perform the solve of its q blocks simultaneously. However, when solving the linear 24 system, multiplications by the Fourier matrix are needed. These are needed in order to: obtain the block diagonal matrix, modify the right-hand side vector, and modify the solution vector. This distribution means that multiplying by the Fourier matrix requires communication among the processors. Using the fact that block Fourier transforms can be decomposed into independent Fourier transforms, it performs an all-to-all communi- cation to give each processor the data needed to compute an independent FFT. They tested the algorithm for BCMs with m = 256 blocks of order n = 318, m = 128 blocks of order n = 189, and m = 64 blocks of order n = 93. This is where our assumptions diverge significantly, and as a result our algorithm differs significantly in implementation of the same inversion formula. 25
Chapter 3
Problem Formulation
3 Consider a rotationally symmetric vibrating structure, Ω R . The rotational symmetry implies Ω can be constructed by rotations of a single element around a fixed
0 3 0 axis. Define Ω to be a structure in R , and let Ωθ represent the structure obtained by rotating Ω0 by angle θ. Then, supposing Ω has m rotational symmetries, Ω can be
0 0 0 0 written as Ω = Ω0 ∪ Ω 2π ∪ Ω 4π ∪ · · · ∪ Ω (m−1)2π ; that is, m m m
m−1 [ 0 Ω = Ω k2π . (3.1) m k=0
For example, for m = 4 the structure Ω can be written as
0 0 0 0 Ω = Ω ∪ Ω π ∪ Ω ∪ Ω 3π . (3.2) 0 2 π 2
Note, the angle θ is relative to an initial orientation of the structure. This means that the structure being rotated can have any initial orientation; as long as the rotation is around a fixed axis and the rotation angle is uniform, the constructed structure is rotationally symmetric. Figure 3.1 shows a real-world example of a structure containing three rotational symmetries. 26
Fig. 3.1 A propeller with three times rotational symmetry [37].
3.1 Coefficient Matrix Derivation
Before beginning the algebraic derivation, we first present the underlying intu- ition. Figure 3.2 shows a sketch of a propeller with four times rotational symmetry.
0 0 0 0 Consider the effect Ω has on Ω π , as well as the effect Ω π has on Ω . Because the blades 0 2 2 π 0 0 0 0 are identical and dist(Ω , Ω π ) = dist(Ω π , Ω ), the entries in the coefficient matrix which 0 2 2 π 0 0 0 0 describe the effect of Ω on Ω π and Ω π on Ω will be identical. This continues for the 0 2 2 π remaining interactions of this form; therefore, the entries of the coefficient matrix due
0 0 0 0 0 0 0 0 to the effect of Ω on Ω π ,Ω π on Ω ,Ω on Ω 3π , and Ω 3π on Ω will be identical. This 0 2 2 π π 2 2 0 same idea is used for all of the remaining interactions to finish populating the coefficient matrix. The equality between interactions due to symmetry is what leads to the block circulant structure of the coefficient matrix. 27
Fig. 3.2 A four times rotationally symmetric sketch of a propeller.
3 This decomposition of the initial structure in R into the union of rotated struc- tures gives insight into the structure of the coefficient matrix. Recall, in the derivation of the BEM, the solution over the boundary of the structure must first be solved in order
0 0 to obtain the solution in the exterior domain. Consider only the base element Ω = Ω0 before any rotations. For clarity, we suppose m = 2 and use the standard boundary integral formulations given by (1.4) and (1.5). The integral formulations which promise uniqueness follow in the same manner. Assuming a Neumann boundary condition and
0 rearranging into knowns and unknowns, the equation over the boundary of Ω0 is given 28 by
Z Z ∂G(p, q) 0 ∂u(q) 0 0 u(q)d ∂Ω0 − 2πu(p) = G(p, q)d ∂Ω0 , p ∂Ω0. (3.3) 0 ∂n 0 ∂n ∂Ω0 q ∂Ω0 q
0 Next, consider the solution of u over the boundary element ∂Ω π ; that is, the 2 0 boundary surface obtained by rotating the base element Ω0 by 90 degrees. This yields the following boundary integral formulation
Z Z ∂G(p, q) 0 ∂u(q) 0 0 u(q)d ∂Ω π − 2πu(p) = G(p, q)d ∂Ω π , p ∂Ω π . (3.4) 0 2 0 2 2 ∂Ω π ∂nq ∂Ω π ∂nq 2 2
0 0 As stand-alone structures, Ω and Ω π are identical aside from their orientation. The 0 2 0 0 boundaries, ∂Ω and ∂Ω π , are unaffected by rotations and are therefore identical. Equa- 0 2 tion (3.3) and (3.4) involve only points on the boundary and, therefore, assuming the
Neumann conditions are identical for both equations, equality holds. Note, by the uniqueness, and the equality for identical right-hand sides, it follows the left-hand sides must be identical.
0 Intuitively, (3.3) shows the relation between a point p on ∂Ω0 and all the points
0 0 0 q on ∂Ω0. If a point p is chosen on ∂Ω0, all of the points on ∂Ω0 contribute to the value of u at that point. In this sense, an N-body problem is being solved. Similarly, if a
0 0 point p is chosen on ∂Ω π , all of the points on ∂Ω π contribute to the value of u at that 2 2 0 0 point; however, ∂Ω are identical ∂Ω π . Therefore, under identical boundary conditions, 0 2 the same N-body problem is being solved. 29
Now, consider the solution of u over the boundary of the structure obtained by
0 0 combining the two aforementioned structures, Ω and Ω π . The boundary is then given 0 2 0 0 by ∂Ω = ∂Ω ∪ ∂Ω π and the integral equation is 0 2
Z ∂G(p, q) Z ∂u(q) u(q)d (∂Ω) − 2πu(p) = G(p, q)d (∂Ω) , p ∂Ω. (3.5) ∂Ω ∂nq ∂Ω ∂nq
Using the rotational symmetries, equation (3.5) becomes
Z Z ∂G(p, q) 0 ∂G(p, q) 0 u(q)d ∂Ω0 + u(q)d ∂Ω π − 2πu(p) = (3.6) 0 ∂n 0 ∂n 2 ∂Ω0 q ∂Ω π q 2 Z Z ∂u(q) 0 ∂u(q) 0 0 0 G(p, q)d ∂Ω0 + G(p, q)d ∂Ω π , p ∂Ω0 ∪ ∂Ω π . 0 ∂n 0 ∂n 2 2 ∂Ω0 q ∂Ω π q 2
0 0 Redefine v1(p) = u(p) for p ∂Ω and v2(p) = u(p) for p ∂Ω π . In addition, define 0 2
Z Z ∂G(p, q) 0 ∂G(p, q) 0 π Γ0[v1] = v1(q)d ∂Ω0 ,Γ [v2] = v2(q)d ∂Ω π 0 ∂n 2 0 ∂n 2 ∂Ω0 q ∂Ω π q 2
Z Z ∂v1(q) 0 ∂v2(q) 0 π Σ0 = G(p, q)d ∂Ω0 , and Σ = G(p, q)d ∂Ω π . 0 ∂n 2 0 ∂n 2 ∂Ω0 q ∂Ω π q 2
Note, the variables v (p) and v (p) are unknowns, and, therefore, Γ [v ] and Γ π [v ] are 1 2 0 1 2 2 defined as operators; whereas, Σ and Σ π are known quantities and are treated as known 0 2 values. Using the newly-defined quantities, (3.6) can be split into two simultaneous 30
0 0 equations over ∂Ω and ∂Ω π . 0 2
0 Γ [v ] + Γ π [v ] − 2πv (p) = Σ + Σ π , p ∂Ω (3.7) 0 1 2 2 1 0 2 0
0 Γ0[v1] + Γ π [v2] − 2πv2(p) = Σ0 + Σ π , p ∂Ω π . 2 2 2
Upon appropriate discretization, (3.7) can be written as the following linear system
Γ0 − 2πI Γ π v1 Σ0 + Σ π 2 2 = , (3.8) Γ Γ π − 2πI v Σ + Σ π 0 2 2 0 2 where I is the identity matrix. Let A denote the coefficient matrix in (3.8) and consider the entries (Γ − 2πI) and Γ π − 2πI . By the previous arguments in establishing the 0 2 equivalence of (3.3) and (3.4), it follows that
(Γ − 2πI) = Γ π − 2πI . (3.9) 0 2
This is true even when the right-hand sides of (3.3) and (3.4) are not identical. With this relation established, define A = (Γ − 2πI) = Γ π − 2πI . Similarly, consider the 1 0 2 entries Γ and Γ π . We would like to show Γ = Γ π . By definition, 0 2 0 2
Z ∂G(p, q) 0 Γ0[v1] = v1(q)d ∂Ω0 , (3.10) 0 ∂n ∂Ω0 q 31 and upon discretization as described in Section 1.2, we obtain
N Z ! X ∂G(p, q) 0 Γ [v ] = (v ) d ∂Ω . (3.11) 0 1 1 i 0 i ∂Ω0 ∂nq i=1 [ 0]i
The quantity Γ0[v1] becomes the product Γ0v1, in which v1 is the discretization of the unknown v1(q), and Γ0 is a matrix of known quantities populated by integrating the
0 normal derivative of the Green’s function over the individual surface elements of ∂Ω0.
In considering the discretization of Γ π [v ], we obtain 2 2
N Z X ∂G(p, q) h 0 i Γ π [v2] = (v2)i d ∂Ω π . (3.12) 2 0 ∂n 2 i i=1 ∂Ω π q 2 i
Again, the quantity Γ π [v ] becomes the product Γ π v , in which v is the discretization of 2 2 2 2 2 the unknown v (q), and Γ π is a matrix of known quantities populated by integrating the 2 2
0 normal derivative of the Green’s function over the individual surface elements of ∂Ω π . 2 0 Assuming the discretization of the boundaries are the same, because the boundaries ∂Ω0
0 and ∂Ω π are identical, the values populating Γ0 and Γ π are identical, and thus Γ0 = Γ π . 2 2 2 Let A = Γ = Γ π , then with the previously established definition A = Γ π − 2πI = 2 0 2 1 2
(Γ0 − 2πI), the matrix, A, comprising the linear system (3.8) has the form
A A 1 2 A = , (3.13) A2 A1 which is a 2 × 2 block circulant matrix. In general, given m rotational symmetries, an m × m block circulant matrix can be obtained. 32
3.2 Block Circulant Inversion
N×N Let N = nm. The coefficient matrix A C arising from the BEM applied to an acoustic radiation problem with a rotationally symmetric boundary surface has the form A A ··· A 1 2 m A A ··· A m 1 m−1 A = A A ··· A , (3.14) m−1 m m−2 . . . . . . .. . . . . A2 A3 ··· A1
n×n where each Aj, j = 1, . . . , m is contained in C and is dense. The matrix A is block circulant and therefore can be represented by circular shifts of its first block row. The circulant structure of A is contained in the m blocks forming the first block row of A.
Therefore, in order to perform block DFT operations, we need to scale the Fourier matrix
m×m N×N F C to the block Fourier matrix Fb C . The Fourier matrix F is defined as
1 1 1 ··· 1 1 ω1 ω2 ··· ωm−1 m m m 1 F = √ 1 ω2 ω4 ··· ω2(m−1) , (3.15) m m m m . . . . . . . . . . . ··· . (m−1) 2(m−1) (m−1)(m−1) 1 ωm ωm ··· ωm
i2π √ m √1 where ωm = e , i = −1, and normalizing by m makes F unitary. Scaling each element of F by the n × n identity matrix, In, produces the block Fourier matrix Fb. 33
This is equivalent to the Kronecker product F ⊗ In. After scaling, we have the block
Fourier matrix
I I I ··· I n n n n I I ω1 I ω2 ··· I ωm−1 n n m n m n m 1 2 4 2(m−1) Fb = √ I I ω I ω ··· I ω . (3.16) m n n m n m n m . . . . . . . . . . . ··· . (m−1) 2(m−1) (m−1)(m−1) In Inωm Inωm ··· Inωm
N×n Next, the DFT relations needed for the inversion formula are established. Let X C be the block column vector containing the first block row of A. The block DFT of X is
˜ given by X = FbX; that is,
A˜ I I I ··· I A 1 n n n n 1 A˜ I I ω1 I ω2 ··· I ωm−1 A 2 n n m n m n m 2 A˜ = I I ω2 I ω4 ··· I ω2(m−1) A , (3.17) 3 n n m n m n m 3 . . . . . . . . . . . . . . . . ··· . . ˜ (m−1) 2(m−1) (m−1)(m−1) Am In Inωm Inωm ··· Inωm Am which is nothing more than a DFT of length m with n × n matrices as coefficients in the transform. Using the formulation of the inverse in [41], we have
−1 ˜ −1 ˜ −1 ˜ −1 ∗ A = Fbdiag{(A1) , (A2) ,..., (Am) }Fb , (3.18) 34
˜ −1 ˜ −1 ˜ −1 where diag{(A1) , (A2) ,..., (Am) } is a block diagonal matrix whose diagonal blocks are precisely the inverses of the blocks obtained from the DFT of the first block row of
A. From the formula, we can derive the algorithm for the solution of a linear system.
Consider the system Ax = b; multiplying by A−1 yields
x = A−1b. (3.19)
Substituting in the definition for A−1 from (3.18), we obtain
˜ −1 ˜ −1 ˜ −1 ∗ x = Fb diag{(A1) , (A2) ,..., (Am) }Fb b. (3.20)
Rearranging, yields
˜ ˜ ˜ ∗ ∗ diag{(A1), (A2),..., (Am)}Fb x = Fb b. (3.21)
∗ ˜ ∗ Letx ˜ = Fb x and b = Fb b. This yields
˜ ˜ ˜ ˜ diag{(A1), (A2),..., (Am)}x˜ = b. (3.22)
˜ Blocking the vectorsx ˜ and b to match the block sizes of each Aj, it is easy to see we obtain m independent linear systems to solve
˜ ˜ Ajx˜j = bj, j = 1, . . . , m. (3.23)
The steps for solution of the linear system Ax = b are given by Algorithm 3.1. Each
∗ multiplication by the matrix Fb or Fb represents a block DFT or inverse DFT (IDFT) 35 operation, respectively. It is worth noting that the system solves in line 3 of the algorithm are completely independent, and thus make the algorithm very amendable to parallel implementation as noted in [35].
Algorithm 3.1 Pseudocode for the sequential solution of a block circulant linear system. ˜ ∗ 1: Compute b = Fb b; ˜ 2: Compute X = FbX; ˜ ˜ 3: Solve Ajx˜j = bj, j = 1, . . . , m; 4: Compute x = Fbx˜;
3.3 Invertibility
Algorithm 3.1 requires the inversion of the blocks obtained from computing the
DFT of the first block row of A. Therefore, assumptions on the invertibility of these blocks are required by the algorithm. The section will show that if the initial matrix A is assumed to be nonsingular, then each diagonal block is also nonsingular.
In order to facilitate the proof, we first show that the block Fourier matrix given in (3.16) is unitary.
Lemma 3.1. The block Fourier matrix, Fb, as defined in (3.16) is unitary.
Proof. Recall, the N × N block Fourier matrix Fb can be constructed as a Kronecker product of the unitary m × m Fourier matrix F with the n × n identity matrix In. That is,
Fb = F ⊗ In. (3.24) 36
By the properties of Kronecker products [13] we have (A ⊗ B)∗ = A∗ ⊗ B∗. Therefore,
∗ ∗ ∗ ∗ ∗ Fb = (F ⊗ In) = F ⊗ In = F ⊗ In. (3.25)
∗ −1 So Fb is can be constructed in the same fashion. Now consider Fb . By the Kronecker product property (A ⊗ B)−1 = A−1 ⊗ B−1, for square nonsingular A and B, we have
−1 −1 −1 Fb = F ⊗ In . (3.26)
However, the Fourier matrix F is unitary, and thus
−1 ∗ Fb = F ⊗ In. (3.27)
∗ ∗ It has been established that Fb = F ⊗ In, and, therefore,
−1 ∗ Fb = Fb . (3.28)
Thus Fb is unitary.
Theorem 3.1. Given a nonsingular block circulant matrix A. The block diagonal matrix
˜ ˜ ˜ ˜ diag{A1, A2,..., Am} is nonsingular, where each Aj, j = 1, . . . , m are the blocks obtained by computing the block Fourier transform of the first block row of A.
Proof. Since A is block circulant we have
∗ ˜ ˜ ˜ A = Fb diag{A1, A2,..., Am}Fb. (3.29) 37
Taking the determinant yields
∗ ˜ ˜ ˜ det (A) = det Fb diag{A1, A2,..., Am}Fb . (3.30)
Using a property of determinants we obtain