Parallel Boundary Element Solutions of Block
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School Department of Computer Science and Engineering PARALLEL BOUNDARY ELEMENT SOLUTIONS OF BLOCK CIRCULANT LINEAR SYSTEMS FOR ACOUSTIC RADIATION PROBLEMS WITH ROTATIONALLY SYMMETRIC BOUNDARY SURFACES A Thesis in Computer Science and Engineering by Kenneth D. Czuprynski c 2012 Kenneth D. Czuprynski Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science May 2012 The thesis of Kenneth D. Czuprynski was reviewed and approved* by the following: Suzanne M. Shontz Assistant Professor of Computer Science and Engineering Thesis Adviser Jesse L. Barlow Professor of Computer Science and Engineering John B. Fahnline Assistant Professor of Acoustics Raj Acharya Professor of Computer Science and Engineering Head of the Department of Computer Science and Engineering *Signatures are on file in the Graduate School. iii Abstract Coupled finite element/boundary element (FE/BE) formulations are commonly used to solve structural-acoustic problems where a vibrating structure is idealized as being submerged in a fluid that extends to infinity in all directions. Typically in (FE/BE) formulations, the structural analysis is performed using the finite element method, and the acoustic analysis is performed using the boundary element method. In general, the problem is solved frequency by frequency, and the coefficient matrix for the boundary element analysis is fully populated and little can be done to alleviate the storage and computational requirements. Because acoustic boundary element calculations require approximately six elements per wavelength to produce accurate solutions, the boundary element formulation is limited to relatively low frequencies. However, when the outer surface of the structure is rotationally symmetric, the system of linear equations becomes block circulant. We propose a parallel algorithm for distributed memory systems which takes advantage of the underlying concurrency of the inversion formula for block circulant matrices. By using the structure of the coefficient matrix in tandem with a distributed memory system setting, we show that the storage and computational requirements are substantially lessened. iv Table of Contents List of Figures ::::::::::::::::::::::::::::::::::::: viii Chapter 1. Introduction :::::::::::::::::::::::::::::::: 1 1.1 Acoustic Radiation Problems . 3 1.2 Boundary Element Method . 4 1.3 The Fourier Matrix and Fast Fourier Transform . 10 1.4 Circulant Matrices . 18 Chapter 2. Literature Review ::::::::::::::::::::::::::::: 21 Chapter 3. Problem Formulation ::::::::::::::::::::::::::: 25 3.1 Coefficient Matrix Derivation . 26 3.2 Block Circulant Inversion . 32 3.3 Invertibility . 35 Chapter 4. Parallel Solution Algorithm ::::::::::::::::::::::: 38 4.1 Block DFT Algorithm . 38 4.2 Block FFT Algorithm . 44 4.3 System Solves . 48 4.4 Parallel Algorithm . 49 Chapter 5. Theoretical Timing Analysis ::::::::::::::::::::::: 51 5.1 Parallel Linear System Solve . 52 v 5.2 Block DFT using the DFT Algorithm . 53 5.3 Block DFT Using the FFT Algorithm . 55 5.4 Bounds . 56 Chapter 6. Numerical Experiments ::::::::::::::::::::::::: 58 6.1 Experiment 1 . 58 6.2 Experiment 2 . 59 6.3 Numerical Results . 59 6.3.1 Experiment 1 . 59 6.3.2 Experiment 2 . 64 Chapter 7. Conclusions :::::::::::::::::::::::::::::::: 70 Appendix. BEM Code ::::::::::::::::::::::::::::::::: 72 A.1 STATIC MULTIPOLE ARRAYS . 73 A.1.1 Sequential . 73 A.1.1.1 General Case . 73 A.1.1.2 Rotationally Symmetric . 74 A.1.2 Parallel . 74 A.1.2.1 General Case . 74 A.1.2.2 Rotationally Symmetric . 75 A.2 COEFF MATRIX . 75 A.2.1 Sequential . 76 A.2.1.1 General Case . 76 vi A.2.1.2 Rotationally Symmetric . 76 A.2.2 Parallel . 77 A.2.2.1 General Case . 77 A.2.2.2 Rotationally Symmetric . 77 A.3 SOURCE AMPLITUDES MODES . 80 A.3.1 Sequential . 80 A.3.1.1 General Case . 80 A.3.1.2 Rotationally Symmetric . 80 A.3.2 Parallel . 82 A.3.2.1 General Case . 82 A.3.2.2 Rotationally Symmetric . 83 A.4 SOURCE POWER . 84 A.4.1 Sequential . 86 A.4.1.1 General Case . 86 A.4.1.2 Rotationally Symmetric . 87 A.4.2 Parallel . 87 A.4.2.1 General Case . 87 A.4.2.2 Rotationally Symmetric . 90 A.5 MODAL RESISTANCE . 91 A.5.1 Sequential . 91 A.5.1.1 General Case . 91 A.5.2 Rotationally Symmetric . 92 A.5.3 Parallel . 92 vii A.5.3.1 General Case . 92 A.5.3.2 Rotationally Symmetric . 92 References :::::::::::::::::::::::::::::::::::::::: 93 viii List of Figures 1.1 Radix 2 element interaction pattern obtained from [18]. 16 3.1 A propeller with three times rotational symmetry [37]. 26 3.2 A four times rotationally symmetric sketch of a propeller. 27 4.1 Initial data distribution assumed in the DFT computation for the case P = m =4................................... 39 4.2 The DFT computation for the case P = m = 4. Each arrow indicates the communication of a processor's owned submatrix to a neighboring processor in the direction of the arrow. 40 4.3 Parallel block DFT data decomposition for P > m. 42 4.4 Parallel block DFT data decomposition and processor groupings for P > m. 43 4.5 Process illustrating the distributed FFT. Lines crossing to different pro- cessors indicate communication from left to right. Note the output is in reverse bit-reversed order relative to numbering starting at zero; that is, A1 is element 0; A2 is element 1, etc. 47 4.6 Processor grid creation for P=16 and m=4. 48 6.1 Runtime comparison using the DFT algorithm for varying P and N with m =4...................................... 60 6.2 Runtime comparison using the FFT algorithm for varying P and N with m =4...................................... 60 ix 6.3 Speedups using the DFT algorithm for varying P and N with m = 4. 61 6.4 Speedups using the FFT algorithm for varying P and N with m = 4. 61 6.5 Efficiency using the DFT algorithm for varying N and P with m = 4. 63 6.6 Efficiency using the FFT algorithm for varying N and P with m = 4. 64 6.7 Runtime comparison using the DFT algorithm for varying P and N with m =8...................................... 65 6.8 Runtime comparison using the FFT algorithm for varying P and N with m =8...................................... 65 6.9 Speedup comparison using the DFT algorithm for varying P and N when m =8...................................... 67 6.10 Speedup comparison using the FFT algorithm for varying P and N when m =8...................................... 67 6.11 Efficiency comparison using the DFT algorithm for varying P and N when m =8. ................................. 68 6.12 Efficiency comparison using the FFT algorithm for varying P and N when m =8. ................................. 69 1 Chapter 1 Introduction Coupled finite element/boundary element (FE/BE) formulations are commonly used to solve structural-acoustic problems where a vibrating structure is idealized as be- ing submerged in a fluid that extends to infinity in all directions. Typically in (FE/BE) formulations, the structural analysis is performed using the finite element method, and the acoustic analysis is performed using the boundary element method (BEM). The boundary element formulation is advantageous for the acoustic radiation problem be- cause only the outer surface of the structure in contact with the acoustic medium is discretized. This formulation also allows us to neglect meshing the infinite fluid exterior to the structure, as would be required if the finite element method were used instead. Using the BEM, we compute the radiated sound field of a vibrating structure 3 Ω ⊂ R . The main obstacle in computing the sound radiation is solving the linear system of equations to enforce the specified boundary conditions. In the context of the BEM, this requires the solution of a dense, complex linear system. In general, the problem is solved frequency by frequency, and the coefficient matrix for the boundary element analysis is fully populated and exhibits no exploitable structure. The size, N 2, of the coefficient matrix is directly correlated with the level of discretization, N, used for the surface in question. Because acoustic boundary element calculations require approximately six elements per wavelength to produce accurate solutions, the boundary 2 element formulation is limited to relatively low frequencies. For high frequency problems, and for problems which involve large and/or complex surfaces, these matrices are large, dense, and unstructured; therefore, there is little which can be done to alleviate the storage and computational requirements. Iterative solvers and preconditioners have been investigated [4, 5, 28] and are a natural choice for large problems because the cost of direct solvers can become prohibitive. While the computational requirements can be lessened by iterative methods, the storage requirements can still present a problem. One obvious solution is to perform the solve in a distributed memory parallel setting. A distributed memory parallel algorithm distributes the workload and allows the storage of the matrix to be split between many individual systems with local memories, thereby increasing the total available memory. In addition, because linear systems are ubiquitous throughout scientific computation, libraries exist for their efficient parallel solution. In particular, because the matrix is dense, Scalable LAPACK (ScaLAPACK) [6] is a favored choice. While in general these matrices exhibit no exploitable structure, when the bound- ary surface is rotationally symmetric, the coefficient matrix is block circulant. Circulant matrices are defined as each row being a circular shift of the row above it. One property of circulant matrices is that they are all diagonalizable by the Fourier matrix. There- fore, the Discrete or Fast Fourier Transform (D/FFT) can be used in the solution of the system. These results generalize to the block case and can be used in the solution of block circulant linear systems arising from acoustic radiation problems involving ro- tationally symmetric boundary surfaces. In addition, the inversion formula for block circulant matrices is highly amendable to parallel computation.