(DOA) of Signals Problem

Total Page:16

File Type:pdf, Size:1020Kb

(DOA) of Signals Problem Calhoun: The NPS Institutional Archive Theses and Dissertations Thesis Collection 1992-09 Fast high-resolution techniques applied to the Direction of Arrival (DOA) of signals problem Ferreira Filho, Milton Pinto Monterey, California. Naval Postgraduate School http://hdl.handle.net/10945/23673 3, v ARY /AL > JATE SCHOOI NTERt, OA 93943-5101 UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE REPORT DOCUMENTATION PAGE la REPORT SECURITY CLASSIFICATION lb RESTRICTIVE MARKINGS UNCLASSIFIED 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT Approved for public release; distribution is unlimited. 2b. DECLASSIFICATION/DOWNGRADING SCHEDULE 4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5 MONITORING ORGANIZATION REPORT NUMBER(S) 6a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a NAME OF MONITORING ORGANIZATION Naval Postgraduate School (If applicable) Naval Postgraduate School 3A 6c ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, State, and ZIP Code) Monterey, CA 93943-5000 Monterey, CA 93943-5000 8a NAME OF FUNDING/SPONSORING 8b OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) 8c ADDRESS (Crty, State, and ZIP Code) 10 SOURCE OF FUNDING NUMBERS Program Element No Protect No Task No. Work Unit Accession Number 1 1 . TITLE (Include Security Classification) FAST HIGH-RESOLUTION TECHNIQUES APPLIED TO THE DIRECTION OF ARRIVAL (DOA) OF SIGNALS PROBLEM 12. PERSONAL AUTHOR(S) Milton Pinto Ferreira 13a. TYPE OF REPORT 13b TIME COVERED 14 DATE OF REPORT (year, month, day) 15. PAGE COUNT Master's Thesis From To September 1992 113 1 6. SUPPLEMENTARY NOTATION The views ex pressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. 17.COSATICODES 1 8 SUBJECT TERMS (continue on reverse if necessary and identify by block number) FIELD GROUP SUBGROUP High-Resolution, Minimum Norm, Direction of Arrival, Angle of Arrival, Eigenstructure, Autocorrelation Matrix, Correlation Matrix, Tracking, Moving sources. 1 9. ABSTRACT (continue on reverse if necessary and identify by block number) Subspace decomposition methods are a very useful technique to extract the signal information via eigen-based estimators. Although those techniques are very accurate, they are usually expensive to update, becoming difficult to implement for real-time applications. The Rank- Revealing QR (RRQR) factorization introduced by Chan, offers an attractive alternative to perform the subspace selection. In this work, we use the RRQR algorithm applied to the Direction of Arrival (DOA) problem to track moving sources, using passive linear arrays. In addition to the regular RRQR algorithm originally proposed by Chan, this thesis introduces an improvement. This refinement uses the signal subspace information and requires very little additional computation. It takes advantage of the Hermitian property of the signal correlation matrix and is implicity equivalent to applying one subspace iteration step to the estimated signal subspace. Simulations show that the performance obtained is equivalent to that obtained using classical eigen-based techniques. Alternative algorithms for finding an approximation of the smallest singular vector of the correlation matrix are discussed and compared to the original method. The final product is an adaptive algorithm that allows for tracking of DOA angles when moving sources are present. An analysis of the number of flops needed to execute the adaptive algorithm based on the RRQR factorization to track the DOA of moving sources has been included, showing its feasibility for being used in real-time implementations. 20. DISTRIBUTION/AVAILABILITY OF ABSTRACT 21 . ABSTRACT SECURITY CLASSIFICATION Fl UNCLASSIFIED/UNLIMITED ] SAME AS REPORT ] OTIC USERS Unclassified 22a. NAME OF RESPONSIBLE INDIVIDUAL 22b TELEPHONE (Include Area code) 22c. OFFICE SYMBOL Monique P. Fargues (408)646 2859 EC/Fa DD FORM 1473, 84 MAR 83 APR edition may be used until exhausted SECURITY CLASSIFICATION OF THIS PAGE All other editions are obsolete Unclassified T^Q/Qi; Approved for public release; distribution is unlimited. Fast High-Resolution Techniques Applied to the Direction of Arrival (DOA) of Signals Problem by Milton Pinto Ferreira Filho Lieutenant Commander, Brazil Navy B.S., Brazil Naval Academy BSEE, University of Sao Paulo, Brazil Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING from the ABSTRACT Subspace decomposition methods are a very useful technique to extract the signal information via eigen-based estimators. Although those techniques are very accurate, they are usually expensive to update, becoming difficult to implement for real-time applications. The Rank-Revealing QR (RRQR) factorization introduced by Chan, offers an attractive alternative to perform the subspace selection. In this work, we use the RRQR algorithm applied to the Direction of Arrival (DOA) problem to track moving sources, using passive linear arrays. In addition to the regular RRQR algorithm originally proposed by Chan, this thesis introduces an improvement. This refinement uses the signal subspace information and requires very little additional computation. It takes advantage of the Hermitian property of the signal correlation matrix and is implicitly equivalent to applying one subspace iteration step to the estimated signal subspace. Simulations show that the performance obtained is equivalent to that obtained using classical eigen-based techniques. Alternative algorithms for finding an approximation of the smallest singular vector of the correlation matrix are discussed and compared to the original method. The final product is an adaptive algorithm that allows for tracking of DOA angles when moving sources are present. An analysis of the number of flops needed to execute the adaptive algorithm based on the RRQR factorization to track the DOA of moving sources has been included, showing its feasibility for being used in real-time implementations. in TABLE OF CONTENTS I. INTRODUCTION 1 A. PROBLEM CONTEXT 1 B. THEORETICAL BACKGROUND 2 II. THE RANK -REVEALING QR (RRQR) FACTORIZATION ALGORITHM 7 A. THE RANK -REVEALING QR ALGORITHM 7 B. USING THE RRQR ALGORITHM TO FIND THE NOISE AND SIGNAL SUBSPACES OF THE NOISE- FREE AUTOCORRELATION MATRIX 16 C. USING THE GIVENS ROTATION TO REFACTOR RU P ... 17 III. METHODOLOGY FOR APPROXIMATING THE SMALLEST RIGHT SINGULAR VECTOR 19 A. THE INVERSE ITERATION METHOD TO FIND THE SMALLEST SINGULAR VECTOR 19 B. THE INVERSE ITERATION METHOD TO FIND THE SMALLEST EIGENVECTOR 24 C. THE INCREMENTAL CONDITION ESTIMATOR 3 IV. COMPARISON BETWEEN THE PERFORMANCE OF THE RRQR ALGORITHM ITERATED FROM DIMENSION n UNTIL n-m+1 AND IV DUULtT l\NUA LIDrVATM NAVAL POSTGRADUATE SCHOOI MONTEREY CA 93943-5101 FROM DIMENSION n UNTIL n-r+1 3 8 V. COMPARING THE PERFORMANCE OF THE MINIMUM NORM AND MUSIC SPECTRAL ESTIMATORS 44 VI. USING THE ADAPTIVE RRQR ALGORITHM TO TRACK A MOVING SIGNAL 51 A. THE ALGORITHM 51 B. REFINEMENT ON THE RESULTS OF THE ADAPTIVE CASE 5 8 C. SIMULATION RESULTS 59 D. COMPUTATIONAL EFFICIENCY OF THE RRQR APPROXIMATION 70 E. INTEGRATING THE RRQR ALGORITHM INTO A REAL-TIME CASE 76 VII. CONCLUSIONS 79 APPENDIX A 81 APPENDIX B 83 APPENDIX C 84 APPENDIX D 85 APPENDIX E 86 APPENDIX F 87 APPENDIX G 88 APPENDIX H 89 APPENDIX I 90 APPENDIX J 92 APPENDIX K 93 APPENDIX L 96 APPENDIX M 97 LIST OF REFERENCES 9 8 INITIAL DISTRIBUTION LIST 100 VI LIST OF TABLES Table 1: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST SING. VECTORS, RANDOM MATRIX, 1000 TRIALS 21 Table 2 : INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST SING. VECTOR, RANDOM MATRIX WITH LOWER TRIANGLE PORTION EQUAL TO ZERO, 1000 TRIALS. ... 22 Table 3 : INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST SING. VECTORS, UPPER TRIANGULAR MATRIX RESULTING FROM QR FACT. W/O PIVOTING, 1000 TRIALS. 22 Table 4: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST SING. VECTORS, UPPER TRIANGULAR MATRIX RESULTING FROM QR FACT. W/ PIVOTING, 1000 TRIALS. 23 Table 5: COMPARISON BETWEEN THE FOUR CASES EXAMINED IN TABLES 1 THROUGH 4 23 Table 6: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST EIGENVECTORS, RANDOM MATRIX, 1000 TRIALS 2 6 Table 7: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST EIGENVECTORS, RANDOM MATRIX WITH LOWER TRIANGLE PORTION EQUAL TO ZERO, 1000 TRIALS. ... 26 Table 8: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST EIGENVECTOR, UPPER TRIANGULAR MATRIX RESULTING FROM QR FACT. W/O PIVOTING, 1000 TRIALS. 27 VII Table 9 : INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST EIGENVECTORS, UPPER TRIANGULAR MATRIX RESULTING FROM QR FACT. W/ PIVOTING, 1000 TRIALS. 27 Table 10: COMPARISON BETWEEN THE FOUR CASES CONSIDERED FOR THE SVD VERSUS THE EIGENVECTOR INVERSE ITERATION METHOD 28 Table 11: RESULTS FOR THE HYPOTHESIS TEST CONDUCTED ON THE COMPARISON BETWEEN THE SVD VERSUS THE EIGENVECTOR INVERSE ITERATION METHOD 3 Table 12: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST SING. VECTORS, RANDOM MATRIX, 1000 TRIALS 33 Table 13: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST SING. VECTORS, RANDOM MATRIX WITH LOWER TRIANGLE PORTION EQUAL TO ZERO, 1000 TRIALS. ... 34 Table 14: INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST SING. VECTORS, UPPER TRIANGULAR MATRIX RESULTING FROM QR FACT. W/O PIVOTING, 1000 TRIALS. 34 Table 15 : INNER PRODUCT MAGNITUDE BETWEEN ESTIMATED AND TRUE SMALLEST
Recommended publications
  • 2 Homework Solutions 18.335 " Fall 2004
    2 Homework Solutions 18.335 - Fall 2004 2.1 Count the number of ‡oating point operations required to compute the QR decomposition of an m-by-n matrix using (a) Householder re‡ectors (b) Givens rotations. 2 (a) See Trefethen p. 74-75. Answer: 2mn2 n3 ‡ops. 3 (b) Following the same procedure as in part (a) we get the same ‘volume’, 1 1 namely mn2 n3: The only di¤erence we have here comes from the 2 6 number of ‡opsrequired for calculating the Givens matrix. This operation requires 6 ‡ops (instead of 4 for the Householder re‡ectors) and hence in total we need 3mn2 n3 ‡ops. 2.2 Trefethen 5.4 Let the SVD of A = UV : Denote with vi the columns of V , ui the columns of U and i the singular values of A: We want to …nd x = (x1; x2) and such that: 0 A x x 1 = 1 A 0 x2 x2 This gives Ax2 = x1 and Ax1 = x2: Multiplying the 1st equation with A 2 and substitution of the 2nd equation gives AAx2 = x2: From this we may conclude that x2 is a left singular vector of A: The same can be done to see that x1 is a right singular vector of A: From this the 2m eigenvectors are found to be: 1 v x = i ; i = 1:::m p2 ui corresponding to the eigenvalues = i. Therefore we get the eigenvalue decomposition: 1 0 A 1 VV 0 1 VV = A 0 p2 U U 0 p2 U U 3 2.3 If A = R + uv, where R is upper triangular matrix and u and v are (column) vectors, describe an algorithm to compute the QR decomposition of A in (n2) time.
    [Show full text]
  • A Fast Algorithm for the Recursive Calculation of Dominant Singular Subspaces N
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Journal of Computational and Applied Mathematics 218 (2008) 238–246 www.elsevier.com/locate/cam A fast algorithm for the recursive calculation of dominant singular subspaces N. Mastronardia,1, M. Van Barelb,∗,2, R. Vandebrilb,2 aIstituto per le Applicazioni del Calcolo, CNR, via Amendola122/D, 70126, Bari, Italy bDepartment of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgium Received 26 September 2006 Abstract In many engineering applications it is required to compute the dominant subspace of a matrix A of dimension m × n, with m?n. Often the matrix A is produced incrementally, so all the columns are not available simultaneously. This problem arises, e.g., in image processing, where each column of the matrix A represents an image of a given sequence leading to a singular value decomposition-based compression [S. Chandrasekaran, B.S. Manjunath, Y.F. Wang, J. Winkeler, H. Zhang, An eigenspace update algorithm for image analysis, Graphical Models and Image Process. 59 (5) (1997) 321–332]. Furthermore, the so-called proper orthogonal decomposition approximation uses the left dominant subspace of a matrix A where a column consists of a time instance of the solution of an evolution equation, e.g., the flow field from a fluid dynamics simulation. Since these flow fields tend to be very large, only a small number can be stored efficiently during the simulation, and therefore an incremental approach is useful [P. Van Dooren, Gramian based model reduction of large-scale dynamical systems, in: Numerical Analysis 1999, Chapman & Hall, CRC Press, London, Boca Raton, FL, 2000, pp.
    [Show full text]
  • Numerical Linear Algebra Revised February 15, 2010 4.1 the LU
    Numerical Linear Algebra Revised February 15, 2010 4.1 The LU Decomposition The Elementary Matrices and badgauss In the previous chapters we talked a bit about the solving systems of the form Lx = b and Ux = b where L is lower triangular and U is upper triangular. In the exercises for Chapter 2 you were asked to write a program x=lusolver(L,U,b) which solves LUx = b using forsub and backsub. We now address the problem of representing a matrix A as a product of a lower triangular L and an upper triangular U: Recall our old friend badgauss. function B=badgauss(A) m=size(A,1); B=A; for i=1:m-1 for j=i+1:m a=-B(j,i)/B(i,i); B(j,:)=a*B(i,:)+B(j,:); end end The heart of badgauss is the elementary row operation of type 3: B(j,:)=a*B(i,:)+B(j,:); where a=-B(j,i)/B(i,i); Note also that the index j is greater than i since the loop is for j=i+1:m As we know from linear algebra, an elementary opreration of type 3 can be viewed as matrix multipli- cation EA where E is an elementary matrix of type 3. E looks just like the identity matrix, except that E(j; i) = a where j; i and a are as in the MATLAB code above. In particular, E is a lower triangular matrix, moreover the entries on the diagonal are 1's. We call such a matrix unit lower triangular.
    [Show full text]
  • Massively Parallel Poisson and QR Factorization Solvers
    Computers Math. Applic. Vol. 31, No. 4/5, pp. 19-26, 1996 Pergamon Copyright~)1996 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0898-1221/96 $15.00 + 0.00 0898-122 ! (95)00212-X Massively Parallel Poisson and QR Factorization Solvers M. LUCK£ Institute for Control Theory and Robotics, Slovak Academy of Sciences DdbravskA cesta 9, 842 37 Bratislava, Slovak Republik utrrluck@savba, sk M. VAJTERSIC Institute of Informatics, Slovak Academy of Sciences DdbravskA cesta 9, 840 00 Bratislava, P.O. Box 56, Slovak Republic kaifmava©savba, sk E. VIKTORINOVA Institute for Control Theoryand Robotics, Slovak Academy of Sciences DdbravskA cesta 9, 842 37 Bratislava, Slovak Republik utrrevka@savba, sk Abstract--The paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder re- flections and Givens rotations. The computer model under consideration is a SIMD mesh-connected toroidal n x n processor array. The Dirichlet problem is replaced by its finite-difference analog on an M x N (M + 1, N are powers of two) grid. The algorithm is composed of parallel fast sine transform and cyclic odd-even reduction blocks and runs in a fully parallel fashion. Its computational complexity is O(MN log L/n2), where L = max(M + 1, N). A parallel proposal of QI~ factorization by the Householder method zeros all subdiagonal elements in each column and updates all elements of the given submatrix in parallel. For the second method with Givens rotations, the parallel scheme of the Sameh and Kuck was chosen where the disjoint rotations can be computed simultaneously.
    [Show full text]
  • A Survey of Direct Methods for Sparse Linear Systems
    A survey of direct methods for sparse linear systems Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar Technical Report, Department of Computer Science and Engineering, Texas A&M Univ, April 2016, http://faculty.cse.tamu.edu/davis/publications.html To appear in Acta Numerica Wilkinson defined a sparse matrix as one with enough zeros that it pays to take advantage of them.1 This informal yet practical definition captures the essence of the goal of direct methods for solving sparse matrix problems. They exploit the sparsity of a matrix to solve problems economically: much faster and using far less memory than if all the entries of a matrix were stored and took part in explicit computations. These methods form the backbone of a wide range of problems in computational science. A glimpse of the breadth of applications relying on sparse solvers can be seen in the origins of matrices in published matrix benchmark collections (Duff and Reid 1979a) (Duff, Grimes and Lewis 1989a) (Davis and Hu 2011). The goal of this survey article is to impart a working knowledge of the underlying theory and practice of sparse direct methods for solving linear systems and least-squares problems, and to provide an overview of the algorithms, data structures, and software available to solve these problems, so that the reader can both understand the methods and know how best to use them. 1 Wilkinson actually defined it in the negation: \The matrix may be sparse, either with the non-zero elements concentrated ... or distributed in a less systematic manner. We shall refer to a matrix as dense if the percentage of zero elements or its distribution is such as to make it uneconomic to take advantage of their presence." (Wilkinson and Reinsch 1971), page 191, emphasis in the original.
    [Show full text]
  • Parallel Codes for Computing the Numerical Rank '
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector LINEAR ALGEBRA AND ITS APPLICATIONS ELSEIVIER Linear Algebra and its Applications 275~ 276 (1998) 451 470 Parallel codes for computing the numerical rank ’ Gregorio Quintana-Orti ***, Enrique S. Quintana-Orti ’ Abstract In this paper we present an experimental comparison of several numerical tools for computing the numerical rank of dense matrices. The study includes the well-known SVD. the URV decomposition, and several rank-revealing QR factorizations: the QR factorization with column pivoting, and two QR factorizations with restricted column pivoting. Two different parallel programming methodologies are analyzed in our paper. First, we present block-partitioned algorithms for the URV decomposition and rank-re- vealing QR factorizations which provide efficient implementations on shared memory environments. Furthermore, we also present parallel distributed algorithms, based on the message-passing paradigm, for computing rank-revealing QR factorizations on mul- ticomputers. 0 1998 Elsevier Science Inc. All rights reserved. 1. Introduction Consider the matrix A E KY’“” (w.1.o.g. m 3 n), and define its rank as 1’ := rank(il) = dim(range(A)). The rank of a matrix is related to its singular values a(A) = {crt, 02,. , o,,}, p = min(m, H), since *Corresponding author. E-mail: [email protected]. ’ All authors were partially supported by the Spanish CICYT project TIC 96-1062~CO3-03. ’ This work was partially carried out while the author was visiting the Mathematics and Computer Science Division at Argonne National Laboratory. ’ E-mail: [email protected].
    [Show full text]
  • The Sparse Matrix Transform for Covariance Estimation and Analysis of High Dimensional Signals
    The Sparse Matrix Transform for Covariance Estimation and Analysis of High Dimensional Signals Guangzhi Cao*, Member, IEEE, Leonardo R. Bachega, and Charles A. Bouman, Fellow, IEEE Abstract Covariance estimation for high dimensional signals is a classically difficult problem in statistical signal analysis and machine learning. In this paper, we propose a maximum likelihood (ML) approach to covariance estimation, which employs a novel non-linear sparsity constraint. More specifically, the covariance is constrained to have an eigen decomposition which can be represented as a sparse matrix transform (SMT). The SMT is formed by a product of pairwise coordinate rotations known as Givens rotations. Using this framework, the covariance can be efficiently estimated using greedy optimization of the log-likelihood function, and the number of Givens rotations can be efficiently computed using a cross- validation procedure. The resulting estimator is generally positive definite and well-conditioned, even when the sample size is limited. Experiments on a combination of simulated data, standard hyperspectral data, and face image sets show that the SMT-based covariance estimates are consistently more accurate than both traditional shrinkage estimates and recently proposed graphical lasso estimates for a variety of different classes and sample sizes. An important property of the new covariance estimate is that it naturally yields a fast implementation of the estimated eigen-transformation using the SMT representation. In fact, the SMT can be viewed as a generalization of the classical fast Fourier transform (FFT) in that it uses “butterflies” to represent an orthonormal transform. However, unlike the FFT, the SMT can be used for fast eigen-signal analysis of general non-stationary signals.
    [Show full text]
  • Chapter 3 LEAST SQUARES PROBLEMS
    Chapter 3 LEAST SQUARES PROBLEMS E B D C A the sea F One application is geodesy & surveying. Let z = elevation, and suppose we have the mea- surements: zA ≈ 1., zB ≈ 2., zC ≈ 3., zB − zA ≈ 1., zC − zB ≈ 2., zC − zA ≈ 1. 71 This is overdetermined and inconsistent: 0 1 0 1 100 1. B C 0 1 B . C B 010C z B 2 C B C A B . C B 001C @ z A ≈ B 3 C . B − C B B . C B 110C z B 1 C @ 0 −11A C @ 2. A −101 1. Another application is fitting a curve to given data: y y=ax+b x 2 3 2 3 x1 1 y1 6 7 6 7 6 x2 1 7 a 6 y2 7 6 . 7 ≈ 6 . 7 . 4 . 5 b 4 . 5 xn 1 yn More generally Am×nxn ≈ bm where A is known exactly, m ≥ n,andb is subject to inde- pendent random errors of equal variance (which can be achieved by scaling the equations). Gauss (1821) proved that the “best” solution x minimizes kb − Axk2, i.e., the sum of the squares of the residual components. Hence, we might write Ax ≈2 b. Even if the 2-norm is inappropriate, it is the easiest to work with. 3.1 The Normal Equations 3.2 QR Factorization 3.3 Householder Reflections 3.4 Givens Rotations 3.5 Gram-Schmidt Orthogonalization 3.6 Singular Value Decomposition 72 3.1 The Normal Equations Recall the inner product xTy for x, y ∈ Rm. (What is the geometric interpretation of the 3 m T inner product in R ?) A sequence of vectors x1,x2,...,xn in R is orthogonal if xi xj =0 T if and only if i =6 j and orthonormal if xi xj = δij.Twosubspaces S, T are orthogonal if T T x ∈ S, y ∈ T ⇒ x y =0.IfX =[x1,x2,...,xn], then orthonormal means X X = I.
    [Show full text]
  • SVD TR1690.Pdf
    Computer Sciences Department Computing the Singular Value Decomposition of 3 x 3 matrices with minimal branching and elementary floating point operations Aleka McAdams Andrew Selle Rasmus Tamstorf Joseph Teran Eftychios Sifakis Technical Report #1690 May 2011 Computing the Singular Value Decomposition of 3 × 3 matrices with minimal branching and elementary floating point operations Aleka McAdams1,2 Andrew Selle1 Rasmus Tamstorf1 Joseph Teran2,1 Eftychios Sifakis3,1 1 Walt Disney Animation Studios 2 University of California, Los Angeles 3 University of Wisconsin, Madison Abstract The algorithm first determines the factor V by computing the eige- nanalysis of the matrix AT A = VΣ2VT which is symmetric A numerical method for the computation of the Singular Value De- and positive semi-definite. This is accomplished via a modified composition of 3 × 3 matrices is presented. The proposed method- Jacobi iteration where the Jacobi factors are approximated using ology robustly handles rank-deficient matrices and guarantees or- inexpensive, elementary arithmetic operations as described in sec- thonormality of the computed rotational factors. The algorithm is tion 2. Since the symmetric eigenanalysis also produces Σ2, and tailored to the characteristics of SIMD or vector processors. In par- consequently Σ itself, the remaining factor U can theoretically be ticular, it does not require any explicit branching beyond simple obtained as U = AVΣ−1; however this process is not applica- conditional assignments (as in the C++ ternary operator ?:, or the ble when A (and as a result, Σ) is singular, and can also lead to SSE4.1 instruction VBLENDPS), enabling trivial data-level paral- substantial loss of orthogonality in U for an ill-conditioned, yet lelism for any number of operations.
    [Show full text]
  • Efficient Scalable Algorithms for Hierarchically Semiseparable Matrices
    EFFICIENT SCALABLE ALGORITHMS FOR HIERARCHICALLY SEMISEPARABLE MATRICES SHEN WANG¤, XIAOYE S. LIy , JIANLIN XIAz , YINGCHONG SITUx , AND MAARTEN V. DE HOOP{ Abstract. Hierarchically semiseparable (HSS) matrix structures are very useful in the develop- ment of fast direct solvers for both dense and sparse linear systems. For example, they can built into parallel sparse solvers for Helmholtz equations arising from frequency domain wave equation mod- eling prevailing in the oil and gas industry. Here, we study HSS techniques in large-scale scienti¯c computing by considering the scalability of HSS structures and by developing massively parallel HSS algorithms. These algorithms include parallel rank-revealing QR factorizations, HSS constructions with hierarchical compression, ULV HSS factorizations, and HSS solutions. BLACS and ScaLAPACK are used as our portable libraries. We construct contexts (sub-communicators) on each node of the HSS tree and exploit the governing 2D-block-cyclic data distribution scheme widely used in ScaLAPACK. Some tree techniques for symmetric positive de¯nite HSS matrices are generalized to nonsymmetric problems in parallel. These parallel HSS methods are very useful for sparse solutions when they are used as kernels for intermediate dense matrices. Numerical examples (including 3D ones) con¯rm the performance of our implementations, and demonstrate the potential of structured methods in parallel computing, especially parallel direction solutions of large linear systems. Related techniques are also useful for the parallelization of other rank structured methods. Key words. HSS matrix, parallel HSS algorithm, low-rank property, compression, HSS con- struction, direct solver AMS subject classi¯cations. 15A23, 65F05, 65F30, 65F50 1. Introduction. In recent years, rank structured matrices have attracted much attention and have been widely used in the fast solutions of various partial di®erential equations, integral equations, and eigenvalue problems.
    [Show full text]
  • Discontinuous Plane Rotations and the Symmetric Eigenvalue Problem
    December 4, 2000 Discontinuous Plane Rotations and the Symmetric Eigenvalue Problem Edward Anderson Lockheed Martin Services Inc. [email protected] Elementary plane rotations are one of the building blocks of numerical linear algebra and are employed in reducing matrices to condensed form for eigenvalue computations and during the QR algorithm. Unfortunately, their implementation in standard packages such as EISPACK, the BLAS and LAPACK lack the continuity of their mathematical formulation, which makes results from software that use them sensitive to perturbations. Test cases illustrating this problem will be presented, and reparations to the standard software proposed. 1.0 Introduction Unitary transformations are frequently used in numerical linear algebra software to reduce dense matrices to bidiagonal, tridiagonal, or Hessenberg form as a preliminary step towards finding the eigenvalues or singular values. Elementary plane rotation matrices (also called Givens rotations) are used to selectively reduce values to zero, such as when reducing a band matrix to tridiagonal form, or during the QR algorithm when attempting to deflate a tridiagonal matrix. The mathematical properties of rota- tions were studied by Wilkinson [9] and are generally not questioned. However, Wilkinson based his analysis on an equation for computing a rotation matrix that is different from those in common use in the BLAS [7] and LAPACK [2]. While the classical formula is continuous in the angle of rotation, the BLAS and LAPACK algorithms have lines of discontinuity that can lead to abrupt changes in the direction of computed eigenvectors. This makes comparing results difficult and proving backward stability impossible. It is easy to restore continuity in the generation of Givens rotations even while scaling to avoid overflow and underflow, and we show how in this report.
    [Show full text]
  • A Parallel Butterfly Algorithm
    A Parallel Butterfly Algorithm The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Poulson, Jack, Laurent Demanet, Nicholas Maxwell, and Lexing Ying. “A Parallel Butterfly Algorithm.” SIAM Journal on Scientific Computing 36, no. 1 (February 4, 2014): C49–C65.© 2014, Society for Industrial and Applied Mathematics. As Published http://dx.doi.org/10.1137/130921544 Publisher Society for Industrial and Applied Mathematics Version Final published version Citable link http://hdl.handle.net/1721.1/88176 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. SIAM J. SCI. COMPUT. c 2014 Society for Industrial and Applied Mathematics Vol. 36, No. 1, pp. C49–C65 ∗ A PARALLEL BUTTERFLY ALGORITHM JACK POULSON†, LAURENT DEMANET‡, NICHOLAS MAXWELL§, AND ¶ LEXING YING Abstract. The butterfly algorithm is a fast algorithm which approximately evaluates a dis- crete analogue of the integral transform Rd K(x, y)g(y)dy at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain d simple geometric condition. In d dimensions with O(N ) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time 2 d of the algorithm is at most O(r N log N). A parallelization of the butterfly algorithm is intro- duced which, assuming a message latency of α and per-process inverse bandwidth of β, executes 2 Nd Nd in at most O(r p log N +(βr p + α)logp)timeusingp processes.
    [Show full text]