SVD TR1690.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Computer Sciences Department Computing the Singular Value Decomposition of 3 x 3 matrices with minimal branching and elementary floating point operations Aleka McAdams Andrew Selle Rasmus Tamstorf Joseph Teran Eftychios Sifakis Technical Report #1690 May 2011 Computing the Singular Value Decomposition of 3 × 3 matrices with minimal branching and elementary floating point operations Aleka McAdams1,2 Andrew Selle1 Rasmus Tamstorf1 Joseph Teran2,1 Eftychios Sifakis3,1 1 Walt Disney Animation Studios 2 University of California, Los Angeles 3 University of Wisconsin, Madison Abstract The algorithm first determines the factor V by computing the eige- nanalysis of the matrix AT A = VΣ2VT which is symmetric A numerical method for the computation of the Singular Value De- and positive semi-definite. This is accomplished via a modified composition of 3 × 3 matrices is presented. The proposed method- Jacobi iteration where the Jacobi factors are approximated using ology robustly handles rank-deficient matrices and guarantees or- inexpensive, elementary arithmetic operations as described in sec- thonormality of the computed rotational factors. The algorithm is tion 2. Since the symmetric eigenanalysis also produces Σ2, and tailored to the characteristics of SIMD or vector processors. In par- consequently Σ itself, the remaining factor U can theoretically be ticular, it does not require any explicit branching beyond simple obtained as U = AVΣ−1; however this process is not applica- conditional assignments (as in the C++ ternary operator ?:, or the ble when A (and as a result, Σ) is singular, and can also lead to SSE4.1 instruction VBLENDPS), enabling trivial data-level paral- substantial loss of orthogonality in U for an ill-conditioned, yet lelism for any number of operations. Furthermore, no trigonometric nonsingular, matrix A. Another possibility is to form AV = UΣ or other expensive operations are required; the only floating point and observe that this matrix contains the columns of U, each scaled operations utilized are addition, multiplication, and an inexact (yet by the respective diagonal entry of Σ. The Gram-Schmidt process fast) reciprocal square root which is broadly available on current could generate U as the orthonormal basis for AV, yet this method SIMD/vector architectures. The performance observed approaches would still suffer from instability in the case of near-zero singu- the limit of making the 3 × 3 SVD a memory-bound (as opposed to lar values. Our approach is based on the QR factorization of the CPU-bound) operation on current SMP platforms. matrix AV, using Givens rotations. With proper attention to some special cases, as described in section 4, the Givens QR procedure Keywords: singular value decomposition, Jacobi eigenvalue algo- is guaranteed to produce a factor Q(= U) which is exactly orthog- rithm onal by construction, while the upper-triangular factor R will be shown to be in fact diagonal and identical to Σ up to sign flips of the diagonal entries. 1 Method overview The novel contribution of our approach lies in the computation of Let A be a real-valued, 3 × 3 matrix. A factorization of A as the Jacobi/Givens rotations without the use of expensive arithmetic such as trigonometric functions, square roots or even division; the T only necessary operations are addition, multiplication and an in- A = UΣV exact reciprocal square root function (which is available in many architectures and is often faster not only than square root, but stan- is guaranteed to exist, where U and V are 3×3 real orthogonal ma- dard division as well). The method is robust for any real matrix, trices and Σ is a 3 × 3 diagonal matrix with real and nonnegative and converges to a given accuracy within a fixed small number of diagonal entries. Since the matrix product UΣVT remains invari- iterations. ant if the same permutation is applied to the columns of U,V and to the diagonal entries of Σ, a common convention is to choose Σ so that its diagonal entries appear in non-increasing order. The exact 2 Symmetric eigenanalysis convention followed in our method is slightly different, specifically: T • The orthogonal factors U and V will be true rotation matrices The first step in computing the decomposition A = UΣV is to by construction (i.e. det(U) = det(V) = 1). This is con- compute the eigenanalysis of the symmetric, positive semidefinite T 2 T trasted to the possibility of U or V having a determinant of matrix S = A A = VΣ V . This will be performed by a variant −1, which corresponds to a rotation combined with a reflec- of the Jacobi eigenvalue algorithm [Golub and van Loan 1989]. tion. • The diagonal entries of Σ will be sorted in decreasing order of 2.1 Jacobi iteration magnitude, but will not necessarily be non-negative (relaxing the non-negativity constraint is necessary to allow U and V to We provide a summary presentation of the classical Jacobi eigen- be true rotations, since the determinant of A could be either value algorithm; for a more detailed exposition the reader is referred positive or negative). More specifically, the singular value to [Golub and van Loan 1989]. The Jacobi process constructs a se- with the smallest magnitude (σ3, or Σ33) will have the same quence of similarity transforms sign as det(A), while the two larger singular values σ1, σ2 will be non-negative. These conventions are motivated by applications in graphics which S(k+1) = [Q(k)]T S(k)Q(k). require the orthogonal matrices U and V to correspond to real 3D spatial rotations. In any case, different conventions can be enforced as a post process with simple manipulations such as negating and/or permuting singular values and vectors. Each matrix Q(k) is constructed as a Givens rotation, of the form 0 1 ··· 0 ··· 0 ··· 0 1 „ c s «„ a a «„ c −s « B = QT AQ = 11 12 B . .. C . −s c a12 a22 s c B . C . B C B 0 ··· c · · · −s ··· 0 C ← row p „ 2 2 2 2 « B C c a11 + 2csa12 + s a22 cs(a22 −a11) + (c −s )a12 B . C . = 2 2 2 2 Q(p, q, θ) = B . .. C . cs(a22 −a11) + (c −s )a12 s a11 − 2csa12 + c a22 B C B 0 ··· s ··· c ··· 0 C ← row q B C B . C . is a diagonal matrix. Therefore, we need to enforce that @ . .. A . 0 ··· 0 ··· 0 ··· 1 2 2 b12 = b21 = cs(a22 −a11) + (c −s )a12 = 0 where c = cos(θ) and s = sin(θ). The objective of every conjuga- tion with a Given matrix Q(k) = Q(p, q, θ) is to bring the next iter- If a22−√a11 =0, we can simply select θ = π/4 (or equivalently, c = ate S(k+1) closer to a diagonal form, by eliminating the off-diagonal s = 1/ 2). Otherwise, the previous condition can be rewritten as: (k) (k) (k+1) (k+1) entries spq and sqp (i.e. by enforcing that spq = sqp = 0). It can be shown that cs a12 2 2 = c − s a11 − a22 X (k+1) 2 X (k) 2 (k) 2 [s ] = [s ] − 2[s ] . 2 cos θ sin θ 2a12 ij ij pq = 2 2 i6=j i6=j cos θ − sin θ a11 − a22 2 tan θ 2a12 2 = (1) Therefore, after each Jacobi iteration, the sum of squared off- 1 − tan θ a11 − a22 diagonal entries of S is reduced by the sum of squares of the two 2a12 off-diagonal annihilated entries. There is a natural choice of the tan(2θ) = (2) a11 − a22 entries to be eliminated that can easily seen to generate a conver- gent process: if we annihilate the maximum off-diagonal entry of 1 an 3 × 3 we are guaranteed to annihilate at least 1/3 of the off- From equation (2) we have θ = 2 arctan(2a12/(a11−a22)). If the diagonal sum-of-squares (in fact, at each iteration after the first, arc-tangent function returns value in the interval (−π/2, π/2), this this reduction will be at least 1/2 of the previous sum of squares, expression will guarantee |θ| < π/4 as required for convergence of since previous iterations leave only one other non-zero off-diagonal Cyclic Jacobi. Naturally, the use of (forward and inverse) trigono- pair). metric functions will significantly increase the cost of this computa- tion. An alternative technique is described in [Golub and van Loan The Jacobi iteration rapidly drives the iterates S(k) to a diagonal 1989] where the quadratic equation (1) is solved to yield tan θ di- form. For 3×3 matrices, 10−15 iterations are typically sufficient to rectly, from which the sine and cosine are computed algebraically. diagonalize S to within single-precision roundoff error. Although This approach still requires a minimum of two reciprocals, one the previous argument suggests at least linear convergence, the or- square root and one exact reciprocal-of-square-root in addition to der becomes in fact quadratic once S(k) has been brought somewhat any multiply/add operations (an extra reciprocal plus a square root close to diagonal form. Additionally, it can be shown that the same will be needed if branching instructions are to be avoided). asymptotic order of convergence is attained even if the off-diagonal Our proposed optimization quickly computes an approximate form elements to be eliminated are selected in a fixed, cyclic order, i.e. of the Givens rotation, where inaccuracy may be introduced both in the angle computation, as well as a constant scaling of the ro- (p, q) = (1, 2), (1, 3), (2, 3), (1, 2), (1, 3),... tation matrix. However, the nature of this approximation is such that the Jacobi procedure is only impacted by a minimal decelera- tion, while the accuracy of the converged result is not compromised.