Projection and Its Importance in Scientific Computing ______
Total Page:16
File Type:pdf, Size:1020Kb
Projection and its Importance in Scientific Computing ________________________________________________ Stan Tomov EECS Department The University of Tennessee, Knoxville March 22, 2017 COCS 594 Lecture Notes Slide 1 / 41 03/22/2017 Contact information office : Claxton 317 phone : (865) 974-6317 email : [email protected] Additional reference materials: [1] R.Barrett, M.Berry, T.F.Chan, J.Demmel, J.Donato, J. Dongarra, V. Eijkhout, R.Pozo, C.Romine, and H.Van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods (2nd edition) http://netlib2.cs.utk.edu/linalg/html_templates/Templates.html [2] Yousef Saad, Iterative methods for sparse linear systems (1st edition) http://www-users.cs.umn.edu/~saad/books.html Slide 2 / 41 Topics as related to high-performance scientific computing Projection in Scientific Computing Sparse matrices, PDEs, Numerical parallel implementations solution, Tools, etc. Iterative Methods Slide 3 / 41 Topics on new architectures – multicore, GPUs (CUDA & OpenCL), MIC Projection in Scientific Computing Sparse matrices, PDEs, Numerical parallel implementations solution, Tools, etc. Iterative Methods Slide 4 / 41 Outline Part I – Fundamentals Part II – Projection in Linear Algebra Part III – Projection in Functional Analysis (e.g. PDEs) HPC with Multicore and GPUs Slide 5 / 41 Part I Fundamentals Slide 6 / 41 Projection in Scientific Computing [ an example – in solvers for PDE discretizations ] A model leading to self-consistent iteration with need for high-performance diagonalization and orthogonalization routines Slide 7 / 41 What is Projection? Here are two examples (from linear algebra) (from functional analysis) u u = f(x) The error (u – Pu) to be Pu orthogonalThe error to(u vector– Pu) toe be orthogonal to vector e 1 e = 1 Pu e 0 1 P : orthogonal projection of P : best approximation (projection) of f(x) vector u on e in span{ e } ⊂ C[0,1] Slide 8 / 41 Definition Projection is a linear transformation P from a linear space V to itself such that P2 = P equivalently Let V is direct sum of subspaces V1 and V2 V = V1 V2 i.e. for u ∈ V there are unique u1∈ V1 and u2∈V2 s.t. u = u1+u2 Then P: VV1 is defined for u ∈ V as Pu ≡ u1 Slide 9 / 41 Importance in Scientific Computing To compute approximations Pu u where dim V1 << dim V V = V1 V2 When computation directly in V is not feasible or even possible. A few examples: – Interpolation (via polynomial interpolation/projection) – Image compression – Sparse iterative linear solvers and eigensolvers – Finite Element/Volume Method approximations – Least-squares approximations, etc. Slide 10 / 41 Projection in R2 In R2 with Euclidean inner-product, i.e. for x, y ∈ R2 T (x, y) = x1 y1 + x2 y2 ( = y x = x⋅y ) and || x || = (x, x)1/2 u (u, e) Pu = e (Exercise) || e ||2 e Pu i.e. for || e || = 1 P : orthogonal projection of Pu = (u, e) e vector u on e Slide 11 / 41 Projection in Rn / Cn Similarly to R2 P : Orthogonal projection of u into span{e1, ... , em}, m ≤ n. Let ei, i = 1 ... m is orthonormal basis, i.e. (ei, ej) = 0 for i≠j and (ei, ej) = 1 for i=j P u = (u, e1) e1 + ... + (u, em) em ( Exercise ) Orthogonal projection of u on e1 Slide 12 / 41 How to get an orthonormal basis? Can get one from every subspace by Gram-Schmidt orthogonalization: Input : m linearly independent vectors x1, ..., xm Output : m orthonormal vectors x1, ..., xm 1. x1 = x1 / || x1 || 2. do i = 2, m 3. xi = xi - (xi, x1) x1 - ... - (xi, xi-1) xi-1 (Exercise: xi x1 , ... , xi-1) 4. xi = xi / || xi || Orthogonal projection of x on x 5. enddo i 1 Known as Classical Gram-Schmidt (CGM) orthogonalization Slide 13 / 41 How to get an orthonormal basis? What if we replace line 3 with the following (3')? 3. xi = xi - (xi, x1) x1 - ... - (xi, xi-1) xi-1 3'. do j = 1, i-1 xi xi = xi – (xi, xj) xj enddo Equivalent in exact arithmetic (Exercise) but not with round-off errors (next) ! Known as Modified Gram-Schmidt (MGS) orthogonalization Slide 14 / 41 CGS vs MGS [ Results from Julien Langou: ] Scalability of MGS and CGS on two different clusters for matrices of various size m=[500 1000 2000 4000] per processor, n = 100 0.35 3.5 0.3 3 MGS 0500 0.25 2.5 MGS 1000 MGS 0500 ) MGS 2000 ) c s e ( 0.2 s 2 MGS 1000 ( MGS 4000 e e m i 0.15 CGS 0500 1.5 CGS 0500 m t i 0.1 CGS 1000 t 1 CGS 1000 CGS 2000 0.05 0.5 CGS 4000 0 0 1 2 4 8 16 32 64 1 2 4 8 16 32 # procs # procs Slide 15 / 41 || I - Q^TQ || [ ResultsJulienLangou:] [ from 1.00E+01 1.00E+03 1.00E-15 1.00E-13 1.00E-11 1.00E-09 1.00E-07 1.00E-05 1.00E-03 1.00E-01 Accuracy ofMGS Accuracy 1.00E+01 1.00E+02 1.00E+03 1.00E+04 vs K(A) 1.00E+05 CGS on matrices ofincreasingconditionnumber CGSonmatrices 1.00E+06 1.00E+07 1.00E+08 CGS 1.00E+09 1.00E+10 MGS CGS vs ||A - QR||/||A|| MGS 1.00E-19 1.00E-18 1.00E-17 1.00E-16 1.00E-15 1.00E+01 1.00E+02 1.00E+03 1.00E+04 K(A) 1.00E+05 1.00E+06 1.00E+07 Slide Slide 1.00E+08 16 1.00E+09 / 41 / 1.00E+10 QR factorization Let A = [x1, ... , xm] be the input for CGS/MGS and Q = [q1, ... , qm] the output; R : an upper m m triangular matrix defined from the CGR/MGS. Then A = Q R Slide 17 / 41 Other QR factorizations What about the following? [known as Cholesky QR] 1. G = ATA 2. G = L LT (Cholesky factorization) 3. Q = A (LT)-1 Does Q have orthonormal columns (i.e. QTQ=I), i.e. A = QLT to be a QR factorization (Exercise) When is this feasible and how compares to CGS and MGS? Slide 18 / 41 Other QR factorizations Feasible when n >> m Allows efficient parallel implementation: blocking both computation and communication AT A G = Investigate numerically accuracy P1 P2 . and scalability (compare to CGS and MGS) Exercise Slide 19 / 41 How is done in LAPACK? Using Householder reflectors x w H = I – 2 w wT w = ? so that H x1 = e1 Hx w = ... (compute or look at the reference books) Allows us to construct Xk Hk-1 ... H1 X = Q (Exercise( LAPACK implementation : “delayed update” of the trailing matrix + “accumulate transformation” to apply it as BLAS 3 Slide 20 / 41 Part II Projection in Linear Algebra Slide 21 / 41 Projection into general basis How to define projection without orthogonalization of a basis? – Sometimes is not feasible to orthogonalize – Often the case in functional analysis (e.g. Finite Element Method, Finite Volume Method, etc.) where the basis is “linearly independent” functions (more later, and Lecture 2) We saw if X = [x1, ... , xm] is an orthonormal basis (*) P u = (u, x1) x1 + ... + (u, xm) xm How does (*) change if X are just linearly independent ? P u = ? Slide 22 / 41 Projection into a general basis The problem: T Find the coefficients C = (c1 ... cm) in P u = c1 x1 + c2 x2 + . + cm xm = X C so that u – Pu ⊥ span{x1, ..., xm} or ⇔ so that the error e in (1) u = P u + e is ⊥ span{x1, ..., xm} Slide 23 / 41 Projection into a general basis (1) u = P u + e = c1 x1 + c2 x2 + . + cm xm + e Multiply (1) on both sides by “test” vector/function xj (terminology from functional analysis) for j = 1,..., m 0 (u , xj) = c1 (x1 , xj)+ c2 (x2 , xj)+ . + cm (xm , xj) + (e, xj) i.e., m equations for m unknowns In matrix notations (XT X) C = XT u (Exercise) XT Pu = XT u XTX is the so called Gram matrix (nonsingular; why?) => there exists a unique solution C Slide 24 / 41 Normal equations System (XT X) C = XT u is known also as Normal Equations The Method of Normal Equations: Finding the projection (approximation) Pu ≡ XC (approximation) of u in X by solving the Normal Equations system Slide 25 / 41 Least Squares (LS) Equivalently, system P u (XT X) C = XT u gives also the solution of the LS problem min || X C – u || C Rm 2 ∈ 2 2 2 since || v1 – u || = || (v1 – Pu) – e || = || v1 – Pu || + || e || 2 2 ≥ || e || = || Pu – u || for ∀ v1 ∈ V1 Slide 26 / 41 LS nm n Note that the usual notations for LS is: For A∈R , b∈R find min || A x – b || Solving LS with QR xfactorization∈Rm T T Let A = Q R, Q A = R = R1 , Q b = c m 0 d n - m Then 2 T T 2 2 2 || Ax – b || = || Q Ax – Q b || = || R1x – c || + || d || i.e. we get minimum if x is such that R1 x = c Slide 27 / 41 Projection and iterative solvers The problem : Solve Ax = b in Rn Iterative solution: at iteration i extract an approximate n xi from just a subspace V = span[v1, ..., vm] of R How? As on slide 22, impose constraints: n b – Ax subspace W = span[w1,...,wm] of R , i.e. (*) (Ax, wi) = (b, wi) for ∀ wi ∈W= span[w1,...,wm] Conditions (*) known also as Petrov-Galerkin conditions Projection is orthogonal: V and W are the same (Galerkin conditions) or oblique : V and W are different Slide 28 / 41 Matrix representation Let V = [v1, ..., vm], W = [w1,...,wm] m solves i.e.