Sparse Approximate Inverses

Sparse approximate Inverses Other approach for approximating A-1 by norm minimization: min AM− I ∈M ℘ over some sparsity pattern P. Choice of the norm? - nalytic (to allow the explicit solution of this problem)a - easy to compute (in parallel) Optimal norm: Frobenius norm n n 2 A2 =: a2 = trace A = AT =< A, A > A F ∑i, j ∑•, j () i, j= 1 j=1 8 SPAI in parallel First, we choose the pattern P in a static way a priori, e.g. as the pattern of A n 2 2 AM− I = AM()− I e = min F min∑ j 2 ∈M ℘ ∈M ℘ j=1 n 2 = AM− e ∑min j j j=1 M∈j ℘ j Hence, to minimize the Frobenius norm, we have to solve n Least Squares problems in the sparse columns of M! This can be done fully in parallel! But costs for LS problems? 9 SPAI and LS ⎛0⎞ ⎜ ⎟ ⎜*⎟ ⎜0⎟ ⎜ ⎟ minAMj− e j =min A−j min = e AMj j ( J− ) j e M∈j ℘ j ⎜*⎟ ⎜ ⎟ ⎜0⎟ ⎜ ⎟ ⎝0⎠ Denote by Jj the set of allowed indices in the j-th column of M. * - A ·* -A* = (:, Jj )⋅ M (j− J ) j = e · 10 SPAI and Sparse LS A(:,J ) is a sparse rectangular matrix. ** I index set, shadow of J j * j j * * * We can reduce the sparse LS to A(I ,J ): * j j * ~ ~ ~ minA I ( J ,j M ) j ( J− j ) ej ( I = )j min A− j M e M j M j ~ Solve small LS problem by Householder QR for A(Ij,Jj)ÆÆM j Mj 11 Computing Mk Jk = A Mk –ek Ik Delete superfluous zeros in Least Squares Problem: For index set Jk in Mk keep only A( : , Jk ) In A( : , Jk ) keep only nonzero rows A( Ik , Jk ) Solve small Least Squares problem in A( Ik , Jk ) , e.g. by QR-decomposition, Householder method. 12 Sparsity Pattern? A-1 will be no more sparse! As a priori choice of a good approximate sparsity pattern for M we can choose the pattern of -Ak or (AT)k -(ATA)k AT for some k=1,2 -Aε with sparsified A - a combination of above Aε by sparsification of A: delete all entries with |Ai,j| < ε 13 Dynamic Pattern Finding (SPAI) - Start with thin approximate pattern Jk for Mk - Compute optimal column Mk,opt(Jk) by LS - ind new entry j for MFk such that Mk,opt(Jk)+ λ ej has smaller residual in the Frobenius norm. 2 2 minA M+k (λ j e − ) k emin =AM (−k e k +λ ) j Ae = 2 T 2 2 minr( k + λ r 2k A (j + λ ) j A) rT A T k j Choose index j with rk Aj ≠ 0 and λopt= − 2 Aj T 2 Improvement: 2 ()rk Aj with min=rk − 2 One can also find Aj the overall optimal14 w entry.ne Block SPAI Partition the given matrix in small blocks (2 x 2 or 3 x 3) and apply the Frobenius norm minimization with blockwise pattern. Advantage: Underlying block structure will also appear in the pattern of A-1 Æ improved pattern Block operations are more efficient Iterative SPAI Start with pattern of A Æ M1 Construct M2 relative to new matrix AM1 Construct M3 relative to new matrix AM1M2 ….. 15 Advantage: Cheaper, but inferior approximation Factorized SPAI, FSPAI T Approximate the inverse Cholesky factor ofALL=A ⋅ A −1 T −1 − 1 T − T ALLLLLL=()A A =AA ≈ ⋅ LL−1 ≈ minLLI− A A F Normal equations giveA(,)() Jk J k ⋅ Lk = Jα ⋅ k e trace()T L AL The same L results from min det(LT AL( 1n / ) ) 16 SPAI and Target Matrices minAM− P F Assume, P is a good sparse preconditioner for A. Improve P by computing M and solving the above Frobenius norm minimization. Application: Assume that A is given by two parts, e.g. an advection part and an diffusion part. We can choose P as Laplacian relative to the diffusion part - easy to solve - and then we add M for improving P relative to the advection part. 17 Probing Find preconditioner of special form (tridiag, band) for precondi- tioning a matrix that is not given explicitly, but only by its action on certain vectors, e.g. eTS = fT. Example: Schur complement S or general dense/implicit matrix. Choose e.g. e=(1,1,…,1)T, and preconditioner as diagonal matrix D=diag(d1,…,dn). Then we have to satisfy eTD = eTS = fT. After the computation of f the solution is given by dj = fj . Disadvantage: Can use only very special pattern for M and special probing vectors e. 18 Example: tridiagonal probing MSPAI Probing Generalize Frobenius norm minimization to ⎛ C ⎞ ⎛ B ⎞ minCM− B =min ⎜ 0 ⎟M − ⎜ 0 ⎟ M F M ⎜ ρ ⋅uT ⎟ ⎜ ρ ⋅vT ⎟ ⎝ ⎠ ⎝ ⎠ F For example ⎛ A ⎞ ⎛ I ⎞ minCM− B =min ⎜ ⎟M − ⎜ ⎟ M F M ⎜ ρe⋅ T A⎟ ⎜ ρ ⋅eT ⎟ ⎝ ⎠ ⎝ ⎠ F Original SPAI extended by an additional norm minimization to deliver especially good results on vector e. Similarly ⎛ I ⎞ min ⎜ CMB⎟()0 −0 WCMB =min ()0 − 0 M ⎜ ρ ⋅eT ⎟ M F ⎝ ⎠ F ⎛ I ⎞ min ⎜ ⎟()AM− I =Wmin () AM− I 19 M ⎜ ρ ⋅eT ⎟ M F ⎝ ⎠ F MSPAI Probing MSPAI allows general probing with any pattern and any collection of vectors e: 2 ~ 2 T T 2 Mmin℘ − S +ρ e M − e S F ~ with a sparse approximationS of S 20 Further research: (1) Apply MSPAI with the probing feature to Multigrid or regularization problems (smoother, preconditioner) (2) Iterative SPAI: Compute first preconditoner M1 and consider Rk = ekM -A1,k. Sort columns in M1 according to the size of the error with permutation P: M1*P has as first columns indices with small error. Consider T ~ ~ ~ P AM() I−1 P = A M1 −, I1 ( M=good bad M) M ⎛ I ⎞ Look for next M2 in the form M 2 = ⎜ M2p , ⎟ ⎝0 ⎠ ~ ~ ⎛0⎞ min AMM− ⎜ ⎟ Apply SPAI for M 2 p () 1 2 p , ⎜ ⎟ and repeat. I 21 ⎝ ⎠ F Available code: SPAI, Block SPAI, and MSPAI are available in parallel version, e.g. PETSc. Other current project: Factorized (Block) SPAI for symmetric matrices. 22 Iterative methods in parallel (1) Asynchronous iterations: Iteration function: x = Φ(x), Rn Æ Rn using p processors, x divided into p subblocks. For k=1,2,… For i=1:p x⎧(Φi 1 ,..., xp )if∈ i() s k xnew, = i ⎨ ⎩ xold, i if i∉ () s k k)}where {s(k is a sequence of nonempty subsets of <1,p> and all indices appear infinitely often for kÆ∞. This code describes an asynchronous fixed point iteration. Convergence for spectral radius of GS less than 1. (2)Block methods: Consider Axj=bj, j=1,…,k simultaneously. Collect B=(b …b) . 1 k 23 Generate Krylov subspace K(A,B)=(B, AB, A2B,…) (3) Parallel cg: while ( sqrt(gamma) > epsilon * error_0 ) { if (iteration > 1) q = r + gamma / gamma_old * q; MVP v = A * q; delta = dot(v,q); scalar alpha = delta / gamma; x = x + alpha * q; r = r – alpha * v; local vector gamma_old = gamma; gamma = dot(r,r); synchronisation iteration = iteration + 1; points } dot product: requires communication (MPI_ALLREDUCE) 24 Using non-blocking collective operations in order to hide communication! Allows overlap of numerical computations and communcations. In MPI-1 only possible for point-to-point communication: MPI_ISEND and MPI_IRECV. Will be included in new MPI-3 standard. Additional libraries necessary for collective operations! Example: LibNBC (non-blocking collectives) 25 Pseudo-code for a non-blocking reduction: MPI_Request req; int sbuf1[SIZE], rbuf1[SIZE], buf2[SIZE]; /* compute sbuf1 */ compute(sbuf1, SIZE); /* start non-blocking allreduce of subf1 */ MPI_Iallreduce(sbuf1,rbuf1,SIZE,MPI_INT,MPI_SUM, MPI_COMM_WORLD, &req); computation /* compute buf2 (independent of buf1) */ and compute(buf2, SIZE); communication MPI_WAIT(&req, &stat); synchronisation /* use data in rbuf1 */ evaluate(rbuf1, buf2, SIZE); final computation 26 Application to cg: NBC only useful in cg for the matrix-vector product. Assume that the vector is distributed over 2D processor array Assume a „matrix-free“ problem: Matrix has not to be stored, but is given only implicitly! 27 Non-blocking matrix-vector product: void matrix_vector-mult(struct array_2d *v_in, struct array_2d *v_out, struct comm_data_t *comm_data) { fill_buffers(v_in, &comm_data->send_buffers); start_send_boundaries(comm_data); overlap volume_mult(v_in, v_out, comm_data); finish_send_boundaries(comm_data); mult_boundaries(v_out, &comm_data->recv_buffers); } Overlap between sending the boundary values to neighbors and computations with the inner data. 28 6. Domain Decomposition Consider elliptic PDE on region Ω with boundary Γ, e.g. with Dirichlet boundary conditions. Example: ∂2u ∂2u Δu = u + u = + f= (,) x y inΩ xx yy ∂x2 ∂y 2 u x(,)(,) y= g x yΓ in Γ Parallel solution? ΩΓ 1 Overlapping DD Partition region Ω in two regions Ω1 and Ω2 , with new boundaries Γ1 and Γ2 which are partially given by old boundary Γ and some unknown parts Γ‘1 and Γ‘2 : Ω1 Γ1 Γ‘1 ΩΓ Γ‘2 Ω2 Γ2 2 Overlapping DD II We can discretize and solve the given PDE on Ω1 with boundary Γ1 , but we need the values of u(x,y) on the new artificial boundary Γ‘1. In a first step we can assume any values for Γ‘1 , e.g. u(x,y)=0. Then we can solve the linear system relative to region Ω1 . The same can be done in parallel for Ω2. The values of the resulting solution on Γ‘2 can be used, to compute in the next step the solution for region Ω2 with boundary Γ2. In the same way we can use the resulting solution on Γ‘1, to compute the solution for region Ω1 with boundary Γ1. So we can generate solutions on the partial regions which provide us with approximate values for the unknown boundary of the other partial solution.

Sparse Approximate Inverses

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support