Sparse Days September 6Th – 8Th, 2017 CERFACS, Toulouse, France
Total Page:16
File Type:pdf, Size:1020Kb
Sparse Days September 6th – 8th, 2017 CERFACS, Toulouse, France Sparse Days – September 6th – 8th, 2017 – CERFACS Preliminary program Wednesday, September 6th 2017 13:00 – 14:00 registration 14:00 – 14:15 welcome message Session 1 (14:15 – 15:30) 14:15 – 15:40 Block Low-Rank multifrontal solvers: complexity, performance, and scalability Theo Mary (IRIT) 15:40 – 16:05 Sparse Supernodal Solver exploiting Low-Rankness Property Grégoire Pichon (Inria Bordeaux – Sud-Ouest) 15:05 – 15:30 Recursive inverse factorization Anton Artemov (Uppsala University) Coffee break (15:30 – 16:00) Session 2 (16:00 – 17:15) 16:00 – 16:25 Algorithm based on spectral analysis to detect numerical blocks in matrices Luce le Gorrec (IRIT – APO) 16:25 – 16:50 Computing Sparse Tensor Decompositions using Dimension Trees Oguz Kaya (ENS Lyon) 16:50 – 17:15 Numerical analysis of dynamic centrality Philip Knight (University of Strathclyde) Thursday, September 7th 2017 Session 3 (09:45 – 11:00) 09:45 – 10:10 Parallel Algorithm Design via Approximation Alex Pothen (Purdue University) 10:10 – 10:35 Accelerating the Mondriaan sparse matrix partitioner Rob Bisseling (Utrecht University) 10:35 – 11:00 LS-GPart: a global, distributed ordering library François-Henry Rouet (Livermore Software Technology Corporation) Coffee break (11:00 – 11:25) Session 4 (11:25 – 13:05) 11:25 – 11:50 A Tale of Two Codes Robert Lucas (Livermore Software Technology Corportion) 11:50 – 12:15 Direct solution of sparse systems of linear equations with sparse multiple right hand sides Gilles Moreau (ENS Lyon) 12:15 – 12:40 Averaged information splitting and multi-frontal solvers for heterogeneous high-throughput data analysis Shengxin Zhu (Xi'an Jiaotong – Liverpool University) 12:40 – 13:05 Permuting Spiked Matrices to Triangular Form in the Linear Programming LU Update Lukas Schork (University of Edinburgh, UK) Lunch break (13:05 – 15:00) Session 5 (15:00 – 16:15) 15:00 – 15:25 Factorization Based Sparse Solvers and Preconditioners for Exascale Sherry Li (Lawrence Berkeley National Laboratory) 15:25 – 15:50 Scalability Analysis of Sparse Matrix Computations on Many-core Processors Weifeng Liu (Norwegian University of Science and Technology) 15:50 – 16:15 Refactoring Sparse Triangular Solver on Sunway TaihuLight Many-core Supercomputer Wei Xue (Tsinghua University) Coffee break (16:15 – 16:45) 16:45 – 17:30 Review of Sparse Days by Iain Duff. Reception dinner (19:45) Friday, September 8th 2017 Session 6 (09:15 – 10:55) 09:15 – 09:40 On solving mixed sparse-dense linear least-squares problems Part I: A Schur complement approach 09:40 – 10:05 Part II: A preconditioned conjugate gradient method Jennifer Scott (STFC Rutherford Appleton Laboratory); and Miroslav Tuma (Charles University and Czech Academy of Sciences) 10:05 – 10:30 A Parallel Hierarchical Low-Rank Solver for General Sparse Matrices Erik Boman (Sandia National Laboratories) 10:30 – 10:55 Effective and robust SIF preconditioners for general SPD sparse matrices Jianlin Xia (Purdue University) Coffee break (10:55 – 11:20) Session 7 (11:20 – 13:00) 11:20 – 11:45 High Performance Block Incomplete LU Factorization Matthias Bollhoefer (TU Braunschweig) 11:15 – 12:10 An optimal Q-OR method for solving nonsymmetric linear systems Gérard Meurant 12:10 – 12:35 Preconditioning for non-symmetric Toeplitz matrices with application to time-dependent PDEs Andrew Wathen (Oxford University) 12:35 – 13:00 Efficient Parallel iterative solvers for high order DG methods Matthias Maischak (Brunel University) Lunch (13:00) Recursive inverse factorization Anton Artemov (Uppsala University, Sweden) In this talk we will present a parallel algorithm for the inverse factorization of symmetric positive definite matrices. This algorithm utilizes a hierarchical representation of the matrix and computes an inverse factor by combining an iterative refinement procedure with a recursive binary principal submatrix decomposition. Although our algorithm should be generally applicable, this work was mainly motivated by the need for efficient and scalable inverse factorization of the basis set overlap matrix in large scale electronic structure calculations. We show that for such matrices the computational cost increases only linearly with system size. We will also demonstrate the weak scaling behavior of the algorithm and give theoretical justification of the scaling behavior based on critical path length estimation. Accelerating the Mondriaan sparse matrix partitioner Rob Bisseling (Mathematical Institute, Utrecht University, Netherlands) The Mondriaan package for sparse matrix partitioning has been developed since 2002 and currently it has reached version 4.1. Its latest addition is the medium-grain method for 2D partitioning, which significantly reduces the communication volume of a parallel sparse matrix-vector multiplication. The present talk is about recent developments to be included in version 4.2 of Mondriaan, which focus on improving the speed of computing the partitioning. One improvement is the use of free nonzeros in the matrix, which can be moved after a split to improve the load balance without adversely affecting the communication volume. This leaves more room for reducing communication in subsequent splits. Another improvement for certain sparse matrices is obtained by trying to find connected components after every split that satisfy the imposed imbalance constraint. If this succeeds, a zero-volume split is found very quickly. The total savings in computation time of these and other improvement for a set of 900 sparse matrices is about 25 per cent. This is joint work with Marco van Oort High Performance Block Incomplete LU Factorization Matthias Bollhoefer (TU Braunschweig, Germany) Many application problems that lead to solving linear systems make use of preconditioned Krylov subspace solvers to compute their solution. Among the most popular preconditioning approaches are incomplete factorization methods either as single--level approaches or within a multilevel framework. We will present a block incomplete triangular factorization that is based on skillfully blocking the system initially and throughout the factorization. This approach allows for the use of cache-optimized dense matrix kernels such level-3 BLAS or LAPACK. We will demonstrate how this block approach may significantly outperform the scalar method on modern architectures, paving the way for its prospective use inside various multilevel incomplete factorization approaches or other applications where the core part relies on an incomplete factorization. A Parallel Hierarchical Low-Rank Solver for General Sparse Matrices Erik Boman (Sandia National Laboratories, USA) Hierarchical methods that exploit low-rank structure have attracted much attention lately. For sparse matrices, many methods rely on low-rank structure within dense frontal matrices. We present a different approach by Eric Darve et al. that works on the entire sparse matrix, and is in several ways simpler. The method is most useful as a preconditioner, though it could also be used as a direct solver. We show that the method can be viewed as an H2 solver and also as an extension to the block ILU(0) preconditioner. We briefly discuss a parallel code developed in collaboration with Stanford, and show some results. Joint work with Chao Chen, Eric Darve, Siva Rajamanickam, and Ray Tuminaro. Computing Sparse Tensor Decompositions using Dimension Trees Oguz Kaya (ENS Lyon, France) Tensor decompositions have increasingly been used in many applications domains including computer vision, signal processing, graph analytics, healthcare data analysis, recommender systems, and many other related machine learning problems. In many of these applications, the tensor representation of data is big, sparse, and of low-rank, and an efficient computation of tensor decompositions is indispensible to be able to analyze datasets of massive scale. Two major computational kernels involved at the core of the iterative tensor decomposition algorithms are tensor- times-vector (TTV) and -matrix (TTM) multiplications. In this talk, we propose a novel algorithmic framework along with an associated sparse tensor data structure that enables an efficient computation of these two operations for high dimensional sparse tensors. With this framework, we enable storing and reusing partial TTV and TTM results, thereby reducing the cost per iteration to O(NlogN) TTM/TTV calls from O(N^2) in the traditional scheme. Second, we propose an efficient tensor storage scheme representing an N-dimensional tensor using O(NlogN) index arrays, and enabling the partial computations in this framework. Third, we propose shared memory parallelization of TTV and TTMs using this sparse data representation. With all these optimizations, we up to 6x speedup over the state of the art implementations using 32 dimensional sparse tensors. Numerical analysis of dynamic centrality Philip Knight (University of Strathclyde, UK) Centrality measures have proven to be a vital tool for analysing static networks. In recent years, many of these measures have been adapted for use on dynamic networks. In particular, spectral measures such as Katz centrality have been extended to take into account time's arrow. We show that great care must be taken in applying such measures due to the inherent ill-conditioning in the associated matrices (an ill-conditioning which can get so extreme that the matrices involved can have a numerical rank of 1). At the same time, the ill-conditioning can reduce computational