Jack Dongarra: Supercomputing Expert and Mathematical Software Specialist
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Recursive Approach in Sparse Matrix LU Factorization
51 Recursive approach in sparse matrix LU factorization Jack Dongarra, Victor Eijkhout and the resulting matrix is often guaranteed to be positive Piotr Łuszczek∗ definite or close to it. However, when the linear sys- University of Tennessee, Department of Computer tem matrix is strongly unsymmetric or indefinite, as Science, Knoxville, TN 37996-3450, USA is the case with matrices originating from systems of Tel.: +865 974 8295; Fax: +865 974 8296 ordinary differential equations or the indefinite matri- ces arising from shift-invert techniques in eigenvalue methods, one has to revert to direct methods which are This paper describes a recursive method for the LU factoriza- the focus of this paper. tion of sparse matrices. The recursive formulation of com- In direct methods, Gaussian elimination with partial mon linear algebra codes has been proven very successful in pivoting is performed to find a solution of Eq. (1). Most dense matrix computations. An extension of the recursive commonly, the factored form of A is given by means technique for sparse matrices is presented. Performance re- L U P Q sults given here show that the recursive approach may per- of matrices , , and such that: form comparable to leading software packages for sparse ma- LU = PAQ, (2) trix factorization in terms of execution time, memory usage, and error estimates of the solution. where: – L is a lower triangular matrix with unitary diago- nal, 1. Introduction – U is an upper triangular matrix with arbitrary di- agonal, Typically, a system of linear equations has the form: – P and Q are row and column permutation matri- Ax = b, (1) ces, respectively (each row and column of these matrices contains single a non-zero entry which is A n n A ∈ n×n x where is by real matrix ( R ), and 1, and the following holds: PPT = QQT = I, b n b, x ∈ n and are -dimensional real vectors ( R ). -
Abstract 1 Introduction
Implementation in ScaLAPACK of Divide-and-Conquer Algorithms for Banded and Tridiagonal Linear Systems A. Cleary Department of Computer Science University of Tennessee J. Dongarra Department of Computer Science University of Tennessee Mathematical Sciences Section Oak Ridge National Laboratory Abstract Described hereare the design and implementation of a family of algorithms for a variety of classes of narrow ly banded linear systems. The classes of matrices include symmetric and positive de - nite, nonsymmetric but diagonal ly dominant, and general nonsymmetric; and, al l these types are addressed for both general band and tridiagonal matrices. The family of algorithms captures the general avor of existing divide-and-conquer algorithms for banded matrices in that they have three distinct phases, the rst and last of which arecompletely paral lel, and the second of which is the par- al lel bottleneck. The algorithms have been modi ed so that they have the desirable property that they are the same mathematical ly as existing factorizations Cholesky, Gaussian elimination of suitably reordered matrices. This approach represents a departure in the nonsymmetric case from existing methods, but has the practical bene ts of a smal ler and more easily hand led reduced system. All codes implement a block odd-even reduction for the reduced system that al lows the algorithm to scale far better than existing codes that use variants of sequential solution methods for the reduced system. A cross section of results is displayed that supports the predicted performance results for the algo- rithms. Comparison with existing dense-type methods shows that for areas of the problem parameter space with low bandwidth and/or high number of processors, the family of algorithms described here is superior. -
Overview of Iterative Linear System Solver Packages
Overview of Iterative Linear System Solver Packages Victor Eijkhout July, 1998 Abstract Description and comparison of several packages for the iterative solu- tion of linear systems of equations. 1 1 Intro duction There are several freely available packages for the iterative solution of linear systems of equations, typically derived from partial di erential equation prob- lems. In this rep ort I will give a brief description of a numberofpackages, and giveaninventory of their features and de ning characteristics. The most imp ortant features of the packages are which iterative metho ds and preconditioners supply; the most relevant de ning characteristics are the interface they present to the user's data structures, and their implementation language. 2 2 Discussion Iterative metho ds are sub ject to several design decisions that a ect ease of use of the software and the resulting p erformance. In this section I will give a global discussion of the issues involved, and how certain p oints are addressed in the packages under review. 2.1 Preconditioners A go o d preconditioner is necessary for the convergence of iterative metho ds as the problem to b e solved b ecomes more dicult. Go o d preconditioners are hard to design, and this esp ecially holds true in the case of parallel pro cessing. Here is a short inventory of the various kinds of preconditioners found in the packages reviewed. 2.1.1 Ab out incomplete factorisation preconditioners Incomplete factorisations are among the most successful preconditioners devel- op ed for single-pro cessor computers. Unfortunately, since they are implicit in nature, they cannot immediately b e used on parallel architectures. -
Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications. -
Life As a Developer of Numerical Software
A Brief History of Numerical Libraries Sven Hammarling NAG Ltd, Oxford & University of Manchester First – Something about Jack Jack’s thesis (August 1980) 30 years ago! TOMS Algorithm 589 Small Selection of Jack’s Projects • Netlib and other software repositories • NA Digest and na-net • PVM and MPI • TOP 500 and computer benchmarking • NetSolve and other distributed computing projects • Numerical linear algebra Onto the Rest of the Talk! Rough Outline • History and influences • Fortran • Floating Point Arithmetic • Libraries and packages • Proceedings and Books • Summary Ada Lovelace (Countess Lovelace) Born Augusta Ada Byron 1815 – 1852 The language Ada was named after her “Is thy face like thy mother’s, my fair child! Ada! sole daughter of my house and of my heart? When last I saw thy young blue eyes they smiled, And then we parted,-not as now we part, but with a hope” Childe Harold’s Pilgramage, Lord Byron Program for the Bernoulli Numbers Manchester Baby, 21 June 1948 (Replica) 19 Kilburn/Tootill Program to compute the highest proper factor 218 218 took 52 minutes 1.5 million instructions 3.5 million store accesses First published numerical library, 1951 First use of the word subroutine? Quality Numerical Software • Should be: – Numerically stable, with measures of quality of solution – Reliable and robust – Accompanied by test software – Useful and user friendly with example programs – Fully documented – Portable – Efficient “I have little doubt that about 80 per cent. of all the results printed from the computer are in error to a much greater extent than the user would believe, ...'' Leslie Fox, IMA Bulletin, 1971 “Giving business people spreadsheets is like giving children circular saws. -
Numerical and Parallel Libraries
Numerical and Parallel Libraries Uwe Küster University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Uwe Küster Slide 1 Höchstleistungsrechenzentrum Stuttgart Numerical Libraries Public Domain commercial vendor specific Uwe Küster Slide 2 Höchstleistungsrechenzentrum Stuttgart 33. — Numerical and Parallel Libraries — 33. 33-1 Overview • numerical libraries for linear systems – dense – sparse •FFT • support for parallelization Uwe Küster Slide 3 Höchstleistungsrechenzentrum Stuttgart Public Domain Lapack-3 linear equations, eigenproblems BLAS fast linear kernels Linpack linear equations Eispack eigenproblems Slatec old library, large functionality Quadpack numerical quadrature Itpack sparse problems pim linear systems PETSc linear systems Netlib Server best server http://www.netlib.org/utk/papers/iterative-survey/packages.html Uwe Küster Slide 4 Höchstleistungsrechenzentrum Stuttgart 33. — Numerical and Parallel Libraries — 33. 33-2 netlib server for all public domain numerical programs and libraries http://www.netlib.org Uwe Küster Slide 5 Höchstleistungsrechenzentrum Stuttgart Contents of netlib access aicm alliant amos ampl anl-reports apollo atlas benchmark bib bibnet bihar blacs blas blast bmp c c++ cephes chammp cheney-kincaid clapack commercial confdb conformal contin control crc cumulvs ddsv dierckx diffpack domino eispack elefunt env f2c fdlibm fftpack fishpack fitpack floppy fmm fn fortran fortran-m fp gcv gmat gnu go graphics harwell hence hompack hpf hypercube ieeecss ijsa image intercom itpack -
Over-Scalapack.Pdf
1 2 The Situation: Parallel scienti c applications are typically Overview of ScaLAPACK written from scratch, or manually adapted from sequen- tial programs Jack Dongarra using simple SPMD mo del in Fortran or C Computer Science Department UniversityofTennessee with explicit message passing primitives and Mathematical Sciences Section Which means Oak Ridge National Lab oratory similar comp onents are co ded over and over again, co des are dicult to develop and maintain http://w ww .ne tl ib. org /ut k/p e opl e/J ackDo ngarra.html debugging particularly unpleasant... dicult to reuse comp onents not really designed for long-term software solution 3 4 What should math software lo ok like? The NetSolve Client Many more p ossibilities than shared memory Virtual Software Library Multiplicityofinterfaces ! Ease of use Possible approaches { Minimum change to status quo Programming interfaces Message passing and ob ject oriented { Reusable templates { requires sophisticated user Cinterface FORTRAN interface { Problem solving environments Integrated systems a la Matlab, Mathematica use with heterogeneous platform Interactiveinterfaces { Computational server, like netlib/xnetlib f2c Non-graphic interfaces Some users want high p erf., access to details {MATLAB interface Others sacri ce sp eed to hide details { UNIX-Shell interface Not enough p enetration of libraries into hardest appli- Graphic interfaces cations on vector machines { TK/TCL interface Need to lo ok at applications and rethink mathematical software { HotJava Browser -
Prospectus for the Next LAPACK and Scalapack Libraries
Prospectus for the Next LAPACK and ScaLAPACK Libraries James Demmel1, Jack Dongarra23, Beresford Parlett1, William Kahan1, Ming Gu1, David Bindel1, Yozo Hida1, Xiaoye Li1, Osni Marques1, E. Jason Riedy1, Christof Voemel1, Julien Langou2, Piotr Luszczek2, Jakub Kurzak2, Alfredo Buttari2, Julie Langou2, Stanimire Tomov2 1 University of California, Berkeley CA 94720, USA, 2 University of Tennessee, Knoxville TN 37996, USA, 3 Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA, 1 Introduction Dense linear algebra (DLA) forms the core of many scienti¯c computing appli- cations. Consequently, there is continuous interest and demand for the devel- opment of increasingly better algorithms in the ¯eld. Here 'better' has a broad meaning, and includes improved reliability, accuracy, robustness, ease of use, and most importantly new or improved algorithms that would more e±ciently use the available computational resources to speed up the computation. The rapidly evolving high end computing systems and the close dependence of DLA algo- rithms on the computational environment is what makes the ¯eld particularly dynamic. A typical example of the importance and impact of this dependence is the development of LAPACK [1] (and later ScaLAPACK [2]) as a successor to the well known and formerly widely used LINPACK [3] and EISPACK [3] libraries. Both LINPACK and EISPACK were based, and their e±ciency depended, on optimized Level 1 BLAS [4]. Hardware development trends though, and in par- ticular an increasing Processor-to-Memory speed gap of approximately 50% per year, started to increasingly show the ine±ciency of Level 1 BLAS vs Level 2 and 3 BLAS, which prompted e®orts to reorganize DLA algorithms to use block matrix operations in their innermost loops. -
Randnla: Randomized Numerical Linear Algebra
review articles DOI:10.1145/2842602 generation mechanisms—as an algo- Randomization offers new benefits rithmic or computational resource for the develop ment of improved algo- for large-scale linear algebra computations. rithms for fundamental matrix prob- lems such as matrix multiplication, BY PETROS DRINEAS AND MICHAEL W. MAHONEY least-squares (LS) approximation, low- rank matrix approxi mation, and Lapla- cian-based linear equ ation solvers. Randomized Numerical Linear Algebra (RandNLA) is an interdisci- RandNLA: plinary research area that exploits randomization as a computational resource to develop improved algo- rithms for large-scale linear algebra Randomized problems.32 From a foundational per- spective, RandNLA has its roots in theoretical computer science (TCS), with deep connections to mathemat- Numerical ics (convex analysis, probability theory, metric embedding theory) and applied mathematics (scientific computing, signal processing, numerical linear Linear algebra). From an applied perspec- tive, RandNLA is a vital new tool for machine learning, statistics, and data analysis. Well-engineered implemen- Algebra tations have already outperformed highly optimized software libraries for ubiquitous problems such as least- squares,4,35 with good scalability in par- allel and distributed environments. 52 Moreover, RandNLA promises a sound algorithmic and statistical foundation for modern large-scale data analysis. MATRICES ARE UBIQUITOUS in computer science, statistics, and applied mathematics. An m × n key insights matrix can encode information about m objects ˽ Randomization isn’t just used to model noise in data; it can be a powerful (each described by n features), or the behavior of a computational resource to develop discretized differential operator on a finite element algorithms with improved running times and stability properties as well as mesh; an n × n positive-definite matrix can encode algorithms that are more interpretable in the correlations between all pairs of n objects, or the downstream data science applications. -
C++ API for BLAS and LAPACK
2 C++ API for BLAS and LAPACK Mark Gates ICL1 Piotr Luszczek ICL Ahmad Abdelfattah ICL Jakub Kurzak ICL Jack Dongarra ICL Konstantin Arturov Intel2 Cris Cecka NVIDIA3 Chip Freitag AMD4 1Innovative Computing Laboratory 2Intel Corporation 3NVIDIA Corporation 4Advanced Micro Devices, Inc. February 21, 2018 This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation's exascale computing imperative. Revision Notes 06-2017 first publication 02-2018 • copy editing, • new cover. 03-2018 • adding a section about GPU support, • adding Ahmad Abdelfattah as an author. @techreport{gates2017cpp, author={Gates, Mark and Luszczek, Piotr and Abdelfattah, Ahmad and Kurzak, Jakub and Dongarra, Jack and Arturov, Konstantin and Cecka, Cris and Freitag, Chip}, title={{SLATE} Working Note 2: C++ {API} for {BLAS} and {LAPACK}}, institution={Innovative Computing Laboratory, University of Tennessee}, year={2017}, month={June}, number={ICL-UT-17-03}, note={revision 03-2018} } i Contents 1 Introduction and Motivation1 2 Standards and Trends2 2.1 Programming Language Fortran............................ 2 2.1.1 FORTRAN 77.................................. 2 2.1.2 BLAST XBLAS................................. 4 2.1.3 Fortran 90.................................... 4 2.2 Programming Language C................................ 5 2.2.1 Netlib CBLAS.................................. 5 2.2.2 Intel MKL CBLAS................................ 6 2.2.3 Netlib lapack cwrapper............................. 6 2.2.4 LAPACKE.................................... 7 2.2.5 Next-Generation BLAS: \BLAS G2"..................... -
A Peer-To-Peer Framework for Message Passing Parallel Programs
A Peer-to-Peer Framework for Message Passing Parallel Programs Stéphane GENAUD a;1, and Choopan RATTANAPOKA b;2 a AlGorille Team - LORIA b King Mongkut’s University of Technology Abstract. This chapter describes the P2P-MPI project, a software framework aimed at the development of message-passing programs in large scale distributed net- works of computers. Our goal is to provide a light-weight, self-contained software package that requires minimum effort to use and maintain. P2P-MPI relies on three features to reach this goal: i) its installation and use does not require administra- tor privileges, ii) available resources are discovered and selected for a computation without intervention from the user, iii) program executions can be fault-tolerant on user demand, in a completely transparent fashion (no checkpoint server to config- ure). P2P-MPI is typically suited for organizations having spare individual comput- ers linked by a high speed network, possibly running heterogeneous operating sys- tems, and having Java applications. From a technical point of view, the framework has three layers: an infrastructure management layer at the bottom, a middleware layer containing the services, and the communication layer implementing an MPJ (Message Passing for Java) communication library at the top. We explain the de- sign and the implementation of these layers, and we discuss the allocation strategy based on network locality to the submitter. Allocation experiments of more than five hundreds peers are presented to validate the implementation. We also present how fault-management issues have been tackled. First, the monitoring of the infras- tructure itself is implemented through the use of failure detectors. -
Evolving Software Repositories
1 Evolving Software Rep ositories http://www.netli b.org/utk/pro ject s/esr/ Jack Dongarra UniversityofTennessee and Oak Ridge National Lab oratory Ron Boisvert National Institute of Standards and Technology Eric Grosse AT&T Bell Lab oratories 2 Pro ject Fo cus Areas NHSE Overview Resource Cataloging and Distribution System RCDS Safe execution environments for mobile co de Application-l evel and content-oriented to ols Rep ository interop erabili ty Distributed, semantic-based searching 3 NHSE National HPCC Software Exchange NASA plus other agencies funded CRPC pro ject Center for ResearchonParallel Computation CRPC { Argonne National Lab oratory { California Institute of Technology { Rice University { Syracuse University { UniversityofTennessee Uniform interface to distributed HPCC software rep ositories Facilitation of cross-agency and interdisciplinary software reuse Material from ASTA, HPCS, and I ITA comp onents of the HPCC program http://www.netlib.org/nhse/ 4 Goals: Capture, preserve and makeavailable all software and software- related artifacts pro duced by the federal HPCC program. Soft- ware related artifacts include algorithms, sp eci cations, designs, do cumentation, rep ort, ... Promote formation, growth, and interop eration of discipline-oriented rep ositories that organize, evaluate, and add value to individual contributions. Employ and develop where necessary state-of-the-art technologies for assisting users in nding, understanding, and using HPCC software and technologies. 5 Bene ts: 1. Faster development of high-quality software so that scientists can sp end less time writing and debugging programs and more time on research problems. 2. Less duplication of software development e ort by sharing of soft- ware mo dules.