A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations Hasan Metin Aktulga, Md
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
CUDA 6 and Beyond
MUMPS USERS DAYS JUNE 1ST / 2ND 2017 Programming heterogeneous architectures with libraries: A survey of NVIDIA linear algebra libraries François Courteille |Principal Solutions Architect, NVIDIA |[email protected] ACKNOWLEDGEMENTS Joe Eaton , NVIDIA Ty McKercher , NVIDIA Lung Sheng Chien , NVIDIA Nikolay Markovskiy , NVIDIA Stan Posey , NVIDIA Steve Rennich , NVIDIA Dave Miles , NVIDIA Peng Wang, NVIDIA Questions: [email protected] 2 AGENDA Prolegomena NVIDIA Solutions for Accelerated Linear Algebra Libraries performance on Pascal Rapid software development for heterogeneous architecture 3 PROLEGOMENA 124 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 5 ONE ARCHITECTURE BUILT FOR BOTH DATA SCIENCE & COMPUTATIONAL SCIENCE AlexNet Training Performance 70x Pascal [CELLR ANGE] 60x 16nm FinFET 50x 40x CoWoS HBM2 30x 20x [CELLR ANGE] NVLink 10x [CELLR [CELLR ANGE] ANGE] 0x 2013 2014 2015 2016 cuDNN Pascal & Volta NVIDIA DGX-1 NVIDIA DGX SATURNV 65x in 3 Years 6 7 8 8 9 NVLINK TO CPU IBM Power Systems Server S822LC (codename “Minsky”) DDR4 DDR4 2x IBM Power8+ CPUs and 4x P100 GPUs 115GB/s 80 GB/s per GPU bidirectional for peer traffic IB P8+ CPU P8+ CPU IB 80 GB/s per GPU bidirectional to CPU P100 P100 P100 P100 115 GB/s CPU Memory Bandwidth Direct Load/store access to CPU Memory High Speed Copy Engines for bulk data movement 1615 UNIFIED MEMORY ON PASCAL Large datasets, Simple programming, High performance CUDA 8 Enable Large Oversubscribe GPU memory Pascal Data Models Allocate up to system memory size CPU GPU Higher Demand -
Singular Value Decomposition (SVD)
San José State University Math 253: Mathematical Methods for Data Visualization Lecture 5: Singular Value Decomposition (SVD) Dr. Guangliang Chen Outline • Matrix SVD Singular Value Decomposition (SVD) Introduction We have seen that symmetric matrices are always (orthogonally) diagonalizable. That is, for any symmetric matrix A ∈ Rn×n, there exist an orthogonal matrix Q = [q1 ... qn] and a diagonal matrix Λ = diag(λ1, . , λn), both real and square, such that A = QΛQT . We have pointed out that λi’s are the eigenvalues of A and qi’s the corresponding eigenvectors (which are orthogonal to each other and have unit norm). Thus, such a factorization is called the eigendecomposition of A, also called the spectral decomposition of A. What about general rectangular matrices? Dr. Guangliang Chen | Mathematics & Statistics, San José State University3/22 Singular Value Decomposition (SVD) Existence of the SVD for general matrices Theorem: For any matrix X ∈ Rn×d, there exist two orthogonal matrices U ∈ Rn×n, V ∈ Rd×d and a nonnegative, “diagonal” matrix Σ ∈ Rn×d (of the same size as X) such that T Xn×d = Un×nΣn×dVd×d. Remark. This is called the Singular Value Decomposition (SVD) of X: • The diagonals of Σ are called the singular values of X (often sorted in decreasing order). • The columns of U are called the left singular vectors of X. • The columns of V are called the right singular vectors of X. Dr. Guangliang Chen | Mathematics & Statistics, San José State University4/22 Singular Value Decomposition (SVD) * * b * b (n>d) b b b * b = * * = b b b * (n<d) * b * * b b Dr. -
Segment-Based Clustering of Hyperspectral Images Using Tree-Based Data Partitioning Structures
algorithms Article Segment-Based Clustering of Hyperspectral Images Using Tree-Based Data Partitioning Structures Mohamed Ismail and Milica Orlandi´c* Department of Electronic Systems , NTNU: Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway; [email protected] * Correspondence: [email protected] Received: 2 October 2020; Accepted: 8 December 2020; Published: 10 December 2020 Abstract: Hyperspectral image classification has been increasingly used in the field of remote sensing. In this study, a new clustering framework for large-scale hyperspectral image (HSI) classification is proposed. The proposed four-step classification scheme explores how to effectively use the global spectral information and local spatial structure of hyperspectral data for HSI classification. Initially, multidimensional Watershed is used for pre-segmentation. Region-based hierarchical hyperspectral image segmentation is based on the construction of Binary partition trees (BPT). Each segmented region is modeled while using first-order parametric modelling, which is then followed by a region merging stage using HSI regional spectral properties in order to obtain a BPT representation. The tree is then pruned to obtain a more compact representation. In addition, principal component analysis (PCA) is utilized for HSI feature extraction, so that the extracted features are further incorporated into the BPT. Finally, an efficient variant of k-means clustering algorithm, called filtering algorithm, is deployed on the created BPT structure, producing the final cluster map. The proposed method is tested over eight publicly available hyperspectral scenes with ground truth data and it is further compared with other clustering frameworks. The extensive experimental analysis demonstrates the efficacy of the proposed method. -
Eigen Values and Vectors Matrices and Eigen Vectors
EIGEN VALUES AND VECTORS MATRICES AND EIGEN VECTORS 2 3 1 11 × = [2 1] [3] [ 5 ] 2 3 3 12 3 × = = 4 × [2 1] [2] [ 8 ] [2] • Scale 3 6 2 × = [2] [4] 2 3 6 24 6 × = = 4 × [2 1] [4] [16] [4] 2 EIGEN VECTOR - PROPERTIES • Eigen vectors can only be found for square matrices • Not every square matrix has eigen vectors. • Given an n x n matrix that does have eigenvectors, there are n of them for example, given a 3 x 3 matrix, there are 3 eigenvectors. • Even if we scale the vector by some amount, we still get the same multiple 3 EIGEN VECTOR - PROPERTIES • Even if we scale the vector by some amount, we still get the same multiple • Because all you’re doing is making it longer, not changing its direction. • All the eigenvectors of a matrix are perpendicular or orthogonal. • This means you can express the data in terms of these perpendicular eigenvectors. • Also, when we find eigenvectors we usually normalize them to length one. 4 EIGEN VALUES - PROPERTIES • Eigenvalues are closely related to eigenvectors. • These scale the eigenvectors • eigenvalues and eigenvectors always come in pairs. 2 3 6 24 6 × = = 4 × [2 1] [4] [16] [4] 5 SPECTRAL THEOREM Theorem: If A ∈ ℝm×n is symmetric matrix (meaning AT = A), then, there exist real numbers (the eigenvalues) λ1, …, λn and orthogonal, non-zero real vectors ϕ1, ϕ2, …, ϕn (the eigenvectors) such that for each i = 1,2,…, n : Aϕi = λiϕi 6 EXAMPLE 30 28 A = [28 30] From spectral theorem: Aϕ = λϕ 7 EXAMPLE 30 28 A = [28 30] From spectral theorem: Aϕ = λϕ ⟹ Aϕ − λIϕ = 0 (A − λI)ϕ = 0 30 − λ 28 = 0 ⟹ λ = 58 and -
Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications. -
Image Segmentation Based on Multiscale Fast Spectral Clustering Chongyang Zhang, Guofeng Zhu, Minxin Chen, Hong Chen, Chenjian Wu
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, XX 2018 1 Image Segmentation Based on Multiscale Fast Spectral Clustering Chongyang Zhang, Guofeng Zhu, Minxin Chen, Hong Chen, Chenjian Wu Abstract—In recent years, spectral clustering has become one In recent years, researchers have proposed various ap- of the most popular clustering algorithms for image segmenta- proaches to large-scale image segmentation. The approaches tion. However, it has restricted applicability to large-scale images are based on three following strategies: constructing a sparse due to its high computational complexity. In this paper, we first propose a novel algorithm called Fast Spectral Clustering similarity matrix, using Nystrom¨ approximation and using based on quad-tree decomposition. The algorithm focuses on representative points. The following approaches are based the spectral clustering at superpixel level and its computational on constructing a sparse similarity matrix. In 2000, Shi and 3 complexity is O(n log n) + O(m) + O(m 2 ); its memory cost Malik [18] constructed the similarity matrix of the image by is O(m), where n and m are the numbers of pixels and the using the k-nearest neighbor sparse strategy to reduce the superpixels of a image. Then we propose Multiscale Fast Spectral complexity of constructing the similarity matrix to O(n) and Clustering by improving Fast Spectral Clustering, which is based to reduce the complexity of eigen-decomposing the Laplacian on the hierarchical structure of the quad-tree. The computational 3 complexity of Multiscale Fast Spectral Clustering is O(n log n) matrix to O(n 2 ) by using the Lanczos algorithm. -
A Parallel Lanczos Algorithm for Eigensystem Calculation
A Parallel Lanczos Algorithm for Eigensystem Calculation Hans-Peter Kersken / Uwe Küster Eigenvalue problems arise in many fields of physics and engineering science for example in structural engineering from prediction of dynamic stability of structures or vibrations of a fluid in a closed cavity. Their solution consumes a large amount of memory and CPU time if more than very few (1-5) vibrational modes are desired. Both make the problem a natural candidate for parallel processing. Here we shall present some aspects of the solution of the generalized eigenvalue problem on parallel architectures with distributed memory. The research was carried out as a part of the EC founded project ACTIVATE. This project combines various activities to increase the performance vibro-acoustic software. It brings together end users form space, aviation, and automotive industry with software developers and numerical analysts. Introduction The algorithm we shall describe in the next sections is based on the Lanczos algorithm for solving large sparse eigenvalue problems. It became popular during the past two decades because of its superior convergence properties compared to more traditional methods like inverse vector iteration. Some experiments with an new algorithm described in [Bra97] showed that it is competitive with the Lanczos algorithm only if very few eigenpairs are needed. Therefore we decided to base our implementation on the Lanczos algorithm. However, when implementing the Lanczos algorithm one has to pay attention to some subtle algorithmic details. A state of the art algorithm is described in [Gri94]. On parallel architectures further problems arise concerning the robustness of the algorithm. We implemented the algorithm almost entirely by using numerical libraries. -
Singular Value Decomposition (SVD) 2 1.1 Singular Vectors
Contents 1 Singular Value Decomposition (SVD) 2 1.1 Singular Vectors . .3 1.2 Singular Value Decomposition (SVD) . .7 1.3 Best Rank k Approximations . .8 1.4 Power Method for Computing the Singular Value Decomposition . 11 1.5 Applications of Singular Value Decomposition . 16 1.5.1 Principal Component Analysis . 16 1.5.2 Clustering a Mixture of Spherical Gaussians . 16 1.5.3 An Application of SVD to a Discrete Optimization Problem . 22 1.5.4 SVD as a Compression Algorithm . 24 1.5.5 Spectral Decomposition . 24 1.5.6 Singular Vectors and ranking documents . 25 1.6 Bibliographic Notes . 27 1.7 Exercises . 28 1 1 Singular Value Decomposition (SVD) The singular value decomposition of a matrix A is the factorization of A into the product of three matrices A = UDV T where the columns of U and V are orthonormal and the matrix D is diagonal with positive real entries. The SVD is useful in many tasks. Here we mention some examples. First, in many applications, the data matrix A is close to a matrix of low rank and it is useful to find a low rank matrix which is a good approximation to the data matrix . We will show that from the singular value decomposition of A, we can get the matrix B of rank k which best approximates A; in fact we can do this for every k. Also, singular value decomposition is defined for all matrices (rectangular or square) unlike the more commonly used spectral decomposition in Linear Algebra. The reader familiar with eigenvectors and eigenvalues (we do not assume familiarity here) will also realize that we need conditions on the matrix to ensure orthogonality of eigenvectors. -
A Singularly Valuable Decomposition: the SVD of a Matrix Dan Kalman
A Singularly Valuable Decomposition: The SVD of a Matrix Dan Kalman Dan Kalman is an assistant professor at American University in Washington, DC. From 1985 to 1993 he worked as an applied mathematician in the aerospace industry. It was during that period that he first learned about the SVD and its applications. He is very happy to be teaching again and is avidly following all the discussions and presentations about the teaching of linear algebra. Every teacher of linear algebra should be familiar with the matrix singular value deco~??positiolz(or SVD). It has interesting and attractive algebraic properties, and conveys important geometrical and theoretical insights about linear transformations. The close connection between the SVD and the well-known theo1-j~of diagonalization for sylnmetric matrices makes the topic immediately accessible to linear algebra teachers and, indeed, a natural extension of what these teachers already know. At the same time, the SVD has fundamental importance in several different applications of linear algebra. Gilbert Strang was aware of these facts when he introduced the SVD in his now classical text [22, p. 1421, obselving that "it is not nearly as famous as it should be." Golub and Van Loan ascribe a central significance to the SVD in their defini- tive explication of numerical matrix methods [8, p, xivl, stating that "perhaps the most recurring theme in the book is the practical and theoretical value" of the SVD. Additional evidence of the SVD's significance is its central role in a number of re- cent papers in :Matlgenzatics ivlagazine and the Atnericalz Mathematical ilironthly; for example, [2, 3, 17, 231. -
Learning on Hypergraphs: Spectral Theory and Clustering
Learning on Hypergraphs: Spectral Theory and Clustering Pan Li, Olgica Milenkovic Coordinated Science Laboratory University of Illinois at Urbana-Champaign March 12, 2019 Learning on Graphs Graphs are indispensable mathematical data models capturing pairwise interactions: k-nn network social network publication network Important learning on graphs problems: clustering (community detection), semi-supervised/active learning, representation learning (graph embedding) etc. Beyond Pairwise Relations A graph models pairwise relations. Recent work has shown that high-order relations can be significantly more informative: Examples include: Understanding the organization of networks (Benson, Gleich and Leskovec'16) Determining the topological connectivity between data points (Zhou, Huang, Sch}olkopf'07). Graphs with high-order relations can be modeled as hypergraphs (formally defined later). Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. High-order network motifs: Motif (Benson’16) Microfauna Pelagic fishes Crabs & Benthic fishes Macroinvertebrates Algorithmic methods for analyzing high-order relations and learning problems are still under development. Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. (Zhou, Yu, Han'11) Beyond Pairwise Relations Functional units in social and biological networks. Meta-graphs, meta-paths in heterogeneous information networks. Algorithmic methods for analyzing high-order relations and learning problems are still under development. Review of Graph Clustering: Notation and Terminology Graph Clustering Task: Cluster the vertices that are \densely" connected by edges. Graph Partitioning and Conductance A (weighted) graph G = (V ; E; w): for e 2 E, we is the weight. -
Exact Diagonalization of Quantum Lattice Models on Coprocessors
Exact diagonalization of quantum lattice models on coprocessors T. Siro∗, A. Harju Aalto University School of Science, P.O. Box 14100, 00076 Aalto, Finland Abstract We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5. Keywords: Tight binding, Hubbard model, exact diagonalization, GPU, CUDA, MIC, Xeon Phi 1. Introduction hopping amplitude is denoted by t. The tight-binding model describes free electrons hopping around a lattice, and it gives a In recent years, there has been tremendous interest in utiliz- crude approximation of the electronic properties of a solid. The ing coprocessors in scientific computing, including condensed model can be made more realistic by adding interactions, such matter physics[1, 2, 3, 4]. Most of the work has been done as on-site repulsion, which results in the well-known Hubbard on graphics processing units (GPU), resulting in impressive model[9]. In our basis, however, such interaction terms are di- speedups compared to CPUs in problems that exhibit high data- agonal, rendering their effect on the computational complexity parallelism and benefit from the high throughput of the GPU. -
Fast Approximate Spectral Clustering
Fast Approximate Spectral Clustering Donghui Yan Ling Huang Michael I. Jordan University of California Intel Research Lab University of California Berkeley, CA 94720 Berkeley, CA 94704 Berkeley, CA 94720 [email protected] [email protected] [email protected] ABSTRACT variety of methods have been developed over the past several decades Spectral clustering refers to a flexible class of clustering proce- to solve clustering problems [15, 19]. A relatively recent area dures that can produce high-quality clusterings on small data sets of focus has been spectral clustering, a class of methods based but which has limited applicability to large-scale problems due to on eigendecompositions of affinity, dissimilarity or kernel matri- its computational complexity of O(n3) in general, with n the num- ces [20, 28, 31]. Whereas many clustering methods are strongly ber of data points. We extend the range of spectral clustering by tied to Euclidean geometry, making explicit or implicit assump- developing a general framework for fast approximate spectral clus- tions that clusters form convex regions in Euclidean space, spectral tering in which a distortion-minimizing local transformation is first methods are more flexible, capturing a wider range of geometries. applied to the data. This framework is based on a theoretical anal- They often yield superior empirical performance when compared ysis that provides a statistical characterization of the effect of local to competing algorithms such as k-means, and they have been suc- distortion on the mis-clustering rate. We develop two concrete in- cessfully deployed in numerous applications in areas such as com- stances of our general framework, one based on local k-means clus- puter vision, bioinformatics, and robotics.