Computational Techniques for Solving the Sparse Matrix Eigenvalue Problem for Semiconductor Bandstructure Calculation

Total Page:16

File Type:pdf, Size:1020Kb

Computational Techniques for Solving the Sparse Matrix Eigenvalue Problem for Semiconductor Bandstructure Calculation Computational Techniques for Solving the Sparse Matrix Eigenvalue Problem for Semiconductor Bandstructure Calculation Samuel SMITH University of Florida samuelsmith@ufl.edu Abstract The tight binding model used to efficiently model nanoscale systems is computationally bound by symmetric matrix eigenvalue problem. We present an overview of semiconductor crystallography and computational techniques to efficiently model these crystal structures. Using a generated crystal structure, we calculate the tight binding Hamiltonian matrix that governs the system. We then explore the computational tools for calculating the eigenvalues of this matrix and benchmark several popular packages. Finally, we use these tools to compute the bandgap for quantum dots of various dimensions. 1 CONTENTS I Crystal Properties and Generation 2 I-A Basis and primitive vectors . .2 I-B Schrodinger¨ equation and the Bloch theorem . .2 I-C Zincblende lattice . .3 I-D Wurtzite lattice . .4 I-E Adjacency matrix generation . .5 I-F Asymptotically optimal connectivity mapping algorithm . .5 II Generation of the Tight Binding Hamiltonian 6 II-A Same atom terms . .6 II-B Passivation of surface effects . .6 II-C Nearest neighbor terms . .7 III Eigenvalue Computation 9 III-A Overview of the eigenvalue problem for sparse symmetric matrices . .9 III-B Possible solutions and selection criteria . .9 III-C ARPACK . .9 III-D Trilinos and Anasazi . 10 III-E Other Python features for future work . 10 IV System and benchmarks 11 IV-A Overview of test system . 11 IV-B High Performance LINPACK Benchmark . 11 IV-C Comparison of eigenvalue solvers . 11 V Bandgap Calculation for Quantum Dots 13 VI Conclusions and Future Work 13 References 14 Appendix 15 2 I. CRYSTAL PROPERTIES AND GENERATION A. Basis and primitive vectors All major semiconductors used today in industry are crystalline materials[1]. The key feature of crystals that differentiates them from other solid matter is that they are spatially periodic. This allows the structure of the crystal to be completely described by a single unit cell. A crystal can be mathematically described using primitive vectors a1, a2, and a3. The full lattice can be determined by integral combinations of these primitive vectors, such that : 0 R = R + m1a1 + m2a2 + m3a3; where R is any known lattice point and m1; m2; m3 2 Z. For more complicated crystals like the zincblende structure, described later, two simple lattices are in superposition. For structures like these, we can define a basis vectors b1 and b2 which describe the relative offsets of the two lattices. B. Schrodinger¨ equation and the Bloch theorem In quantum mechanics, the famous Schrodinger¨ wave equation is used to describe the behavior of systems. For the time-independent case, we write this equation as: " # − 2 ~ r2 + V (r) = E ; 2m where ~ is the reduced Planck constant, m is the mass of the particle, V (r) is the spatially dependent potential energy, is the wavefunction, and E is the energy operator. We define the LHS operator on the wavefunction as the Hamiltonian, H^ . Using this definition, we can rewrite the Schrodinger¨ equation as: H^ = E : A useful result for the Schrodinger¨ equation for particles in a periodic potential structure like a crystal is the Bloch theorem [1]. The Bloch theorem states that for particles in a periodic potential, the eigenfuntions of the ik·r Hamiltonian will be the product of a plane wave e . and some function uk(r) with the same periodicity as the lattice. We can write this as: ik·r k(r) = e uk(r): We note that: uk(r) = uk(r + R) where R is the periodicity of the lattice. 3 C. Zincblende lattice The zincblende structure and the closely related diamond structure are perhaps the most important crystal structures in the semiconductor industry. Silicon and germanium (group IV semiconductors) have a diamond structure, and gallium arsenide (a III-V semiconductor) has a zincblende structure[2]. The only major difference between these structures is that the anion and cation species are the same for the diamond structure and different for the zincblende structure. The primitive vectors for the zincblende structure with lattice constant a and orthogonal basis [^x; y;^ z^] are: 1 1 a = a^y + a^z 1 2 2 1 1 a = a^x + a^z 2 2 2 1 1 a = a^x + a^y: 3 2 2 The basis vectors are: b1 = 0 1 1 1 b = a + a + a ; 2 4 1 4 2 4 3 where b1 is the basis for the cation sites and b2 is the basis for the anion sites. A silicon quantum dot is shown in figure 1. A quantum dot is a structure that is fully confined in all three dimensions. While it is locally periodic, it has well defined boundary conditions. This leads to it displaying dramatically different properties from bulk material. The dimension shown in the caption for the picture refers to the number of iterations for each sublattice (anionic and cationic). A 3 × 3 × 3 crystal has 54 atoms, 27 anion sites and 27 cation sites. Fig. 1. Silicon quantum dot (3 × 3 × 3) 4 D. Wurtzite lattice In the early stages of the project, the wurtzite structure was also considered along with the similar hexagonal diamond structure. Gallium nitride, a common wide bandgap semiconductor, has this structure. It is actually possible [3] to make silicon into this structure, but it is not commonly done. The wurtize lattice has more complicated primitive vectors than the zincblende lattice: 1 1 a = a^x − 31=2a^y 1 2 2 1 1 a = a^x + 31=2a^y 2 2 2 a3 = c^y; where c=a = (8=3)1=2. The basis vectors are: 1 2 b = a + a 1 3 1 3 2 2 1 1 b = a + a + a 2 3 1 3 2 2 3 1 2 b = a + a + ua 3 3 1 3 2 3 2 1 1 b = a + a + + u a ; 4 3 1 3 2 2 3 where u = 3=8. For the wurtzite lattice, b1 and b2 are basis vectors for the cation sites, and b3 and b4 are basis vectors for the anion sites. A wurtzite structure generated by the old MATLAB code used at the start of the project is shown in figure 2. Fig. 2. Wurtzite quantum dot 5 E. Adjacency matrix generation For computational modeling of a crystal system[4], we begin by iterating through integral combinations of the primitive vectors added to the appropriate basis vectors where applicable. This number of iterations is bounded by the variables xcells, ycells, and zcells, specifying the number of iterations (largest multiple) for each basis vector to be allowed. The sites for the anions and cations are stored in an n × 3 array for fast access. A matrix A over GF(2) of dimension n × m is created to store the connections between the n anions and the m cations (usually, m = n). Each element of the matrix Aij is defined as 1 if anion i is a nearest neighbor of cation j and 0 otherwise. Generation of this matrix is performed by iterating over every cation site for every anion site. F. Asymptotically optimal connectivity mapping algorithm Generating or iterating over a connectivity matrix is an inherently inefficient operation as we must perform an operation for every cation for every anion. This is asymptotically O(n2) complexity, where n is the number of atoms in the system. As n grows large, the calculations quickly become intractable. This looping structure is made even worse by the nature of most interpreted programming languages like MATLAB and Python. We present a new crystal generation algorithm that avoids these problems. We begin by finding the coordinates of all the atoms in the usual manner by finding linear combinations of primitive vectors and performing some affine transformation to offset the either the anionic or cationic sites (this analysis was performed with a simple zincblende structure, but could easily be extended to other crystal types of arbitrary shape). We improved this part slightly by multiplying the all the vectors by some constants to make all lattice points integers for fast comparisons and eliminating round-off difficulties. There is no harm in doing so because anytime a real distance is needed another proportionality factor can be used. After generating a list of all the sites, we sort the list of cation sites using Timsort, an O(n log n) sort that can take advantage of any ordering already present in the list. After we have a sorted list of cation sites, we perform an iteration over all the anion sites. For each anion, we apply the translation for all four possible crystal directions and perform the very efficient O(log n) binary tree search on the sorted list of cations to determine if that cation actually present in the system. If the atom is found, it is recorded in an adjacency list. Adjacency lists are used because low degree graphs (crystals with this sort of connectivity are essentially isomorphic to low degree non-planar graphs) are more efficiently represented in terms of lists than matrices. This includes sparse matrices as it is easier to create an iterator for a list than for a sparse matrix in most programming languages. The list stores only connected sites and can thus be iterated over in linear time. The overall asymptotic time complexity for the adjacency list generation is O(n log n). Using this data structure, it will be possible to generate the sparse Hamiltonian (described later) much faster. Using a single processor, connectivity lists (including bonding directions each site) for a 101306 atom system were generated in just under 40 seconds.
Recommended publications
  • CUDA 6 and Beyond
    MUMPS USERS DAYS JUNE 1ST / 2ND 2017 Programming heterogeneous architectures with libraries: A survey of NVIDIA linear algebra libraries François Courteille |Principal Solutions Architect, NVIDIA |[email protected] ACKNOWLEDGEMENTS Joe Eaton , NVIDIA Ty McKercher , NVIDIA Lung Sheng Chien , NVIDIA Nikolay Markovskiy , NVIDIA Stan Posey , NVIDIA Steve Rennich , NVIDIA Dave Miles , NVIDIA Peng Wang, NVIDIA Questions: [email protected] 2 AGENDA Prolegomena NVIDIA Solutions for Accelerated Linear Algebra Libraries performance on Pascal Rapid software development for heterogeneous architecture 3 PROLEGOMENA 124 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 5 ONE ARCHITECTURE BUILT FOR BOTH DATA SCIENCE & COMPUTATIONAL SCIENCE AlexNet Training Performance 70x Pascal [CELLR ANGE] 60x 16nm FinFET 50x 40x CoWoS HBM2 30x 20x [CELLR ANGE] NVLink 10x [CELLR [CELLR ANGE] ANGE] 0x 2013 2014 2015 2016 cuDNN Pascal & Volta NVIDIA DGX-1 NVIDIA DGX SATURNV 65x in 3 Years 6 7 8 8 9 NVLINK TO CPU IBM Power Systems Server S822LC (codename “Minsky”) DDR4 DDR4 2x IBM Power8+ CPUs and 4x P100 GPUs 115GB/s 80 GB/s per GPU bidirectional for peer traffic IB P8+ CPU P8+ CPU IB 80 GB/s per GPU bidirectional to CPU P100 P100 P100 P100 115 GB/s CPU Memory Bandwidth Direct Load/store access to CPU Memory High Speed Copy Engines for bulk data movement 1615 UNIFIED MEMORY ON PASCAL Large datasets, Simple programming, High performance CUDA 8 Enable Large Oversubscribe GPU memory Pascal Data Models Allocate up to system memory size CPU GPU Higher Demand
    [Show full text]
  • Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
    Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications.
    [Show full text]
  • Present and Future Leadership Computers at OLCF
    Present and Future Leadership Computers at OLCF Buddy Bland OLCF Project Director Presented at: SC’14 November 17-21, 2014 New Orleans ORNL is managed by UT-Battelle for the US Department of Energy Oak Ridge Leadership Computing Facility (OLCF) Mission: Deploy and operate the computational resources required to tackle global challenges Providing world-leading computational and data resources and specialized services for the most computationally intensive problems Providing stable hardware/software path of increasing scale to maximize productive applications development Providing the resources to investigate otherwise inaccessible systems at every scale: from galaxy formation to supernovae to earth systems to automobiles to nanomaterials With our partners, deliver transforming discoveries in materials, biology, climate, energy technologies, and basic science SC’14 Summit - Bland 2 Our Science requires that we continue to advance OLCF’s computational capability over the next decade on the roadmap to Exascale. Since clock-rate scaling ended in 2003, HPC Titan and beyond deliver hierarchical parallelism with performance has been achieved through increased very powerful nodes. MPI plus thread level parallelism parallelism. Jaguar scaled to 300,000 cores. through OpenACC or OpenMP plus vectors OLCF5: 5-10x Summit Summit: 5-10x Titan ~20 MW Titan: 27 PF Hybrid GPU/CPU Jaguar: 2.3 PF Hybrid GPU/CPU 10 MW Multi-core CPU 9 MW CORAL System 7 MW 2010 2012 2017 2022 3 SC’14 Summit - Bland Today’s Leadership System - Titan Hybrid CPU/GPU architecture, Hierarchical Parallelism Vendors: Cray™ / NVIDIA™ • 27 PF peak • 18,688 Compute nodes, each with – 1.45 TF peak – NVIDIA Kepler™ GPU - 1,311 GF • 6 GB GDDR5 memory – AMD Opteron™- 141 GF • 32 GB DDR3 memory – PCIe2 link between GPU and CPU • Cray Gemini 3-D Torus Interconnect • 32 PB / 1 TB/s Lustre® file system 4 SC’14 Summit - Bland Scientific Progress at all Scales Fusion Energy Liquid Crystal Film Stability A Princeton Plasma Physics Laboratory ORNL Postdoctoral fellow Trung Nguyen team led by C.S.
    [Show full text]
  • LARGE-SCALE COMPUTATION of PSEUDOSPECTRA USING ARPACK and EIGS∗ 1. Introduction. the Matrices in Many Eigenvalue Problems
    SIAM J. SCI. COMPUT. c 2001 Society for Industrial and Applied Mathematics Vol. 23, No. 2, pp. 591–605 LARGE-SCALE COMPUTATION OF PSEUDOSPECTRA USING ARPACK AND EIGS∗ THOMAS G. WRIGHT† AND LLOYD N. TREFETHEN† Abstract. ARPACK and its Matlab counterpart, eigs, are software packages that calculate some eigenvalues of a large nonsymmetric matrix by Arnoldi iteration with implicit restarts. We show that at a small additional cost, which diminishes relatively as the matrix dimension increases, good estimates of pseudospectra in addition to eigenvalues can be obtained as a by-product. Thus in large- scale eigenvalue calculations it is feasible to obtain routinely not just eigenvalue approximations, but also information as to whether or not the eigenvalues are likely to be physically significant. Examples are presented for matrices with dimension up to 200,000. Key words. Arnoldi, ARPACK, eigenvalues, implicit restarting, pseudospectra AMS subject classifications. 65F15, 65F30, 65F50 PII. S106482750037322X 1. Introduction. The matrices in many eigenvalue problems are too large to allow direct computation of their full spectra, and two of the iterative tools available for computing a part of the spectrum are ARPACK [10, 11]and its Matlab counter- part, eigs.1 For nonsymmetric matrices, the mathematical basis of these packages is the Arnoldi iteration with implicit restarting [11, 23], which works by compressing the matrix to an “interesting” Hessenberg matrix, one which contains information about the eigenvalues and eigenvectors of interest. For general information on large-scale nonsymmetric matrix eigenvalue iterations, see [2, 21, 29, 31]. For some matrices, nonnormality (nonorthogonality of the eigenvectors) may be physically important [30].
    [Show full text]
  • Exploring Capabilities Within Fortrilinos by Solving the 3D Burgers Equation
    Scientific Programming 20 (2012) 275–292 275 DOI 10.3233/SPR-2012-0353 IOS Press Exploring capabilities within ForTrilinos by solving the 3D Burgers equation Karla Morris a,∗, Damian W.I. Rouson a, M. Nicole Lemaster a and Salvatore Filippone b a Sandia National Laboratories, Livermore, CA, USA b Università di Roma “Tor Vergata”, Roma, Italy Abstract. We present the first three-dimensional, partial differential equation solver to be built atop the recently released, open-source ForTrilinos package (http://trilinos.sandia.gov/packages/fortrilinos). ForTrilinos currently provides portable, object- oriented Fortran 2003 interfaces to the C++ packages Epetra, AztecOO and Pliris in the Trilinos library and framework [ACM Trans. Math. Softw. 31(3) (2005), 397–423]. Epetra provides distributed matrix and vector storage and basic linear algebra cal- culations. Pliris provides direct solvers for dense linear systems. AztecOO provides iterative sparse linear solvers. We demon- strate how to build a parallel application that encapsulates the Message Passing Interface (MPI) without requiring the user to make direct calls to MPI except for startup and shutdown. The presented example demonstrates the level of effort required to set up a high-order, finite-difference solution on a Cartesian grid. The example employs an abstract data type (ADT) calculus [Sci. Program. 16(4) (2008), 329–339] that empowers programmers to write serial code that lower-level abstractions resolve into distributed-memory, parallel implementations. The ADT calculus uses compilable Fortran constructs that resemble the mathe- matical formulation of the partial differential equation of interest. Keywords: ForTrilinos, Trilinos, Fortran 2003/2008, object oriented programming 1. Introduction Burgers [4].
    [Show full text]
  • A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms
    A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms Yu Jin Joseph F. JaJa Institute for Advanced Computer Studies Institute for Advanced Computer Studies Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering University of Maryland, College Park, USA University of Maryland, College Park, USA Email: [email protected] Email: [email protected] Abstract—Spectral clustering is one of the most popular graph CPUs, further boost the overall performance and are able clustering algorithms, which achieves the best performance for to achieve very high performance on problems whose sizes many scientific and engineering applications. However, existing grow up to the capacity of CPU memory [6, 7, 8, 9, 10, implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging 11]. In this paper, we present a hybrid implementation of the Big Data applications. In this paper, we present a fast imple- spectral clustering algorithm which significantly outperforms mentation of the spectral clustering algorithm on a CPU-GPU the known implementations, most of which are purely based heterogeneous platform. Our implementation takes advantage on multi-core CPUs. of the computational power of the multi-core CPU and the There have been reported efforts on parallelizing the spec- massive multithreading and SIMD capabilities of GPUs. Given the input as data points in high dimensional space, we propose tral clustering algorithm. Zheng et al. [12] presented both a parallel scheme to build a sparse similarity graph represented CUDA and OpenMP implementations of spectral clustering. in a standard sparse representation format.
    [Show full text]
  • A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations Hasan Metin Aktulga, Md
    1 A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations Hasan Metin Aktulga, Md. Afibuzzaman, Samuel Williams, Aydın Buluc¸, Meiyue Shao, Chao Yang, Esmond G. Ng, Pieter Maris, James P. Vary Abstract—As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. We consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We present techniques to significantly improve the SpMM and the transpose operation SpMMT by using the compressed sparse blocks (CSB) format. We achieve 3–4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.
    [Show full text]
  • Slepc Users Manual Scalable Library for Eigenvalue Problem Computations
    Departamento de Sistemas Inform´aticos y Computaci´on Technical Report DSIC-II/24/02 SLEPc Users Manual Scalable Library for Eigenvalue Problem Computations https://slepc.upv.es Jose E. Roman Carmen Campos Lisandro Dalcin Eloy Romero Andr´es Tom´as To be used with slepc 3.15 March, 2021 Abstract This document describes slepc, the Scalable Library for Eigenvalue Problem Computations, a software package for the solution of large sparse eigenproblems on parallel computers. It can be used for the solution of various types of eigenvalue problems, including linear and nonlinear, as well as other related problems such as the singular value decomposition (see a summary of supported problem classes on page iii). slepc is a general library in the sense that it covers both Hermitian and non-Hermitian problems, with either real or complex arithmetic. The emphasis of the software is on methods and techniques appropriate for problems in which the associated matrices are large and sparse, for example, those arising after the discretization of partial differential equations. Thus, most of the methods offered by the library are projection methods, including different variants of Krylov and Davidson iterations. In addition to its own solvers, slepc provides transparent access to some external software packages such as arpack. These packages are optional and their installation is not required to use slepc, see x8.7 for details. Apart from the solvers, slepc also provides built-in support for some operations commonly used in the context of eigenvalue computations, such as preconditioning or the shift- and-invert spectral transformation. slepc is built on top of petsc, the Portable, Extensible Toolkit for Scientific Computation [Balay et al., 2021].
    [Show full text]
  • Comparison of Numerical Methods and Open-Source Libraries for Eigenvalue Analysis of Large-Scale Power Systems
    applied sciences Article Comparison of Numerical Methods and Open-Source Libraries for Eigenvalue Analysis of Large-Scale Power Systems Georgios Tzounas , Ioannis Dassios * , Muyang Liu and Federico Milano School of Electrical and Electronic Engineering, University College Dublin, Belfield, Dublin 4, Ireland; [email protected] (G.T.); [email protected] (M.L.); [email protected] (F.M.) * Correspondence: [email protected] Received: 30 September 2020; Accepted: 24 October 2020; Published: 28 October 2020 Abstract: This paper discusses the numerical solution of the generalized non-Hermitian eigenvalue problem. It provides a comprehensive comparison of existing algorithms, as well as of available free and open-source software tools, which are suitable for the solution of the eigenvalue problems that arise in the stability analysis of electric power systems. The paper focuses, in particular, on methods and software libraries that are able to handle the large-scale, non-symmetric matrices that arise in power system eigenvalue problems. These kinds of eigenvalue problems are particularly difficult for most numerical methods to handle. Thus, a review and fair comparison of existing algorithms and software tools is a valuable contribution for researchers and practitioners that are interested in power system dynamic analysis. The scalability and performance of the algorithms and libraries are duly discussed through case studies based on real-world electrical power networks. These are a model of the All-Island Irish Transmission System with 8640 variables; and, a model of the European Network of Transmission System Operators for Electricity, with 146,164 variables. Keywords: eigenvalue analysis; large non-Hermitian matrices; numerical methods; open-source libraries 1.
    [Show full text]
  • The Latest in Tpetra: Trilinos' Parallel Sparse Linear Algebra
    Photos placed in horizontal position with even amount of white space between photos The latest in Tpetra: Trilinos’ and header parallel sparse linear algebra Photos placed in horizontal position with even amount of white space Mark Hoemmen between photos and header Sandia National Laboratories 23 Apr 2019 SAND2019-4556 C (UUR) Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc. for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Tpetra: parallel sparse linear algebra § “Parallel”: MPI + Kokkos § Tpetra implements § Sparse graphs & matrices, & dense vecs § Parallel kernels for solving Ax=b & Ax=λx § MPI communication & (re)distribution § Tpetra supports many linear solvers § Key Tpetra features § Can manage > 2 billion (10^9) unknowns § Can pick the type of values: § Real, complex, extra precision § Automatic differentiation § Types for stochastic PDE discretizations 2 Over 15 years of Tpetra 2004-5: Stage 1 rewrite of Tpetra to use Paul 2005-10: Mid 2010: Assemble Deprecate & Sexton ill-tested I start Kokkos 2.0 new Tpetra purge for starts research- work on Kokkos team; gather Trilinos 13 Tpetra ware Tpetra 2.0 requirements 2005 2007 2009 2011 2013 2015 2017 2019 Trilinos 10.0 Trilinos 11.0 Trilinos 12.0 Trilinos 13.0 ??? Trilinos 9.0 2006 2008 2010 2012 2014 2016 2018 2020 2008: Chris Baker Fix bugs; rewrite Stage 2: purge Improve GPU+MPI
    [Show full text]
  • Warthog: a MOOSE-Based Application for the Direct Code Coupling of BISON and PROTEUS (MS-15OR04010310)
    ORNL/TM-2015/532 Warthog: A MOOSE-Based Application for the Direct Code Coupling of BISON and PROTEUS (MS-15OR04010310) Alexander J. McCaskey Approved for public release. Stuart Slattery Distribution is unlimited. Jay Jay Billings September 2015 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website: http://www.osti.gov/scitech/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-605-6000 (1-800-553-6847) TDD: 703-487-4639 Fax: 703-605-6900 E-mail: [email protected] Website: http://www.ntis.gov/help/ordermethods.aspx Reports are available to DOE employees, DOE contractors, Energy Technology Data Ex- change representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone: 865-576-8401 Fax: 865-576-5728 E-mail: [email protected] Website: http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal lia- bility or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or rep- resents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not nec- essarily constitute or imply its endorsement, recommendation, or fa- voring by the United States Government or any agency thereof.
    [Show full text]
  • Solving Applied Graph Theory Problems in the Juliagraphs Ecosystem
    Solving Applied Graph Theory Problems in the JuliaGraphs Ecosystem James P Fairbanks Oct 24, 2018 Georgia Tech Research Institute 1 Introduction The Julia Programming Language • 1.0 stable release in Aug 2018 • Multiple dispatch • Dynamic Type system • JIT Compiler • Metaprogramming • Single machine, GPU, and distributed parallelism • Open Source (MIT License) 2 Julia Performance Benchmarks Figure 1: Benchmark times relative to C (smaller is better, C performance = 1.0) (Source: julialang.org) 3 My path to Julia • Started in pure math (chalk), • Intro programming class (Java) • Grad school in CSE, Cray XMT (C) C++ was too modern • Numpy/Pandas bandwagon (Python) • Numerical Graph Algorithms (Julia) 4 Outline • LightGraphs.jl • Spectral Clustering • 2 language problem • Fake News • Future Directions 5 LightGraphs.jl 6 LightGraphs.jl is a central vertex 7 Generic Programing in LightGraphs.jl Interface for subtypes of AbstractGraph. • edges • Base.eltype • has edge • has vertex • inneighbors • ne • nv • outneighbors • vertices • is directed 8 Numerical Analysis for Spectral Partitioning Spectral Clustering is Graphs + FP Figure 2: A graph with four natural clusters. 9 Spectral Graph Matrices Graphs and Linear Algebra 1, if vi ∼ vj Adjacency Matrix Aij = 0, otherwise Degrees Dii = di = deg(vi ) all other entries 0. Combinatorial Laplacian L = D − A Normalized Laplacian Lˆ = I − D−1/2AD−1/2 Normalized Adjacency Aˆ = D−1/2AD−1/2 10 Spectral Graph Types 11 Spectral Graph Types 11 Spectral Graph Types NonBacktracking operator B(s,t),(u,v) = Ast ∗ Auv ∗ δtu ∗ (1 − δsv ) 11 Graph Clustering is Finding Block Structure Adjacency Matrix of Scrambled Block Graph Adjacency Matrix of Block Graph 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 0 500 500 1000 1000 1500 1500 destination destination 2000 2000 2500 2500 source source Figure 3: A Stochastic Block Model graph showing the recovery of clusters.
    [Show full text]