High-Performance Tensor Contraction Without Transposition
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Tensor Manipulation in GPL Maxima
Tensor Manipulation in GPL Maxima Viktor Toth http://www.vttoth.com/ February 1, 2008 Abstract GPL Maxima is an open-source computer algebra system based on DOE-MACSYMA. GPL Maxima included two tensor manipulation packages from DOE-MACSYMA, but these were in various states of disrepair. One of the two packages, CTENSOR, implemented component-based tensor manipulation; the other, ITENSOR, treated tensor symbols as opaque, manipulating them based on their index properties. The present paper describes the state in which these packages were found, the steps that were needed to make the packages fully functional again, and the new functionality that was implemented to make them more versatile. A third package, ATENSOR, was also implemented; fully compatible with the identically named package in the commercial version of MACSYMA, ATENSOR implements abstract tensor algebras. 1 Introduction GPL Maxima (GPL stands for the GNU Public License, the most widely used open source license construct) is the descendant of one of the world’s first comprehensive computer algebra systems (CAS), DOE-MACSYMA, developed by the United States Department of Energy in the 1960s and the 1970s. It is currently maintained by 18 volunteer developers, and can be obtained in source or object code form from http://maxima.sourceforge.net/. Like other computer algebra systems, Maxima has tensor manipulation capability. This capability was developed in the late 1970s. Documentation is scarce regarding these packages’ origins, but a select collection of e-mail messages by various authors survives, dating back to 1979-1982, when these packages were actively maintained at M.I.T. When this author first came across GPL Maxima, the tensor packages were effectively non-functional. -
ARM HPC Ecosystem
ARM HPC Ecosystem Darren Cepulis HPC Segment Manager ARM Business Segment Group HPC Forum, Santa Fe, NM 19th April 2017 © ARM 2017 ARM Collaboration for Exascale Programs United States Japan ARM is currently a participant in Fujitsu and RIKEN announced that the two Department of Energy funded Post-K system targeted at Exascale will pre-Exascale projects: Data be based on ARMv8 with new Scalable Movement Dominates and Fast Vector Extensions. Forward 2. European Union China Through FP7 and Horizon 2020, James Lin, vice director for the Center ARM has been involved in several of HPC at Shanghai Jiao Tong University funded pre-Exascale projects claims China will build three pre- including the Mont Blanc program Exascale prototypes to select the which deployed one of the first architecture for their Exascale system. ARM prototype HPC systems. The three prototypes are based on AMD, SunWei TaihuLight, and ARMv8. 2 © ARM 2017 ARM HPC deployments starting in 2H2017 Tw o recent announcements about ARM in HPC in Europe: 3 © ARM 2017 Japan Exascale slides from Fujitsu at ISC’16 4 © ARM 2017 Foundational SW Ecosystem for HPC . Linux OS’s – RedHat, SUSE, CENTOS, UBUNTU,… Commercial . Compilers – ARM, GNU, LLVM,… . Libraries – ARM, OpenBLAS, BLIS, ATLAS, FFTW… . Parallelism – OpenMP, OpenMPI, MVAPICH2,… . Debugging – Allinea, RW Totalview, GDB,… Open-source . Analysis – ARM, Allinea, HPCToolkit, TAU,… . Job schedulers – LSF, PBS Pro, SLURM,… . Cluster mgmt – Bright, CMU, warewulf,… Predictable Baseline 5 © ARM 2017 – now on ARM Functional Supported packages / components Areas OpenHPC defines a baseline. It is a community effort to Base OS RHEL/CentOS 7.1, SLES 12 provide a common, verified set of open source packages for Administrative Conman, Ganglia, Lmod, LosF, ORCM, Nagios, pdsh, HPC deployments Tools prun ARM’s participation: Provisioning Warewulf Resource Mgmt. -
Electromagnetic Fields from Contact- and Symplectic Geometry
ELECTROMAGNETIC FIELDS FROM CONTACT- AND SYMPLECTIC GEOMETRY MATIAS F. DAHL Abstract. We give two constructions that give a solution to the sourceless Maxwell's equations from a contact form on a 3-manifold. In both construc- tions the solutions are standing waves. Here, contact geometry can be seen as a differential geometry where the fundamental quantity (that is, the con- tact form) shows a constantly rotational behaviour due to a non-integrability condition. Using these constructions we obtain two main results. With the 3 first construction we obtain a solution to Maxwell's equations on R with an arbitrary prescribed and time independent energy profile. With the second construction we obtain solutions in a medium with a skewon part for which the energy density is time independent. The latter result is unexpected since usually the skewon part of a medium is associated with dissipative effects, that is, energy losses. Lastly, we describe two ways to construct a solution from symplectic structures on a 4-manifold. One difference between acoustics and electromagnetics is that in acoustics the wave is described by a scalar quantity whereas an electromagnetics, the wave is described by vectorial quantities. In electromagnetics, this vectorial nature gives rise to polar- isation, that is, in anisotropic medium differently polarised waves can have different propagation and scattering properties. To understand propagation in electromag- netism one therefore needs to take into account the role of polarisation. Classically, polarisation is defined as a property of plane waves, but generalising this concept to an arbitrary electromagnetic wave seems to be difficult. To understand propaga- tion in inhomogeneous medium, there are various mathematical approaches: using Gaussian beams (see [Dah06, Kac04, Kac05, Pop02, Ral82]), using Hadamard's method of a propagating discontinuity (see [HO03]) and using microlocal analysis (see [Den92]). -
Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations
Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations Albert Hartono,1 Alexander Sibiryakov,1 Marcel Nooijen,3 Gerald Baumgartner,4 David E. Bernholdt,6 So Hirata,7 Chi-Chung Lam,1 Russell M. Pitzer,2 J. Ramanujam,5 and P. Sadayappan1 1 Dept. of Computer Science and Engineering 2 Dept. of Chemistry The Ohio State University, Columbus, OH, 43210 USA 3 Dept. of Chemistry, University of Waterloo, Waterloo, Ontario N2L BG1, Canada 4 Dept. of Computer Science 5 Dept. of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 70803 USA 6 Computer Sci. & Math. Div., Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA 7 Quantum Theory Project, University of Florida, Gainesville, FL 32611 USA Abstract. Complex tensor contraction expressions arise in accurate electronic struc- ture models in quantum chemistry, such as the Coupled Cluster method. Transfor- mations using algebraic properties of commutativity and associativity can be used to significantly decrease the number of arithmetic operations required for evalua- tion of these expressions, but the optimization problem is NP-hard. Operation mini- mization is an important optimization step for the Tensor Contraction Engine, a tool being developed for the automatic transformation of high-level tensor contraction expressions into efficient programs. In this paper, we develop an effective heuristic approach to the operation minimization problem, and demonstrate its effectiveness on tensor contraction expressions for coupled cluster equations. 1 Introduction Currently, manual development of accurate quantum chemistry models is very tedious and takes an expert several months to years to develop and debug. The Tensor Contrac- tion Engine (TCE) [2, 1] is a tool that is being developed to reduce the development time to hours/days, by having the chemist specify the computation in a high-level form, from which an efficient parallel program is automatically synthesized. -
0 BLIS: a Framework for Rapidly Instantiating BLAS Functionality
0 BLIS: A Framework for Rapidly Instantiating BLAS Functionality FIELD G. VAN ZEE and ROBERT A. VAN DE GEIJN, The University of Texas at Austin The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for rapidly in- stantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental innovation is that virtually all computation within level-2 (matrix-vector) and level-3 (matrix-matrix) BLAS operations can be expressed and optimized in terms of very simple kernels. While others have had similar insights, BLIS reduces the necessary kernels to what we believe is the simplest set that still supports the high performance that the computational science community demands. Higher-level framework code is generalized and imple- mented in ISO C99 so that it can be reused and/or re-parameterized for different operations (and different architectures) with little to no modification. Inserting high-performance kernels into the framework facili- tates the immediate optimization of any BLAS-like operations which are cast in terms of these kernels, and thus the framework acts as a productivity multiplier. Users of BLAS-dependent applications are given a choice of using the the traditional Fortran-77 BLAS interface, a generalized C interface, or any other higher level interface that builds upon this latter API. Preliminary performance of level-2 and level-3 operations is observed to be competitive with two mature open source libraries (OpenBLAS and ATLAS) as well as an established commercial product (Intel MKL). Categories and Subject Descriptors: G.4 [Mathematical Software]: Efficiency General Terms: Algorithms, Performance Additional Key Words and Phrases: linear algebra, libraries, high-performance, matrix, BLAS ACM Reference Format: ACM Trans. -
The Riemann Curvature Tensor
The Riemann Curvature Tensor Jennifer Cox May 6, 2019 Project Advisor: Dr. Jonathan Walters Abstract A tensor is a mathematical object that has applications in areas including physics, psychology, and artificial intelligence. The Riemann curvature tensor is a tool used to describe the curvature of n-dimensional spaces such as Riemannian manifolds in the field of differential geometry. The Riemann tensor plays an important role in the theories of general relativity and gravity as well as the curvature of spacetime. This paper will provide an overview of tensors and tensor operations. In particular, properties of the Riemann tensor will be examined. Calculations of the Riemann tensor for several two and three dimensional surfaces such as that of the sphere and torus will be demonstrated. The relationship between the Riemann tensor for the 2-sphere and 3-sphere will be studied, and it will be shown that these tensors satisfy the general equation of the Riemann tensor for an n-dimensional sphere. The connection between the Gaussian curvature and the Riemann curvature tensor will also be shown using Gauss's Theorem Egregium. Keywords: tensor, tensors, Riemann tensor, Riemann curvature tensor, curvature 1 Introduction Coordinate systems are the basis of analytic geometry and are necessary to solve geomet- ric problems using algebraic methods. The introduction of coordinate systems allowed for the blending of algebraic and geometric methods that eventually led to the development of calculus. Reliance on coordinate systems, however, can result in a loss of geometric insight and an unnecessary increase in the complexity of relevant expressions. Tensor calculus is an effective framework that will avoid the cons of relying on coordinate systems. -
Appendix a Relations Between Covariant and Contravariant Bases
Appendix A Relations Between Covariant and Contravariant Bases The contravariant basis vector gk of the curvilinear coordinate of uk at the point P is perpendicular to the covariant bases gi and gj, as shown in Fig. A.1.This contravariant basis gk can be defined as or or a gk g  g ¼  ðA:1Þ i j oui ou j where a is the scalar factor; gk is the contravariant basis of the curvilinear coordinate of uk. Multiplying Eq. (A.1) by the covariant basis gk, the scalar factor a results in k k ðgi  gjÞ: gk ¼ aðg : gkÞ¼ad ¼ a ÂÃk ðA:2Þ ) a ¼ðgi  gjÞ : gk gi; gj; gk The scalar triple product of the covariant bases can be written as pffiffiffi a ¼ ½¼ðg1; g2; g3 g1  g2Þ : g3 ¼ g ¼ J ðA:3Þ where Jacobian J is the determinant of the covariant basis tensor G. The direction of the cross product vector in Eq. (A.1) is opposite if the dummy indices are interchanged with each other in Einstein summation convention. Therefore, the Levi-Civita permutation symbols (pseudo-tensor components) can be used in expression of the contravariant basis. ffiffiffi p k k g g ¼ J g ¼ðgi  gjÞ¼Àðgj  giÞ eijkðgi  gjÞ eijkðgi  gjÞ ðA:4Þ ) gk ¼ pffiffiffi ¼ g J where the Levi-Civita permutation symbols are defined by 8 <> þ1ifði; j; kÞ is an even permutation; eijk ¼ > À1ifði; j; kÞ is an odd permutation; : A:5 0ifi ¼ j; or i ¼ k; or j ¼ k ð Þ 1 , e ¼ ði À jÞÁðj À kÞÁðk À iÞ for i; j; k ¼ 1; 2; 3 ijk 2 H. -
The BLAS API of BLASFEO: Optimizing Performance for Small Matrices
The BLAS API of BLASFEO: optimizing performance for small matrices Gianluca Frison1, Tommaso Sartor1, Andrea Zanelli1, Moritz Diehl1;2 University of Freiburg, 1 Department of Microsystems Engineering (IMTEK), 2 Department of Mathematics email: [email protected] February 5, 2020 This research was supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) via eco4wind (0324125B) and DyConPV (0324166B), and by DFG via Research Unit FOR 2401. Abstract BLASFEO is a dense linear algebra library providing high-performance implementations of BLAS- and LAPACK-like routines for use in embedded optimization and other applications targeting relatively small matrices. BLASFEO defines an API which uses a packed matrix format as its native format. This format is analogous to the internal memory buffers of optimized BLAS, but it is exposed to the user and it removes the packing cost from the routine call. For matrices fitting in cache, BLASFEO outperforms optimized BLAS implementations, both open-source and proprietary. This paper investigates the addition of a standard BLAS API to the BLASFEO framework, and proposes an implementation switching between two or more algorithms optimized for different matrix sizes. Thanks to the modular assembly framework in BLASFEO, tailored linear algebra kernels with mixed column- and panel-major arguments are easily developed. This BLAS API has lower performance than the BLASFEO API, but it nonetheless outperforms optimized BLAS and especially LAPACK libraries for matrices fitting in cache. Therefore, it can boost a wide range of applications, where standard BLAS and LAPACK libraries are employed and the matrix size is moderate. In particular, this paper investigates the benefits in scientific programming languages such as Octave, SciPy and Julia. -
Accelerating the “Motifs” in Machine Learning on Modern Processors
Accelerating the \Motifs" in Machine Learning on Modern Processors Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Jiyuan Zhang B.S., Electrical Engineering, Harbin Institute of Technology Carnegie Mellon University Pittsburgh, PA December 2020 c Jiyuan Zhang, 2020 All Rights Reserved Acknowledgments First and foremost, I would like to thank my advisor | Franz. Thank you for being a supportive and understanding advisor. You encouraged me to pursue projects I enjoyed and guided me to be the researcher that I am today. When I was stuck on a difficult problem, you are always there to offer help and stand side by side with me to figure out solutions. No matter what the difficulties are, from research to personal life issues, you can always consider in our shoes and provide us your unreserved support. Your encouragement and sense of humor has been a great help to guide me through the difficult times. You are the key enabler for me to accomplish the PhD journey, as well as making my PhD journey less suffering and more joyful. Next, I would like to thank the members of my thesis committee { Dr. Michael Garland, Dr. Phil Gibbons, and Dr. Tze Meng Low. Thank you for taking the time to be my thesis committee. Your questions and suggestions have greatly helped shape this work, and your insights have helped me under- stand the limitations that I would have overlooked in the original draft. I am grateful for the advice and suggestions I have received from you to improve this dissertation. -
Introduchon to Arm for Network Stack Developers
Introducon to Arm for network stack developers Pavel Shamis/Pasha Principal Research Engineer Mvapich User Group 2017 © 2017 Arm Limited Columbus, OH Outline • Arm Overview • HPC SoLware Stack • Porng on Arm • Evaluaon 2 © 2017 Arm Limited Arm Overview © 2017 Arm Limited An introduc1on to Arm Arm is the world's leading semiconductor intellectual property supplier. We license to over 350 partners, are present in 95% of smart phones, 80% of digital cameras, 35% of all electronic devices, and a total of 60 billion Arm cores have been shipped since 1990. Our CPU business model: License technology to partners, who use it to create their own system-on-chip (SoC) products. We may license an instrucBon set architecture (ISA) such as “ARMv8-A”) or a specific implementaon, such as “Cortex-A72”. …and our IP extends beyond the CPU Partners who license an ISA can create their own implementaon, as long as it passes the compliance tests. 4 © 2017 Arm Limited A partnership business model A business model that shares success Business Development • Everyone in the value chain benefits Arm Licenses technology to Partner • Long term sustainability SemiCo Design once and reuse is fundamental IP Partner Licence fee • Spread the cost amongst many partners Provider • Technology reused across mulBple applicaons Partners develop • Creates market for ecosystem to target chips – Re-use is also fundamental to the ecosystem Royalty Upfront license fee OEM • Covers the development cost Customer Ongoing royalBes OEM sells • Typically based on a percentage of chip price -
Gicheru3012018jamcs43211.Pdf
Journal of Advances in Mathematics and Computer Science 30(1): 1-15, 2019; Article no.JAMCS.43211 ISSN: 2456-9968 (Past name: British Journal of Mathematics & Computer Science, Past ISSN: 2231-0851) Decomposition of Riemannian Curvature Tensor Field and Its Properties James Gathungu Gicheru1* and Cyrus Gitonga Ngari2 1Department of Mathematics, Ngiriambu Girls High School, P.O.Box 46-10100, Kianyaga, Kenya. 2Department of Mathematics, Computing and Information Technology, School of Pure and Applied Sciences, University of Embu, P.O.Box 6-60100, Embu, Kenya. Authors’ contributions This work was carried out in collaboration between both authors. Both authors read and approved the final manuscript. Article Information DOI: 10.9734/JAMCS/2019/43211 Editor(s): (1) Dr. Dragos-Patru Covei, Professor, Department of Applied Mathematics, The Bucharest University of Economic Studies, Piata Romana, Romania. (2) Dr. Sheng Zhang, Professor, Department of Mathematics, Bohai University, Jinzhou, China. Reviewers: (1) Bilal Eftal Acet, Adıyaman University, Turkey. (2) Çiğdem Dinçkal, Çankaya University, Turkey. (3) Alexandre Casassola Gonçalves, Universidade de São Paulo, Brazil. Complete Peer review History: http://www.sciencedomain.org/review-history/27899 Received: 07 July 2018 Accepted: 22 November 2018 Original Research Article Published: 21 December 2018 _______________________________________________________________________________ Abstract Decomposition of recurrent curvature tensor fields of R-th order in Finsler manifolds has been studied by B. B. Sinha and G. Singh [1] in the publications del’ institute mathematique, nouvelleserie, tome 33 (47), 1983 pg 217-220. Also Surendra Pratap Singh [2] in Kyungpook Math. J. volume 15, number 2 December, 1975 studied decomposition of recurrent curvature tensor fields in generalised Finsler spaces. -
Tensor Calculus
A Primer on Tensor Calculus David A. Clarke Saint Mary’s University, Halifax NS, Canada [email protected] June, 2011 Copyright c David A. Clarke, 2011 Contents Preface ii 1 Introduction 1 2 Definition of a tensor 3 3 The metric 9 3.1 Physical components and basis vectors ..................... 11 3.2 The scalar and inner products .......................... 14 3.3 Invariance of tensor expressions ......................... 17 3.4 The permutation tensors ............................. 18 4 Tensor derivatives 21 4.1 “Christ-awful symbols” .............................. 21 4.2 Covariant derivative ............................... 25 5 Connexion to vector calculus 30 5.1 Gradient of a scalar ................................ 30 5.2 Divergence of a vector .............................. 30 5.3 Divergence of a tensor .............................. 32 5.4 The Laplacian of a scalar ............................. 33 5.5 Curl of a vector .................................. 34 5.6 The Laplacian of a vector ............................ 35 5.7 Gradient of a vector ............................... 35 5.8 Summary ..................................... 36 5.9 A tensor-vector identity ............................. 37 6 Cartesian, cylindrical, spherical polar coordinates 39 6.1 Cartesian coordinates ............................... 40 6.2 Cylindrical coordinates .............................. 40 6.3 Spherical polar coordinates ............................ 41 7 An application to viscosity 42 i Preface These notes stem from my own need to refresh my memory on the fundamentals