GEMM-Like Tensor-Tensor Contraction

Total Page:16

File Type:pdf, Size:1020Kb

GEMM-Like Tensor-Tensor Contraction GEMM-like Tensor-Tensor Contraction Paul Springer Paolo Bientinesi AICES, RWTH Aachen University AICES, RWTH Aachen University Aachen, Germany Aachen, Germany [email protected] [email protected] 1 INTRODUCTION 2 GEMM-LIKE TENSOR CONTRACTION Tensor contractions (TCs)—a generalization of matrix-matrix mul- Let A, B, and C be tensors (i.e., multi-dimensional arrays), a tensor tiplications (GEMM) to multiple dimensions—arise in a wide range contraction is: of scientific computations such as machine learning1 [ ], spectral CI α × AI × BI + β × CI ; (1) element methods [10], quantum chemistry calculations [2], mul- C A B C tidimensional Fourier transforms [4] and climate simulations [3]. where IA, IB, and IC denote symbolic index sets. We further define Despite the close connection between TCs and matrix-matrix prod- three important index sets: Im := IA \ IC, In := IB \ IC, and ucts, the performance of the former is in general vastly inferior to Ik := IA \ IB, respectively denoting the free indices of A, the that of an optimized GEMM. To close such a gap, we propose a novel free indices B, and the contracted indices. Moreover, we adopt the approach: GEMM-like Tensor-Tensor (GETT) contraction [8]. Einstein notation, indicating that all indices of Ik are implicitly GETT captures the essential features of a high-performance summed (i.e., contracted). GEMM. In line with a highly optimized GEMM, and in contrast to Adhering to this definition, we can reformulate Equation (1) previous approaches to TCs, our high-performance implementa- such that its similarity to a matrix-matrix multiplication becomes tion of the GETT approach is fully vectorized, exploits the CPU’s apparent: cache hierarchy, avoids explicit preprocessing steps, and operates C C α ×A A ×B B +β ×C C ; (2) Π ¹Im [In º Π ¹Im [Ik º Π ¹In [Ik º Π ¹Im [In º on arbitrary dimensional subtensors while preserving stride-one A B C memory accesses. Another key design principle is the fact that where Π ; Π ; and Π denote arbitrary permutations of the in- GETT utilizes tensor transpositions [9] to pack subtensors into the dices respectively belonging to A; B, and C. The only difference caches, yielding highly efficient packing routines. between tensor contractions and matrix-matrix multiplication is A central challenge to high-performance tensor contractions—in that the former allows jIm j; jIn j; jIk j ≥ 1, while the latter requires contrast to matrix-matrix multiplications—is due to the increased that jIm j = jIn j = jIk j = 1. dimensionality of tensors, often resulting in suboptimal memory accesses; as a consequence, the performance of many tensor contrac- //N-Loop tions can be limited by the system’s memory-bandwidth (i.e., band- for n = 1 : nC : SIn 1 width-bound) as opposed to its floating-point units (i.e., compute- //K-Loop(contracted) for k = 1 : k : S 2 bound). Hence, developing a systematic way to exploit the system’s C Ik B = identify_subtensor¹B; n; kº caches is critical to increase the performance of TCs and push them b // pack Bb into Be from the bandwidth-bound regime to the compute-bound regime. Be = packB (Bb) 3 Previous approaches to TCs, such as Loop-over-GEMM (LoG) //M-Loop or Transpose-Transpose-GEMM-Transpose (TTGT) rely on highly for m = 1 : mC : SIm 4 optimized matrix-matrix multiplications to attain high performance. Ab = identify_subtensor¹A; m; kº However, both approaches suffer from inherent disadvantages: Fore- // pack Ab into Ae most, both LoG as well as TTGT exhibit suboptimal I/O cost, mak- Ae = packA (Ab) 5 ing them especially less effective in the bandwidth-bound regime. Cb = identify_subtensor¹C; m; nº While TTGT requires additional auxiliary memory, increasing the // matrix-matrix product: Cb α Ae × Be + β Cb macroKernel(A; B; C; α; β ) 6 memory footprint of the tensor contraction by up to 2×, LoG is not e e b applicable to all tensor contractions without relying on a strided Listing 1: GETT design. (suboptimal) GEMM implementation. Simultaneously to our research, Matthews [6] recently intro- Blocking & Packing. Listing 1 outlines the GEMM-like design of duced another “native” tensor contraction algorithm (TBLIS) that GETT, which is inspired by BLIS [11]. GETT begins to block along also adopts a GEMM-like design, attaining the same I/O cost as the indices of In ( 1 ) and Ik ( 2 ), at this stage the size of the sub- an equivalent-sized matrix-matrix product. TBLIS—in contrast to tensor Bb of B is constrained and can be packed into a local buffer GETT—does not offer multi-dimensional packing routines, which Be via tensor transpositions ( 3 ). The next steps involve blocking can result in strided memory accesses. However, TBLIS is already along the indices of Im ( 4 ) and packing the subtensor Abof A into available as a C++ library that avoids the code-generation process a local buffer Ae. At this point the data is suitable prepared for the currently required by GETT. macro-kernel—a specialized form of an in-cache matrix-matrix prod- uct ( 6 ). This design resembles that of a high-performance GEMM Super Computing ’17, November 2017, Denver, CO, USA 2017. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 while the additional complexity due to the higher dimensionality https://doi.org/10.1145/nnnnnnn.nnnnnnn of the tensors has been moved into the packing routines. main memory registers 80 L3 L2 L1 TBLIS 2nd loop / macro-kernel 70 Eigen GETT (w/o k) nC 60 GETT (w/ k) kC 50 kC 40 mC += × 30 GFLOPs /s20 /core d a 10 ¹ º A C m1; m2; n1; n2 e o Be b l e e e e m1ke1m2 n1ke1n2 e e e e 0 1st loop 1 0 200 400 600 800 1000 test case nR Figure 2: Performance for TCs with varying arithmetic in- k C tensity (sorted w.r.t. GETT). Host: 2× Intel Xeon E5-2680 v3 using 24 threads. kC += mC × frameworks. These speedups highlight GETT’s excellent perfor- mance. Moreover, comparing the orange triangles (with parallel-k) Cb¹me1; me2; ne1º Ae Be me1ke1me2 ne1ke1 with the red triangles (without parallel-k) indicates that some ten- e m t micro-kernel sor contraction benefit from GETT’s ability to parallelize the loop a a n e R d r associated to Ik . p t u s 3 2 3 CONCLUSION nR kC kC We presented GETT, a novel GEMM-like approach for tensor con- mR = mR × tractions. GETT—similarly to a high-performance GEMM—is fully vectorized, blocks for all levels of the cache hierarchy as well as for registers, and parallelizes all five loops around its micro-kernel in- Cem n Ae Be e1e1 me1ke1 ne1ke1 dividually to yield good load-balancing. Moreover, it utilizes highly optimized tensor transpositions for its packing routines to attain Figure 1: Macro-kernel & micro-kernel. high bandwidth. These desirable properties reflect positively on GETT’s performance. Macro-Kernel. The sizes of the packed subtensors are chosen such that Ae and Be respectively reside in the L2- and L3-caches REFERENCES during the invocation to the macro-kernel (see Figure 1); the macro- [1] Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, kernel exposes two additional loops around a micro-kernel that Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, etal. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. computes a fat outer-product of two appropriate slivers from Ae (2015). https://www.tensorflow.org and B that fit into the L1-cache; the sizes of m and n are chosen [2] Rodney J. Bartlett and Monika Musiał. 2007. Coupled-cluster theory in quantum e R R chemistry. Reviews in Modern Physics 79, 1 (2007), 291–352. such that the corresponding tile of Ce is kept in registers. [3] John Drake, Ian Foster, John Michalakes, Brian Toonen, and Patrick Worley. 1995. Design and performance of a scalable parallel community climate model. Parallel Parallelism. GETT exposes a total of five loops around the micro- Comput. 21, 10 (1995), 1571–1591. kernel that can be parallelized in a nested fashion [7]; a performance [4] Matteo Frigo and Steven G. Johnson. 2005. The Design and Implementation of FFTW3. Proc. IEEE 93, 2 (Feb 2005), 216–231. heuristic is used to select an appropriate parallelization strategy [5] Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3.3.4. that maximizes load-balancing while staying within the cache limi- http://eigen.tuxfamily.org. (2010). tations. Special care has to be taking when the loop associated to [6] Devin A. Matthews. 2016. High-Performance Tensor Contraction without BLAS. CoRR abs/1607.00291 (2016). http://arxiv.org/abs/1607.00291 Ik (i.e., 4th loop around the micro-kernel) ought to be parallelized [7] Tyler M. Smith, Robert A. van de Geijn, Mikhail Smelyanskiy, Jeff R. Hammond, due to the simultaneous updates to the same portion of C. Thus, and Field G. Van Zee. 2014. Anatomy of High-Performance Many-Threaded Matrix Multiplication. In 28th IEEE International Parallel & Distributed Processing the 4th loop is only parallelized if the available parallelism along Symposium (IPDPS 2014). the Im and In dimensions is too few (i.e., C is very small); in that [8] Paul Springer and Paolo Bientinesi. 2016. Design of a High-Performance GEMM- case GETT manages multiple copies of C and reduces these copies like Tensor-Tensor Multiplication. Technical Report. arXiv:1607.00145 Under review for ACM Transactions on Mathematical Software. after the macro-kernel has completed. [9] Paul Springer, Tong Su, and Paolo Bientinesi. 2017. HPTT: A High-Performance Tensor Transposition C++ Library.
Recommended publications
  • Tensor Manipulation in GPL Maxima
    Tensor Manipulation in GPL Maxima Viktor Toth http://www.vttoth.com/ February 1, 2008 Abstract GPL Maxima is an open-source computer algebra system based on DOE-MACSYMA. GPL Maxima included two tensor manipulation packages from DOE-MACSYMA, but these were in various states of disrepair. One of the two packages, CTENSOR, implemented component-based tensor manipulation; the other, ITENSOR, treated tensor symbols as opaque, manipulating them based on their index properties. The present paper describes the state in which these packages were found, the steps that were needed to make the packages fully functional again, and the new functionality that was implemented to make them more versatile. A third package, ATENSOR, was also implemented; fully compatible with the identically named package in the commercial version of MACSYMA, ATENSOR implements abstract tensor algebras. 1 Introduction GPL Maxima (GPL stands for the GNU Public License, the most widely used open source license construct) is the descendant of one of the world’s first comprehensive computer algebra systems (CAS), DOE-MACSYMA, developed by the United States Department of Energy in the 1960s and the 1970s. It is currently maintained by 18 volunteer developers, and can be obtained in source or object code form from http://maxima.sourceforge.net/. Like other computer algebra systems, Maxima has tensor manipulation capability. This capability was developed in the late 1970s. Documentation is scarce regarding these packages’ origins, but a select collection of e-mail messages by various authors survives, dating back to 1979-1982, when these packages were actively maintained at M.I.T. When this author first came across GPL Maxima, the tensor packages were effectively non-functional.
    [Show full text]
  • Electromagnetic Fields from Contact- and Symplectic Geometry
    ELECTROMAGNETIC FIELDS FROM CONTACT- AND SYMPLECTIC GEOMETRY MATIAS F. DAHL Abstract. We give two constructions that give a solution to the sourceless Maxwell's equations from a contact form on a 3-manifold. In both construc- tions the solutions are standing waves. Here, contact geometry can be seen as a differential geometry where the fundamental quantity (that is, the con- tact form) shows a constantly rotational behaviour due to a non-integrability condition. Using these constructions we obtain two main results. With the 3 first construction we obtain a solution to Maxwell's equations on R with an arbitrary prescribed and time independent energy profile. With the second construction we obtain solutions in a medium with a skewon part for which the energy density is time independent. The latter result is unexpected since usually the skewon part of a medium is associated with dissipative effects, that is, energy losses. Lastly, we describe two ways to construct a solution from symplectic structures on a 4-manifold. One difference between acoustics and electromagnetics is that in acoustics the wave is described by a scalar quantity whereas an electromagnetics, the wave is described by vectorial quantities. In electromagnetics, this vectorial nature gives rise to polar- isation, that is, in anisotropic medium differently polarised waves can have different propagation and scattering properties. To understand propagation in electromag- netism one therefore needs to take into account the role of polarisation. Classically, polarisation is defined as a property of plane waves, but generalising this concept to an arbitrary electromagnetic wave seems to be difficult. To understand propaga- tion in inhomogeneous medium, there are various mathematical approaches: using Gaussian beams (see [Dah06, Kac04, Kac05, Pop02, Ral82]), using Hadamard's method of a propagating discontinuity (see [HO03]) and using microlocal analysis (see [Den92]).
    [Show full text]
  • Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations
    Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations Albert Hartono,1 Alexander Sibiryakov,1 Marcel Nooijen,3 Gerald Baumgartner,4 David E. Bernholdt,6 So Hirata,7 Chi-Chung Lam,1 Russell M. Pitzer,2 J. Ramanujam,5 and P. Sadayappan1 1 Dept. of Computer Science and Engineering 2 Dept. of Chemistry The Ohio State University, Columbus, OH, 43210 USA 3 Dept. of Chemistry, University of Waterloo, Waterloo, Ontario N2L BG1, Canada 4 Dept. of Computer Science 5 Dept. of Electrical and Computer Engineering Louisiana State University, Baton Rouge, LA 70803 USA 6 Computer Sci. & Math. Div., Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA 7 Quantum Theory Project, University of Florida, Gainesville, FL 32611 USA Abstract. Complex tensor contraction expressions arise in accurate electronic struc- ture models in quantum chemistry, such as the Coupled Cluster method. Transfor- mations using algebraic properties of commutativity and associativity can be used to significantly decrease the number of arithmetic operations required for evalua- tion of these expressions, but the optimization problem is NP-hard. Operation mini- mization is an important optimization step for the Tensor Contraction Engine, a tool being developed for the automatic transformation of high-level tensor contraction expressions into efficient programs. In this paper, we develop an effective heuristic approach to the operation minimization problem, and demonstrate its effectiveness on tensor contraction expressions for coupled cluster equations. 1 Introduction Currently, manual development of accurate quantum chemistry models is very tedious and takes an expert several months to years to develop and debug. The Tensor Contrac- tion Engine (TCE) [2, 1] is a tool that is being developed to reduce the development time to hours/days, by having the chemist specify the computation in a high-level form, from which an efficient parallel program is automatically synthesized.
    [Show full text]
  • The Riemann Curvature Tensor
    The Riemann Curvature Tensor Jennifer Cox May 6, 2019 Project Advisor: Dr. Jonathan Walters Abstract A tensor is a mathematical object that has applications in areas including physics, psychology, and artificial intelligence. The Riemann curvature tensor is a tool used to describe the curvature of n-dimensional spaces such as Riemannian manifolds in the field of differential geometry. The Riemann tensor plays an important role in the theories of general relativity and gravity as well as the curvature of spacetime. This paper will provide an overview of tensors and tensor operations. In particular, properties of the Riemann tensor will be examined. Calculations of the Riemann tensor for several two and three dimensional surfaces such as that of the sphere and torus will be demonstrated. The relationship between the Riemann tensor for the 2-sphere and 3-sphere will be studied, and it will be shown that these tensors satisfy the general equation of the Riemann tensor for an n-dimensional sphere. The connection between the Gaussian curvature and the Riemann curvature tensor will also be shown using Gauss's Theorem Egregium. Keywords: tensor, tensors, Riemann tensor, Riemann curvature tensor, curvature 1 Introduction Coordinate systems are the basis of analytic geometry and are necessary to solve geomet- ric problems using algebraic methods. The introduction of coordinate systems allowed for the blending of algebraic and geometric methods that eventually led to the development of calculus. Reliance on coordinate systems, however, can result in a loss of geometric insight and an unnecessary increase in the complexity of relevant expressions. Tensor calculus is an effective framework that will avoid the cons of relying on coordinate systems.
    [Show full text]
  • Appendix a Relations Between Covariant and Contravariant Bases
    Appendix A Relations Between Covariant and Contravariant Bases The contravariant basis vector gk of the curvilinear coordinate of uk at the point P is perpendicular to the covariant bases gi and gj, as shown in Fig. A.1.This contravariant basis gk can be defined as or or a gk g  g ¼  ðA:1Þ i j oui ou j where a is the scalar factor; gk is the contravariant basis of the curvilinear coordinate of uk. Multiplying Eq. (A.1) by the covariant basis gk, the scalar factor a results in k k ðgi  gjÞ: gk ¼ aðg : gkÞ¼ad ¼ a ÂÃk ðA:2Þ ) a ¼ðgi  gjÞ : gk gi; gj; gk The scalar triple product of the covariant bases can be written as pffiffiffi a ¼ ½¼ðg1; g2; g3 g1  g2Þ : g3 ¼ g ¼ J ðA:3Þ where Jacobian J is the determinant of the covariant basis tensor G. The direction of the cross product vector in Eq. (A.1) is opposite if the dummy indices are interchanged with each other in Einstein summation convention. Therefore, the Levi-Civita permutation symbols (pseudo-tensor components) can be used in expression of the contravariant basis. ffiffiffi p k k g g ¼ J g ¼ðgi  gjÞ¼Àðgj  giÞ eijkðgi  gjÞ eijkðgi  gjÞ ðA:4Þ ) gk ¼ pffiffiffi ¼ g J where the Levi-Civita permutation symbols are defined by 8 <> þ1ifði; j; kÞ is an even permutation; eijk ¼ > À1ifði; j; kÞ is an odd permutation; : A:5 0ifi ¼ j; or i ¼ k; or j ¼ k ð Þ 1 , e ¼ ði À jÞÁðj À kÞÁðk À iÞ for i; j; k ¼ 1; 2; 3 ijk 2 H.
    [Show full text]
  • Gicheru3012018jamcs43211.Pdf
    Journal of Advances in Mathematics and Computer Science 30(1): 1-15, 2019; Article no.JAMCS.43211 ISSN: 2456-9968 (Past name: British Journal of Mathematics & Computer Science, Past ISSN: 2231-0851) Decomposition of Riemannian Curvature Tensor Field and Its Properties James Gathungu Gicheru1* and Cyrus Gitonga Ngari2 1Department of Mathematics, Ngiriambu Girls High School, P.O.Box 46-10100, Kianyaga, Kenya. 2Department of Mathematics, Computing and Information Technology, School of Pure and Applied Sciences, University of Embu, P.O.Box 6-60100, Embu, Kenya. Authors’ contributions This work was carried out in collaboration between both authors. Both authors read and approved the final manuscript. Article Information DOI: 10.9734/JAMCS/2019/43211 Editor(s): (1) Dr. Dragos-Patru Covei, Professor, Department of Applied Mathematics, The Bucharest University of Economic Studies, Piata Romana, Romania. (2) Dr. Sheng Zhang, Professor, Department of Mathematics, Bohai University, Jinzhou, China. Reviewers: (1) Bilal Eftal Acet, Adıyaman University, Turkey. (2) Çiğdem Dinçkal, Çankaya University, Turkey. (3) Alexandre Casassola Gonçalves, Universidade de São Paulo, Brazil. Complete Peer review History: http://www.sciencedomain.org/review-history/27899 Received: 07 July 2018 Accepted: 22 November 2018 Original Research Article Published: 21 December 2018 _______________________________________________________________________________ Abstract Decomposition of recurrent curvature tensor fields of R-th order in Finsler manifolds has been studied by B. B. Sinha and G. Singh [1] in the publications del’ institute mathematique, nouvelleserie, tome 33 (47), 1983 pg 217-220. Also Surendra Pratap Singh [2] in Kyungpook Math. J. volume 15, number 2 December, 1975 studied decomposition of recurrent curvature tensor fields in generalised Finsler spaces.
    [Show full text]
  • Tensor Calculus
    A Primer on Tensor Calculus David A. Clarke Saint Mary’s University, Halifax NS, Canada [email protected] June, 2011 Copyright c David A. Clarke, 2011 Contents Preface ii 1 Introduction 1 2 Definition of a tensor 3 3 The metric 9 3.1 Physical components and basis vectors ..................... 11 3.2 The scalar and inner products .......................... 14 3.3 Invariance of tensor expressions ......................... 17 3.4 The permutation tensors ............................. 18 4 Tensor derivatives 21 4.1 “Christ-awful symbols” .............................. 21 4.2 Covariant derivative ............................... 25 5 Connexion to vector calculus 30 5.1 Gradient of a scalar ................................ 30 5.2 Divergence of a vector .............................. 30 5.3 Divergence of a tensor .............................. 32 5.4 The Laplacian of a scalar ............................. 33 5.5 Curl of a vector .................................. 34 5.6 The Laplacian of a vector ............................ 35 5.7 Gradient of a vector ............................... 35 5.8 Summary ..................................... 36 5.9 A tensor-vector identity ............................. 37 6 Cartesian, cylindrical, spherical polar coordinates 39 6.1 Cartesian coordinates ............................... 40 6.2 Cylindrical coordinates .............................. 40 6.3 Spherical polar coordinates ............................ 41 7 An application to viscosity 42 i Preface These notes stem from my own need to refresh my memory on the fundamentals
    [Show full text]
  • Conformal Differential Geometry of a Subspace
    CONFORMAL DIFFERENTIAL GEOMETRY OF A SUBSPACE BY AARON FIALKOW Table of contents Introduction. 310 I. Preliminary matters 1. Notation. 315 2. Riemann spaces conformai to Vm. 316 3. Conformai tensors. 318 4. Algebraic and metric properties. 319 5. Generalized covariant differentiation and the second fundamental form of V„. 323 II. Conform al tensor algebra 6. The relative conformai curvature of Vn. 325 7. The conformai measure tensors. 329 III. Conformal tensor analysis 8. Conformal differentiation. 331 9. The replacement principle. 335 10. Conformal geodesic and conformal normal coordinates and parameters. 341 11. The Weyl tensors. 342 12. The Bianchi identities for the Weyl tensors. 346 13. The deviation tensor. 350 14. The conformal Riemann tensor of V„. 352 15. Variation of the mean curvature normal. 354 16. The conformal Riemann tensor of Vm. 355 17. The Ricci identities. 359 IV. The conformal vector spaces of Vn 18. The conformal osculating and normal vector spaces of V„. 359 19. The conformal normal measure tensors. 362 20. The conformal Frenet equations. 365 V. The fundamental equations 21. The boundary, algebraic, reality and derived conditions. 370 22. The conformal Gauss-Codazzi equations. 373 23. Higher integrability conditions. 376 24. The third derivatives of the relative conformal curvature. 379 25. The second derivatives of the mean curvature. Discussion for a hypersurface. 383 26. The second derivatives of the mean curvature normal. Discussion for the general case 385 VI. SUBSPACES IN A CONFORMALLY EUCLIDEAN SPACE Rm 27. Symmetry relations. 392 28. Determination of &£$,, U$Ü¿>,£«„>I<W. 396 Presented to the Society, October 26, 1940 and February 22, 1941; received by the editors February 1, 1943, and, in revised form, December 15, 1943.
    [Show full text]
  • Lecture Notes on General Relativity Sean M
    Lecture Notes on General Relativity Sean M. Carroll Institute for Theoretical Physics University of California Santa Barbara, CA 93106 [email protected] December 1997 Abstract These notes represent approximately one semester’s worth of lectures on intro- ductory general relativity for beginning graduate students in physics. Topics include manifolds, Riemannian geometry, Einstein’s equations, and three applications: grav- itational radiation, black holes, and cosmology. Individual chapters, and potentially updated versions, can be found at http://itp.ucsb.edu/~carroll/notes/. arXiv:gr-qc/9712019v1 3 Dec 1997 NSF-ITP/97-147 gr-qc/9712019 i Table of Contents 0. Introduction table of contents — preface — bibliography 1. Special Relativity and Flat Spacetime the spacetime interval — the metric — Lorentz transformations — spacetime diagrams — vectors — the tangent space — dual vectors — tensors — tensor products — the Levi-Civita tensor — index manipulation — electromagnetism — differential forms — Hodge duality — worldlines — proper time — energy-momentum vector — energy- momentum tensor — perfect fluids — energy-momentum conservation 2. Manifolds examples — non-examples — maps — continuity — the chain rule — open sets — charts and atlases — manifolds — examples of charts — differentiation — vectors as derivatives — coordinate bases — the tensor transformation law — partial derivatives are not tensors — the metric again — canonical form of the metric — Riemann normal coordinates — tensor densities — volume forms and integration 3. Curvature
    [Show full text]
  • M.Sc. (MATHEMATICS) MAL-633
    M.Sc. (MATHEMATICS) MECHANICS OF SOLIDS-I MAL-633 DIRECTORATE OF DISTANCE EDUCATION GURU JAMBHESHWAR UNIVERSITY OF SCIENCE & TECHNOLOGY HISAR, HARYANA-125001. MAL-633 MECHANICS OF SOLIDS-I Contents: Chapter -1 ………………………………………………………………… Page: 1-21 1.1. Introduction 01 1.1.1.1. Rank/Order of tensor 02 1.1.1.2. Characteristics of the tensors 02 1.2 Notation and Summation Convention 03 1.3 Law of Transformation 04 1.4 Some properties of tensor 11 1.5 Contraction of a Tensor 15 1.6 Quotient Law of Tensors 16 1.7 Symmetric and Skew Symmetric tensors 18 Chapter - II ……………………………………………………………… Page: 18-32 2.1 The Symbol ij 19 2.1.1. Tensor Equation 20 2.2 The Symbol ijk 24 2.3 Isotropic Tensors 26 2.4 Contravariant tensors(vectors) 29 2.5 Covariant Vectors 30 Chapter –III …………………………………………………………….. Page: 33-38 3.1 Eigen Values and Eigen Vectors 33 Chapter –IV ............................................................................................ Page: 39-65 4.1 Introduction 39 4.2 Transformation of an Elastic Body 40 4.3 Linear Transformation or Affine Transformation 42 4.4 Small/Infinitesimal Linear Deformation 44 4.5 Homogeneous Deformation 45 4.6 Pure Deformation and Components of Strain Tensor 52 4.7 Geometrical Interpretation of the Components of Strain 56 4.8 Normal and Tangential Displacement 64 Chapter –V ..……………………………………………………………… Page: 66-99 5.1 Strain Quadric of Cauchy 66 5.2 Strain Components at a point in a rotation of coordinate axes 69 5.3 Principal Strains and Invariants 71 5.4 General Infinitesimal Deformation 82 5.5 Saint-Venant’s
    [Show full text]
  • A Code Generator for High-Performance Tensor Contractions on Gpus
    A Code Generator for High-Performance Tensor Contractions on GPUs Jinsung Kim∗, Aravind Sukumaran-Rajam∗, Vineeth Thumma∗, Sriram Krishnamoorthy†, Ajay Panyala†, Louis-Noel¨ Pouchet‡, Atanas Rountev∗, P. Sadayappan∗ ∗The Ohio State University, Ohio, USA {kim.4232, sukumaranrajam.1, thumma.6, rountev.1, sadayappan.1}@osu.edu †Pacific Northwest National Laboratory, Washington, USA {sriram, ajay.panyala}@pnnl.gov ‡Colorado State University, Colorado, USA [email protected] Abstract—Tensor contractions are higher dimensional general- The common solution in use today for performing tensor izations of matrix-matrix multiplication. They form the compute- contractions is to first perform suitable index permutation, i.e., intensive core of many applications in computational science layout transformation of one or both input tensors, to move and data science. In this paper, we describe a high-performance GPU code generator for arbitrary tensor contractions. It exploits all contraction indices to the left end or the right end, so domain-specific properties about data reuse in tensor contrac- that the needed sums of products for the contraction can be tions to devise an effective code generation schema, coupled with performed using matrix-matrix multiplication. A further final an effective model-driven search, to determine parameters for index permutation of the result from the matrix multiplication mapping of computation to threads and staging of data through may also be required. However, as elaborated in the next the GPU memory hierarchy. Experimental evaluation using a set of tensor contraction benchmarks demonstrates performance section, this approach, referred to as the TTGT (Transpose- improvement and/or significantly reduced code generation time Transpose-GEMM-Transpose), incurs several disadvantages.
    [Show full text]
  • A Framework for Performance Optimization of Tensor Contraction Expressions
    A Framework for Performance Optimization of Tensor Contraction Expressions DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Pai-Wei Lai, M.S. Graduate Program in Computer Science and Engineering The Ohio State University 2014 Dissertation Committee: P. Sadayappan, Advisor Gagan Agrawal Atanas Rountev © Copyright by Pai-Wei Lai 2014 ABSTRACT Attaining high performance and productivity in the evaluation of scientific ap- plications is a challenging task for computer scientists, and is often critical in the advancement of many scientific disciplines. In this dissertation, we focus on the de- velopment of high performance, scalable parallel programs for a class of scientific computations in quantum chemistry — tensor contraction expressions. Tensor contraction expressions are generalized forms of multi-dimensional matrix- matrix operations, which form the fundamental computational constructs in elec- tronic structure modeling. Tensors in these computations exhibit various types of symmetry and sparsity. Contractions on such tensors are highly irregular with sig- nificant computation and communication cost, if data locality is not considered in the implementation. Prior efforts have focused on implementing tensor contrac- tions using block-sparse representation. Many parallel programs of tensor contrac- tions have been successfully implemented, however, their performances are unsat- isfactory on emerging computer systems. In this work, we investigate into several performance bottlenecks of previous ap- proaches, and present responding techniques to optimize operations, parallelism, workload balance, and data locality. We exploit symmetric properties of tensors to minimize the operation count of tensor contraction expressions through algebraic ii transformation. Rules are formulated to discover symmetric properties of inter- mediate tensors; cost models and algorithms are developed to reduce operation counts.
    [Show full text]