Design of a High-Performance GEMM-Like Tensor-Tensor Multiplication

Design of a High-Performance GEMM-Like Tensor-Tensor Multiplication

Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 1 / 19 Outline 1 Introduction 2 GEMM-like Tensor-Tensor Multiplication 3 Tensor Contraction Code Generator 4 Performance 5 Conclusion and Future Work Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 2 / 19 Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer combine GETT, TTGT and LoG into a unified tool Essentially three approaches: We propose a novel approach: GETT1 Tensor Contraction Code Generator (TCCG) Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer combine GETT, TTGT and LoG into a unified tool We propose a novel approach: GETT1 Tensor Contraction Code Generator (TCCG) Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 combine GETT, TTGT and LoG into a unified tool Tensor Contraction Code Generator (TCCG) Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer Tensor Contraction Code Generator (TCCG) combine GETT, TTGT and LoG into a unified tool 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: P Cm;n k Am;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: Cm;n Am;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: Cm;n Am;k Bk;n //N-Loop for j = 0 : N − 1 //M-Loop for i = 0 : M − 1 tmp = 0 //K-Loop(contracted) for k = 0 : K − 1 tmp += Ai;k Bk;j // updateC Ci;j = α tmp + βCi;j Naive GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: Cm;n Am;k Bk;n //N-Loop for n = 0 : nc : N − 1 //K-Loop(contracted) for k = 0 : kc : K − 1 Bb = identify_submatrix(B , n, k ) // pack Bb into Be kc×nc Be = packB (Bb)// Be 2 R //M-Loop //N-Loop for m = 0 : mc : M − 1 for j = 0 : N − 1 //M-Loop Ab = identify_submatrix(A, m, k ) for i = 0 : M − 1 // pack Ab into Ae tmp = 0 mc×kc //K-Loop(contracted) Ae = packA (Ab)// Ae 2 R for k = 0 : K − 1 Cb = identify_submatrix(C , m, n) tmp += Ai;k Bk;j // updateC // matrix-matrix product: AeBe Ci;j = α tmp + βCi;j macroKernel(Ae; Be; Cb; α, β) Naive GEMM. High-performance GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... ) Quite similar to GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Im := fm1; m2; :::; mγ g: free indices of A In := fn1; n2; :::; nζ g: free indices of B Ik := fk1; k2; :::; kξg: contracted indices These index sets Im; In and Ik are critical GETT Tensor-Tensor Multiplication (Einstein notation) SA×SA×:::SA SB×SB×:::SB Let the input tensors A 2 R 1 2 rA , and B 2 R 1 2 rB SC×SC×:::SC update the output tensor C 2 R 1 2 rC : C C αA A B B + βC C : Π (Im[In) Π (Im[Ik ) Π (In[Ik ) Π (Im[In) Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19 GETT Tensor-Tensor Multiplication (Einstein notation) SA×SA×:::SA SB×SB×:::SB Let the input tensors A 2 R 1 2 rA , and B 2 R 1 2 rB SC×SC×:::SC update the output tensor C 2 R 1 2 rC : C C αA A B B + βC C : Π (Im[In) Π (Im[Ik ) Π (In[Ik ) Π (Im[In) These index sets Im; In and Ik are critical Im := fm1; m2; :::; mγ g: free indices of A In := fn1; n2; :::; nζ g: free indices of B Ik := fk1; k2; :::; kξg: contracted indices Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19 GETT 1 //N-Loops 2 for n1 = 1 : Sn1 3 //... remainingN-loops omitted... 4 for nζ = 1 : Snζ 5 //M-Loops 6 for m1 = 1 : Sm1 7 //... remainingM-loops omitted... 8 for mγ = 1 : Smγ 9 tmp = 0 10 //K-Loops(contracted) 11 for k1 = 1 : Sk1 12 //..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    51 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us