
Design of a High-Performance GEMM-like Tensor-Tensor Multiplication Paul Springer and Paolo Bientinesi Aachen Institute for Advanced Study in Computational Engineering Science Austin, Sep. 20th 2016 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 1 / 19 Outline 1 Introduction 2 GEMM-like Tensor-Tensor Multiplication 3 Tensor Contraction Code Generator 4 Performance 5 Conclusion and Future Work Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 2 / 19 Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer combine GETT, TTGT and LoG into a unified tool Essentially three approaches: We propose a novel approach: GETT1 Tensor Contraction Code Generator (TCCG) Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer combine GETT, TTGT and LoG into a unified tool We propose a novel approach: GETT1 Tensor Contraction Code Generator (TCCG) Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 combine GETT, TTGT and LoG into a unified tool Tensor Contraction Code Generator (TCCG) Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 Introduction Tensors can be thought of as higher dimensional matrices Tensor contraction can be thought of as higher dimensional GEMMs Essentially three approaches: Nested loops Transpose-Transpose-GEMM-Transpose (TTGT) Loops over GEMM (LoG) We propose a novel approach: GETT1 Akin to a high-performance GEMM implementation Adopts the BLIS methodology: Breaking through the BLAS layer Tensor Contraction Code Generator (TCCG) combine GETT, TTGT and LoG into a unified tool 1Paul Springer and Paolo Bientinesi. \Design of a high-performance GEMM-like Tensor-Tensor Multiplication". In: TOMS, in review (). Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 3 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: P Cm;n k Am;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: Cm;n Am;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: Cm;n Am;k Bk;n //N-Loop for j = 0 : N − 1 //M-Loop for i = 0 : M − 1 tmp = 0 //K-Loop(contracted) for k = 0 : K − 1 tmp += Ai;k Bk;j // updateC Ci;j = α tmp + βCi;j Naive GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Matrix-Matrix Multiplication Matrix-Matrix Multiplication (Einstein notation) M×K K×N M×N A 2 R , B 2 R and C 2 R be 2D tensors: Cm;n Am;k Bk;n //N-Loop for n = 0 : nc : N − 1 //K-Loop(contracted) for k = 0 : kc : K − 1 Bb = identify_submatrix(B , n, k ) // pack Bb into Be kc×nc Be = packB (Bb)// Be 2 R //M-Loop //N-Loop for m = 0 : mc : M − 1 for j = 0 : N − 1 //M-Loop Ab = identify_submatrix(A, m, k ) for i = 0 : M − 1 // pack Ab into Ae tmp = 0 mc×kc //K-Loop(contracted) Ae = packA (Ab)// Ae 2 R for k = 0 : K − 1 Cb = identify_submatrix(C , m, n) tmp += Ai;k Bk;j // updateC // matrix-matrix product: AeBe Ci;j = α tmp + βCi;j macroKernel(Ae; Be; Cb; α, β) Naive GEMM. High-performance GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 4 / 19 Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Tensor Contractions Tensor contraction examples: Cm;n Am;k Bk;n Cm1;m2;n Am1;m2;k Bk;n Cm1;n;m2 Am1;m2;k Bk;n Cm1;n1;n2;m2 Am1;m2;k Bn2;k;n1 Cm1;n1;n2;m2 Am1;k1;m2;k2 Bk2;n2;k1;n1 Cm1;n1;n2;m2;n3 Am1;k1;m2;k2 Bn3;k2;n2;k1;n1 ... ) Quite similar to GEMM. Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 5 / 19 Im := fm1; m2; :::; mγ g: free indices of A In := fn1; n2; :::; nζ g: free indices of B Ik := fk1; k2; :::; kξg: contracted indices These index sets Im; In and Ik are critical GETT Tensor-Tensor Multiplication (Einstein notation) SA×SA×:::SA SB×SB×:::SB Let the input tensors A 2 R 1 2 rA , and B 2 R 1 2 rB SC×SC×:::SC update the output tensor C 2 R 1 2 rC : C C αA A B B + βC C : Π (Im[In) Π (Im[Ik ) Π (In[Ik ) Π (Im[In) Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19 GETT Tensor-Tensor Multiplication (Einstein notation) SA×SA×:::SA SB×SB×:::SB Let the input tensors A 2 R 1 2 rA , and B 2 R 1 2 rB SC×SC×:::SC update the output tensor C 2 R 1 2 rC : C C αA A B B + βC C : Π (Im[In) Π (Im[Ik ) Π (In[Ik ) Π (Im[In) These index sets Im; In and Ik are critical Im := fm1; m2; :::; mγ g: free indices of A In := fn1; n2; :::; nζ g: free indices of B Ik := fk1; k2; :::; kξg: contracted indices Paul Springer (AICES) Tensor Contraction Code Generator Sep. 20th 2016 6 / 19 GETT 1 //N-Loops 2 for n1 = 1 : Sn1 3 //... remainingN-loops omitted... 4 for nζ = 1 : Snζ 5 //M-Loops 6 for m1 = 1 : Sm1 7 //... remainingM-loops omitted... 8 for mγ = 1 : Smγ 9 tmp = 0 10 //K-Loops(contracted) 11 for k1 = 1 : Sk1 12 //..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages51 Page
-
File Size-