What Is a Tensor? (Part II)
Total Page:16
File Type:pdf, Size:1020Kb
What is a tensor? (Part II) Lek-Heng Lim University of Chicago goals of these tutorials • very elementary introduction to tensors • through the lens of linear algebra and numerical linear algebra • highlight their roles in computations • trace chronological development through three definitions a multi-indexed object that satisfies tensor transformation rules Á a multilinear map  an element of a tensor product of vector spaces • last week: | trickiest one to discuss • today: Á and  | much easier modern definitions 1 recap from part I • take home idea: a tensor is defined by its transformation rule • example: is this 2 3 1 2 3 6 7 42 3 45 3 4 5 a tensor? • makes no sense • definition requires a context 2 recap from part I • if it comes from a measurement of stress 2 3 σ11 σ12 σ13 6 7 4σ21 σ22 σ235 σ31 σ32 σ33 then it is a contravariant 2-tensor • if we are interested in its eigenvalues and eigenvectors (XAX −1)Xv = λXv then it is a mixed 2-tensor • if we are interested in its Hadamard product 2 3 2 3 2 3 a11 a12 a13 b11 b12 b13 a11b11 a12b12 a13b13 6 7 6 7 6 7 4a21 a22 a235 ◦ 4b21 b22 b235 = 4a21b21 a22b22 a23b235 a31 a32 a33 b31 b32 b33 a31b31 a32b32 a33b33 then it is not a tensor 3 recap from part I • indices tell us nothing • A 2 Rm×n has two indices but if transformation rule is 0 0 −T −T −T A = XA = [Xa1;:::; Xan] or A = X A = [X a1;:::; X an] then it is covariant or contravariant 1-tensor respectively • e.g., Householder QR algorithm, equivariant neural networks | all covariant 1-tensors no matter how many indices • want coordinate-free definition that does not depend on indices 4 tensors via multilinear maps multilinear maps • V1;:::; Vd and W vector spaces • multilinear map or d-linear map is ' : V1 × · · · × Vd ! W with 0 0 Φ(v; : : : ; λvk + λ vk ;:::; vd ) 0 0 = λΦ(v1;:::; vk ;:::; vd ) + λ Φ(v1;:::; vk ;:::; vd ) 0 0 for v1 2 V1;:::; vk ; vk 2 Vk ;:::; vd 2 Vd , λ, λ 2 R d • write M (V1;:::; Vd ; W) for set of all such maps 1 • write M (V; W) = L(V; W) for linear maps 5 why important linearity principle: almost any natural process is linear in small amounts almost everywhere multilinearity principle: if we keep all but one factors constant, the varying factor obeys principle of linearity Hooke's and Ohm's laws both linear but not if we pass a current through the spring or stretch the resistor 6 d = 1 ∗ ∗ ∗ • B = fv1;:::; vmg basis of V, B = fv1 ;:::; vmg dual basis • any v 2 V uniquely represented by a 2 Rm 2 3 a1 : 6 . 7 m V 3 v = a1v1 + ··· + amvm ! [v]B = 4 . 5 2 R am • any ' 2 V∗ uniquely represented by b 2 Rm 2 3 b1 ∗ ∗ ∗ . m ∗ : 6 . 7 V 3 ' = b1v1 + ··· + bmvm ! [']B = 4 . 5 2 R bm 7 d = 1 0 0 m×m • if C = fv1;:::; vmg another basis and X 2 R is m 0 X vj = xij vi i=1 • change-of-basis theorem: −1 T [v]C = X [v]B; [']C ∗ = X [']B∗ • recover transformation rules for contravariant and covariant 1-tensors a0 = X −1a; b0 = X Tb • vectors ! contravariant 1-tensors • linear functionals ! covariant 1-tensors 8 d = 2 • bases A = fu1;:::; ung on U, B = fv1;:::; vmg on V • linear operator Φ : U ! V has matrix representation m×n [Φ]A ;B = A 2 R where m X Φ(uj ) = aij vi i=1 • new bases A 0 and B0 0 m×n [Φ]A 0;B0 = A 2 R • change-of-basis theorem: if X 2 GL(m) change-of-basis matrix on V, Y 2 GL(n) change-of-basis matrix on U, then A0 = X −1AY 9 d = 2 • special case U = V with A = B and A 0 = B0 A0 = X −1AX • recover transformation rules for mixed 2-tensors • bilinear functional β : U × V ! R with β(λu + λ0u0; v) = λβ(u; v) + λ0β(u0; v); β(u; λv + λ0v 0) = λβ(u; v) + λ0β(u; v 0) for all u; u0 2 U, v; v 0 2 V, λ, λ0 2 R • matrix representation of β m×n [β]A ;B = A 2 R given by aij = β(ui ; vj ) 10 d = 2 • change-of-basis theorem: if 0 m×n [β]A 0;B0 = A 2 R ; then A0 = X TAY • special case U = V, A = B, A 0 = B0 A0 = X TAX • recover transformation rules for covariant 2-tensors • linear operators ! mixed 2-tensors • bilinear functionals ! covariant 2-tensors 11 1- and 2-tensor transformation rules contravariant 1-tensor: a0 = X −1a a0 = Xa covariant 1-tensor: a0 = X Ta a0 = X −Ta covariant 2-tensor: A0 = X TAX A0 = X −TAX −1 contravariant 2-tensor: A0 = X −1AX −T A0 = XAX T mixed 2-tensor: A0 = X −1AX A0 = XAX −1 contravariant 2-tensor: A0 = X −1AY −T A0 = XAY T covariant 2-tensor: A0 = X TAY A0 = X −TAY −1 mixed 2-tensor: A0 = X −1AY A0 = XAY −1 12 d = 3 • bilinear operator B : U × V ! W, B(λu + λ0u0; v) = λB(u; v) + λ0B(u0; v); B(u; λv + λ0v 0) = λB(u; v) + λ0B(u; v 0) • bases A = fu1;:::; ung, B = fv1;:::; vmg, C = fw1;:::; wpg, p X B(ui ; vj ) = aijk wk i=1 • change-of-basis theorem: if 0 m×n×p [B]A ;B;C = A and [B]A 0;B0;C 0 = A 2 R ; then A0 = (X T; Y T; Z −1) · A 13 d = 3 • trilinear functional τ : U × V × W ! R, τ(λu + λ0u0; v; w) = λτ(u; v; w) + λ0τ(u0; v; w); τ(u; λv + λ0v 0; w) = λτ(u; v; w) + λ0τ(u; v 0; w); τ(u; v; λw + λ0w 0) = λτ(u; v; w) + λ0τ(u; v; w 0) • bases A = fu1;:::; ung, B = fv1;:::; vmg, C = fw1;:::; wpg, τ(ui ; vj ; wk ) = aijk • change-of-basis theorem: if 0 m×n×p [τ]A ;B;C = A and [τ]A 0;B0;C 0 = A 2 R ; then A0 = (X T; Y T; Z T) · A 14 extends to arbitrary order • bilinear operators ! mixed 3-tensor of covariant order 2 contravariant order 1 • trilinear functionals ! covariant 3-tensor • recover all transformation rules in definition I covariant d-tensor: 0 T T T A = (X1 ; X2 ;:::; Xd ) · A I contravariant d-tensor: 0 −1 −1 −1 A = (X1 ; X2 ;:::; Xd ) · A I mixed d-tensor: 0 −1 −1 T T A = (X1 ;:::; Xp ; Xp+1;:::; Xd ) · A • definition Á is the easiest definition of a tensor 15 adopted in books c. 1980s 16 issue • many more multilinear maps than there are types of tensors • d = 2: I linear operators ∗ ∗ ∗ ∗ Φ: U ! V; Φ: U ! V ; Φ: U ! V I bilinear functionals ∗ ∗ ∗ ∗ β : U × V ! R; β : U × V ! R; β : U × V ! R • d = 3: I bilinear operators ∗ ∗ ∗ ∗ ∗ B : U × V ! W; B : U × V ! W;:::; B : U × V ! W I trilinear functionals ∗ ∗ ∗ ∗ ∗ τ : U ×V×W ! R; τ : U×V ×W ! R; : : : ; τ : U ×V ×W ! R I more complicated maps Φ1 : U ! L(V; W); Φ2 : L(U; V) ! W; β1 : U × L(V; W) ! R; β2 : L(U; V) × W ! R 17 issue • possibilities increase exponentially with order d • ought to be only as many as types of transformation rules ∗ covariant 2-tensor: Φ : U ! V ; β : U × V ! R ∗ ∗ ∗ contravariant 2-tensor: Φ : U ! V; β : U × V ! R ∗ mixed 2-tensor: Φ : U ! V; β : U × V ! R; ∗ ∗ ∗ Φ: U ! V ; β : U × V ! R • definition  accomplishes this without reference to the transformation rules 18 imperfect fix • only allow W = R • d-tensor of contravariant order p and covariant order d − p is multilinear functional ∗ ∗ ' : V1 × · · · × Vp × Vp+1 × · · · × Vd ! R • excludes vectors, by far the most common 1-tensor • excludes linear operators, by far the most common 2-tensor • excludes bilinear operators, by far the most common 3-tensor • e.g., instead of talking about v 2 V, need to talk about linear functionals f : V∗ ! R 19 multilinear operators are useful • favorite example: higher derivatives of multivariate functions d • but need a norm on M (V1;:::; Vd ; W) d • M (V1;:::; Vd ; W) is itself a vector space • if V1;:::; Vd and W endowed with norms, then kΦ(v1;:::; vd )k kΦkσ := sup v1;:::;vd 6=0 kv1k · · · kvd k d defines a norm on M (V1;:::; Vd ; W) • slightly abused notation: same k · k denote norms on different spaces 20 higher-order derivatives • V, W normed spaces; Ω ⊆ V open • derivative of f :Ω ! W at v 2 Ω is linear operator Df (v): V ! W, kf (v + h) − f (v) − [Df (v)](h)k lim = 0 h!0 khk • since Df (v) 2 L(V; W), apply same definition to Df :Ω ! L(V; W) • get D2f (v): V ! L(V; W) as D(Df ), kDf (v + h) − Df (v) − [D2f (v)](h)k lim = 0 h!0 khk • apply recursively to get derivatives of arbitrary order 2 Df (v) 2 L(V; W); D f (v) 2 L V; L(V; W) ; 3 4 D f (v) 2 L V; L(V; L(V; W)) ; D f (v) 2 L V;L V; L(V; L(V; W)) 21 higher-order derivatives • how to avoid nested spaces of linear maps? • use multilinear maps d−1 d L V;M (V;:::; V; W) = M (V;:::; V; W) d • if Φ : V ! M (V;:::; V; W) linear, then [Φ(h)](h1;:::; hd ) linear in h for fixed h1;:::; hd , d-linear in h1;:::; hd for fixed h • Dd f (v): V × · · · × V ! W may be regarded as multilinear operator • Taylor's theorem 1 f (v + h) = f (v) + [Df (v)](h) + [D2f (v)](h; h) + ··· 2 1 ··· + [Dd f (v)](h;:::; h) + R(h) d! remainder kR(h)k=khkd ! 0 as h ! 0 22 why not hypermatrices • why not choose bases and just treat them as hypermatrices A 2 Rn1×···×nd ? • may not be computable: writing down a hypermatrix given bases in general #P-hard • may not be possible: multilinear maps extend to modules, which may not have bases • may not be useful: even for d = 1; 2, writing down matrix unhelpful when vector spaces have special structures • may not be meaningful: `indices' may be continuous 23 writing down hypermatrix is #P-hard • 0 ≤ d1 ≤ d2 ≤ · · · ≤ dn, generalized Vandermonde matrix 2 d1 d1 d1 3 x1 x2 ::: xn 6 x d2 x d2 ::: x d2 7 6 1 2 n 7 6 .