<<

The Kronecker A Product of the Times

Charles Van Loan

Department of Computer Science

Cornell University

Presented at the SIAM Conference on Applied , Monterey, Califirnia, October 26, 2009 The Kronecker Product

B C is a block whose ij-th block is b C. ⊗ ij E.g.,

b b b11C b12C 11 12 C = b b ⊗    21 22  b21C b22C   Also called the “Direct Product” or the “ Product” Every bijckl Shows Up

c11 c12 c13 b11 b12 c21 c22 c23 b21 b22 ⊗     c31 c32 c33   =

b11c11 b11c12 b11c13 b12c11 b12c12 b12c13  b11c21 b11c22 b11c23 b12c21 b12c22 b12c23  b c b c b c b c b c b c  11 31 11 32 11 33 12 31 12 32 12 33     b c b c b c b c b c b c   21 11 21 12 21 13 22 11 22 12 22 13     b21c21 b21c22 b21c23 b22c21 b22c22 b22c23     b21c31 b21c32 b21c33 b22c31 b22c32 b22c33      Basic Algebraic Properties

(B C)T = BT CT ⊗ ⊗ (B C) 1 = B 1 C 1 ⊗ − − ⊗ − (B C)(D F ) = BD CF ⊗ ⊗ ⊗ B (C D) =(B C) D ⊗ ⊗ ⊗ ⊗ C B = (Perfect Shuffle)T (B C)(Perfect Shuffle) ⊗ ⊗

R.J. Horn and C.R. Johnson(1991). Topics in Matrix Analysis, Cambridge University Press, NY. Reshaping KP Computations

2 Suppose B, C IRn n and x IRn . ∈ × ∈

The y =(B C)x is O(n4): ⊗ y = kron(B,C)*x

The equivalent, reshaped operation Y = CXBT is O(n3):

y = reshape(C*reshape(x,n,n)*B’,n,n)

H.V. Henderson and S.R.Searle (1981). “The Vec-, The Vec Operator, and Kronecker Products, A Review,” Linear and Mulitilinear Algebra, 9, 271–288. Talk Outline

1. The 1800’s Origins: Z

2. The 1900’s

Heightened Profile: ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ 3. The 2000’s Future: ∞ The 1800’s

Z

Products and Deltas

δ ⊗ ij

Leopold Kronecker (1823–1891)

Of course, the contributions go far beyond this... E.T. Bell (1937). Men of Mathematics, Simon and Schuster, New York. K. Hensel (1968). Leopold Kronecker’s Werke, Chelsea Publishing Company, New York. Brief Survey of the Kronecker Delta

T U δijV = δij

1 κ2(δ ) = ij δij

G.H. Golub and C. Van Loan (1996). Matrix Computations, 3rd Ed., Johns Hopkins University Press, Baltimore, Maryland. Acknowledgement

H.V. Henderson, F. Pukelsheim, and S.R. Searle (1983). “On the History of the Kronecker Product,” Linear and 14, 113–120.

Shayle Searle, Professor Emeritus, Cornell University (right) Scandal!

H.V. Henderson, F. Pukelsheim, and S.R. Searle (1983). “On the History of the Kronecker Product,” Linear and Multilinear Algebra 14, 113–120.

Abstract

History reveals that what is today called the Kro- necker product should be called the Zehfuss Product.

This fact is somewhat appreciated by the modern (numerical) linear algebra community: R.J. Horn and C.R. Johnson(1991). Topics in Matrix Analysis, Cambridge University Press, NY, p. 254. A.N. Langville and W.J. Stewart (2004). “The Kronecker product and stochastic automata networks,” J. Computational and Applied Mathematics 167, 429–447. Who Was Zehfuss?

Born 1832.

Obscure professor of mathematics at University of Heidelberg for a while. Then went on to other things.

Wrote papers on ...

G. Zehfuss (1858). “Uber¨ eine gewisse Determinante,” Zeitschrift f¨ur Mathematik und Physik 3, 298–301. Main Result a.k.a. “The Z Theorem”

If B IRm m and C IRn n then ∈ × ∈ × det(B C) = det(B)ndet(C)m ⊗

Modern Proof

Note that In B and Im C are block diagonal and take determinants in ⊗ ⊗ T B C = (B In)(Im C) = P (In B)P (Im C) ⊗ ⊗ ⊗ ⊗ ⊗ where P is a perfect shuffle permutation. Excerpts from Zehfuss(1858)

a a b b c c d d 1A1 1B1 1A1 1B1 1A1 1B1 1A1 1B1

a1 2 a1 2 b1 2 b1 2 c1 2 c1 2 d1 2 d1 2 A B A B A B A B

a2 1 a2 1 b2 1 b2 1 c2 1 c2 1 d2 1 d2 1 A B A B A B A B

a2 2 a2 2 b2 2 b2 2 c2 2 c2 2 d2 2 d2 2 A B A B A B A B ∆ = a a b b c c d d 3 1 3 1 3 1 3 1 3 1 3 1 3 1 3 1 A B A B A B A B

a3 2 a3 2 b3 2 b3 2 c3 2 c3 2 d3 2 d3 2 A B A B A B A B

a4 1 a4 1 b4 1 b4 1 c4 1 c4 1 d4 1 d4 1 A B A B A B A B

a a b b c c d d 4A2 4B2 4A2 4B2 4A2 4B2 4A2 4B2

Excerpts from Zehfuss(1858)

a1 b1 c1 d1

a2 b2 c2 d2 1 1 p = und P = A B

a3 b3 c3 d3 2 2 A B

a b c d 4 4 4 4

2 4 ∆2 2,2 = p4P2

M m ∆2 Mm = p P Hensel (1891)

Student in Berlin 1880-1884.

Maintains that Kronecker presented the Z-theorem in his lectures.

K. Hensel (1891). “Uber¨ die Darstellung der Determinante eines Systems, welches aus zwei anderen componirt ist,” ACTA Mathematica 14, 317–319. The 1900’s

⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ Muir (1911)

Attributes the Z-theorem to Zehfuss.

Calls det(B C) the “Zehfuss .” ⊗

T. Muir (1911). The Theory of Determinants in the Historical Order of Development, Vols 1-4, Dover, NY. Rutherford(1933)

Q. When are two Zehfuss matrices equal?

??? B C F G ⊗ = ⊗

Subscripting from zero, if B (m n ), C (mc nc), F (m n ), b× b × f × f G (mg ng), then (B C) = (F G) means × ⊗ ij ⊗ ij

B(floor(i/mc), floor(j/nc)) C(i mod mc,j mod nc) · = F (floor(i/mg), floor(j/ng)) G(i mod mg,j mod ng) ·

D.E. Rutherford (1933). “On the Condition that Two Zehfuss Matrices are Equal,” Bull. Amer. Math. Soc. 39, 801-808. Z Why? → ⊗ “...a series of influential texts at and after the turn of the century permanently associated Kronecker’s name with the “ ” product and this terminology is nearly universal to- ⊗ day.” Horn and Johnson (1991)

“...the textbook of Scott and Matthews (1904) which ap- peared four years after the publication of Rados’ paper, gave new life to the old error. This was probably due to the teaching of Pascal, whose second edition (1923) still propagates the error [of the first edition (1897).]” Muir (1927) Heightened Profile Beginning in the 60s

Some Reasons

Regular Grids Tensoring Low Dimension Ideas Higher Order Statistics Fast Transforms Preconditioners Quantum Computing Tensor Decompositions/Approximations

C. Van Loan (2000). “The Ubiquitous Kronecker Product,” Journal of Computational and Applied Math- ematics,, 85-100. Regular Grids

(M +1)-by-(N +1) discretization of the Laplacian on a rectangle...

A = I T + T I M ⊗ N M ⊗ N

2 1 0 0 0 1− 2 1 0 0  − −  T = 0 1 2 1 0 5 − −  0 0 1 2 1   − −   0 0 0 1 2   −    F.W. Dorr (1970). “The Direct Solution of the Discrete Poisson Equation on a Rectangle,” SIAM Review 12, 248–263. G.H. Golub and C.F. Van Loan (1996). Matrix Computations, 3rd Ed, Johns Hopkins University Press, Baltimore, MD. Tensoring Low Dimension Ideas

b n T f(x) dx wi f(xi) = w f(x) a ≈ Z Xi=1

b b b nx ny nz 1 2 3 (x) (y) (z) f(x,y,z) dxdydz wi wj wk f(xi, yj,zk) a1 a2 a3 ≈ Z Z Z Xi=1 Xj=1 Xk=1 =(w(x) w(y) w(z))T f(x y z) ⊗ ⊗ ⊗ ⊗

A. Graham (1981). Kronecker Products and with Applications, Ellis Horwood Ltd, Chich- ester, England. Higher Order Statistics

E(xxT )

E(x⇓ x) ⊗ E(x x ⇓ x) ⊗ ⊗ ··· ⊗ Kronecker powers:

kA = A A A (k times) ⊗ ⊗ ⊗ ··· ⊗

T.F. Andre, R.D. Nowak, and B.D. Van Veen (1997). “Low Estimation of Higher Order Statistics,” IEEE Trans. Signal Processing 45, 673–685. Fast Transforms

FFT F P = B (I B )(I B )(I B ) 16 16 16 2 ⊗ 8 4 ⊗ 4 8 ⊗ 2

10 1 0 0 1 0 ω4 B =   ωn = exp( 2πi/n) 4 1 0 1 0 − −  0 1 0 ω   − 4   

J. Granata, M. Conner, and R. Tolimieri (1992). “Recursive Fast Algorithms and the Role of the ,” IEEE Transactions on Signal Processing 40, 2921–2930. C. Van Loan(1992). Computational Frameworks for the Fast Fourier Transform, SIAM Publications, Philadelphia, PA. Fast Transforms Cont’d

Haar Wavelet Transform

1 1 W I if m> 1 m ⊗ 1 m ⊗ 1 W2m =      −  .  [1] if m = 1  Fast Gauss Transform 2 g = exp( s t /δ) G = Gnear + G ij −k j − i k2 ⇒ far Gnear involves a Kronecker Product

G. Strang(1993). “Wavelet Transforms Versus Fourier Transforms,” Bulletin of the American Mathematical Association, 28, 288–305. X. Sun and Y. Bao (2003). “A Kronecker Product Represenation of the Fast Gauss Transform,” SIAM J. Matrix Anal. Appl., 24, 768–786. Preconditioners

If A B C, then B C has potential as a preconditioner. ≈ ⊗ ⊗ It captures the essence of A.

It is easy to solve (B C)z = r. ⊗

Good Example: A band block Toeplitz with banded Toeplitz blocks. B and C chosen to be band Toeplitz.

J. Kamm and J.G. Nagy (2000). “Optimal Kronecker Product Approximation of Block Toeplitz Matrices,” SIAM J. Matrix Anal. and Appl., 22, 155–172. J. Nagy and M. Kilmer (2006). “Kronecker Product Approximation for Three-Dimensional Imaging Ap- plications,” IEEE Trans. Image Proc. 15, 604-613. Quantum Computing

Filled with Kronecker powers of 2-by-2’s.

n 1 1 1 H ⊗ = H H HH = ⊗ ⊗ ··· ⊗ √2 1 1  − 

N.D. Mermin (2007). Quantum Computer Science, Cambridge University Press, Cambridge, England. Tensor Decompositions/Approximation

E.g. Given = (1:n, 1:n, 1:n, 1:n), find orthogonal A A Q = q qn 1 ··· U =  u1 un  ··· V =  v1 vn  ··· W =  w1 wn  ··· and a “core tensor” σ so   n vec( ) σ w v u q A ≈ ijk,` i ⊗ j ⊗ k ⊗ ` i,j,k,`X=1

T. Kolda and B. Bader (2009). “Tensor Decompositions and Applications,” SIAM Review 51, 455–500. Descendants

1. The Left Kronecker Product 2. The Hadamard Product 3. The Tracy-Singh Product 4. The Khatri-Rao Product 5. The Generalized Kronecker Product 6. The Strong Kronecker Product 7. The Symmetric Kronecker Product 8. The Bi-Alternate Product Left Kronecker Product

Definition:

Left c11 c12 c11B c12B B = = C B ⊗ " c21 c22 # " c21B c22B # ⊗ Fact:

If B IRmb nb and C IRmc nc then ∈ × ∈ ×

Left B C = ΠT (B C)Π ⊗ mc, mbmc ⊗ nc,nbnc Perfect Shuffles ↑ ↑

F.A. Graybill(1969). Introduction to Matrices with Applications in Statistics, Wadsworth, Belmont, CA. The Hadamard Product

Definition:

b11 b12 c11 c12 b11c11 b12c12 Had b b c c = b c b c  21 22  ⊗  21 22   21 21 22 22  b b c c b c b c  31 32   31 32   31 31 32 32       

Had B C = B. C ⊗ ∗

A. Smilde, R. Bro, and P. Geladi (2004). Multiway Analysis, John Wiley, Chichester, England. The Hadamard Product

Fact:

If A˜ = B C and B, C IRm n , then ⊗ ∈ × Had B C = A˜(1:(m+1):m2, 1:(n+1):n2) ⊗ E.g.,

b11c11 b11c12 b12c11 b12c12 b11c21 b11c22 b12c21 b12c22  b c b c b c b c  b b c c 11 31 11 32 12 31 12 32 11 12 11 12  b c b c b c b c   21 11 21 12 22 11 22 12   b21 b22   c21 c22  =  b21c21 b21c22 b22c21 b22c22  ⊗    b c b c b c b c   b b   c c   21 31 21 32 22 31 22 32   31 32   31 32   b c b c b c b c       31 11 31 12 32 11 32 12   b c b c b c b c   31 21 31 22 32 21 32 22   b c b c b c b c   31 31 31 32 32 31 32 32    The Tracy-Singh Product

Definition: B11 B12 C11 C12 B =  B21 B22  C = " C21 C22 #  B B   31 32   

B11 C11 B11 C12 B12 C11 B12 C12  ⊗ ⊗ ⊗ ⊗  B C B C B C B C  11 ⊗ 21 11 ⊗ 22 12 ⊗ 21 12 ⊗ 22      TS  B21 C11 B21 C12 B22 C11 B22 C12  B C =  ⊗ ⊗ ⊗ ⊗  ⊗    B21 C21 B21 C22 B22 C21 B22 C22   ⊗ ⊗ ⊗ ⊗       B31 C11 B31 C12 B32 C11 B32 C12   ⊗ ⊗ ⊗ ⊗     B31 C21 B31 C22 B32 C21 B32 C22   ⊗ ⊗ ⊗ ⊗    D.S. Tracy and R.P. Singh (1972). “A New Matrix Product and Its Applications in Partitioned Matrices,” Statistica Neerlandica 26, 143–157. The Khatri-Rao Product

Definition:

B11 B12 C11 C12

B =  B21 B22  C =  C21 C22   B B   C C   31 32   31 32     

B11 C11 B12 C12 K-R ⊗ ⊗ B C =   ⊗ B21 C21 B22 C22  ⊗ ⊗     B C B C   31 ⊗ 31 32 ⊗ 32    C.R. Rao and S.K. Mitra (1971). Generalized Inverse of Matrices and Applications, John Wiley and Sons, New York. A. Smilde, R. Bro, and P. Geladi (2004). Multiway Analysis, John Wiley, Chichester, England. The Khatri-Rao Product Fact: KR TS B C is a submatrix of B C ⊗ ⊗

B11 C11 B11 C12 B12 C11 B12 C12  ⊗ ⊗ ⊗ ⊗  B C B C B C B C  11 ⊗ 21 11 ⊗ 22 12 ⊗ 21 12 ⊗ 22       B11 C31 B11 C32 B12 C31 B12 C32   ⊗ ⊗ ⊗ ⊗     B21 C11 B21 C12 B22 C11 B22 C12   ⊗ ⊗ ⊗ ⊗  TS   B C =  B C B C B C B C  ⊗  21 21 21 22 22 21 22 22   ⊗ ⊗ ⊗ ⊗     B21 C31 B21 C32 B22 C31 B22 C32   ⊗ ⊗ ⊗ ⊗     B C B C B C B C   31 ⊗ 11 31 ⊗ 12 32 ⊗ 11 32 ⊗ 12       B31 C21 B31 C22 B32 C21 B32 C22   ⊗ ⊗ ⊗ ⊗     B31 C31 B31 C32 B32 C31 B32 C32   ⊗ ⊗ ⊗ ⊗    The Generalized Kronecker Product

Left B C(1, :)  1 ⊗       Left  B1  B2 C(2, :)   ⊗  B gen    2  C =   B3 ⊗      Left   B4   B C(3, :)   3 ⊗               Left   B4 C(4, :)]   ⊗       

Regalia and Mitra (1989), “Kronecker Products, Unitary Matrices, and Signal Processing Applications,” SIAM Review, 31, 586–613. The Generalized Kronecker Product

B1 gen B1 C1  B2 ⊗  B2 GEN C1       =    B3  ⊗ C2        B3 gen  B4 C2 B4 ⊗               The Strong Kronecker Product

A multiplication, but with Kronecker Products in- stead matrix-matrix products, e.g.,

B11 B12 Strong C11 C12   ⊗   B21 B22 C21 C22   =   B C + B C B C + B C 11 ⊗ 11 12 ⊗ 21 11 ⊗ 12 12 ⊗ 22  B C + B C B C + B C  21 ⊗ 11 22 ⊗ 21 21 ⊗ 12 22 ⊗ 22  

W. De Launey and J. Seberry (1994), “The Strong Kronecker Product,” Journal of Combinatorial Theory, Series A 66, 192–213. The Symmetric Kronecker Product

The KP turns matrix equations into vector equations: CXBT = G (B C) vec(X) = vec(G) ⇔ ⊗ The symmetric Kronecker product does the same thing for matrix equations with symmetric solutions: 1 T T 2 CXB + BXC = G (symmetric)

⇔  sym (B C) svec(X) = svec(G) ⊗ where

x11 x12 x13 T svec x12 x22 x23 = x11 √2x12 x22 √2x13 √2x23 x33  x x x  13 23 33     F. Alizadeh, J-P.A. Haeberly, and M.L. Overton (1998). “Primal-Dual Interior Point Methods for Semidef- inite Programming: Convergence Rates, Stability, and Numerical Results,” SIAM J. Optimization 8, 746– 768. Symmetric Kronecker Product

Fact: If

100000 0 α 0 0 00  0 0 0 α 0 0   0 α 0 0 00    P =  001000  α = 1/√2    00 0 0 α 0     0 0 0 α 0 0     00 0 0 α 0     0 0 α 0 0 1    then vec(X)= P svec(X) and  · sym T B C = P (B C)P ⊗ ⊗

H.V. Henderson and S.R. Searle(1998). “Vec and Vech Operators for Matrices, with Some uses in Jacobians and Multivariate Statistics,” The Canadian Journal of Statistics 7, 65–81. Bi-Alternate Product

Bi-Alt 1 B C = (B C + C B) ⊗ 2 ⊗ ⊗

W. Govaerts (2000). Numerical Methods for Bifurcations of Dynamical Equilibria, SIAM Publications, Philadelhia, PA. The 2000’s Three Predictions

∞ Big N Will Mean Big d Will Mean KP

a a b b z z A = 11 12 11 12 11 12 a a ⊗ b b ⊗ ··· ⊗ z z  21 22   21 22   21 22 

N = 2d

G. Beylkin and M.J. Mohlenkamp (2005). “Algorithms for Numerical Analysis in High Dimensions,” SIAM J. Scientific Computation 26, 2133–2159. Inevitable: Block Tensor → →

Tensor-level thinking will require an ability to spot KP’s. E.g., if for all 1 m n we have, ≤ i ≤ (m , m , m , m ) B 1 2 3 4 = n W (i , m )Y (i , m )X(i , m )Z(i , m ) (i , i , i , i ) 1 1 2 2 3 3 4 4 A 1 2 3 4 i1,i2X,i3,i4=1 then B = (W Y )T A(X Z) ⊗ ⊗

NSF Workshop on Future Directions in Tensor-Based Computation and Modeling, 2009. http://www.cs.cornell.edu/cv/TenWork/FinalReport.pdf. Data-Sparse Approximate Factorizations

New KP-based factorizations will widen the set of solvable huge problems.

Sample factorization...

A (B C )(B C )(B C ) ≈ 1 ⊗ 1 2 ⊗ 2 3 ⊗ 3 ··· Det(Log(A)) via Zehfuss

If A (B C)(D E)(F G) ≈ ⊗ ⊗ ⊗ ··· then the big log det problem becomes a bunch of smaller ones...

nc log(det(B)) + nb log(det(C)) +

log(det(A)) ne log(det(D)) + n log(det(E)) + ≈ d ng log(det(F )) + n log(det(G)) f ···

R.P. Barry and R.K. Pace (1999). “Monte Carlo Estimates of the Log Determinant of Large Sparse Matrices,” Linear Algebra and Its Applications 289, 41–54.