Graph Notation for Arrays

Hans G. Ehrbar Economics Department, University of Utah 1645 Campus Center Drive Salt Lake City, UT 84112-9300 U.S.A. +1 801 581 7797 [email protected]

July 24, 2000

Abstract avoided in the applied sciences because of the dif- ficulties to write them on a two-dimensional sheet A graph-theoretical notation for array concatenation of paper. The following symbolic notation makes represents arrays as bubbles with arms sticking out, the structure of arrays explicit without writing them each arm with a specified number of “fingers.” Bub- down element by element. It is hoped that this makes bles with one arm are vectors, with two arms ma- arrays easier to understand, and that this notation trices, etc. Arrays can only hold hands, i.e., “con- leads to simple high-level user interfaces for pro- tract” along a given pair of arms, if the arms have gramming languages manipulating arrays. the same number of fingers. There are three array concatenations: outer product, contraction, and di- rect sum. Special arrays are the unit vectors and the 2 Informal Survey of the Notation diagonal array, which is the branching point of sev- eral arms. Outer products and contractions are inde- Each array is symbolized by a rectangular tile with pendent of the order in which they are performed and arms sticking out, similar to a molecule. Tiles with distributive with respect to the direct sum. Examples one arm are vectors, those with two arms matrices, are given where this notation clarifies mathematical those with more arms are arrays of higher rank (or proofs. “valence” as in [SS35], [Mor73], and [MS86, p. 12]), and those without arms are scalars. The arrays con- 1 Introduction sidered here are rectangular, not “ragged,” therefore in addition to their rank we only need to know the Besides scalars, vectors, and matrices, also higher ar- dimension of each arm; it can be thought of as the rays are necessary in ; for instance, the “co- number of fingers associated with this arm. Arrays variance ” of a is really an ar- can only hold hands (i.e., “contract” along two arms) ray of rank 4, etc. Usually, such higher arrays are if the hands have the same number of fingers.

1 Sometimes it is convenient to write the dimen- The trace of a square matrix Q is the sion of each arm at the end of the arm, i.e., a m n Q matrix A can be represented as m A ×n . concatenation , which is a scalar since Matrix products are represented by joining the ob- no arms are sticking out. In general, concatena- vious arms: if B is n q, then the matrix prod- tion of two arms of the same tile represents con- × uct AB is m A n B q or, in short, traction, i.e., summation over equal values of the indices associated with these two arms. This no- A B . The notation allows the reader to tation makes it obvious that trXY = trYX, be- always tell which arm is which, even if the arms are cause by definition there is no difference be- not marked. If m C r is m r, then the × tween X Y and Y X . Also product C>A is

X Y or X Y etc. represent the

C>A = r C A n same array (here array of rank zero, i.e., scalar). m Each of these tiles can be evaluated in essentially two different ways. One way is = r C m A n . (1) 1. Juxtapose the tiles for X and Y, i.e., form their In the second representation, the tile representing C outer product, which is an array of rank 4 with is turned by 180 degrees. Since the white part of typical element xmpyqn. the frame of C is at the bottom, not on the top, one 2. Connect the East arm of X with the West arm knows that the West arm of C, not its East arm, is of Y. This is a contraction, resulting in an array concatenated with the West arm of A. The of rank 2, the matrix product XY, with typical of m C r is r C m , i.e., it is not element ∑p xmpypn. a different entity but the same entity in a different po- sition. The order in which the elements are arranged 3. Now connect the West arm of X with the East on the page (or in computer memory) is not a part of arm of Y. The result of this second contraction the definition of the array itself. Likewise, there is no is a scalar, the trace trXY = ∑p,m xmpypm. distinction between row vectors and column vectors. An alternative sequence of operations evaluating this Vectors are usually, but not necessarily, written in same graph would be such a way that their arm points West (column vector convention). If a and b are vectors, their 1. Juxtapose the tiles for X and Y. scalar product a>b is the concatenation a b 2. Connect the West arm of X with the East arm of which has no free arms, i.e., it is a scalar, and their Y to get the matrix product YX. outer product ab is a b , which is a > 3. Now connect the East arm of X with the West matrix. Juxtaposition of tiles represents the outer arm of Y to get trYX. product, i.e., the array consisting of all the prod- ucts of elements of the arrays represented by the tiles The result is the same, the notation does not spec- placed side by side. ify which of these alternative evaluation paths is

2 meant, and a computer receiving commands based and the “Hadamard product” (element-by-element on this notation can choose the most efficient evalua- product) of two vectors x y as tion path. Probably the most efficient evaluation path ∗ is given by (13) below: take the element-by-element x product of X with the transpose of Y, and add all the x y = ∆ . (5) ∗ elements of the resulting matrix. y If the user specifies tr(XY), the computer is locked into one evaluation path: it first has to compute the The graphical representation of all these different matrix product XY, even if X is a column vector and matrix operations uses only a small number of Y a row vector and it would be much more efficient atomic operations, which will be enumerated in Sec- to compute it as tr(YX), and then form the trace, i.e., tion 3, and each such graph can be evaluated in a throw away all off-diagonal elements. If the trace number of different ways. In principle, each graph can be evaluated as follows: form the outer prod- X Y is specified as , the computer can choose uct of all arrays involved, and then contract along all the most efficient of a number of different evaluation those pairs of arms which are connected. For prac- paths transparently to the user. This advantage of the tical implementations it is more efficient to develop graphical notation is of course even more important functions which connect two arrays along one or sev- if the graphs are more complex. eral of their arms without first forming outer prod- There is also the “diagonal” array, which in the ucts, and to add the arrays one by one, performing case of rank 3 can be written all contractions as early as possible. n ∆ n n or ∆ n (2) 3 Axiomatic Development of Array n n Operations or similar configurations. It has 1’s down the main The following sketch shows how this axiom system diagonal and 0’s elsewhere. It can be used to con- might be built up. I apologize for presenting a half- struct the diag(x) of a vector (the finished theory, and thank the conference organiz- square matrix with the vector in the diagonal and ze- ers and referees for allowing me to do this. Since ros elsewhere) as I am an economist I do not plan to develop the mate- n ∆ n rial presented here any further. Others are invited diag(x) = , (3) to take over. If you are interested in working on x this, I would be happy to hear from you; email me at [email protected] the diagonal vector of a square matrix (i.e., the vector There are two kinds of special arrays: unit vectors containing its diagonal elements) as and diagonal arrays. For every natural number m 1, m unit vectors ≥ m i (i = 1,...,m) exist. Despite the fact that ∆ A , (4) the unit vectors are denoted here by numbers, there is no intrinsic ordering among them; they might as

3 well have the names “red, green, blue, ... ” (From the direct sum uniquely: (9) and other axioms below it will follow that each unit vector can be represented as a m-vector with 1 as one of the components and 0 elsewhere.)

m m For every rank 1 and dimension n 1 there is ≥ ≥ r a unique diagonal array denoted by ∆. Their main A n = r S n M i properties are (6) and (7). (This and the other ax- i=1 ⇒ ioms must be formulated in such a way that it will q q be possible to show that the diagonal arrays of rank 1 are the “vectors of ones” ι which have 1 in every m m component; diagonal arrays of rank 2 are the identity matrices; and for higher ranks, all arms of a diagonal i r S n = Ai n . array have the same dimension, and their i jk ele- ··· ment is 1 if i = j = k = and 0 otherwise.) Perhaps q q it makes sense to define··· the diagonal array of rank 0 and dimension n to be the scalar n, and to declare all arrays which are everywhere 0-dimensional to be diagonal.

It is impossible to tell which is the first summand and which the second, direct sum is an operation defined There are only three operations of arrays: their on finite sets of arrays (where different elements of outer product, represented by writing them side by a set may be equal to each other in every respect but side, contraction, represented by the joining of arms, still have different identities). and the direct sum, which will be defined now: There is a broad rule of associativity: the order in which outer products and contractions are performed The direct sum is the operation by which a vector does not matter, as long as the at the end, the right can be built up from scalars, a matrix from its row arms are connected with each other. And there are or column vectors, an array of rank 3 from its lay- distributive rules involving (contracted) outer prod- ers, etc. The direct sum of a set of r similar arrays ucts and direct sums. (i.e., arrays which have the same number of arms, and corresponding arms have the same dimensions) Additional rules apply for the special arrays. If is an array which has one additional arm, called the two different diagonal arrays join arms, the result reference arm of the direct sum. If one “saturates” is again a diagonal array. For instance, the follow- the reference arm with the ith unit vector, one gets ing three concatenations of diagonal three-way ar- the ith original array back, and this property defines rays are identical, and they all evaluate to the (for a

4 given dimension) unique diagonal array or rank 4: addition of arrays is not an axiom because it can be derived: if one saturates the reference arm of a direct sum with the vector of ones, one gets the element-by- ∆ element sum of the arrays in this direct sum. Multi- = ∆ ∆ = plication of an array by a scalar is also contained in ∆ the above system of axioms: it is simply the outer product with an array of rank zero.

Problem 1. Show that the saturation of an arm of a ∆ ∆ diagonal array with the vector of ones is the same as ∆ = = (6) dropping this arm.

Answer. Since the vector of ones is the diagonal array of rank 1, this is a special case of the general concantenation rule for The diagonal array of rank 2 is neutral under con- diagonal arrays. catenation, i.e., it can be written as

n ∆ n = . (7) Problem 2. Show that the diagonal matrix of the vector of ones is the , i.e., because attaching it to any array will not change this array. (6) and (7) make it possible to represent di- n ∆ n agonal arrays simply as the branching points of sev- = . (10) eral arms. This will make the array notation even ι simpler. However in the present introductory article, all diagonal arrays will be shown explicitly, and the Answer. In view of (7), this is a special case of Problem 1. vector of ones will be denoted m ι instead of m ∆ or perhaps m δ . Problem 3. A trivial array operation is the addition Unit vectors concatenate as follows: of an arm of dimension 1; for instance, this is how 1 if i = j a n-vector can be turned into a n 1 matrix. Is this i m j = ( (8) operation contained in the above system× of axioms? 0 otherwise. Answer. It is a special case of the direct sum: the direct sum of and the direct sum of all unit vectors is the diagonal one array only, the only effect of which is the addition of the array of rank 2: reference arm.

n From (9) and (7) follows that every array of rank i n = n ∆ n = . (9) M k can be represented as a direct sum of arrays of i=1 rank k 1, and recursively, as iterated direct sums of I am sure there will be modifications if one works those scalars− which one gets by saturating all arms it all out in detail, but if done right, the number of with unit vectors. Hence the following “extension- axioms should be fairly small. Element-by-element ality property”: if the arrays A and B are such that

5 for all possible conformable choices of unit vectors informed by category theory, it is an implication; it κ1 κ8 follows comes at the end, not the beginning. ··· Instead of considering arrays as bags filled with

κ3 κ4 κ5 elements, with the associated false problem of spec- ifying the order in which the elements are packed

κ2 A κ6 = into the bag, this notation and system of axioms con- sider each array as an abstract entity, associated with a certain finite graph. These entities can be operated κ1 κ8 κ7 on as specified in the axioms, but the only time they κ3 κ4 κ5 lose their abstract character is when they are fully saturated, i.e., concatenated with each other in such = κ2 B κ6 (11) a way that no free arms are left: in this case they be- come scalars. An array of rank 1 is not the same as a

κ1 κ8 κ7 vector, although it can be represented as a vector— after an ordering of its elements has been specified. then A = B. This is why the saturation of an array This ordering is not part of the definition of the array with unit vectors can be considered one of its “ele- itself. (Some vectors, such as time series, have an ments,” i.e., intrinsic ordering, but I am speaking here of the sim- plest case where they do not.) Also the ordering of κ3 κ4 κ5 the arms is not specified, and the order in which a set of arrays is packed into its direct sum is not specified κ κ either. These axioms therefore make a strict distinc- 2 A 6 = aκ1κ2κ3κ4κ5κ6κ7κ8 . (12) tion between the abstract entities themselves (which

κ1 κ8 κ7 the user is interested in) and their various represen- tations (which the computer worries about). From (8) and (9) follows that the concatenation of Maybe the following examples may clarify these two arrays by joining one or more pairs of arms con- points. If you specify a set of colors as sists in forming all possible products and summing red,green,blue , then this representation has an or- { } over those subscripts (arms) which are joined to each dering built in: red comes first, then green, then other. For instance, if blue. However this ordering is not part of the def- inition of the set; green,red,blue is the same set. m A n B r = m C r , The two notations{ are two different} representations of the same set. Another example: mathematicians usually distinguish between the outer products A B n ⊗ then cµρ = ∑ν=1 aµνbνρ. This is one of the most ba- and B A; there is a “natural isomorphism” between sic facts if one thinks of arrays as collections of el- them but⊗ they are two different objects. In the system ements. From this point of view, the proposed no- of axioms proposed here these two notations are two tation is simply a graphical elaboration of Einstein’s different representations of the same object, as in the summation convention. But in the holistic approach set example. This object is represented by a graph taken by the proposed system of axioms, which is which has A and B as nodes, but it is not apparent

6 from this graph which node comes first. Interesting the reader, are white on the opposite side and vice conceptual issues are involved here. The proposed versa. If one flips a tile, the arms appear in a mirror- axioms are quite different than e.g. [Mor73]. symmetric manner. For a matrix, flipping over is equivalent to turning by 180 degrees, i.e., there is Problem 4. The trace of the product of two matrices no difference between the matrix A and the can be written as tr(XY) = ι>(X Y >)ι. Use tile notation to show that this gives indeed∗ tr(XY). matrix A . Since sometimes one and some- times the other notation seems more natural, both Answer. In analogy with (5), the Hadamard product of the two will be used. For higher arrays, flipping over ar- matrices X and Z, i.e., their element by element multiplication, is ranges the arms in a different fashion, which is some- times convenient in order to keep the graphs unclut- X tered. It will be especially useful for differentiation. X Z = ∆ ∆ ∗ If one allows turning in 90 degree increments and Z flipping, each array can be represented in eight dif- ferent positions, as shown here with a hypothetical If Z = Y >, one gets array of rank 3:

X k m n m X Y > = ∆ ∆ . ∗ L n L k L Y k L n n m m k ι>(X Y >)ι ∗ Therefore one gets, using (10): n m m k k L n L L X n L k ι ∆ ∆ ι = m k m n Y The black-and-white pattern at the edge of the tile X indicates whether and how much the tile has been = = tr(XY) (13) turned and/or flipped over, so that one can keep track Y which arm is which. In the above example, the arm with dimension k will always be called the West arm, whatever position the tile is in. 4 An Additional Notational Detail 5 Equality of Arrays and Extended Besides turning a tile by 90, 180, or 270 degrees, Substitution the notation proposed here also allows to flip the tile over. The tile (here drawn without its arms) is Given the flexibility of representing the same array in simply the tile laid on its face; i.e., those parts various positions for concatenation, specific conven- of the frame, which are black on the side visible to tions are necessary to determine when two such ar-

7 rays in generalized positions are equal to each other. is symmetric according to this definition, and so is Expressions like every scalar. Also the notion of a nonnegative defi- nite matrix, or of a matrix inverse or generalized in- verse, or of a , can be extended to A = B or K = K arrays in this way. are not allowed. The arms on both sides of the equal sign must be parallel, in order to make it clear which 6 Vectorization and Kronecker arm corresponds to which. A permissible way to Product write the above expressions would therefore be

One conventional generally accepted method to deal A = B with arrays of rank > 2 is the . If A and B are both matrices, then the outer product in tile notation is and A (14) K = K B

Since this is an array of rank 4, there is no natural One additional benefit of this tile notation is the way to write its elements down on a sheet of pa- ability to substitute arrays with different numbers of per. This is where the Kronecker product steps in. arms into an equation. This is also a necessity since The Kronecker product of two matrices is their outer the number of possible arms is unbounded. This product written again as a matrix. Its definition in- multiplicity can only be coped with because each cludes a protocol how to arrange the elements of an arm in an identity written in this notation can be re- array of rank 4 as a matrix. Alongside the Kronecker placed by a bundle of many arms. product, also the vectorization operator is useful, Extended substitution also makes it possible to which is a protocol how to arrange the elements of extend definitions familiar from matrices to higher a matrix as a vector, and also the so-called “commu- arrays. For instance we want to be able to say tation matrices” may become necessary. Here are the Ω that the array is symmetric if and only relevant definitions: if Ω = Ω . This notion of symme- try is not limited to arrays of rank 2. The arms of this array may symbolize not just a single arm, but 6.1 Vectorization of a Matrix whole bundles of arms; for instance an array of the If A is a matrix, then vec(A) is the vector obtained form Σ satisfying Σ = Σ by stacking the column vectors on top of each other,

8 i.e., Answer. Assume A is k m, B is m n, and C is n p. Write × × × a1>  .  a1 A = . and B = b1 bn . Then (C> A)vecB =      ···  ⊗ . a  if A = a1 an then vec(A) = . . k>  ···    an c11A c21A cn1A  ···  b1 c12A c22A cn2A (15) ···  .  =  . . . .  . =  . . .. .    bn The vectorization of a matrix is merely a different c1pA c2pA cnpA ··· arrangement of the elements of the matrix on paper, c11a1>b1 + c21a1>b2 + + cn1a1>bn  ···  just as the transpose of a matrix. c11a>b1 + c21a>b2 + + cn1a>bn 2 2 ··· 2  .   .    ( ) = ( )   Problem 5. Show that tr B>C vecB > vecC.  c11ak>b1 + c21ak>b2 + + cn1ak>bn   ···   c12a1>b1 + c22a1>b2 + + cn2a1>bn   ···   c12a2>b1 + c22a2>b2 + + cn2a2>bn  Answer. Both sides are ∑b c . (26) is a proof in tile notation  ···  ji ji  .  which does not have to look at the matrices involved element by =  . .   element. c a b + c a b + + c a b   12 k> 1 22 k> 2 n2 k> n   . ···   .   .     c1pa1>b1 + c2pa1>b2 + + cnpa1>bn   ···  6.2 Kronecker Product of Matrices  c1pa>b1 + c2pa>b2 + + cnpa>bn   2 2 ··· 2   .   .  Let A and B be two matrices, say A is m n and B is   × c1pa b1 + c2pa b2 + + cnpa bn r q. Their Kronecker product A B is the mr nq k> k> ··· k> matrix× which in partitioned form can⊗ be written× One obtains the same result by vectorizing the matrix ABC =

a1>b1 a1>b2 a1>bn c11 c12 c1p  ···  ···  a B a B a b1 a b2 a bn c21 c22 c2p 11 1n 2> 2> ··· 2> ···  . ···. .   . . . .  . . . .  A B = . .. . (16)  . . .. .  . . .. .  ⊗ . .      a b a b a b cn1 cn2 cnp am1B amnB k> 1 k> 2 k> n ··· ··· ··· a1>b1c11 + a1>b2c21 + + a1>bncn1  ··· ··· a b1c11 + a b2c21 + + a bncn1 This convention of how to write the elements of an 2> 2> ··· 2> ··· =  . . array of rank 4 as a matrix is not symmetric, so that  . ..  usually A C = C A. Both Kronecker products ak>b1c11 + ak>b2c21 + + ak>bncn1 ⊗ 6 ⊗ ··· ··· represent the same abstract array, but they arrange a b1c12 + a b2c22 + + a bncn2 ··· 1> 1> ··· 1> ··· a b1c12 + a b2c22 + + a bncn2 it differently on the page. However, in many other ··· 2> 2> ··· 2> ··· . . . respects, the Kronecker product maintains the prop- .. . .. erties of outer products. a b1c12 + a b2c22 + + a bncn2 ··· k> k> ··· k> ··· a1>b1c1p + a1>b2c2p + + a1>bncnp + ··· ···  Problem 6. [JHG 88, p. 965] Show that a b1c1p + a b2c2p + + a bncnp ··· 2> 2> ··· 2> . . . .. .   vec(ABC) = (C> A)vec(B). (17) a b1c1p + a b2c2p + + a bncnp ⊗ ··· k> k> ··· k>

9 The main challenge in this automatic proof is to fit the many and B with members of a certain family of three-way matrix rows, columns, and single elements involved on the same arrays Π(i, j): sheet of paper. Among the shuffling of matrix entries, it is easy to lose track of how the result comes about. Later, in equation (27), a compact and intelligible proof will be given in tile nota- mr A B nq = tion. ⊗ m A n = mr Π Π nq 6.3 The Commutation Matrix r B q (21) Besides the Kronecker product and the vectorization operator, also the “commutation matrix” [MN88, pp. Π(m,r) 46/7], [Mag88, p. 35] is needed for certain operations Strictly speaking we should have written and Π(n,q) Π involving arrays of higher rank. Assume A is m n. for the two -arrays in (21), but the super- Then the commutation matrix K(m,n) is the mn ×mn scripts can be inferred from the context: the first su- matrix which transforms vecA into vec(A ): × perscript is the dimension of the Northeast arm, and > the second that of the Southeast arm. (m,n) K vecA = vec(A>) (18) Vectorization uses a member of the same family Π(m,n) to convert the matrix n A m into The main property of the commutation matrix is that the vector it allows to commute the Kronecker product: For any m n matrix A and r q matrix B follows m × × mn vecA = mn Π A (22) K(r,m)(A B)K(n,q) = B A (19) ⊗ ⊗ n

Problem 7. Use (18) to compute K(2,3). This equation is a little awkward because the A is here a n m matrix, while elsewhere it is a m n × × Answer. matrix. It would have been more consistent with 1 0 0 0 0 0 the lexicographical ordering used in the Kronecker 0 0 1 0 0 0 product to define vectorization as the stacking of the ( , ) 0 0 0 0 1 0 row vectors; then some of the formulas would have K 2 3 =   (20) 0 1 0 0 0 0   looked more natural. 0 0 0 1 0 0   m 0 0 0 0 0 1 The array Π(m,n) = mn Π exists for n every m 1 and n 1. The dimension of the West 6.4 Kronecker Product and Vectorization arm is always≥ the product≥ of the dimensions of the in Tile Notation two East arms. The elements of Π(m,n) will be given in (28) below; but first I will list three important The Kronecker product of m A n and properties of these arrays and give examples of their r B q is the following concatenation of A application.

10 First of all, each Π(m,n) satisfies Here is the answer to Problem 5 in tile notation:

Π Π m m m m trB>C = B C = B C =

Π mn Π = . = vecB vecC (26)

n n n n = (vecB)> vecC (23) Equation (23) was central for obtaining the result. The answer to Problem 6 also relies on equation (23): Let us discuss the meaning of (23) in detail. The lefthand side of (23) shows the concatenation of two C> A vecB = copies of the three-way array Π(m,n) in a certain way ⊗ that yields a 4-way array. Now look at the righthand C Π Π Π side. The arm m m by itself (which was bent = B A only in order to remove any doubt about which arm to the left of the equal sign corresponds to which C arm to the right) represents the neutral element un- = Π B der concatenation (i.e., the m m identity matrix). A Writing two arrays next to each× other without join- ing any arms represents their outer product, i.e., the = vecABC (27) array whose rank is the sum of the ranks of the arrays involved, and whose elements are all possible prod- 6.5 Looking Inside the Kronecker Arrays ucts of elements of the first array with elements of the second array. It is necessary to open up the arrays from the Π - The second identity satisfied by Π(m,n) is family and look at them “element by element,” in order to verify (21), (22), (23), (24), and (25). The (m,n) m elements of Π , which can be written in tile nota- mn Π Π mn = mn mn . tion by saturating the array with unit vectors, are n m µ (24) π(m,n) θ Π θµν = mn = ν Finally, there is also associativity: n 1 if θ = (µ 1)n + ν = ( − (28) m m 0 otherwise. mnp Π n Π = Π mnp Π n Note that for every θ there is exactly one µ and one ν π(m,n) ν p p such that θµν = 1; for all other values of µ and , π(m,n) (25) θµν = 0.

11 Π(n,m) Π(m,n) Writing ν A µ = aνµ and on its East side, while (24) contains twice. We will therefore use the following repre- θ vecA = cθ, (22) reads sentation, mathematically equivalent to (32), which makes it easier to see the effects of K(m,n): π(m,n) cθ = ∑ θµν aνµ, (29) µ,ν m K(m,n) = mn Π Π mn . which coincides with definition (15) of vecA. n One also checks that (21) is (16). Calling A B = (33) C, it follows from (21) that ⊗

(m,r) (n,q) Problem 8. Using the definition (33) show that cφθ = ∑ π a νbρκπ . (30) (m,n) (n,m) φµρ µ θνκ K K = Imn, the mn mn identity matrix. µ,ν,ρ,κ × Answer. You will need (23) and (24). φ π(m,r) For 1 r one gets a nonzero φµρ only for µ = 1 ≤ ≤ (n,q) Problem 9. Prove (19) in tile notation. and ρ = φ, and for 1 θ q one gets a nonzero πθνκ ≤ ≤ (r,m) (n,q) only for ν = 1 and κ = θ. Therefore cφθ = a11bφθ for Answer. Start with a tile representation of K (A B)K : φ θ ⊗ all elements of matrix C with r and q. Etc. r ≤ ≤ θ The proof of (23) uses the fact that for every rm Π Π rm (m,n) there is exactly one µ and one ν such that πθ ν = 0: µ 6 m nq θ=mn 1 if µ = ω and ν = σ π(m,n)π(m,n) ( m A n ∑ θµν θωσ = (31) θ=1 0 otherwise nq Π Π r B q Similarly, (24) and (25) can be shown by elemen- nq tary but tedious proofs. The best verification of these n rules is their implementation in a computer language, nq Π Π nq see Section 9 below. q 6.6 The Commutation Matrix in Tile Nota- Now use (23) twice to get tion A The simplest way to represent the commutation ma- = Π Π trix K(m,n) in a tile is B m K(m,n) = mn Π Π mn . (32) r B q n = rm Π Π nq . This should not be confused with the lefthand side of m A n (24): K(m,n) is composed of Π(m,n) on its West and

12 7 Higher Moments of Random Vec- Answer. (36) is an immediate consequence of (35); this step is tors now trivial due to linearity of the expected value:

7.1 Identically Distributed, not Necessarily A Normal A A Given a random vector ε of independent variables ε ε " # = σ4 + σ4 + εi with zero expected value E[εi] = 0 and identical E 2 ε ε second and fourth moments. Call var[εi] = σ and ε4 σ4 γ γ B B E[ i ] = ( 2 + 3), where 2 is the kurtosis. Then the following holds for the fourth moments: B

σ4(γ + 3) if i = j = k = l A A  2 σ4 if i = j = k = l  σ4 γ σ4  6 + + 2 ∆ E[εiε jεkεl] =  or i = k = j = l (34)  6 or i = l = j = k  6 B B  0 otherwise.  It is not an accident that (34) is given element by The first term is trAB. The second is trAB>, but since A and B element and not in matrix notation. It is not possible are symmetric, this is equal to trAB. The third term is trAtrB. What is the fourth term? Diagonal arrays exist with any number to do this, not even with the Kronecker product. But of arms, and any connected concatenation of diagonal arrays is it is easy in tile notation: again a diagonal array, see (6). For instance,

ε ε ∆ " # = σ4 + σ4 + E ∆ = . (37) ε ε ∆

4 4 + σ + γ2σ ∆ (35) From this together with (4) one can see that the fourth term is the scalar product of the diagonal vectors of A and B.

Problem 10. [Seb77, pp. 14–16 and 52] Show that for any symmetric n n matrices A and B, whose vectors of diagonal elements× are a and b, 7.2 Multivariate Normal Distribution

E[(ε>Aε)(ε>Bε)] = I will give a brief overview in tile notation of the higher moments of the multivariate standard normal = σ4 trAtrB + 2tr(AB) + γ a b . (36)  2 >  z. All odd moments disappear, and the fourth mo-

13 ments are is ubiquitous only because the Kronecker-notation blows up something as trivial as the crossing of two arms into a mysterious-sounding special matrix. z z The sixth moments of the standard normal, in E " # = + + analogy to the fourth, are the sum of all the differ- z z ent possible outer products of unit matrices:

(38) z z z Compared with (35), the last term, which depends E " # = (41) on the kurtosis, is missing. What remains is a sum z z z of outer products of unit matrices, with every pos- sibility appearing exactly once. In the present case, it happens to be possible to write down the four-way arrays in (38) in terms of Kronecker products and the commutation matrix K(n,n) introduced in (19): It is = + +

(n,n) E[(zz>) (zz>)] = In2 + K + (vec[In])(vec[In])> ⊗ (39)

Compare [Gra83, 10.9.2 on p. 361]. Here is a proof + + + of (39) in tile notation:

Π Π Π Π + + + z z E " # = + + z z Π Π Π Π + + +

(40)

(n,n) The first term is In2 due to (24), the second is K + + + . due to (33), and the third is (vec[In])(vec[In])> be- cause of (22). Graybill [Gra83, p. 312] considers it a justification of the interest of the commutation ma- Here is the principle how these were written down: trix that it appears in the higher moments of the stan- Fix one branch, here the Southwest branch. First dard normal. In my view, the commutation matrix combine the Southwest branch with the Northwest

14 one, and then you have three possibilities to pair up A A the others as in (38). Next combine the Southwest branch with the North branch, and you again have three possibilities for the others. Etc. This gives 15 + B + B + possibilities altogether. This can no longer be written as a Kronecker prod- uct, see [Gra83, 10.9.4 (3) on p. 363]. However (41) C C can be applied directly, for instance in order to show (42), which is [Gra83, 10.9.12 (1) on p. 368]:

E[(z>Az)(z>Bz)(z>Cz)] = A A = tr(A)tr(B)tr(C)+2tr(A)tr(BC)+2tr(B)tr(AC)+ + 2tr(C)tr(AB) + 8tr(ABC). (42) + B + B +

Problem 11. Assuming that A, B, and C are symmet- ric matrices, prove (42) in tile notation. C C

Answer.

A A A

z z z + B + B + E " B # = (43) z z z

C C C

A A A A

= B + B + + B + B +

C C C C

15 A A is that array which satisfies

A dx = dy , (45) + B + B + i.e.,

∂ y ∂ x dx = dy (46) C C   Extended substitutability applies here: n y A A and m x are not necessarily vectors; the arms with dimension m and n can represent different bun- + B + B + dles of several arms. The simplest matrix differentiation rule, for y = w>x, is C C ∂w>x/∂x> = w> (47)

A In tiles it is ∂ w x ∂ x = w (48)  + B . Here is the most basic matrix differentiation rule: if y = Ax is a linear vector function, then its deriva- tive is that same linear vector function: C ∂Ax/∂x> = A, (49) These 15 summands are, in order, tr(B)tr(AC), tr(ABC) twice, tr(B)tr(AC), tr(ABC) four times, tr(A)tr(BC), tr(ABC) twice, or in tiles tr(A)tr(BC), tr(C)tr(AB) twice, and tr(A)tr(B)tr(C). ∂ A x ∂ x = A (50) 

8 Array Differentiation Problem 12. Show that ∂trAX ∂ = A. (51) Here is the tile notation for matrix differentiation: If X> n y depends on m x , then In tiles it reads n A m = ∂ y ∂ x m  ∂ A X ∂ X = A . (52) 

(44) n

16 Answer. tr(AX) = ∑i, j ai jx ji i.e., the coefficient of x ji is then ai j. x Here is a differentiation rule for a matrix with ∂ ∂ A . x = respect to a matrix, first written element by ele- x ment, and then in tiles: If Y = AXB, i.e., yim = ∂ yim x ∑ ai jx jkbkm, then = ai jakm, because for every j,k ∂x jk fixed i and m this sum contains only one term which = A + A (55) x has x jk in it, namely, ai jx jkbkm. In tiles:

2 Proof. yi = ∑ j,k ai jkx jxk. For a given i, this has xp in 2 A the term aippxp, and it has xp in the terms aipkxpxk = = A where p k, and in ai jpx jxp where j p. The derivatives6 of these terms are 2a x +∑ 6 a x + ∂ X ∂ X = (53) ipp p k=p ipk k . ∑ a x , which simplifies to ∑ a x +6 ∑ a x . B j=p i jp j k ipk k j i jp j This6 is the i, p-element of the matrix on the rhs of B (55).

But there are also other ways to have the ar- Equations (52) and (53) can be obtained from (48) ray X occur twice in a concatenation Y. If Y = ∂ ∂ and (50) by extended substitution, since a bundle of X>X then yik = ∑ j x jix jk and therefore yik/ xlm = several arms can always be considered as one arm. 0 if m = i and m = k. Now assume m = i = k: 6 6 6 For instance, (52) can be written ∂yik/∂xli = ∂xlixlk/∂xli = xlk. Now assume m = k = i: 6 ∂yik/∂xlk = ∂xlixlk/∂xlk = xli. And if m = k = i then one gets the sum of the two above: ∂y /∂x = ∂ A X ∂ X = A ii li ∂ 2 ∂  xli/ xli = 2xli. In tiles this is

i and this is a special case of (48), since the two par- l allel arms can be treated as one arm. With a better X ∂X X development of the logic underlying this notation, it > = ∂ ∂ X = ∂X . will not be necessary to formulate them as separate > X theorems; all matrix differentiation rules given so far m are trivial applications of (50). k Here is one of the basic differentiation rules for a bilinear array concatenation: if X X = + . (56) x y = A (54) This rule is helpful for differentiating the multivari- x ate Normal likelihood function.

17 A computer implementation of this tile notation should contain algorithms to automatically take the derivatives of these array concatenations.

9 Internet Resources

The TEX-macros to typeset the tiles are available at www.econ.utah.edu/ehrbar/arca.sty.

My Econometrics class notes www.econ.utah.edu/ehrbar/ecmet.pdf

(5 Megabytes) contain more examples of this notation. A pilot implementation of this type of array concatenation in R is available as R-package on my web site. It can be down- loaded using the following R-command: in- stall.packages("arca", contriburl = "http://www.econ.utah.edu/ehrbar/R", lib = "/usr/lib/R/library") (you may need a different lib argument for your system). Besides special functions building the Π-arrays for Kronecker products, this package has one function arca which takes as arguments several arrays, together with a vector indicating which arms of which arrays are to be joined together. Several contractions can be specified in the same function call. This is not a production version; I merely used it to check the identities in Section 6.

18 References

[Gra83] Franklin A. Graybill. Matrices with Applications in Statistics. Wadsworth and Brooks/Cole, Pacific Grove, CA, second edition, 1983.

[JHG+88] George G. Judge, R. Carter Hill, William E. Griffiths, Helmut Lutkepohl,¨ and Tsoung-Chao Lee. Introduction to the Theory and Practice of Econometrics. Wiley, New York, second edition, 1988.

[Mag88] Jan R. Magnus. Linear Structures. Oxford University Press, New York, 1988.

[MN88] Jan R. Magnus and Heinz Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. Wiley, Chichester, 1988.

[Mor73] Trenchard More, Jr. Axioms and theorems for a theory of arrays. IBM Journal of Research and Development, 17(2):135–175, March 1973.

[MS86] Parry Hiram Moon and Domina Eberle Spencer. Theory of Holors; A Generalization of Tensors. Cambridge University Press, 1986.

[Seb77] G. A. F. Seber. Linear Regression Analysis. Wiley, New York, 1977.

[SS35] J. A. Schouten and Dirk J. Struik. Einfuhrung¨ in die neuren Methoden der Differentialgeometrie, volume I. 1935.

19