<<

Constructing Matrices using Power Series

David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. Created: December 11, 2007 Last Modified: December 29, 2007

Contents

1 Introduction 3 1.1 Power Series of Functions...... 3 1.2 Power Series Involving Matrices...... 3 1.3 Reduction of the Power Series...... 4 1.4 The Cayley-Hamilton Theorem...... 7 1.5 Ordinary Difference Equations...... 7 1.6 Generating Rotation Matrices from Skew-Symmetric Matrices...... 8

2 Matrices in 2D9

3 Rotations Matrices in 3D 11

4 Rotations Matrices in 4D 12 4.1 The Case δ = 0...... 13 4.2 The Case δ 6= 0...... 13 4.2.1 The Case ρ > 0...... 14 4.2.2 The Case ρ = 0...... 15 4.3 Summary of the Formulas...... 16 4.4 Source Code for Random Generation...... 17

5 Fixed Points, Invariant Spaces, and Direct Sums 19 5.1 Fixed Points...... 19 5.2 Invariant Spaces...... 19

1 5.3 Direct Sums...... 20 5.4 Rotations in 4D...... 21 5.4.1 The Case (d, e, f) = (0, 0, 0)...... 22 5.4.2 The Case (a, b, c) = (0, 0, 0)...... 23 5.4.3 The Case (d, e, f) 6= 0, (a, b, c) 6= 0, and δ = 0...... 24 5.4.4 The Case (d, e, f) 6= 0, (a, b, c) 6= 0, and δ 6= 0...... 25 5.4.5 A Summary of the Factorizations...... 27

References 29

2 1 Introduction

Although we tend to work with rotation matrices in two or three dimensions, sometimes the question arises about how to generate rotation matrices in arbitrary dimensions. This document describes a method for computing rotation matrices using power series of matrices. The approach is one you see in an undergraduate mathematics course on solving systems of linear differential equations with constant coefficients; for example, see [Bra83, Chapter 3].

1.1 Power Series of Functions

The natural exponential function is introduced in Calculus, namely, exp(x) = ex, where the base is (approx- . imately) e = 2.718281828. The function may be written as a power series

∞ x2 x3 xk X xk exp(x) = 1 + x + + + ··· + + ··· = (1) 2! 3! k! k! k=0

The power series is known to converge for any real√ number x. The formula extends to complex numbers z = x + iy, where x and y are real numbers and i = −1, ez = ex+iy = exeiy = exp(x) (cos(y) + i sin(y)) (2) The term exp(x) may be written as a power series using Equation (1). The trigonometric terms also have power series representations, ∞ y3 y5 X y2k+1 sin(y) = y − + − · · · = (−1)k (3) 3! 5! (2k + 1)! k=0 ∞ y2 y4 X y2k cos(y) = 1 − + − · · · = (−1)k (4) 2! 4! (2k)! k=0 The latter two power series also converge for any y.

1.2 Power Series Involving Matrices

The power series representations extend to functions whose inputs are square matrices rather than scalars, taking us into the realm of matrix algebra; for example, see [HJ85]. That is, exp(M), cos(M), and sin(M) are power series of the M, and they converge for all M. In particular, we are interested in exp(M) for square matrix M. A note of caution is in order. The -valued exponential function exp(x) has various properties that do not immediately extend to the matrix-valued function. For example, we know that exp(a + b) = ea+b = eaeb = ebea = eb+a = exp(b + a) (5) for any scalars a and b. This formula does not apply to all pairs of matrices A and B. The problem is that is not commutative, so reversing the order of terms exp(A) and exp(B) in the products generally produces different values. It is true, though that exp(A + B) = exp(A) exp(B), when AB = BA (6)

3 That is, as long as A and B commute in the multiplication, the usual property of power-of-sum-equals- product-of-powers applies. The power series for exp(M) of a square matrix M is formally defined as ∞ M 2 M k X M k exp(M) = I + M + + ··· + + ··· = (7) 2! k! k! k=0 and converges for all M. The first term, I, is the of the same size as M. The immediate question is how one goes about computing the power series. That is one of the goals of this document and is described next.

1.3 Reduction of the Matrix Power Series

Suppose that the n × n matrix M is diagonalizable; that is, suppose we may factor M = PDP −1, where −1 D = Diag(d1, . . . , dn) is a matrix and P is an with inverse P . The matrix k k −1 k k k powers are easily shown to be M = PD P , where D = Diag(d1 , . . . , dn). The power series factors as

P∞ M k P∞ Dk −1 exp(M) = k=0 k! = k=0 P k! P P∞ Dk  −1 = P k=0 k! P  k k  (8) P∞ d1 P∞ dn −1 = P Diag k=0 k! ,..., k=0 k! P = P Diag ed1 , . . . , edn  P −1

The expression is valid regardless of whether P and D have real-valued or complex-valued entries. For the purpose of numerical computation, when D has a real-valued entry, dj, then exp(dj) may be computed using any standard mathematics library. Naturally, the computation is only an approximation. When D has a complex-valued entry, dj = xj + iyj, then Equation (2) may be used to obtain exp(dj) = exp(xj)(cos(yj) + i sin(yj)). All of exp(xj), cos(yj), and sin(yj) may be computed using any standard mathematics library. The problem, though, is that not all matrices M are diagonalizable. A special case that arises often is a real-valued M. Such matrices have only real-valued eigenvalues and a complete of orthonormal eigenvectors. The eigenvalues are the diagonal entries of D and the corresponding eigenvectors become the columns of P . The orthonormality of the eigenvectors guarantees that P is orthogonal, so P −1 = P T (the is the inverse). A more general result is discussed in [HS74, Chapter 6], which applies to all matrices. A brief summary is provided here. A real-valued matrix S that is diagonalizable is said to be semisimple. A real-valued matrix N is nilpotent if there exists a power p ≥ 1 for which N p = 0. Generally, a real-valued matrix M may be uniquely decomposed as M = S + N, where S is real-valued and semisimple, N is real-valued and nilpotent, and −1 SN = NS. Because S is diagonalizable, we may factor it is S = PDP , where D = Diag(d1, . . . , dn) is a and P is invertible. What this means regarding the exponential function is

p−1 X N k exp(M) = exp(S + N) = exp(S) exp(N) = P Diag ed1 , . . . , edn  P −1 (9) k! k=0 The key here is that the power series for exp(N) is really a finite sum because N is nilpotent. Equation (9) shows us how to compute exp(M) for any matrix M; in particular, the equation may be implemented on a computer. Notice that in the special case of a symmetric matrix M, it must be that M = S and N = 0.

4 The discussion here allows for complex numbers. In particular, P and D might have complex-valued entries. −1 Instead, we may factor the n × n matrix S = QEQ as follows. Let S have r real-valued eigenvalues λ1 through λr and c real-valued eigenvalues α1 +iβ1 through αc +iβc, where n = r+c. Because S is real-valued, the complex-valued eigenvalues appear in pairs, αj + iβj and αj − iβj; thus, c is even, say, c = 2m. We may construct a basis of vectors for IRn that become the columns of the matrix Q and for which the matrix E has the block-diagonal form   λ1    ..   .       λr         α1 β1  E =   (10)      −β1 α1       ···         αm βm      −βm αm

Specifically, let Ui be a real-valued eigenvector corresponding to the real-valued eigenvalue λi for 1 ≤ i ≤ r. Let Vj = Xj +iYj be a complex-valued eigenvector corresponding to the complex-valued eigenvalue αj +iβj for 1 ≤ j ≤ c. The numbers α and β are real-valued and the vectors X and Y have real-valued components. Because S is real-valued, when µ is a complex-valued eigenvalue with eigenvector V, the conjugateµ ¯ has eigenvector V¯ , which is the componentwise conjugation of V. The matrix P has columns that are the eigenvectors, h i P = U1 ··· Ur V1 V¯ 1 ··· Vm V¯ m (11) To verify the factorization, h i SP = S U1 ··· Ur V1 V¯ 1 ··· Vm V¯ m h i = SU1 ··· SUr SV1 SV¯ 1 ··· SVm SV¯ m h i = λ1U1 ··· λrUr (α1 + iβ1)V1 (α1 − iβ1)V¯ 1 ··· (αm + iβm)Vm (αm − iβm)V¯ m = PD (12) where D is the diagonal matrix of eigenvalues. Applying the inverse of P to both sides of the equation produces S = PDP −1. Let µ = α + iβ be a complex-valued eigenvalue of S. Let V = X + iY be a complex-valued eigenvector for µ. We may write SV = (α + iβ)V in terms of real parts and imaginary parts,

SX + iSY = (αX − βY) + i(βX + αY) (13)

Equating real parts and imaginary parts,

SX = αX − βY,SY = βX + αY (14)

5 Writing this in block-matrix form,   h i h i α β S XY = XY   (15) −β α

Eigenvectors corresponding to two distinct eigenvalues are linearly independent. To see this, let e1 and e2 be distinct eigenvalues with corresponding eigenvectors W1 and W2, respectively. To demonstrate , we must show that 0 = c1W1 + c2W2 (16) implies c1 = c2 = 0. Applying S to Equation (16) leads to

0 = c1SW1 + c2SW2 = c1e1W1 + c2e2W2 (17)

Multiplying Equation (16) by e1 and subtracting from Equation (17) produces

0 = c2(e2 − e1)W2 (18)

The eigenvalues are distinct, so e2 − e1 6= 0. Eigenvectors are nonzero, so the only possibility to satisfy Equation (18) is c2 = 0, which in turn implies 0 = c1W1 and c1 = 0. This result implies that V and V¯ are linearly independent vectors, and consequently X = (V + V¯ )/2 and Y = (V − V¯ )/(2i) are linearly independent vectors. In the matrix P of Equation (11), we may replace each pair of complex-valued eigenvectors V and V¯ by the real-valued vectors X and Y to obtain a real-valued and invertible matrix h i Q = U1 ··· Ur X1 Y1 ··· Xm Ym (19) Using Equation (15), it may be verified that S = QEQ−1. The exponential of E has the block-diagonal form   eλ1    ..   .     λ   e r         cos β1 sin β1  exp(E) =  eα1  (20)      − sin β1 cos β1       ···         cos βm sin βm   eαm    − sin βm cos βm This follows from   αk βk Ek =   = αkI + βkJ (21) −βk αk where     1 0 0 1 I =   ,J =   ,IJ = JI (22) 0 1 −1 0

6 Because I and J commute in their product,   cos βk sin βk αk exp(Ek) = exp(αkI + βkJ) = exp(αkI) exp(βkJ) = e   (23) − sin βk cos βk

The matrix p−1 X N k exp(M) = Q exp(E)Q−1 (24) k! k=0 is therefore computed using only real-valued matrices. The heart of the construction relies on factoring M = S + N, computing the eigenvalues of S, and computing an orthogonal basis for IRn from S.

1.4 The Cayley-Hamilton Theorem

This is also a useful result that allows a reduction of the power series for exp(M) to a finite sum. The eigenvalues of an n × n matrix M are the roots to the characteristic polynomial

n n X k p(t) = det(M − tI) = p0 + p1t + ··· + pnt = pkt (25) k=0

n where I is the identity matrix. The degree of p is n and the coefficients are p0 through pn = (−1) . The characteristic equation is p(t) = 0. The Cayley-Hamilton Theorem states that if you formally substitute M into the characteristic equation, you obtain equality,

n n X k 0 = p(M) = p0I + p1M + ··· + M = pkM (26) k=0 Multiplying this equation by powers of M and reducing everytime the largest power is a multiple of n allows a reduction of the power series for exp(M) to

∞ n−1 X M k X exp(M) = = c M k (27) k! k k=0 k=0

for some coefficients c0 through cn−1. These coefficients are necessarily functions of the characteristic coef- ficients pj for 0 ≤ j ≤ n but the relationships are at first glance complicated to determine in closed form. This leads us into the topic of ordinary difference equations.

1.5 Ordinary Difference Equations

This is a topic whose theory parallels that of ordinary differential equations. It is a large topic that is not discussed in full detail here; a summary of it is provided in [Ebe03, Appendix D]. Generally, one has a ∞ sequence of numbers {xk}k=0 and a relationship that determines a term in the sequence from the previous n terms: xk+n = f(xk, ··· , xk+n−1), k ≥ 0 (28)

7 This is an explicit equation for xk+n. The equation is said to have degree n ≥ 1. The initial conditions are user-specified x0 through xn−1. An implicit equation is

F (xk, ··· , xk+n−1, xk+n) = 0, k ≥ 0 (29)

and are generally more difficult to solve because the xk+n term might occur in such a manner as to prevent solving for it explicitly. The subtopic of interest here is how to solve ordinary difference equations that are linear and have constant coefficients. The general equation of this form is written with all terms on one side of the equation as if it were implicit, but this is still an explicit equation because we may solve for xk+n,

pnxk+n + pn−1xk+n−1 + . . . p1xk+1 + p0xk = 0 (30) where p0 through pn are constants with pn 6= 0. Notice that the coefficients may be used to form the n n characteristic polynomial described in the previous section, p(t) = p0 + p1t + ··· + pnt with pn = (−1) . The method of solution for the linear equation with constant coefficients involves computing the roots of the characteristic equation. Let t1 through t` be the distinct roots of p(t) = 0. Let each root tj have multiplicity mj; necessarily, n = m1 +···+m`. The Fundamental Theorem of Algebra states that the polynomial factors into a product, ` Y mj p(t) = (t − tj) (31) j=1

Equation (30) has mj linearly independent solutions corresponding to tj, namely,

k k mj −1 k xk = tj , xk = ktj ,, ··· , xk = k tj (32)

This is the case even when tj is a complex-valued root. The general solution to Equation (30) is

` mj −1 ! X X s k xk = cjsk tj , k ≥ n (33) j=1 s=0 where the n coefficients cjs are determined from the initial conditions which are the user-specified values x0 through xn−1.

As it turns out, the finite difference equations and constructions apply equally as well when the xk are matrices. In this case, the cjs of Equation (33) are themselves matrices.

1.6 Generating Rotation Matrices from Skew-Symmetric Matrices

Equation (6) shows how to exponentiate a sum of matrices. As noted, exp(A + B) = exp(A) exp(B) as long as AB = BA. In particular, it is the case that I = exp(0) = exp(A + (−A)) = exp(A) exp(−A), where I is the identity matrix. Thus, we have the formula for inversion of an exponential of a matrix,

exp(A)−1 = exp(−A) (34)

The transpose of a sum of matrices is the sum of the transposes of the matrices, a property that also extends to the exponential power series. A consequence is

exp(A)T = exp AT (35)

8 Now consider a skew-symmetric matrix S. Such a matrix has the property ST = −S. Define R = exp(S). Using Equations (34) and (35),

RT = exp(S)T = exp ST = exp(−S) = exp(S)−1 = R−1 (36)

Because the inverse and transpose of R are the same matrix, R must be an . Although in the realm of advanced mathematics, it may be shown that in fact R is a . The essence of the argument is that the space of orthogonal matrices has two connected components, one for which the is 1 (rotation matrices) and one for which the determinant is −1 (reflection matrices). Each component is path connected. The curve of orthogonal matrices exp(tS) for t ∈ [0, 1] is a path connecting I (the case t = 0) and R = exp(S) (the case t = 1), so R and I must have the same determinant, which is 1, and R is therefore a rotation matrix.

2 Rotations Matrices in 2D

The general skew-symmetric matrix in 2D is   0 −1 S = θ   (37) 1 0 where θ is any real-valued number. The corresponding rotation matrix is R = exp(S). The characteristic equation for S is 0 = p(t) = det(S − tI) = t2 + θ2 (38) The Cayley-Hamilton Theorem guarantees that

S2 + θ2I = 0 (39) so that S2 = −θ2I. Higher powers of S are S3 = −θ2S, S4 = −θ2S2 = θ4I, and so on. Substituting these into the power series for exp(S) and grouping together the terms involving I and S produces

R = exp(S) S2 S3 S4 S5 = I + S + 2! + 3! + 4! + 5! + ··· θ2 θ2 θ4 θ5 = I + S − 2! I − 3! S + 4! I + 5! S − · · ·  2 4   2 4  = 1 − θ + θ − · · · I + 1 − θ + θ − · · · S 2! 4! 3! 5! (40)  sin(θ)  = cos(θ)I + θ S

  cos(θ) − sin(θ) =   sin(θ) cos(θ)

The rotation matrix may√ be computed using the eigendecomposition method of Equation (8). The eigenvalues of S are ±iθ, where i = −1. Corresponding eigenvectors are (±i, 1). The matrices D, P , and P −1 in the

9 decomposition are       iθ 0 i −i −1 1 1 i D =   ,P =   ,P =   (41) 0 −iθ 1 1 2i −1 i It is easily verified that S = PDP −1. The rotation matrix is computed to be

R = P exp(D)P −1

      iθ i −i e 0 1 1 i =     2i   1 1 0 e−iθ −1 i

  (42) eiθ +e−iθ eiθ −e−iθ 2 − 2i =   eiθ −e−iθ eiθ +e−iθ 2i 2

  cos(θ) − sin(θ) =   sin(θ) cos(θ) where we have used the identities e±iθ = cos(θ) ± i sin(θ). The rotation matrix may also be computed using ordinary finite differences. The linear difference equation is 2 Xk+2 + θ Xk = 0, k ≥ 0 (43) X0 = I,X1 = S

k We know from the construction that S = Xk. Equation (33) is

k k Xk = (iθ) C0 + (−iθ) C1 (44) for unknown coefficient matrices C0 and C1. The initial conditions imply

I = C0 + C1 (45) S = iθC0 − iθC1 and have the solution I−(i/θ)S C0 = 2 (46) I+(i/θ)S C1 = 2 The solution to the linear difference equation is I − (i/θ)S I + (i/θ)S Sk = X = (iθ)k + (−iθ)k (47) k 2 2 When k is even, say, k = 2p, Equation (47) reduces to

S2p = (−1)pθ2pI, p ≥ 1 (48)

10 When k is odd, say, k = 2p + 1, Equation (47) reduces to

S2p+1 = (−1)pθ2pS, p ≥ 1 (49)

But these powers of S are exactly what we computed manually and substituted into the power series for exp(S) to produce Equation (40).

3 Rotations Matrices in 3D

The general skew-symmetric matrix in 3D is   0 a b     S = θ  −a 0 c  (50)   −b −c 0 where θ, a, b, and c are any real-valued numbers with a2 + b2 + c2 = 1. The corresponding rotation matrix is R = exp(S). The characteristic equation for S is

0 = p(t) = det(S − tI) = −t3 − θ2t (51)

The Cayley-Hamilton Theorem guarantees that

− S3 − θ2S = 0 (52) so that S3 = −θ2S. Higher powers of S are S4 = −θ2S2, S5 = −θ2S3 = θ4S, S6 = −θ2S2, and so on. Substituting these into the power series for exp(S) and grouping together the terms involving I, S, and S2 produces R = exp(S) S2 S3 S4 S5 S6 = I + S + 2! + 3! + 4! + 5! + 6! + ··· 1 2 θ2 θ2 2 θ4 θ4 2 = I + S + 2! S − 3! S − 4! S + 5! S + 6! S − · · · (53)  θ2 θ4   1 θ2 θ4  2 = I + 1 − 3! + 5! − · · · S + 2! − 4! + 6! − · · · S  sin(θ)   1−cos(θ)  2 = I + θ S + θ2 S

If we define Sˆ = S/θ, then R = I + sin(θ)Sˆ + (1 − cos(θ))Sˆ2 (54) which is Rodrigues’ Formula for a rotation matrix. The angle of rotation is θ and the axis of rotation has unit-length direction (−c, b, −a). The rotation matrix may also be computed using ordinary finite differences. The linear difference equation is 2 Xk+3 + θ Xk = 0, k ≥ 0 (55) 2 X0 = I,X1 = S, X2 = S

11 k We know from the construction that S = Xk. The roots of the characteristic equation are 1 and ±iθ. Equation (33) is k k k k S = Xk = 1 C0 + (iθ) C1 + (−iθ) C2 (56) for unknown coefficient matrices C0, C1, and C2. The initial conditions imply

I = C0 + C1 + C2

S = C0 + iθC1 − iθC2 (57) 2 2 2 S = C0 − θ C1 − θ C2 and have the solution S2+θ2I C0 = 1+θ2 (I−C0)−(i/θ)(S−C0) (58) C1 = 2

(I−C0)+(i/θ)(S−C0) C2 = 2 When k is odd, say, k = 2p + 1, Equation (56) reduces to the following using Equation (58)

S2p+1 = (−1)pθ2pS, p ≥ 1 (59)

When k is even, say, k = 2p + 2, Equation (56) reduces to the following using Equation (58)

S2p+2 = (−1)pθ2pS2, p ≥ 1 (60)

But these powers of S are exactly what we computed manually and substituted into the power series for exp(S) to produce Equation (53).

4 Rotations Matrices in 4D

The general skew-symmetric matrix in 4D is   0 a b d      −a 0 c e  S = θ   (61)    −b −c 0 f    −d −e −f 0

where θ, a, b, c, d, e, and f are any real-valued numbers with a2 +b2 +c2 +d2 +e2 +f 2 = 1. The corresponding rotation matrix is R = exp(S). The characteristic equation for S is

0 = p(t) = det(S − tI) = t4 + θ2t2 + θ4(af − be + cd)2 = t4 + θ2t2 + θ4δ2 (62)

where δ = af − be + cd. The Cayley-Hamilton Theorem guarantees that

S4 + θ2S2 + θ4δ2 = 0 (63)

Computing closed-form expressions for powers of S is simple, much like previous dimensions, when δ = 0. It is more difficult when δ 6= 0. Let us consider the cases separately. In both cases, we assume that θ 6= 0, otherwise S = 0 and the rotation matrix is the identity matrix.

12 4.1 The Case δ = 0

Suppose that δ = af − be + cd = 0; then S4 + θ2S2 = 0. The closed-form expressions for powers of S are easy to generate. Specifically, S2p+2 = (−1)pθ2pS2, p ≥ 1 (64) S2p+3 = (−1)pθ2pS3, p ≥ 1 This leads to the power series reduction

1 − cos(θ) θ − sin(θ) R = exp(S) = I + S + S2 + S3 (65) θ2 θ3

It is possible for further reduction depending on the entries of S. For example, if d = e = f = 0, we effectively have the 3D rotation in (x, y, z) embedded in 4D; the w-component is unchanged by the rotation. The matrix S really satisfies S3 + θ2S = 0, in which case we may replace S3 = −θ2S in Equation (65) and obtain Equation (53). The fact that S is a “root” of a 3rd-degree polynomial when the characteristic polynomial is 4th-degree is related to the linear algebraic concept of minimal polynomial. Another reduction occurs, for example, when b = c = d = e = f = 0. The matrix S satisfies S2 + θ2I = 0 and Equation (65) reduces to Equation (40).

4.2 The Case δ 6= 0

Suppose that δ = af − be + cd 6= 0. The characteristic polynomial t4 + θ2t2 + θ4δ2 is a quadratic polynomial in t2. We may use the quadratic formula to obtain its roots, √ √ ! −θ2 ± θ4 − 4θ4δ2 1 ± 1 − 4δ2 t2 = = −θ2 (66) 2 2

Notice that

1 − 4δ2 = (a2 + b2 + c2 + d2 + e2 + f 2) − 4(af − be + cd)2 = (a − f)2 + (b + e)2 + (c − d)2 (a + f)2 + (b − e)2 + (c + d)2 (67) = ρ2 ≥ 0 where ρ = p[(a − f)2 + (b + e)2 + (c − d)2] [(a + f)2 + (b − e)2 + (c + d)2] (68) Thus, the right-hand side of Equation (66) is real-valued. Moreover, 1 − 4δ2 < 1, so the right-hand side of Equation (66) consists of two negative real numbers. Taking the square roots produces the t-roots for the characteristic equation, all having zero real part,

r1 − ρ r1 + ρ t = ±iθ =: ±iα, ±iθ =: ±iβ (69) 2 2 The values α and β are defined by these equations, both values being positive real numbers with α ≤ β.

13 4.2.1 The Case ρ > 0

Using the linear difference equation approach, the powers of S are k k k k k S = (iα) C0 + (−iα) C1 + (iβ) C2 + (−iβ) C3 (70) where the C-matrices are determined by the initial conditions

I = C0 + C1 + C2 + C3

S = iαC0 − iαC1 + iβC2 − iβC3 (71) 2 2 2 2 2 S = −α C0 − α C1 − β C2 − β C3 3 3 3 3 3 S = −iα C0 + iα C1 − iβ C2 + iβ C3 The solution is αβ3I−iβ3S+αβS2−iβS3 C0 = 2αβ(β2−α2) αβ3I+iβ3S+αβS2+iβS3 C1 = 2 2 2αβ(β −α ) (72) −α3βI+iα3S−αβS2+iαS3 C2 = 2αβ(β2−α2) −α3βI−iα3S−αβS2−iαS3 C3 = 2αβ(β2−α2)

We may use Equation (70) to compute four consecutive powers of S, namely,

4p 2 2 4p 2 2 4p α (S +β I)−β (S +α I) S = β2−α2 4p 3 2 4p 3 2 4p+1 α (S +β S)−β (S +α S) S = β2−α2 4p+2 2 2 4p+2 2 2 (73) 4p+2 −α (S +β I)−β (S +α I) S = β2−α2 4p+2 3 2 4p+2 3 2 4p+3 −α (S +β S)−β (S +α S) S = β2−α2 for p ≥ 0. The exponential of S factors to

P∞ Sk exp(S) = k=0 k! P∞ S4p P∞ S4p+1 P∞ S4p+2 P∞ S4p+3 = p=0 (4p)! + p=0 (4p+1)! + p=0 (4p+2)! + p=0 (4p+3)! 1 hP∞ α4p  2 2 P∞ β4p  2 2 = β2−α2 p=0 (4p)! (S + β I) − p=0 (4p)! (S + α I)+ (74) P∞ α4p  3 2 P∞ β4p  3 2 p=0 (4p+1)! (S + β S) − p=0 (4p+1)! (S + α S)− P∞ α4p+2  2 2 P∞ β4p+2  2 2 p=0 (4p+2)! (S + β I) + p=0 (4p+2)! (S + α I)− P∞ α4p+2  3 2 P∞ β4p+2  3 2 i p=0 (4p+3)! (S + β S) + p=0 (4p+3)! (S + α S)

This expansion appears to be quite complicated, but the following identities allow us to simplify the expres- sion. Each equation below defines the function fj(x),

P∞ x4p 1 x −x 1 f0(x) = p=0 (4p)! = 4 (e + e ) + 2 cos(x) P∞ x4p+1 1 x −x 1 f1(x) = = (e − e ) + sin(x) p=0 (4p+1)! 4 2 (75) P∞ x4p+2 1 x −x 1 f2(x) = p=0 (4p+2)! = 4 (e + e ) − 2 cos(x) P∞ x4p+3 1 x −x 1 f3(x) = p=0 (4p+3)! = 4 (e − e ) − 2 sin(x)

14 0 0 0 0 Notice that f0(x) = f1(x), f1(x) = f2(x), f2(x) = f3(x), and f3(x) = f0(x). It is also easy to verify that exp(x) = f0(x) + f1(x) + f2(x) + f3(x). Equation (74) becomes

1  2 2 2 2 R = exp(S) = β2−α2 f0(α)(S + β I) − f0(β)(S + α I)+ f1(α) 3 2 f1(β) 3 2 α (S + β S) − β (S + α S)− 2 2 2 2 f2(α)(S + β I) + f2(β)(S + α I)− i (76) f3(α) 3 2 f3(β) 3 2 α (S + β S) + β (S + α S)  β2 cos(α)−α2 cos(β)   β2 sin(α)/α−α2 sin(β)/β  = β2−α2 I + β2−α2 S+  cos(α)−cos(β)  2  sin(α)/α−sin(β)/β  3 β2−α2 S + β2−α2 S

4.2.2 The Case ρ = 0

Let ρ = 0. Based on Equation (67), the only time this can happen is√ when (a, b, c) = ±(f, −e, d). This 2 2 2 2 condition implies√ a + b + c = 1/2 and δ = 1/4. The t-roots are ±iθ/ 2, each one having multiplicity 2. Define α = θ/ 2. Using the linear difference equation approach, the powers of S are

k k k k k S = (iα) C0 + (−iα) C1 + k(iα) C2 + k(−iα) C3 (77)

where the C-matrices are determined by the initial conditions

I = C0 + C1

S = iα(C0 − C1 + C2 − C3) (78) 2 2 S = −α (C0 + C1 + 2C2 + 2C3) 3 3 S = −iα (C0 − C1 + 3C2 − 3C3)

The solution is 3 C = I − 3√iS − √iS 0 2 2 2 θ 2 θ3 3 C = I + 3√iS + √iS 1 2 2 2 θ 2 θ3 2 3 (79) C = − I + √iS − S + √iS 2 4 2 2 θ 2θ2 2 θ3 2 3 C = − I − √iS − S − √iS 3 4 2 2 θ 2θ2 2 θ3

We may use Equation (77) to compute four consecutive powers of S, namely,

4p 4p 4p  I S2  S = α I − 4pα 2 + 2α2 4p+1 4p+1 S 4p+1  S S3  S = α − 4pα + 3 α 2α 2α (80) 4p+2 4p+2 S2 4p+2  I S2  S = α α2 + 4pα 2 + 2α2 4p+3 4p+3 S3 4p+3  S S3  S = α α3 + 4pα 2α + 2α3

15 for p ≥ 0. Define Sˆ = S/α. The exponential of S factors to

P∞ Sk exp(S) = k=0 k! P∞ S4p P∞ S4p+1 P∞ S4p+2 P∞ S4p+3 = p=0 (4p)! + p=0 (4p+1)! + p=0 (4p+2)! + p=0 (4p+3)! P∞ α4p  P∞ 4pα4p   I Sˆ2  = p=0 (4p)! I − p=0 (4p)! 2 + 2 +  4p+1   4p+1   3  (81) P∞ α ˆ P∞ 4pα Sˆ Sˆ p=0 (4p+1)! S − p=0 (4p+1)! 2 + 2 +  4p+2   4p+2   2  P∞ α ˆ2 P∞ 4pα I Sˆ p=0 (4p+2)! S + p=0 (4p+2)! 2 + 2 +  4p+3   4p+3   3  P∞ α ˆ3 P∞ 4pα Sˆ Sˆ p=0 (4p+3)! S + p=0 (4p+3)! 2 + 2

We may simplify this using the definitions for fj(x) and the following identities,

P∞ 4px4p g0(x) = p=0 (4p)! = xf3(x) P∞ 4px4p+1 g1(x) = = xf0(x) − f1(x) p=0 (4p+1)! (82) P∞ 4px4p+2 g2(x) = p=0 (4p+2)! = xf1(x) − 2f2(x) P∞ 4px4p+3 g3(x) = p=0 (4p+3)! = xf2(x) − 3f3(x)

0 0 0 Notice that g0(x) = g1(x), g1(x) = g2(x), and g2(x) = g3(x). Equation (81) becomes     g0(α) g2(α) g1(α) g3(α) ˆ R = exp(S) = f0(α) − 2 + 2 I + f1(α) − 2 + 2 S+     g0(α) g2(α) 2 g1(α) g3(α) 3 f2(α) − + Sˆ + f3(α) − + Sˆ 2 2 2 2 (83)  2 cos(α)+α sin(α)   3 sin(α)−α cos(α)  = 2 I + 2α S+  sin(α)  2  sin(α)−α cos(α)  3 2α S + 2α3 S

4.3 Summary of the Formulas

We start with the skew-symmetric matrix   0 a b d      −a 0 c e  S = θ   (84)    −b −c 0 f    −d −e −f 0 where a2 + b2 + c2 + d2 + e2 + f 2 = 1. Define r r p 1 − ρ 1 + ρ δ = af − be + cd, ρ = 1 − 4δ2, α = θ , β = θ (85) 2 2

Equation (76) really is the most general formula for the rotation matrix R corresponding to a skew-symmetric

16 matrix S. To repeat that formula,

 β2 cos(α)−α2 cos(β)   β2 sin(α)/α−α2 sin(β)/β  R = 2 2 I + 2 2 S β −α β −α (86)  cos(α)−cos(β)  2  sin(α)/α−sin(β)/β  3 + β2−α2 S + β2−α2 S

This equation was constructed for when β > α > 0. However, when α = 0, Equation(86) reduces to Equation (65), where it is necessary that β = 1,

 1−cos(β)  2  β−sin(β)  3 R = I + S + β2 S + β3 S (87)

The evaluation of sin(α)/α at α = 0 is done in the limiting sense, limα→0 sin(α)/α = 1. Equation (83) was for the case β = α but may be obtained also from Equation (86) in the limiting sense as β → α. l’Hˆopital’s Rule may be applied to the coefficients of I, S, S2, and S3 to obtain the coefficients in Equation (83),     R = 2 cos(α)+α sin(α) I + 3 sin(α)−α cos(α) S 2 2α (88)  sin(α)  2  sin(α)−α cos(α)  3 + 2α S + 2α3 S

4.4 Source Code for Random Generation

Here is some sample code to illustrate Equation (86) when β 6= α.

v o i d RandomRotation4D BetaNotEqualAlpha () { double a = Mathd:: SymmetricRandom(); // number is in[ −1 ,1) double b = Mathd:: SymmetricRandom(); // number is in[ −1 ,1) double c = Mathd::SymmetricRandom(); // number is in[ −1 ,1) double d = Mathd:: SymmetricRandom(); // number is in[ −1 ,1) double e = Mathd::SymmetricRandom(); // number is in[ −1 ,1) double f = Mathd::SymmetricRandom(); // number is in[ −1 ,1) Matrix4d S ( 0 . 0 , a , b , d , −a , 0 . 0 , c , e , −b , −c , 0 . 0 , f , −d , −e , −f , 0 . 0 );

double theta = Mathd::Sqrt(a∗a + b∗b + c∗c + d∗d + e∗e + f ∗ f ) ; double invLength = 1.0/theta; a ∗= invLength; b ∗= invLength; c ∗= invLength; d ∗= invLength; e ∗= invLength; f ∗= invLength;

double d e l t a = a∗ f − b∗e + c∗d ; double delta2 = delta ∗ d e l t a ; double p = Mathd::Sqrt(1.0 − 4.0∗ d e l t a 2 ) ; double alpha = theta ∗Mathd:: Sqrt(0.5 ∗ ( 1 . 0 − p ) ) ; double beta = t h e t a ∗Mathd:: Sqrt(0.5 ∗ ( 1 . 0 + p ) ) ; double invDenom = 1.0/(beta ∗ beta − a l p h a ∗ a l p h a ) ; double cosa = Mathd::Cos(alpha); double sina = Mathd::Sin(alpha); double cosb = Mathd::Cos(beta); double sinb = Mathd::Sin(beta);

double k0 = ( beta ∗ beta ∗ cosa − a l p h a ∗ a l p h a ∗ cosb )∗ invDenom ;

17 double k1 = ( beta ∗ beta ∗ s i n a / a l p h a − a l p h a ∗ a l p h a ∗ s i n b / beta )∗ invDenom ; double k2 = ( cosa − cosb )∗ invDenom ; double k3 = (sina/alpha − s i n b / beta )∗ invDenom ;

Matrix4d I = Matrix4d ::IDENTITY; Matrix4d S2 = S∗S; Matrix4d S3 = S∗S2 ; Matrix4d R = k0∗ I + k1∗S + k2∗S2 + k3∗S3 ; // The random rotation matrix.

// Sanity checks. Matrix4d S4 = S∗S3 ; double theta2 = theta ∗ t h e t a ; double theta4 = theta2 ∗ t h e t a 2 ; Matrix4d zero0 = S4 + theta2 ∗S2 + ( t h e t a 4 ∗ d e l t a 2 )∗ I; //= the double one = R.Determinant(); //=1 Matrix4d zero1 = R.TransposeTimes(R) − I; //= the zero matrix }

Here is some sample code to illustrate the case when β = α. v o i d RandomRotation4D BetaEqualAlpha () { double a = Mathd:: SymmetricRandom(); // number is in[ −1 ,1) double b = Mathd:: SymmetricRandom(); // number is in[ −1 ,1) double c = Mathd::SymmetricRandom(); // number is in[ −1 ,1) double mult = Mathd:: Sqrt(0.5)/Mathd:: Sqrt(a∗a + b∗b + c∗c ) ; a ∗= mult ; b ∗= mult ; c ∗= mult ; double d = c ; double e = −b ; double f = a ; double theta = Mathd::SymmetricRandom(); Matrix4d S ( 0 . 0 , a , b , d , −a , 0 . 0 , c , e , −b , −c , 0 . 0 , f , −d , −e , −f , 0 . 0 ); S ∗= t h e t a ;

double l e n s q r = a∗a + b∗b + c∗c + d∗d + e∗e + f ∗ f ; //=1 double d e l t a = a∗ f − b∗e + c∗d ; //= 1/2 double delta2 = delta ∗ d e l t a ; //= 1/4 double d i s c r = 1 . 0 − 4.0∗ d e l t a 2 ; //=0 double p = Mathd:: Sqrt(Mathd::FAbs(discr )); //=0 double alpha = theta ∗Mathd:: Sqrt(0.5); double cosa = Mathd::Cos(alpha); double sina = Mathd::Sin(alpha);

double k0 = ( 2 . 0 ∗ cosa + a l p h a ∗ s i n a ) / 2 . 0 ; double k1 = ( 3 . 0 ∗ s i n a − a l p h a ∗ cosa ) / ( 2 . 0 ∗ a l p h a ) ; double k2 = sina/(2.0 ∗ a l p h a ) ; double k3 = ( s i n a − a l p h a ∗ cosa ) / ( 2 . 0 ∗ a l p h a ∗ a l p h a ∗ a l p h a ) ;

Matrix4d I = Matrix4d ::IDENTITY; Matrix4d S2 = S∗S; Matrix4d S3 = S∗S2 ; Matrix4d R = k0∗ I + k1∗S + k2∗S2 + k3∗S3 ; // The random rotation matrix.

// Sanity checks. Matrix4d S4 = S∗S3 ; double theta2 = theta ∗ t h e t a ; double theta4 = theta2 ∗ t h e t a 2 ; Matrix4d zero0 = S4 + theta2 ∗S2 + ( t h e t a 4 ∗ d e l t a 2 )∗ I; //= the zero matrix double one = R.Determinant(); //=1 Matrix4d zero1 = R.TransposeTimes(R) − I; //= the zero matrix }

18 5 Fixed Points, Invariant Spaces, and Direct Sums

A rotation in IR2 is sometimes described as “rotation about a point”, where the point is the origin–a 0- dimensional quantity living in a 2-dimensional space. A rotation in IR3 is sometimes described as “rotation about an axis”, where the axis is a line–a 1-dimensional quantity in a 3-dimensional space. Sometimes you will hear someone attempt to extend these geometric descriptions to a rotation in IR4 by saying that you have a “rotation about a plane”, the plane being a 2-dimensional quantity living in a 4-dimensional space. This description is not meaningful, and in fact is not technically correct. It is better to describe the rotations in terms of fixed points and invariant spaces. In the following discussion, I assume that a rotation (rotation matrix) is not the identity (identity matrix).

5.1 Fixed Points

Let F(X) be a vector field; that is, F : IRn → IRn, which states that F is a function that maps n-tuples to n-tuples. If F is a linear function, then it may be represented by a matrix M; that is, F(X) = MX. This rep- resentation is assumed to be with respect to the standard Euclidean basis for IRn, which is the of n vectors {ek}k=1 for which ek has a 1 in component k of the vector and all other components are 0. In IR2, the basis is {(1, 0), (0, 1)}. In IR3, the basis is {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. In IR4, the basis is {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)}. Other matrix representations are possible when you choose dif- ferent bases. A point P is said to be a fixed point for the function when F(P) = P. For a linear function, MP = P. The only fixed point for rotation in IR2 is the origin 0. If R is the 2 × 2 matrix that represents the rotation, then R0 = 0. For all vectors X 6= 0, RX 6= X. A rotation in IR3 has infinitely many fixed points, all lying on the axis of rotation. If the axis is the line containing the origin and having direction U, then a point on the axis is tU for some scalar t. If R is the 3 × 3 matrix that represents the rotation about the axis, then R(tU) = tU.

5.2 Invariant Spaces

Let V ⊆ IRn be a vector subspace. The trivial subspaces are {0}, the set consisting of only the origin, and IRn, the entire set of n-tuples. In 2D, the nontrivial subspaces are lines containing the origin. In 3D, the nontrivial subspaces are lines and planes containing the origin. Let F(X) be a linear function from IRn to IRn. Define F(V ) to be the set of all n-tuples that are mapped to by n-tuples in V ; that is, Y ∈ F(V ) means that Y = F(X) for some X ∈ V . V is said to be an invariant space relative to F when F(V ) ⊆ V ; that is, any vector in V is transformed by F to a vector in V . It is not required that a nonzero vector in V be mapped to itself, so an invariant space might not have any nonzero fixed points. The trivial subspaces are always invariant under linear transformations. If F(X) = MX for an n × n matrix M, then M0 = 0, in which case 0 is a fixed point and the set {0} is invariant. Every vector X ∈ IRn

19 is mapped to a vector MX ∈ IRn, so IRn is invariant. The more interesting question is whether there are nontrivial invariant subspaces. The only invariant subspaces for a rotation in 2D are the trivial ones. A line containing the origin is a nontrivial subspace, but a rotation causes a line to rotate to some other line, so the original line is not invariant. A rotation in 3D has two nontrivial invariant subspaces, the axis of rotation and the plane perpendicular to it. For example, consider the canonical rotation matrix   cos θ − sin θ 0     R =  sin θ cos θ 0  (89)   0 0 1

The axis of rotation is the z-axis. Points in the xy-plane are rotated to other points in the xy-plane. Thus, the z-axis is an invariant subspace and the xy-plane is an invariant subspace.

5.3 Direct Sums

n n Let V1 through Vm be nontrivial subspaces of IR . If every vector X ∈ IR can be represented uniquely as m X X = Xi, Xi ∈ Vi (90) i=1 then IRn is said to be a of the subspaces, denoted by

m n M IR = V1 ⊕ · · · ⊕ Vm = Vi (91) i=1

For example, if V2 is the set of points in the xy-plane and V2 is the set of points on the z-axis, then 3 IR = V1 ⊕ V2. A vector is represented as (x, y, z) = (x, y, 0) + (0, 0, z), where (x, y, 0) ∈ V1 and (0, 0, z) ∈ V2. n Let V1 through Vm be nontrivial subspaces of IR . Let Fi : Vi → Vi be linear transformations on the n n subspaces; that is, Vi is invariant with respect to Fi. Let F : IR → IR be a linear transformation for which F(X) = Fi(X) whenever X ∈ Vi. F is said to be a direct sum of the linear functions, denoted by m M F = F1 ⊕ · · · ⊕ Fm = Fi (92) i=1

Let the dimension of Vi be denoted by di = dim(Vi). Necessarily, n = d1 + ··· dm. If Fi(X) = MiX for matrices Mi, where Mi is a di × di matrix, then F(X) = MX where M is an n × n matrix and M = Diag(M1,...,Mm).

Using our example of a 3D rotation about the z-axis, V1 is the set of points in the xy-plane, V2 is the set of 3 points on the z-axis, and IR = V1 ⊕ V2. We may choose   cos θ − sin θ h i M1 =   ,M2 = 1 (93) sin θ cos θ

20 in which case the 3D rotation matrix R is   cos θ − sin θ 0     R = M1 ⊕ M2 = Diag(M1,M2) =  sin θ cos θ 0  (94)   0 0 1

More generally, let the axis of rotation have unit-length direction C. Let A and B be vectors that span the plane perpendicular to the axis, but choose them so that {A, B, C} is an orthonormal set (vectors are unit-length and mutually perpendicular). The invariant subspaces are

2 V1 = {xA + yB :(x, y) ∈ IR },V2 = {zC : z ∈ IR} (95) and {A, B, C} is a basis for IR3. The coordinates for this basis are X = [x y z]T and the rotation matrix relative to this basis is R of Equation (94). The rotated vector is X0 = RX. To see the representation in Cartesian coordinates, define the matrix P = [ABC] whose columns are the specified vectors. Any vector Y ∈ IR3 may be represented as Y = P X (96) for some vector X. The inverse of P is just its transpose, so X = P TY. The rotated vector is

Y0 = P X0 = PRX = PRP TY (97)

Thus, the rotation matrix in Cartesian coordinates is

R0 = PRP T (98)

The question now is how do these concepts extend to 4D rotations. The next section describes this.

5.4 Rotations in 4D

We found that the skew-symmetric matrix S of Equation (61) has eigenvalues ±iαθ and ±iβθ, where α and β are real numbers with 0 ≤ α ≤ β. The factorization is   0 αθ 0 0      −αθ 0 0 0  T S = P   P (99)    0 0 0 βθ    0 0 −βθ 0 for some orthogonal matrix P . The corresponding rotation matrix is   cos(αθ) sin(αθ) 0 0      − sin(αθ) cos(αθ) 0 0  T R = P   P (100)    0 0 cos(βθ) sin(βθ)    0 0 − sin(βθ) cos(βθ)

21 The problem is to determine what is P . This requires determining the invariant subspaces of IR4 relative to the rotation R or, equivalently, relative to the skew-symmetric S. There will be two nontrivial invariant subspaces, each of dimension 2. Let V1 = span(U1, U2), which will be the invariant subspace that corresponds to α, and V2 = span(U3, U4), which be the invariant subspace that corresponds to β, where the four spanning vectors form an orthonormal set. P may be constructed as the orthogonal matrix whose columns are these four vectors, h i P = U1 U2 U3 U4 (101)

Recall that α = p(1 − ρ)/2 and β = p(1 + ρ)/2 for some ρ ∈ [0, 1]. Also, δ = af − be + cd, δ2 = α2β2, and 1 − 4δ2 = ρ2. Finally, we have always assumed that a2 + b2 + c2 + d2 + e2 + f 2 = 1.

5.4.1 The Case (d, e, f) = (0, 0, 0)

The condition (d, e, f) = (0, 0, 0) reduces S to the 3D case that we already understand. The skew-symmetric matrix is   0 a b 0      −a 0 c 0  S =   (102)    −b −c 0 0    0 0 0 0 Observe that δ = 0, α = 0, β = 1, and a2 + b2 + c2 = 1. By observation,     c 0          −b   0  U1 =   , U2 =   (103)      a   0      0 1

It is easily verified that SU1 = −0U2 = 0 and SU2 = 0U1 = 0. When (b, c) 6= (0, 0), we may choose   b     1  c  U3 = √   (104) b2 + c2    0    0

This unit-length vector is perpendicular to U1 and U2. Let ek be the vector with a 1 in component k and 0 in all other components. We may then compute     −ac e e e   1 2 3     1  ab  U4 = det  c −b a  + 0e4 = √   (105)   b2 + c2  2 2     b + c  √ b √ c 0   b2+c2 b2+c2 0

22 When (b, c) = (0, 0), it must be that a = 1, in which case we may choose     1 0          0   1  U3 =   , U4 =   (106)      0   0      0 0 so that P is the identity matrix.

In either case for (b, c), it is easily verified that SU3 = −1U4 and SU4 = 1U3.

5.4.2 The Case (a, b, c) = (0, 0, 0)

The condition (a, b, c) = (0, 0, 0) implies that the skew-symmetric matrix is   0 0 0 d      0 0 0 e  S =   (107)    0 0 0 f    −d −e −f 0

Observe that δ = 0, α = 0, β = 1, and d2 + e2 + f 2 = 1. By observation,     d 0          e   0  U3 =   , U4 =   (108)      f   0      0 1

It is easily verified that SU3 = −1U4 and SU4 = 1U3. When (d, e) 6= (0, 0), we may choose   e     1  −d  U1 = √   (109) d2 + e2    0    0

This unit-length vector is perpendicular to U3 and U4. Let ek be the vector with a 1 in component k and 0 in all other components. We may then compute     df e e e   1 2 3     1  ef  U2 = det  d e f  + 0e4 = √   (110)   d2 + e2  2 2     −d − e  √ e √ −d 0   d2+e2 d2+e2 0

23 When (d, e) = (0, 0), it must be that f = 1, in which case we may choose     1 0          0   1  U1 =   , U2 =   (111)      0   0      0 0

so that P is the identity matrix.

5.4.3 The Case (d, e, f) 6= 0, (a, b, c) 6= 0, and δ = 0

Consider the case 0 = δ = af − be + cd = (c, −b, a) · (d, e, f); then α = 0 and β = 1. Let ek be the vector with a 1 in component k and 0 in all other components. We may formally construct an eigenvector for β = 1 using the determinant,   e1 e2 e3 e4      −i a b d  V = det      −a −i c e    (112) −b −c −i f

= [(−d) + (bf + ae)i]e1 + [(−e) + (cf − ad)i]e2+ 2 2 2 [(−f) − (bd + ce)i]e3 + [(a + b + c − 1)i]e4

Using the results of Section 1.3, V = X + iY with     −d bf + ae          −e   cf − ad  X =   , Y =   (113)      −f   −bd − ce      0 a2 + b2 + c2 − 1

The vectors are orthogonal and have the same length. To see this, we know that SX = −βθY and SY = βθX. First, XTSX = −θXTY (114) The left-hand side is a scalar, so its transpose is just the scalar itself,

XTSX = (XTSX)T = XTSTX = −XTSX (115)

The only way a scalar and its negative can be the same is when the scalar is zero. Thus, XTSX = 0, which implies XTY = 0, so the vectors are orthogonal. Second,

θXTX = XTSY = (XTSY)T = YTSTX = −YTSX = θYTY (116)

which implies |X| = |Y|.

24 Two real-valued eigenvectors for ±i that are orthogonal and unit-length are the normalized −X and −Y,     d −ae − bf         1  e  1  ad − cf  U3 =   , U4 =   (117) pd2 + e2 + f 2   pd2 + e2 + f 2    f   bd + ce      0 d2 + e2 + f 2

where we have used a2 + b2 + c2 + d2 + e2 + f 2 = 1. The condition δ = 0 allows us to choose   c     1  −b  U1 = √   (118) a2 + b2 + c2    a    0

It is easily verified that U1 is perpendicular to both U3 and U4. We may choose the final vector using   e1 e2 e3 e4    d e f 0  −√1   U2 = 2 2 2 2 2 2 det   (d +e +f ) a +b +c  2 2 2   −ae − bf ad − cf bd + ce d + e + f    c −b a 0   (119) ae + bf    −ad + cf  √ 1   = 2 2 2   a +b +c    −bd − ce    a2 + b2 + c2

5.4.4 The Case (d, e, f) 6= 0, (a, b, c) 6= 0, and δ 6= 0

In our construction when the rotation is not the identity matrix, we know that β > 0. The eigenvalues iβθ and −iβθ each have 1-dimensionsal eigenspaces. The eigenvectors for iβθ must satisfy the equation (S − iβθI)V = 0. Because the eigenspace is 1-dimensional, the matrix S − iβθI has 3. Equivalently, the matrix S/θ−iβI has rank 3. Let ek be the vector with a 1 in component k and 0 in all other components.

25 We may formally construct an eigenvector using the determinant,   e1 e2 e3 e4      −iβ a b d  V = det      −a −iβ c e    (120) −b −c −iβ f 2 2 = [(cδ − dβ ) + β(bf + ae)i]e1 + [(−bδ − eβ ) + β(cf − ad)i]e2+ 2 2 2 2 2 [(aδ − fβ ) − β(bd + ce)i]e3 + [β(a + b + c − β )i]e4

Using the results of Section 1.3, V = X + iY with     cδ − dβ2 bf + ae      2     −bδ − eβ   cf − ad  X =   , Y = β   (121)  2     aδ − fβ   −bd − ce      0 a2 + b2 + c2 − β2

The vectors are orthogonal and have the same length. To see this, we know that SX = −βθY and SY = βθX. First, XTSX = −βθXTY (122) The left-hand side is a scalar, so its transpose is just the scalar itself,

XTSX = (XTSX)T = XTSTX = −XTSX (123)

The only way a scalar and its negative can be the same is when the scalar is zero. Thus, XTSX = 0, which implies XTY = 0, so the vectors are orthogonal. Second,

βθXTX = XTSY = (XTSY)T = YTSTX = −YTSX = βθYTY (124)

which implies |X| = |Y|. We may choose X Y U = , U = (125) 3 |X| 4 |Y|

We will construct vectors U1 and U2 for α next. The Case α < β. The same construction applies to α as it did to β, producing     cδ − dα2 bf + ae      2    0  −bδ − eα  0  cf − ad  X =   , Y = α   (126)  2     aδ − fα   −bd − ce      0 a2 + b2 + c2 − α2

and X0 Y0 U = , U = (127) 1 |X0| 2 |Y0|

26 √ The Case α = β. When α = β = 1/ 2, it must be that ρ = 0, δ = σ/2 for σ ∈ {−1, +1},(d, e, f) = σ(c, −b, a), and a2 + b2 + c2 = 1/2 = d2 + e2 + f 2. The skew-symmetric matrix is   0 a b σc      −a 0 c −σb  S = θ   (128)    −b −c 0 σa    −σc σb −σa 0 √ 2 2 ˆ Observe√ that S + (θ /2)I = 0. Define S = S/θ. When solving for eigenvectors for iα =√i/ 2, we solve ˆ ˆ (S − (i/ 2)I)V = 0. Because the eigenspace is 2-dimensional, the√ first two rows of S − (i/ 2)I are linearly independent and must be orthogonal to the eigenvectors of i/ √2. However, this means that the first two rows are in the orthogonal√ complement of the eigenspace for i/ 2, so these rows are linearly independent eigenvectors for −i/ 2. √ Consider the first row of Sˆ−(i/ 2)I. The real and imaginary parts are the candidates we want for producing U3 and U4. Specifically, the normalized real and imaginary part are     0 −1         √  a   0  U3 = 2   , U4 =   (129)      b   0      σc 0

√ A similar argument applied to (−b, −c, −i/ 2, σa) leads to     −b 0         √  −c   0  U1 = 2   , U2 =   (130)      0   −1      σa 0

5.4.5 A Summary of the Factorizations

In all cases, we have assumed a2 + b2 + c2 + d2 + e2 + f 2 = 1 so that Sˆ 6= 0. The factorization is   0 α 0 0      −α 0 0 0  T Sˆ = P   P (131)    0 0 0 β    0 0 −β 0

27 The case (d, e, f) = (0, 0, 0) and (b, c) 6= (0, 0):     √ b √−ac 0 a b 0 c 0 2 2 2 2    b +c b +c   −a 0 c 0   −b 0 √ c √ ab     2 2 2 2  Sˆ =   ,P =  b +c √ b +c  (132)    2 2   −b −c 0 0   a 0 0 b + c      0 0 0 0 0 1 0 0

The case (d, e, f) = (0, 0, 0) and (b, c) = (0, 0):     0 1 0 0 0 0 1 0          −1 0 0 0   0 0 0 1  Sˆ =   ,P =   (133)      0 0 0 0   1 0 0 0      0 0 0 0 0 1 0 0

The case (a, b, c) = (0, 0, 0) and (d, e) 6= (0, 0):     √ e √ df 0 0 0 d 2 2 2 2 d 0    d +e d +e   0 0 0 e   √ −d √ ef e 0     2 2 2 2  Sˆ =   ,P =  d +e √ d +e  (134)    2 2   0 0 0 f   0 − d + e f 0      −d −e −f 0 0 0 0 1

The case (a, b, c) = (0, 0, 0) and (d, e) = (0, 0):     0 0 0 0 1 0 0 0          0 0 0 0   0 1 0 0  Sˆ =   ,P =   (135)      0 0 0 1   0 0 1 0      0 0 −1 0 0 0 0 1

The case (a, b, c) 6= (0, 0, 0), (d, e, f) 6= (0, 0, 0), and δ = 0:   √ c √ ae+bf √ d √−ae−bf a2+b2+c2 a2+b2+c2 2 2 2 2 2 2  d +e +f d +e +f   √ −b √−ad+cf √ e √ ad−cf   2 2 2 2 2 2   a +b +c a +b +c d2+e2+f 2 d2+e2+f 2  P =   (136)  √ a √−bd−ce √ f √ bd+ce   a2+b2+c2 a2+b2+c2 d2+e2+f 2 d2+e2+f 2   √  0 a2 + b2 + c2 0 pd2 + e2 + f 2

28 The case (a, b, c) 6= (0, 0, 0), (d, e, f) 6= (0, 0, 0), δ = 0, α < β:

 2  cδ−dα2 α(bf+ae) cδ−dβ β(bf+ae) ` ` m m    −bδ−eα2 α(cf−ad) −bδ−eβ2 β(cf−ad)   ` ` m m  P =   (137)  aδ−fα2 α(−bd−ce) aδ−fβ2 β(−bd−ce)   ` ` m m   α(a2+b2+c2−α2) β(a2+b2+c2−β2)  0 ` 0 m

where ` = p(cδ − dα2)2 + (bδ + eα2)2 + (aδ − fα2)2 and m = p(cδ − dβ2)2 + (bδ + eβ2)2 + (aδ − fβ2)2. The case (a, b, c) 6= (0, 0, 0), (d, e, f) 6= (0, 0, 0), δ = 0, α = β:   0 a b σc      −a 0 c −σb  Sˆ =   (138)    −b −c 0 σa    −σc σb −σa 0

and  √  −b 2 0 0 −1  √ √     −c 2 0 a 2 0  P =  √  (139)    0 −1 b 2 0   √ √  σa 2 0 σc 2 0

References

[Bra83] Martin Braun. Differential Equations and Their Applications. Springer-Verlag, New York, NY, 3rd edition, 1983. [Ebe03] David Eberly. Game Physics. Morgan Kaufmann, San Francisco, CA, 2003.

[HJ85] Roger A. Horn and Charles R. Johnson. . Cambridge University Press, Cambridge, England, 1985. [HS74] Morris W. Hirsch and Stephen Smale. Differential Equations, Dynamical Systems, and . Academic Press, Orlando, FL, 1974.

29