<<

MATH 235.9 WRITTEN HW 4: APPLICATIONS

ANDREW J. HAVENS

This is a collection of short projects rather than a regular problem set. It will be due on the last day of class. Please choose at least three to work on. It is recommended you discuss these with your classmates, or work in groups as needed; however each person still must submit their individual solutions, written up in an organized way (keep problems from a given topic together!) Any additional work may be counted as bonus credit towards your written assignment average. The topics are: linear ciphers, matrices and groups, graph theory, finite vector spaces, linear recurrence relations, and quaternions.

Linear Ciphers.

These problems illustrate and give practice with some constructions of linear ciphers, which use linear algebra to obscure plaintext. We will first consider monoalphabetic substitution ciphers, where each letter of plaintext is mapped to a single letter of ciphertext. Affine maps in one dimension can accomplish this, and furnish simple but easy to break examples of classic ciphers. It is advised, should you choose to complete this project, that you write programs to perform the computations in this problem, as they are quite repetitive and easily amenable to coding. If you do work on this, you are welcome to come discuss your code with me. There are 26 basic letters in the english alphabet, however, since 26 is not prime, not every element of integers mod 26 can be inverted multiplicatively (consider: 2(13) ≡ 0 mod 26. Thus, we will work with the field F29 of integers mod 29. We can assign to each letter an integer, and use the remaining elements of F29 for punctuation and the space character. For example, we could use the following scheme:

A B C D E F G H I J K L M N O 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

P Q R S T U V W X Y Z . ? x y 16 17 18 19 20 21 22 23 24 25 26 27 28 0

Figure 1. Our alphanumeric correspondence.

Recall that the arithmetic in F29 works as follows: to take the sum of two elements, simply add the integers, then find the remainder modulo 29, e.g. 14 + 18 = 32 in Z, and 32 ≡ 3 mod 29, so in F29, 14 + 18“=”3 (I will simply write ≡ for “=” henceforth in these problems, where it is clear and implicit that we are working over F29.) Similarly, for multiplication, one takes the product as usual in Z and then reduce modulo 29, e.g. 2 × 16 = 32 ≡ 3. Division ∗ by a ∈ F29 − {0} =: F29 corresponds to solving the congruence ax ≡ 1 mod 29 for x ∈ F29. It may be helpful to create a table of multiplicative inverses (you may look one up, or try to write a program to produce one for general Fp; the extended Euclidean algorithm is useful for this). 1 2 ANDREW J. HAVENS

(1) An early cipher is known as the Caesar cipher or shift cipher, which simply shifts letters by a fixed value (this works even mod 26). For example, shifting every letter by 3 in the above scheme turns the plaintext message “is the kernel nontrivial?” to “lvcwkhcnhuqhocqrqwulyldob”. Note that if we suspected it was a shift cipher, it would be easy to decode if we knew about the above alphanumeric assignment scheme: at worst we check 28 possible shifts to see if anything produced is not babble. If we are more clever, we use that “e” is the most common letter in written english, and that spaces are also common. Letters “c”, “h”, “l” and “q” all occur with the same high frequency (12%) in the ciphertext, so we might might guess that one of these letters is the encipherment of “e” and another of the space character. Observe that “c”∼ 3 and

“h”∼ 8 differ by 5, as do “x y”∼ 0 and “e”∼ 5, so we might assume that a shift of 3 is most probable, and try to decrypt using a shift of −3 ≡ 26 mod 29. ∗ Another variation is a linear cipher, where we fix a ∈ F29 and map the numeric code x of each letter of plaintext to ax. For example, using a = 13, one has the ciphertext “ao ?qg .gbhgk huh?bayamkp”. This is also susceptible to letter frequency analysis. An affine monoalphabetic cipher combines these two into one: the key is a pair ∗ (a, b) ∈ F29 × F29, and the ciphertext is produced by the rule c(x) ≡ ax + b mod 29. Assume that we fix the scheme of alphanumeric assignment above. (a) Use affine encryption with the key (8, 14) to encipher the following plaintext: “What is the fundamental theorem of linear algebra if not the rank nullity theo- rem?”

(b) Assume you are given some cipher text, and suppose you know that “e” enciphers to “o”. Knowing that an affine cipher was used, can you recover the plaintext from the given ciphertext with this information alone? Justify.

(c) Suppose that you know that not only does “e” enciphers to “o” but also “t” enciphers to “q”. Determine the key values a and b, and decipher the following ciphertext: “q.oxy?nuoxzsxcxbmxq ovqhxrzioxq.?vxq.oxy?nuoxzsx?x .bg.xbmxszuip”.

(d) Use frequency analysis or other logical assumptions about the usual form of english text to decipher the following affine ciphertext, and also provide the key pair: “voxosoy kqeqlsqi?.k?qukqeq?pils?qkd.avcen.?vbaksok tdbk e.ikimdscsqnfk?l?ok– –eqikobetlukzs.ock ?oci.klsqi?.kdsvbi.ok?qukp?osdkd.avc?q?laosofk.lxosf?”. Note that this text is padded with a few random characters at the beginning and end. The dashes before and after the line break indicate that there is no space in the original ciphertext, and are inserted to ensure that the text fits on the page.

It should be clear from the above that a message is not very secure when enciphered by these methods. Among the obvious flaws: choosing to include spaces and punc- tuation makes frequency analysis much easier. Generally, any monoalphabetic substi- tution cipher, even if it does not encrypt spaces or punctuation, is highly susceptible to frequency analysis provided a sufficient amount of plaintext is available. We thus will consider using linear algebra to encrypt things polyalphabetically, where a given plaintext character may be enciphered to different ciphertext values across the same message. MATH 235.9 WRITTEN HW 4: APPLICATIONS 3

(2) One of the most famous and basic ways to encrypt plaintext polyalphabetically is the family of ciphers known as Hill ciphers. (INSERT HISTORY) These are linear ciphers in more than one dimension. For the Hill cipher of size n over F29, one chooses an invertible square matrix A ∈ Matn×n(F29), and wraps the numerically encoded plain text into an n × m matrix P, where m depends on the length of the message and any padding used, and one computes the cipher matrix C = AP. As a simple example, we could take our key to be the matrix A ∈ Mat4×4(F29) given by  4 17 5 1   15 24 12 11  A =   .  28 2 7 13  3 6 21 0 Then if the plaintext message we wished to encode was “Wizards pluck joy from the quivering box”1, then we’d have plaintext matrix  23 18 16 11 25 15 8 21 18 0   9 4 12 0 0 13 5 9 9 2  P =   .  26 19 21 10 6 0 0 22 14 15  1 0 3 15 18 20 17 5 7 24 Note that the text is wrapped here column to column from left to right and top to bottom. Other schemes could be used to add a layer of difficulty in recovering the text (such techniques amount to composing the Hall cipher method with a transposition cipher, which permutes the arrangement of plaintext or ciphertext). The ciphertext would be produced by computing C = AP mod 29 and reading off the ten columns of C, converting numbers using the table in figure 1.

(a) Use the above encryption matrix to encrypt the given plaintext above, and give the output as text using the table in figure 1.

(b) Find the inverse of the above encryption matrix (you can use Gauss-Jordan, mod 29). This is the decryption key.

(c) Using the above key, decipher the following text:

“kxaglfqtbwviaqxmho.o v?jyfoamyhuwcvfkhus”

Bonus : Hill ciphers are still susceptible to various forms of analysis, and are computationally expensive enough that they did not catch on during the era when they were less vulnera- ble to machine-assisted analysis. With modern computers, Hill’s cipher, and variations thereof can be rapidly implemented, but also more readily analyzed and deciphered. Try to decrypt the following text, which was encrypted using a dynamic variation of a Hill cipher of size 2: “o?bawcxikerlmopk?srgakhiebtojydvvhcxzb.pyyvjcs tos gveunffktbwolyracgfslsjjahrob”

1This sentence is constructed so as to have letter frequencies which do not match the average; every consonant appears exactly once, and note also that there are only two “e”s, just as many “o”s and “u”s and even more “i”s. 4 ANDREW J. HAVENS

Groups of Matrices.

These problems explore the definitions of some groups of matrices. Groups are the natural mathematical objects to consider when studying symmetries. A is a mathematical set G, together with a binary product, µ: G × G → G called multiplication such that (i.) The multiplication is associative: µ(g, µ(h, k)) = µ(µ(g, h), k) for any g, h, k ∈ G. We often don’t write the µ, writing simply gh for µ(g, h), so the associative identity is captured by the more familiar equation g(hk) = (gh)k, for any g, h, k ∈ G.

(ii.) There is a unique identity element e ∈ G such that eg = g = ge for any g ∈ G.

(iii.) The multiplication is invertible: for each g ∈ G, there is a g−1 such that g−1g = e = gg−1. If the product is commutative the group G is said to be a commutative or Abelian group. Some simple examples of Abelian groups are C, R, Q, or Z with addition, C−{0} with complex multiplication, R − {0} with real multiplication, any vector space with vector addition, any field with its field addition . . . We want to construct groups important to linear algebra, which are not necessarily Abelian. These groups will be collections of matrices which are closed under matrix products, and whose elements are all invertible.

(1) Though the following construction works over any field, to be user friendly let us work over the reals. Define

GLn(R) := {A ∈ Matn×n(R) | det A 6= 0} .

Show that GLn(R) is a group with matrix multiplication as the product. In particular, you should check that the matrix product of two elements of GLn(R) is itself in GLn(R), and that matrix multiplication is associative. You should specify the identity element, and explain how elements of GLn(R) meet axiom (iii.) of being a group. This group can be thought of as the group of linear automorphisms of Rn, and is called the of n × n real matrices. All our other real n × n matrix groups will be subgroups of GLn(R) (i.e. are groups contained in GLn(R), with the inherited operation of matrix multiplication).

(2) Is GLn(R) ⊂ Matn×n(R) a vector subspace? Why or why not?

τ (3) Define O(n, R) := {A ∈ Matn×n(R) | A A = In}. Show that A ∈ O(n, R) if and only if the columns of A form an orthonormal basis of Rn with respect the usual inner product structure given by the dot product. Moreover, show that O(n, R) ⊂ GLn(R) is a group. We call O(n, R) the orthogonal group of n×n real matrices. Prove that if A ∈ O(n, R), then det A = ±1.

(4) Define the SLn(R) = {A ∈ GLn(R) | det A = 1}. Define SO(n, R) := O(n, R) ∩ SLn(R), called the special orthogonal group of n × n real matrices We saw in class that SO(2, R) can be identified with the S1 ⊂ C of unit complex ∼ 2 numbers, which describe rotations of C = R about the origin. Check that SO(3, R) is precisely the group of spatial rotations of R3. MATH 235.9 WRITTEN HW 4: APPLICATIONS 5

The above groups are examples of infinite groups which also form spaces over which one can do calculus. For example, the group SO(2, R) can be identified with the unit circle S1 ⊂ R2, which has a well defined notion of tangent space. Such groups are examples of Lie groups, named in honor of Sophus Lie, who became interested in such groups in an attempt to understand smooth symmetries of solutions to differential equations. Next we consider groups describing the discrete set of symmetries of regular polygons. For a group G and some indexing set A, a collection {gα | gα ∈ G, α ∈ A} ⊂ G are said to generate G if every element of G can be written as a product of gαs. In the next exercise, you will use reflections to generate the symmetries of polygons.

2 (5) Let T0 be reflection across the x-axis of R and T1 reflection across the line `1making an angle of π/6 with the x-axis. Show, using explicit matrices, that T0 and T1 generate a group with 12 elements, consisting of 6 reflections, 5 non-trivial rotations, and the identity matrix (which is sometimes regarded as a “trivial rotation”). Argue with a picture that this is the group of all rigid symmetries of a particular collection of regular hexagons centered at 0. Explain why the group can also be generated by just one rotation and one reflection (specify matrices for each.) The group of rigid symmetries of a regular hexagon, with operation being composition of transformations, is called the dihedral group D6.

(6) By using a pair of reflections in lines separated by an angle of π/n, show that one can construct a dihedral group Dn of the 2n symmetries of a regular n-gon. (You need not compute all of the matrices involved, but rather, should argue using a picture and basic facts about reflections and rotations).

(7) Show that the row-swap elementary matrices Eij ∈ GLn(R) generate a group, which gives all permutations of coordinate axes. How many elements are in this group? This group is called the symmetric group or permutation group of n symbols, (in this case, the permutations act on coordinate axes). Symmetric groups on n symbols are denoted by Sn or Sn. (Hints: try to describe how to cyclically permute some set

{ei1 , ei2 ,..., eik } ⊂ {e1,..., en} using transpositions ei ↔ ej.)

Bonus: Show that the group of permutations of the coordinate axes of R3 has 6 elements, and can be mapped bijectively to dihedral group D3 of symmetries of an equilateral triangle, in such a way that a product of matrices in the first group is mapped to the product of images in the second group. Maps preserving products in this way are called group homomorphisms, and bijective homomorphisms are called group isomorphisms. In particular, as groups, the symmetry group of three symbols, S3 is group-isomorphic ∼ to the dihedral group D3, and one can write S3 = D3. There are no isomorphisms between Sn and Dn for n > 3. Can you give a reason why? 6 ANDREW J. HAVENS

Graphs.

These problems begin the exploration of the applications of linear algebra to graph theory. In the mathematical field of study called combinatorics, a directed multigraph consists of the following data: a set V, called the vertex set, whose elements are called vertices, and a multiset 2 E ⊂ V × V, called the edge set, whose elements are called edges. The term directed graph adds the restrictions that E is a set (each pair of distinct vertices has at most one edge between them) and no ordered pair of the form (vi, vi) appears in E (there are no loops on a single vertex). A graph ignores the “directed” nature of the edges, so if (vi, vj) ∈ E, then so is (vj, vi), and we identify these pairs as a single edge (except in describing “walks”, see problem (4)). Below are figures illustrating graphs and directed multigraphs. One writes G = (V, E).

Figure 2. Examples of graphs (undirected, and not multi-).

Figure 3. Examples of directed multigraphs.

The adjacency matrix of an n-vertex (possibly directed) graph, AG ∈ Matn×n(F2) is a matrix whose entries are 1 or 0, depending on whether there is or is not an edge between the corre- sponding vertices. More precisely, suppose G is a (possibly directed) graph with n vertices,

2Recall, that a set does not acknowledge duplicates: e.g. the set {1, 1, 2} is the same as the set {1, 2}, and one generally simply writes the latter. On the other hand, a multiset keeps track of elements as well as how many times they appear. MATH 235.9 WRITTEN HW 4: APPLICATIONS 7 and fix a labeling of these vertices V = {v1, . . . vn}. An adjacency matrix for the graph G with this labeling is an n × n matrix AG whose entries aij satisfy ( 1 if (v , v ) ∈ E , a = i j ij 0 otherwise .

Here, if the matrix is a graph, recall that we identify (vi, vj) ∼ (vj, vi), while if it’s directed, we consider these separate edges, and one or the other may not belong to E. For example,for the graph connecting two vertices, G = {{v1, v2}, {(v1, v2)}}, the adjacency matrix is  0 1  A = , G 1 0 while for the directed graph  G = {v0, v1, v2, v3}, {(v0, v1), (v0, v2), (v0, v3), (v1, v2), (v1, v3), (v2, v3)} depicted below, the adjacency matrix is  0 1 1 1   0 0 1 1  A =   .  0 0 0 1  0 0 0 0

Figure 4. The directed graph described above, with numbering indicating the vertex labeling and arrows indicating the direction of each edge.

For a directed multigraph, the adjacency matrix counts the edges from one vertex to another (paying attention to direction; e.g. if (vi, vj) occurs twice and (vj, vi) never occurs in E, then aij = 2, but aji = 0.) Thus, the adjacency matrix of a directed multigraph contains nonnegative integer entries. For an undirected multigraph, one often counts loops (vi, vi) twice, since the loop contributes two connections to the vertex (see problem (3): this convention allows the degree to be read off easily for multigraphs).

(1) Describe the conditions on a matrix for it to be the adjacency matrix of a graph (undirected). How do adjacency matrices of graphs differ from directed graphs? What is the set of all matrices which can be the adjacency matrix of some (undirected) multigraph? What is the set of matrices which realize adjacency matrices for directed multigraphs? 8 ANDREW J. HAVENS

(2) For each of the graphs depicted above in figure 2, determine an adjacency matrix for some labeling (illustrate your labeling). Similarly, give an adjacency matrix for each of the directed multigraphs in figure 3. Dropping the arrows, give adjacency matrices for the underlying undirected multigraphs from figure 3, and provide an adjacency matrix for the underlying graph of figure 4.

(3) The (total) degree deg vi of a vertex vi in a (multi-) graph is the number of edges connected to that vertex, i.e. the number of ordered pairs in E containing that vertex. If the graph is directed, then there is an in-degree counting pairs/edges ending at the vertex and an out-degree counting pairs/edges beginning at the vertex. There is also a net degree for multigraphs, given by the difference of out degrees and in degrees. Describe how to compute the degree of a vertex from the adjacency matrix, and describe how to generalize this procedure to find in-degree, out-degree and net degree for a directed multigraph. Give the degrees of the vertices (according to your labeling) for each of the graphs in figure 1 and give the in-degree, out-degree, and net degree for each of the vertices in each directed multigraphs from figures 3 and 4.

(4)A walk of length k on a graph G is an ordered collection of k edges, such that each successive edge contains a common vertex with the preceding edge (imagine walking from vertex to vertex along the edges, reading the edges off as you go; you may traverse an edge twice, and indeed in an undirected walk, you may go backwards on an edge you just traveled along; to distinguish the direction in which each edge is traversed we keep track of the order of the vertices in each edge pair in the walk). Let A be the adjacency matrix of a graph. Give an argument explaining why Ak (product over Z, rather than F2) has entries aij equal to the number of length k walks from vertex vi to vertex vj.

(5) A graph is called disconnected if there is a splitting of V into two subsets such that there are no edges between the subsets, otherwise it is called connected. How can one detect connectivity using an adjacency matrix? (Hint: think about walks!)

(6) An alternative matrix which can be formed to describe a graph G is its incidence matrix CG. Choose a labeling of vertices and edges for G. If |V| = n and |E| = m, then CG ∈ Matm×n(F2) has entries cij equal to 1 if the edge ei contains vertex vj, and 0 otherwise. Define the oriented incidence matrix of a directed graph G to be the matrix DG whose entries dij are +1 if the edge ei terminates in the vertex vj, −1 if ei initiates in the vertex vj, and 0 otherwise. (a) Provide incidence matrices for the graphs in figure 2, and provide oriented incidence matrices for the directed (multi-)graphs in figures 3 and the directed graph in figure 4. τ (b) Show that for any graph G,AG = CGCG − 2In. (c) One can make a graph G into a directed graph by choosing an orientation on each n m edge in E. Consider the linear map of R → R determined by DG. Show that, independent of the choice of orientations on edges, the nullity of DG counts the τ connected components of G, and that the nullity of DG counts the number of loops in the graph. MATH 235.9 WRITTEN HW 4: APPLICATIONS 9

(7) We call the multiset of eigenvalues (accounting for multiplicity) of an adjacency matrix A for a graph G the spectrum of the graph G. (a) Argue that the spectrum does not depend on the choice of an adjacency matrix (in particular, it does not depend on a chosen labeling of the vertices of G), and that the spectrum consists of real numbers.

(b) One can define an operator on graphs called a graph Laplacian (which is a discrete analogue of the 2nd order differential operator called the Laplacian in multivariable and vector calculus). Fix a labeling and an orientation on a graph G. Given the oriented incidence matrix DG for this orientation on the graph G, The graph Laplacian is τ LG := DGDG .

Show that LG = Λ − A where Λ is a diagonal matrix with entries λii = deg vi, and A is the adjacency matrix of the underlying (undirected) graph. In particular, the graph Laplacian is independent of the chosen orientation. The multiset of eigenvalues of the graph Laplacian is called the Laplacian spectrum of the graph.

(c) For each of the following graphs, fix a labeling and compute the adjacency and Laplacian spectra:

Figure 5. The square graph and the path on four vertices.

It turns out that the spectra of graphs encode information about connectivity, walks, and robustness of a graph network (e.g. whether removing some num- ber of edges randomly is likely to separate the graph into two components which are disconnected from each other). Thus, graph spectra are an area of active research in theoretical computer science, electrical engineering, and the analytic modeling of networks (be they circuits, machines, utility systems, lattice materi- als/quasicrystals, or social networks). Note that by diagonalizing any diagonaliz- able adjacency matrix A of a graph G, we can rapidly compute powers Ak to count length k walks for very large k.

Bonus : There are two special kinds of walks through a graph: Euler paths and Hamiltonian paths. An Euler path is a walk which visits each edge exactly once and each vertex at least once; there are also Euler cycles, which are Euler paths starting and ending on the same vertex. A Hamiltonian path visits each vertex exactly once, and a Hamiltonian cycle starts at some vertex, and visits each other vertex exactly once before returning to the original vertex. Determine an algorithm to use an adjacency matrix to produce a 10 ANDREW J. HAVENS

list of Euler paths (or cycles) or to indicate that none exists. Try your algorithm out on the graphs in figure 1, and describe how to extend the algorithm to find directed Euler paths or cycles in directed multigraphs before trying the algorithm on the directed multi-graphs of figures 2 and 3. Hamiltonian paths and circuits are more difficult to find. See if you can find a Hamiltonian circuit on the following graph:

Figure 6. Finding a Hamiltonian circuit on this graph is a solution to the fa- mous traveling salesman problem. The above graph represents the edges and vertices of a regular platonic solid called a dodecahedron, which has 12 pentag- onal sides. The game models the earth as a dodecahedron, where each vertex is a city and each edge is a road. Can you find a way for a salesman to start in a home city, visit each other city exactly once traveling along the roads, and return to his home city?

Final remarks: there is a calculus connection. The directed incidence matrix DG is analogous to defining a kind of gradient on the directed graph G. These matrices see applications in the study of circuits and networks. For example, if we attach a value xi to each vertex vi (e.g. voltage at each node of a circuit), then when we compute DGx, we obtain a vector whose τ entries are the differences in voltage across each edge. The Laplacian LG := DGDG is analogous to the continuous (vector) Laplacian, Trace(∇∇). It can be used to describe, e.g. how “heat” diffuses through a network. To read more about applications of these matrices to circuits and networks, see section 8.2 of Gilbert Strang’s Introduction Linear Algebra.

Finite Vector Spaces.

m It can be shown that any finite field has size |Fq| = q = p , where p is a prime number, called the characteristic of the field. Any element α of a characteristic p field has the property that pα = 0, and p is the least positive integer such that this holds. We’ve so far only seen the examples Fp, but larger finite fields can be created by considering certain polynomials over m Fp. In particular, if q = p , then Fq is naturally a vector space over Fp, and its elements can be regarded as certain polynomials over Fp. E.g. = F22 can be identified with the set {0, 1, x, x + 1} where the arithmetic is usual polynomial addition and multiplication modulo MATH 235.9 WRITTEN HW 4: APPLICATIONS 11

2 both 2 and x + x + 1. (Another way to realize Fpm is to identify the elements of Fpm as the pm roots of the polynomial x − x ∈ P(Fpm ), much as one identifies the imaginary units ±i as 2 the roots of x + 1). You will not need to use the exact structure of arithmetic in Fq for these problems, except when q = 2 and m = 1. For problems (1), (2), and (3), fix some prime p and fix some q = pm, and consider the n vector space Fq .

n (1) How many elements are there in Fq , in terms of p, m, and n?

n (2) How many one dimensional subspaces U ⊂ Fq are there in terms of p, m, and n?

n n (3) How many subspaces are there total in Fq (including the trivial subspace 0 ⊂ Fq and n the improper subspace Fq )? The remaining few problems deal with low dimensional vector spaces over the binary field.

2 3 (4) Construct graphical representations of the finite vector spaces F2 and F2, (Hint: think n n of the vectors of F2 as sitting in R ; alternatively, you may construct a graph, with vertices corresponding to vectors, and edges corresponding to the convention that two n vectors are called adjacent in F2 if and only if they differ by an element of the standard basis.)

2 2 (5) Describe all possible F2-linear transformations F2 → F2, constructing a matrix describ- ing each relative to the standard basis.

3 3 (6) How many F2-linear transformations are there of F2 → F2? How many of these maps are bijective?

2 3 (7) Describe all injective linear maps from F2 → F2, and give a geometric/graphical in- 2 terpretation of these. (Hint: can you find “copies” of F2 sitting inside your picture of 3 F2?) Bonus:A subspace lattice is a directed graph defined for a finite vector space V as follows. The vertex set V is the set of all subspaces of V , including the trivial subspace 0 and the improper subspace V . The edge set E contains a pair (U, W ) if and only if U ( W is a proper subspace of the subspace W , and there is no proper subspace Y ( W such 3 that U ( Y ( W . For example, the subspace lattice for F2 is

2 F2

he1i he1 + e2i he2i

0

2 Figure 7. Subspace lattice for F2 3 Construct a subspace lattice for F2, and give its adjacency matrix. 12 ANDREW J. HAVENS

Linear Recurrence Relations.

Let {xk} be a sequence in your favorite field F. A linear recurrence relation for the sequence {xk} of order n is a relation of the form

xn = c · xn−1 , where  xn−1   cn−1  x c  n−2   n−2  n xn−1 =  .  , c =  .  ∈ F ,  .   .  x0 c0 where c is a fixed vector of constants. Note that one can rewrite this condition as

xn = cn−1xn−1 + ... + c0x0 . To completely determine the sequence from an n-th order linear recurrence relation for a given c, one needs initial data, such as n consecutive initialization terms:

 x0  x  −1  n x0 =  .  ∈ F .  .  x−n+1 We begin by investigating a famous example of a sequence arising from a second order recurrence: the Fibonacci sequence. The terms of this sequence are actually integers, but it is useful to regard the vectors and matrices in what follows as being over the reals3. The Fibonacci sequence is defined by the recurrence Fn = Fn−1 + Fn−2 together with the initial data F0 = 1,F−1 = 0.

(1) Let Fn be the vector   Fn Fn = . Fn−1

Express the relationship between Fn and Fn−1 as a matrix equation

Fn = CFn−1 , for some 2 × 2 matrix C, called the companion matrix of the recursion.

(2) What are the columns of Ck for the Fibonacci sequence?

(3) Using the matrix, compute the first 10 terms of the Fibonacci sequence.

(4) More generally describe the companion matrix C for any second order linear recurrence 2 2 xn = c · xn−1, c ∈ R , with initial data x0 ∈ R . Describe xn in terms of x0, C, and n.

√ √ 3 In actuality, we could choose to work over the field extension Q( 5) = {a + b 5 | a, b ∈ Q}, where the operations are those induced√ by the arithmetic as a subfield of R–we don’t need all real√ numbers, just rational linear combinations of 1 and 5, since we will encounter eigenvalues of the form a + b 5. MATH 235.9 WRITTEN HW 4: APPLICATIONS 13

(5) Suppose C is diagonalizable, with distinct eigenvalues λ1 and λ2 and an eigenbasis v1, v2. Using the description from the previous part, describe the solution vector xn using the eigenvalues, eigenbasis, and initial data x0, x−1, and thus give xn as a linear combination of the eigenvalues, related by the initial values x0, x−1. Further show that the eigenvectors may be chosen to be composed of powers of the corresponding eigenvalues.

(6) Use the previous formula to find√ an explicit form for the n-th Fibonacci number. Note that the golden ratio φ = (1 + 5)/2 and its reciprocal appear in the formula. Show that limn→∞ Fn+1/Fn = φ.

(7) Generalize the procedure used in the second order case to give an algorithm to solve n- th order linear recurrences. You will need to consider separately the cases where there m are repeated eigenvalues. (Hint: if an eigenvalue λi is repeated m times, i.e. (λ − λi) m−1 divides the characteristic polynomial of C, then each of λi, nλi, . . . , n λi satisfies the recurrence xn = c · xn−1.)

Quaternions.

3 In calculus, it is common to see the notation {i, j, k} for the standard basis {e1, e2, e3} of R . This problem explores the historic reason for this notation. In particular, we will introduce Hamilton’s quaternions, and use the modern language of linear algebra to understand their legitimacy in a method of representing rotations of three dimensional space. Recall that we can use complex numbers eiθ to represent rotation by an angle θ of the plane counterclockwise about zero, and we can represent the similarity transform of the plane consisting of dilation by a factor of λ and rotation about zero by an angle θ by the complex linear map z 7→ λeiθz. The motivation for the quaternions is the desire to find a system of hypercomplex numbers which can do for three dimensional Euclidean what complex numbers do for two dimensional Euclidean geometry. The story is thus: William Rowan Hamilton desired (obsessively, some might say) to describe a system of hypercomplex numbers which could encode spatial rotations and rigid motions of R3, in much the same way as the algebra of complex numbers can describe planar rigid motions. Alas, no 3D system was ever found (it was later proved that the particular structure Hamilton desired, that of a nontrivial normed division algebra over R, can only exist in dimensions 1,2 and 4, and 8). In 1843 Hamilton developed and described his four dimensional analog of the complex numbers, which he dubbed “the quaternions”. We will initially use matrices to describe√ his quaternion algebra. Recall that the matrix J ∈ Mat2×2(R) which plays the role of −1 ∈ C is  0 −1  J = . 1 0

2 3 4 One can readily check that J = −I2,J = −J, and J = I2.

(1) Consider the complex matrices i, j, k ∈ Mat2×2(C) given by √ √  0 −1   0 −1   −1 0  i = √ , j = , k = √ . −1 0 1 0 0 − −1 14 ANDREW J. HAVENS

Let 1 denote the complex 2 × 2 identity matrix (its entries are the same as the real ∼ 2 matrix In, but the notation will be suggestive of how we identified C = R .) Show that the following identities hold: (?) i2 = j2 = k2 = ijk = −1 ,

(†) ij = k = −ji, jk = i = −kj, ki = j = −ik .

(2) Demonstrate, without using the explicit matrices chosen, that if some algebraic quanti- ties i, j, and k, satisfy the relations (?) such that 1 acts as an identity and −1 = (−1)1, with −1 ∈ R acting as a real scalar, then the relations (†) follow. In particular, the matrices given generate a noncommutative group of order 8. Note the similarity of the relations (†) to cross product identities! Historically, Hamilton discovered the re- lations (?), and the corresponding real algebra described below, without considering matrices or our modern language of linear algebra/cross products. Indeed, Hamilton was so excited upon realizing the relations (?) gave a system of hypercomplex numbers that he reportedly vandalized the Brougham Bridge across Dublin’s Royal Canal (an inspiration to mathematically inclined graffiti artists everywhere).

(3) Define the space of quaternions H to be the R-linear span of 1, i, j, and k:

H := {a1 + bi + cj + dk | a, b, c, d ∈ R} ⊂ Mat2×2(C) . This is naturally a four dimensional real vector space. Representing a quaternion p = a1 + bi + cj + dk by the vector ae1 + be2 + ce3 + de4, and using the relations above, find a real 4 × 4 matrix realizing left multiplication of a quaternion by the quaternion q = w1 + xi + yj + zk, i.e. construct a matrix describing the R-linear map p 7→ qp. Do the same for right multiplication p 7→ pq.

1 (4) Let q = w1 + xi + yj + zk and show that − 2 (q + iqi + jqj + kqk) = w1 − xi − yj − zk. 1 We call the quantity − 2 (q + iqi + jqj + kqk) the conjugate quaternion to q, and write 1 q¯ = − (q + iqi + jqj + kqk) . 2 Show thatq ¯, as a matrix in Mat2×2(C), is obtained by taking the conjugate transpose of the matrix for q (i.e. transpose, and conjugate all complex entries of the complex 2 × 2 matrix for q, and check that this is the matrix ofq ¯.)

(5) There is a notion of quaternion length just as there is a notion of modulus for a complex number: the length |q| of a quaternion q is defined by |q|2 = qq¯ =qq ¯ . Show that this is just the length of the Euclidean four-vector with components equal to the real coefficients of q. Use the quaternion modulus to give a definition of a mul- tiplicative inverse q−1 for any nonzero quaternion q. Caution: we have to be sensitive p to the non-commutativity: it is ambiguous to write q , so instead one must specify whether pq−1 or q−1p is meant, as with matrices. Indeed, check that the matrix for left multiplication by an inverse quaternion is precisely the inverse of the left-multiplication matrix found in (c). MATH 235.9 WRITTEN HW 4: APPLICATIONS 15

(6) Informally, one can express a quaternion as a “sum” of a scalar with a vector:

q = w1 + xi + yj + zk ∼ w + hx, y, zi =: qs + qv . Exploiting this notational device, show that for generic quaternions p and q that

pq = psqs − pv · qv + pvqs + psqv + pv × qv , where “×” is the three dimensional vector cross-product, and “·” is the usual dot prod- uct. One calls the vector part qv of a quaternion a “pure quaternion”– any quaternion with no scalar part is pure, and in analogy with purely imaginary numbers, any pure quaternion p satisfies p = −p¯. Historically, this formula for quaternion multiplication represents the first appearance of the dot and cross products! Owing to Hamilton’s notation, we have our modern 3D vector algebra notation, which describes the vector subspace of pure quaternions, together with a dot (inner) product structure, and a cross product () structure. (7) Let q be a quaternion of unit modulus (often called a ) and let p = −p¯ be a pure quaternion, which thus corresponds to some vector p ∈ R3. Show that the map p 7→ qpq−1 can be interpreted as an R-linear map of R3, and describe the geometric effect of this map on p. Hints: First try to understand the effect when q is i, j, or k. Then show that a θ θ versor can always be written as cos( 2 )1 + sin( 2 )u for some angle θ and some pure unit quaternion u = 0 + u. Using the scalar-plus-vector notation of part (f), try to connect the map p 7→ qpq−1 to the spatial rotation formula given in class.

Some final notes: though the quaternion algebra is four dimensional (in particular, it’s a four dimensional real vector space, together with a noncommutative product structure, and notions of conjugation and length), as you hopefully found above, quaternions can be used to compute 3 dimensional spatial rotations nonetheless! Moreover, they are more computationally efficient than the Rodriguez rotation formula. They also find use in studying the geometry of the three dimensional sphere S3 = {x ∈ R4 | kxk = 1}, which can be regarded as the subalgebra of unit quaternions. It’s easy to check that the product of unit quaternions is again a unit quaternion, and so S3 has a natural multiplication structure. This structure can explain how to compute 3 or 4 dimensional spatial rotations! Note that for modeling 3D spatial rotations, there are two different choices of unit quaternion which yield the same rotation, and thus correspond to a Rodriquez rotation matrix. In the language of groups: there is a 2 : 1 covering homomorphism of SO(3, R) by the unit quaternion group S3. Apart from the inspired name, the concept of quaternions had already existed prior to Hamil- ton’s revelation, for in 1840 Olinde Rodrigues published a largely ignored paper on transfor- mation groups which had effectively given a description of these same objects. And as early as 1819 Gauss had described the same principles in his own personal writings, which remained unpublished until 1900. [See Baez’s Octonions, or Altman’s article Hamilton, Rodrigues, and the Quaternion Scandal.] It is a triumph of linear algebra that we can take constructions of complex numbers and quaternions and firmly ground them in matrix algebra!