How to Generate Random Matrices from the Classical Compact Groups, Volume 54, Number 5

How to Generate Random Matrices from the Classical Compact Groups Francesco Mezzadri

ince Random Matrix Theory (RMT) was be able to implement in a code of only a few lines. introduced by Wishart [17] in 1928, it has This is achieved in the section “A Correct and Ef- found applications in a variety of areas of ficient Algorithm”. Secondly, we discuss in detail physics, pure and applied mathematics, the main ideas, which turn out be fairly general probability, statistics, and engineering. and quite interesting, behind this algorithm. SA few examples—far from being exhaustive— An N × N unitary matrix U = (u ) is defined jk include: analytic number theory, combinatorics, by the relation U ∗U = UU ∗ = I, which in terms of graph theory, multivariate statistics, nuclear the matrix elements reads physics, quantum chaos, quantum information, N N statistical mechanics, structural dynamics, and ∗ ujkukl = ukj ukl = δjl and wireless telecommunications. The reasons for the k=1 k=1 (1) X X ever growing success of RMT are mainly two. N N ∗ Firstly, in the limit of large matrix dimension the ujkukl = ujkulk = δjl, statistical correlations of the spectra of a family, kX=1 kX=1 or ensemble, of matrices are independent of the ∗ ∗ where U = (ujk) is the conjugate transpose probability distribution that defines the ensemble, ∗ of U, i.e., ujk = ukj . In this article we will use but depend only on the invariant properties of the symbol to denote complex conjugation, in such a distribution. As a consequence random order to distinguish it from ∗, which is reserved matrices turn out to be very accurate models for a for the conjugate transpose of a matrix. The large number of mathematical and physical prob- constraints (1) simply state that the columns lems. Secondly, RMT techniques allow analytical (rows) of a unitary matrix form an orthonormal computations to an extent that is often impossible basis in CN . The set U(N) of unitary matrices to achieve in the contexts that they are modelling. forms a compact Lie group whose real dimension This predictive ability of RMT is particularly is N2; it is then made into a probability space by powerful whenever in the original problem there assigning as a distribution the unique measure are no natural parameters to average over. invariant under group multiplication, known as Although the advantage of using RMT lies in Haar measure. Such a probability space is often the possibility of computing explicit mathematical referred to as Circular Unitary Ensemble (CUE). and physical quantities analytically, it is some- Usually the correct ensemble to model a par- times necessary to resort to numericalsimulations. ticular situation depends on the symmetries of The purpose of this article is twofold. Firstly, we the problem. Ensembles of unitary matrices are provide the reader with a simple method for gener- constructed in two steps: we first identify a subset ating random matrices from the classical compact U ⊂ U(N) by imposing further restrictions on U; groups that most mathematicians—not necessari- then we assign to U a probability measure with the ly familiar with computer programming —should appropriate invariant properties. As well as U(N), Francesco Mezzadri is a Lecturer in Applied Mathemat- we will discuss how to generate random matrices ics at the University of Bristol, United Kingdom. His email from the orthogonal O(N) and unitary symplec- address is [email protected]. tic USp(2N) groups with probability distributions

592 Notices of the AMS Volume 54, Number 5 given by the respective unique invariant mea- not have any other symmetry. From general con- sures. We shall also consider the two remaining siderations (see Mehta [9] p. 36) we know that U Dyson circular ensembles [2], namely the Circular is always conjugate by a unitary transformation Orthogonal Ensemble (COE) and Circular Symplec- to a symmetric matrix. Therefore, we can always tic Ensemble (CSE). Other symmetric spaces appear choose U so that in the applications [18], but we will not concern (3) U = U t , ourselves with them. t Writing an algorithm to generate random uni- where U denotes the transpose of U. Since there tary matrices that is both correct and numerically are no other symmetries in the problem, this is stable presents some pitfalls. The reason is that the only constraint that we can impose. Therefore, the conditions (1) imply that the matrix elements the appropriate matrices that model this physical O are not independent and thus are statistically cor- system should be symmetric. Let us denote by O related. The main ideas discussed in this article the set of unitary symmetric matrices. If U ∈ it are centered around the QR decomposition and go can be proved (see Meta [9] p. 499) that it admits back to Wedderburn [16], Heiberger [5] (corrected the representation by Tanner and Thisted [15]), Stewart [14], and Dia- (4) U = W W t , W ∈ U(N). conis and Shahshahani [1]. However, the technical This factorization is not unique. Let O(N) be the literature may be difficult to access for a reader group of real matrices O such that OOt = Ot O = without a background in numerical analysis or I and set W ′ = W O. By definition we have statistics, while the implementation of such tech- ′ ′t t t t niques is elementary. Another method discussed (5) U = W W = W OO W = W W . in the literature involves an explicit representation This statement is true also in the opposite direc- 2 of the matrix elements of U in terms of N inde- tion: if W W t = W ′W ′t there exists an O ∈ O(N) pendent parameters (Euler angles) [19], but it does such that W ′ = W O. Therefore, O is isomorphic to not seem to be equally efficient or convenient. the left coset space of O(N) in U(N), i.e., (6) O ≅ U(N)/O(N). Some Examples and Motivations Before discussing how to generate random matri- Since a measure space with total mass equal to ces it is helpful to give a few examples that show one is a probability space, in what follows we shall how they appear in the applications. use the two terminologies interchangeably. An en- In quantum mechanics all the information about semble of random matrices is defined by a matrix space and a probability measure on it. We have an isolated physical system at a given time t0 is found the former; we are left to identify the latter. contained in a state vector ψ0 belonging to a Hilbert space H —in general infinite dimensional. Haar measure, which will be discussed in detail in the section “Haar Measure and Invariance”, pro- The time evolution of ψ0, i.e., its dynamics, is determined by a unitary operator U. In other words, vides a natural probability distribution on U(N); “natural” in the sense that it equally weighs differ- at a time t>t0, ψ0 has evolved into ent regions of U(N), thus it behaves like a uniform (2) ψ = Uψ0. distribution. From the factorization (4) the proba- The fact that U is unitary guarantees that kψk = bility distribution on U(N) induces a measure on O. As a consequence, if W is Haar distributed the kψ0k = 1, which is required by the probabilistic interpretation of quantum mechanics. resulting measure on O will be uniform too. In If the dynamics is complicated—as in heavy nu- the section “A Group Theoretical Interpretation”, clei or in quantum systems whose classical limits we shall see that such a measure is the unique are characterized by chaotic dynamics—writing probability distribution induced by Haar measure O down an explicit expression for U may be hope- on . Therefore, it provides a natural choice to less. Therefore, we can attempt to replace U by model a time reversal invariant quantum system. O a random operator and check if the predictions The space together with this measure is the COE that we obtain are consistent with the empirical ensemble. observations. It is also reasonable to simplify the If a quantum system does not have any sym- problem even further and replace U by a random metry, then there are no restrictions on U(N), and unitary matrix of finite, but large, dimension. Then the natural choice of probability distribution is the main question is: What are the matrix space Haar measure. This is the CUE ensemble. If the and the probability distribution that best model system is invariant under time reversal and has our system? a half-integer spin, then the appropriate ensem- In physics the symmetries of a problem are ble is the CSE. The matrix space of the CSE is S often known a priori, even if the details of the the subset ⊂ U(2N) whose elements admit the dynamics remain obscure. Now, suppose that our representation system is invariant under time reversal but does (7) U = −W JW t J, W ∈ U(2N),

May 2007 Notices of the AMS 593 where complex matrices Z = (zjk); the matrix elements are independent identically distributed (i.i.d.) stan- 0 IN (8) J = . dard normal complex random variables. In other −IN 0 ! words, the probability density function (p.d.f.) of From the factorization (7) the probability distri- zjk is bution on U(2N) induces a measure on S. As 1 2 −|zjk| previously, such a measure is ﬁxed by assigning (12) p(zjk) = e . Haar measure to U(2N). π The set S is isomorphic to a coset space too. The By deﬁnition the matrix entries are statistically in- unitary symplectic group USp(2N) is the subgroup dependent, therefore the joint probability density of U(2N) whose elements obey the relation function (j.p.d.f.) for the matrix elements is t N (9) SJS = J. 1 2 P(Z) = e−|zjk| π N2 Therefore, the matrix U in equation (7) does not j,k=1 ′ Y change if we replace W with W = W S, where N ′ (13) 1 2 S ∈ USp(2N). Similarly, if W and W are such that = exp − z π N2  jk  (10) j,k=1 t ′ ′t ′ X U = −W JW J = −W JW J, W,W ∈ U(2N), 1   = exp (− Tr Z∗Z). then W ′W −1 ∈ USp(2N). Therefore, π N2 (11) S ≅ U(2N)/USp(2N). Since P(Z) is a probability density, it is normalized to one, i.e., The probability distribution of the CSE is the unique invariant measure induced on the coset (14) P(Z)dZ = 1, space (11) by Haar measure on U(2N). ZCN2 From equations (4) and (7) all we need to N where dZ = dxjkdyjk and zjk = xjk + iyjk. generate random matrices in the CUE, COE, and j,k=1 The j.p.d.f. P(Z)Q contains all the statistical infor- CSE ensembles is an algorithm whose output is mation on the Ginibre ensemble. Haar distributed unitary matrices. The rest of Since CN×N ≅ CN2 , we will use the two notations this article will concentrate on generating ran- according to what is more appropriate for the dom matrices from all three classical compact context. Thus, we can write groups U(N), O(N), and USp(2N) with probability distributions given by the respective Haar mea- (15) dµG(Z) = P(Z)dZ

sures. These groups are not only functional to and think of dµG as an infinitesimal volume or constructing matrices in the COE and CSE, but are measure in CN2 . If f : CN×N -→ CN×N , we say that also important ensembles in their own right. In- dµG is invariant under f if deed, the work of Montgomery [11], Odlyzko [12], Katz and Sarnak [6], Keating and Snaith [7, 8], (16) dµG f (Z) = dµG(Z). and Rubinstein [13] has shown beyond doubt that Lemma 1. The measure of the Ginibre ensemble is the local statistical properties of the Riemann zeta invariant under left and right multiplication of Z function and other L-functions can be modelled by by arbitrary unitary matrices, i.e., the characteristic polynomials of Haar distributed (17) random matrices. Over the last few years the pre- dµG(UZ) = dµG(ZV) = dµG(Z), U, V ∈ U(N). dictive power of this approach has brought about impressive progress in analytic number theory Proof. First we need to show that P(UZ) = P(Z); that could not have been achieved with traditional then we must prove that the Jacobian of the map techniques. (See [10] for a collection of review (18) Z ֏ UZ articles on the subject.) (seen as a transformation in CN2 ) is one. Since by ∗ Haar Measure and Invariance definition U U = I, we have 1 Since the algorithm we shall discuss is essentially P(UZ) = exp (− Tr Z∗U ∗UZ) π N2 based on the invariant properties of Haar measure, (19) in this section we introduce the main concepts that 1 = exp (− Tr Z∗Z) = P(Z). are needed to understand how it works. We never- π N2 theless begin with another ensemble: the Ginibre Now, the map (18) is isomorphic to ensemble. Besides being a simpler illustration of (20) X = U ⊕···⊕ U. the ideas we need, generating a matrix in the Gini- bre ensemble is the first step toward producing a Ntimes random unitary matrix. It follows immediately| that{z X}is a N2 × N2 uni- The space of matrices for the Ginibre ensemble tary matrix, therefore |det X| = 1. The proof of is GL(N, C), the set of all the invertible N × N right invariance is identical.

594 Notices of the AMS Volume 54, Number 5 Because the elements of a unitary matrix are It turns out that if the entries of Z are i.i.d. stan- not independent, writing an explicit formula for dard complex normal random variables, i.e., if Z the infinitesimal volume element of U(N) is more belongs to the Ginibre ensemble, then Q is dis- complicated than for the Ginibre ensemble. An tributed with Haar measure (see Eaton [3], p. 234, N × N unitary matrix contains 2N2 real numbers for a proof). Unfortunately, the implementation of and the constraints (1) form a system of N2 re- this algorithm is numerically unstable. However, al equations. Therefore, U(N) is isomorphic to a we may observe that N2-dimensional manifold embedded in R2N2 . Such (23) Z = QR, a manifold is compact and has a natural group structure that comes from matrix multiplication. where R is upper-triangular and invertible. In oth- Thus, an infinitesimal volume element on U(N) er words, the Gram-Schmidt algorithm realizes will have the form the QR decomposition. This factorization is widely used in numerical analysis to solve linear least (21) dµ(U) = m(α ,...,α 2 )dα · · · dα 2 , 1 N 1 N squares problems and as the first step of a par- where α1,...,αN2 are local coordinates on U(N). ticular eigenvalue algorithm. Indeed, every linear Every compact Lie group has a unique (up to an algebra package has a routine that implements arbitrary constant) left and right invariant mea- it. In most cases, however, the algorithm adopt- sure, known as Haar measure. In other words, if ed is not the Gram-Schmidt orthonormalization we denote Haar measure on U(N) by dµH(U), we but uses the Householder reflections, which are have numerically stable. (22) Because of this simple observation, at first one

dµH(VU) = dµH(UW ) = dµH(U), V, W ∈ U(N). might be tempted to produce a matrix in the Ginibre ensemble and then to use a black box Although an explicit expression for Haar measure QR decomposition routine. Writing such a code on U(N) in terms of local coordinates can be is straightforward. For example, if we choose the written down (see Zyczkowski˙ and Kus [19] for SciPy library in Python, we may implement the a formula), we will see that in order to generate following function: matrices distributed with Haar measure we only need to know that it is invariant and unique. from scipy import Haar measure normalized to one is a natural * def wrong_distribution(n): choice for a probability measure on a compact “““A Random matrix with the wrong group because, being invariant under group mul- distribution””” tiplication, any region of U(N) carries the same z = (randn(n,n) + 1j*randn(n,n))/ weight in a group average. It is the analogue of the sqrt(2.0) uniform density on a finite interval. In order to q,r = linalg.qr(z) understand this point consider the simplest exam- return q ple: U(1). It is the set {eiθ } of the complex numbers with modulo one, therefore it has the topology Unfortunately, as Edelman and Rao observed [4], of the unit circle S1. Since in this case matrix the output is not distributed with Haar measure. multiplication is simply addition mod 2π,U(1) is It is instructive to give an explicit example of this isomorphic to the group of translations on S1.A phenomenon. probability density function that equally weighs A unitary matrix can always be diagonalized in any part of the unit circle is the constant density U(N). Therefore, its eigenvalues {eiθ1 ,...,eiθN } lie ρ(θ) = 1/(2π). This is the standard Lebesgue on the unit circle. A classical calculation in RMT measure, which is invariant under translations. (see Mehta [9] pp. 203–205) consists of computing Therefore, it is the unique Haar measure on U(1). the statistical correlations among the arguments Note that it is not possible to define an “un- θ . The simplest correlation function to determine biased” measure on a non-compact manifold. For j is the density of the eigenvalues ρ(θ), or—as example, we can provide a finite interval with a sometimes it is called—the one-point correlation. constant p.d.f. ρ(x), but not the whole real line R, ∞ Since Haar measure is the analogue of a uniform since the integral ρ(x)dx would diverge. −∞ distribution, each set of eigenvalues must have the R same weight, therefore the normalized eigenvalue The QR Decomposition and a Numerical density is Experiment 1 By definition the columns of an N × N unitary (24) ρ(θ) = . matrix are orthonormal vectors in CN . Thus, if we 2π take an arbitrary complex N × N matrix Z of full It is important to point out that equation (24) does rank and apply Gram-Schmidt orthonormalization not mean that the eigenvalues are statistically to its columns, the resulting matrix Q is unitary. uncorrelated.

May 2007 Notices of the AMS 595 Testing (24) numerically is very simple. We generated 10, 000 random unitary matrices using wrong_distribution(n). The density of the eigenvalues of such matrices is clearly not constant (Figure 1(a)). Figure 1(b) shows the histogram of the spacing distribution, which deviates from the theoretical prediction too. This statistic is often plotted because it encodes the knowledge of all the spectral correlations and is easy to determine empirically. For unitary matrices it is deﬁned as follows. Take the arguments of the eigenvalues and order them in ascending order:

(25) θ1 ≤ θ2 ≤ ... ≤ θN . The normalized distances, or spacings, between consecutive eigenvalues are (a) Eigenvalue density N (26) s = (θ − θ ), j = 1,...,N. j 2π j+1 j The spacing distribution p(s) is the probability density of s. (For a discussion on the spacing distribution see Mehta [9] p. 118.) It is worth emphasizing that the QR decomposition is a standard routine. The most commonly known mathematical software packages like Mat- lab, Mathematica, Maple, and SciPy for Python essentially use a combination of algorithms found in LAPACK routines. Changing software would not alter the outcome of this numerical experiment.

A Correct and Efficient Algorithm What is wrong with standard QR factorization routines? Where do they differ from the Gram- Schmidt orthonormalization? Why is the proba- (b) Spacing distribution bility distribution of the output matrix not Haar Figure 1. Empirical histograms of the density measure? of the eigenvalues and of the spacing The main problem is that QR decomposition is distributions compared with the theoretical not unique. Indeed, let Z ∈ GL(N, C) and suppose curves for the CUE. The data are computed that Z = QR, where Q is unitary and R is invertible from the eigenvalues of ten thousand 50 x 50 and upper-triangular. If random unitary matrices obtained from the (27) routine wrong_distribution(n). eiθ1 . iθ iθ Λ =  ..  = diag e 1 ,...,e N ,  eiθN      then is dictated only by the performance and stabil- (28) Q′ = QΛ and R′ = Λ−1R ity of the code. For our purposes, however, the subset of U(N) × T(N), in which the output of are still unitary and upper-triangular, respectively. the QR decomposition is chosen, is fundamental Furthermore, and we need to pay particular attention to it. It is ′ ′ (29) Z = QR = Q R . convenient from a mathematical point of view to Therefore, the QR decomposition defines a multi- introduce a variation of the mapping (30), which valued map is not only single-valued but also one-to-one. In (30) QR : GL(N, C) -→ U(N) × T(N), this way we will not have to refer all the time where T(N) denotes the group of invertible upper- to a specific algorithm. Indeed, the idea is that triangular matrices. we should be able to alter the output of a QR In orderto makethe mapping (30) single-valued, decomposition routine without even knowing the we need to specify the algorithm that achieves the algorithm implemented. factorization. In most applications such a choice We first need

596 Notices of the AMS Volume 54, Number 5 Lemma 2. Equation (29) implies (28), where Λ ∈ Theorem 1. Suppose that the map (32) satisﬁes Λ(N) and Λ(N) denotes the group of all unitary the hypothesis (33). Then it decomposes the mea- diagonal matrices (27). sure (15) of the Ginibre ensemble as

Proof. Equation (29) can be rearranged as (36) dµG(Z) = dµH(Q) × dµΓ (N)(γ). (31) Q−1Q′ = RR′−1. Proof. We have Since U(N) and T(N) are groups, both sides of (37a) equation (31) must belong to U(N) ∩ T(N). By dµG(UZ) = dµG(Z) by lemma 1 deﬁnition the inverse of a unitary matrix U is its (37b) = dµ(UQ,γ) = dµ(Q,γ) by equation (33) conjugate transpose and the inverse of an upper- triangular matrix is upper-triangular. Therefore, (37c) = dµH(Q) × dµΓ (N)(γ) by the uniqueness if a matrix is both unitary and upper-triangular it of Haar measure. must be diagonal, i.e., Λ(N) = U(N) ∩ T(N).

This lemma suggests that, more naturally, in- The choice of the class of representatives that stead of the QR factorization (30) we should we made coincides exactly with outcome of the consider a one-to-one map Gram-Schmidt orthonormalization. The output of (32) QR : GL(N, C) -→ U(N) × Γ (N), standard QR decomposition routines are such that if Z ֏ (Q, R) then UZ ֏ (Q′,R′) with Q′ ≠ UQ where Γ (N) = T(N)/Λ(N) is the right coset space and R′ ≠ R. Therefore, the corresponding map (32) of Λ(N) in T(N). We construct (32) as follows: we does not obey (33) and theorem 1 does not hold. first define it on a class of representatives of Γ (N) We can now give a recipe to create a random using the QR factorization; then we extend it to the unitary matrix with distribution given by Haar whole Γ (N). However, since the QR decomposition measure. is not unique, there is a certain degree of arbi- trariness in this definition. We need to find a map (1) Takean N ×N complex matrix Z whose en- under which the measure of the Ginibre ensemble tries are complex standard normal random induces Haar measure on U(N). The main tool to variables. achieve this goal is the invariance under group (2) Feed Z into any QR decomposition routine. multiplication of Haar measure and its unique- Let (Q, R), where Z = QR, be the output. ness. Thus, our choice of the decomposition (32) (3) Create the following diagonal matrix must be such that if r11 |r11| . (33) Z ֏ (Q, γ) then UZ ֏ (UQ, γ) (38) Λ =  ..  , r with the same γ ∈ Γ (N) and for any U ∈ U(N).  NN   |rNN |  This property implies that left multiplication of Z   where the rjj s are the diagonal elements by a unitary matrix reduces, after the decompo- of R. sition, to the left action of U(N) into itself. But (4) The diagonal elements of R′ = Λ−1R are lemma 1 states that always real and strictly positive, therefore ′ Λ (34) dµG(UZ) = dµG(Z) the matrix Q = Q is distributed with Haar measure. for any U ∈ U(N). As a consequence, if the map (32) satisfies (33) the induced measure on The corresponding Python function is: U(N) will be invariant under left multiplication from scipy import * too and therefore must be Haar measure. def haar_measure(n): How do we construct the map (32)? A class of “““A Random matrix distributed with representatives of Γ (N) can be chosen by fixing the Haar measure””” arguments of the elements of the main diagonal z = (randn(n,n) + 1j*randn(n,n))/ of R ∈ T(N). Let us impose that such elements sqrt(2.0) all be real and strictly positive. Using (28) we can q,r = linalg.qr(z) uniquely factorize any Z ∈ GL(N, C) so that the d = diagonal(r) main diagonal of R has this property. It follows ph = d/absolute(d) that if Z = QR, then q = multiply(q,ph,q) return q (35) UZ = UQR, U ∈ U(N). Ifwe repeatthe numericalexperimentdiscussed This QR decomposition of UZ is unique within the in the section “The QR Decomposition and a Nu- chosen class of representatives of Γ (N). Therefore, merical Experiment”, using this routine, we obtain the resulting map (32) obeys (33). Finally, we arrive the histograms in Figure 2, which are consistent at with the theoretical predictions.

May 2007 Notices of the AMS 597 where 1 is the identity and i1, i2, i3 are the quaternion units; they obey the algebra 2 2 2 (40) i1 = i2 = i3 = i1i2i3 = −1. We can also deﬁne the conjugate of q,

(41) q = a · 1 − bi1 − ci2 − di3, as well the norm (42) kqk2 = q q = q q = a2 + b2 + c2 + d2. When c = d = 0, H reduces to C and q is simply the complex conjugate of q. In analogy with RN and CN —provided we are careful with the fact that multiplication in H is not commutative—we can study the space HN . N Elements in H are N-tuples q = (q1,...,qN ). The (a) Eigenvalue density bilinear map N HN (43) hp, qi = pj qj , p, q ∈ , jX=1 is the analogue of the usual Hermitian inner product in CN , and the norm of a quaternion vector is simply N 2 2 (44) kqk =hq, qi = qj . jX=1

Similarly, GL(N, H) is the group of all the N × N invertible matrices with quaternion elements. The quaternion units admit a representation in terms of the 2 × 2 matrices (45a) 1 0 i 0 0 1 I2 = , e1 = , e2 = (b) Spacing distribution 0 1! 0 −i! −1 0! 0 i Figure 2. Empirical histograms of the density and e = , 3 i 0 of the eigenvalues and of the spacing ! distributions compared with the theoretical where curves for the CUE. The data are computed (45b) 1 ֏ I , i ֏ e , i ֏ e and i ֏ e . from the eigenvalues of ten thousand 50x 50 2 1 1 2 2 3 3 random unitary matrices output of the Thus, q = a · 1 + bi1 + ci2 + di3 is mapped into the function haar_measure(n). complex matrix z w (46a) A = −w z ! where z = a + ib and w = c + id. In addition The Unitary Symplectic Group USp(2N) z −w Up to now we have only considered U(N). The dis- (46b) q ֏ A∗ = . w z cussion for O(N) is identical, except that the input ! matrix of the QR decomposition routine must be Equations (46) generalize to an arbitrary N × N real. Unfortunately, however, for USp(2N) there quaternion matrix Q, which can be represented in are no black box routines that we can use, and we terms of a 2N × 2N complex matrix Q using the must put more eﬀort into writing an algorithm. decomposition

The algebra of unitary symplectic matrices (47) Q ֏ Q = Q0 ⊗I2 +Q1 ⊗e1 +Q2 ⊗e2 +Q3 ⊗e3, can be rephrased in terms of Hamilton’s quater- where Q0, Q1, Q2, and Q3 are arbitrary N × N nions; it is convenient for our purposes to use real matrices. Proceeding in the same fashion, if this formalism. A quaternion q ∈ H is a linear Q ∈ GL(N, H) we deﬁne its conjugate transpose ∗ ∗ ∗ combination Q = (qjk) by setting qjk = qkj . The symplectic group Sp(N) is the subset of ∗ (39) q = a · 1 + bi1 + ci2 + di3, a,b,c,d ∈ R, GL(N, H) whose matrices satisfy the identity S S =

598 Notices of the AMS Volume 54, Number 5 SS∗ = I. Because of the analogy between U(N) and The important fact that we need to pay attention Sp(N), the latter is sometimes called the hyper- to is that H unitary group and is denoted by U(N, ). The (54) Q∗ ֏ Q∗, usefulness of the quaternion algebra lies in if and only if the coefficients of the quaternion Theorem 2. The groups Sp(N) and USp(2N) are units are real numbers. This is a straightforward isomorphic, i.e. consequence of the representation (46a). (48) USp(2N) ≅ Sp(N). Let S ∈ USp(2N) be the complex representation of the quaternion matrix S, but assume that Proof. It is convenient to replace the skew- S∗ is not mapped into S∗. It is still true, however, symmetric matrix J in the definition (9) with that 0 1 (55) S∗ ֏ −Ω St Ω, . −1 0 ..  because equation (50) is only a consequence of matrix manipulations. But since S is unitary sym- Ω  . . .  (49) =  ......  = I ⊗ e2. ∗ Ω t Ω   plectic S = − S , which is a contradiction.    ..   . 0 1    −1 0   The algebra of Sp(N) is the generalization to   Hamilton’s quaternions of the algebra of U(N). This substitution is equivalent to a permutation Therefore, it is not surprising that the discussion of the rows and columns of J, therefore it is sim- in the section “A Correct and Efficient Algorithm” ply a conjugation by a unitary matrix. is not affected by replacing GL(N, C) and U(N) We first prove that if S ∈ Sp(N), then its com- with GL(N, H) and Sp(N) respectively. Thus, since plex representation S belongs to USp(2N). By Sp(N) and USp(2N) are isomorphic, USp(2N) and equation (47) S∗ is mapped to Sp(N) have the same Haar measure dµH. In par- t t t t Ω t Ω (50) S0 ⊗ I2 − S1 ⊗ e1 − S2 ⊗ e2 − S3 ⊗ e3 = − S , ticular, we can introduce the quaternion Ginibre ensemble, which is the set GL(N, H) equipped with which follows from the identities the probability density (51) t t t 1 (A⊗B) = A ⊗B and (A⊗B)(C⊗D) = AC⊗BD, (56) P(Z) = exp (− Tr Z∗Z) = π 2N2 and from the algebra (40) of the quaternion units. 1 N As a consequence, exp − z 2 . 2N2 jk ∗ t π   (52) SS ֏ −S Ω S Ω = I. j,kX=1   Therefore, the matrix S is symplectic. Combining Quaternion matrices can be factorized by the H equations (46b) and (50) gives QR decomposition too: for any Z ∈ GL(N, ) we can always write (53) − Ω St Ω = S∗, (57) Z=QR, thus S ∈ USp(2N). where Q ∈ Sp(N) and R is an invertible and We now need to show that if S ∈ USp(2N) then upper-triangular quaternion matrix. Now, let it is the representation of a matrix S ∈ Sp(N). This statement follows if we prove that S admits (58) a decomposition of the form (47), where S0, S1, S2, Λ(N, H) = Λ ∈ T(N, H) Λ = diag (q1,...,qN ), and S must be real N × N matrices. If this is true, 3 n then the argument of the first part of the proof qj = 1, j = 1,...,N , can simply be reversed. o where T(N, H ) is the group of invertible upper- Let us allow the coefficients a, b, c, and d in triangular quaternion matrices. Furthermore, let the definition (39) to be complex numbers. The Γ (N, H) = T(N, H)/Λ(N, H) be the right coset definitions of conjugate quaternion (41) and con- space of Λ(N, H) inT(N, H). We have the following jugate transpose of a quaternion matrix, however, remain the same. The matrices (45a) form a ba- Theorem 3. There exists a one-to-one map 2×2 sis in C . Therefore, any 2 × 2 complex matrix (59) QR : GL(N, H) -→ Sp(N) × Γ (N, H) can be represented as a linear combination of I2, 2N×2N such that e1, e2, and e3. Thus, any matrix Q ∈ C admits a decomposition of the form (47), but now (60) Z ֏ (Q, γ) and UZ ֏ (UQ,γ), the matrices Q , Q , Q , and Q are allowed to be 0 1 2 3 where γ = Λ(N, H)R. Furthermore, it factorizes complex. In other words, Q is always the repre- the measure dµG of the Ginibre ensemble as sentation of a quaternion matrix Q, but in general the quaternion units have complex coefficients. (61) dµG(Z) = dµH(Q) × dµΓ (N,H)(γ).

May 2007 Notices of the AMS 599 We leave proving these generalizations as an exercise for the reader.

Householder Reflections Theorem 3 provides us with the theoretical tools to generate a random matrix in USp(2N). Howev- er, when we implement these results in computer code, we need to devise an algorithm whose output satisfies the condition (60). The first one that comes to one’s mind is Gram-Schmidt orthonormalization. But given that black box routines for quaternion matrices do not exist on the market and that we are forced to write the complete code ourselves, we may as well choose one that is numerically stable and that as it turns out, requires the same effort. The most common algorithm that Figure 3. Householder reflection in R2. achieves the QR decomposition uses the House- holder reflections. For the sake of clarity, we will discuss this method for O(N); the generalizations is by construction an orthogonal matrix. In equa- to U(N) and Sp(N) are straightforward. tions (67) and (68) H denotes the block matrix Given an arbitrary vector v ∈ Rm, the main idea m

of the Householder reﬂections is to construct a IN−m (69) Hm =e , simple orthogonal transformation Hm (dependent Hm! on v) such that where Hm is deﬁnede in equation (62).

(62) Hmv = kvk e1, The matrix Hm is constructed using elementary geometry. Consider a vector in the plane, v = m where e1 = (1, 0,..., 0) ∈ R . For any real matrix (x1, x2), and assume, for simplicity, that x1 > 0. X = (xjk), HN is determined by replacing v in Furthermore, let uˆ denote the unit vector along equation (62) with the first column of X. The the interior bisector of the angle φ that v makes product HN X will have the structure with the x1-axis, i.e., r ∗ ··· ∗ v + kvk e 11 (70) uˆ = 1 , 0 ∗ ··· ∗   v + kvk e1 (63) H X = , N . . .  . . .  where e1 is the unit vector along the x1-axis.    0 ∗ ··· ∗ The reflection of v along the direction of uˆ is     kvk e1 (see Figure 3). Reflections are distance- where preserving linear transformations, therefore their representations in an orthonormal basis are or- (64) r = kvk = N x2 . 11 j=1 j1 thogonal matrices. In this simple example it can q Then, define the matrix P be constructed from elementary linear algebra: t 1 0 (71) H2(v) = −I + 2uûˆ .   Finally, we obtain (65) HN−1 = , 0 HN−1   (72) H (v)v = kvk e .   2 1 e     It is worth noting that H2 depends only on the where direction of v and not on its modulus. Thus we ′ ′ ′ can rewrite equation (72) as (66) HN−1(v )v = kv k e1. (73) H (vˆ)vˆ = e , In this case v′ is the (N − 1)-dimensional vector 2 1 obtained by dropping the first element of the sec- where vˆ = v/ kvk. ond column of the matrix (63). We proceed in this The generalization to arbitrary dimensions is fashion until the matrix straightforward. For any vector v ∈ Rm, the House- holder reflection is defined as (67) R = H1H2 · · · HN−1HN X t (74) Hm(vˆ) = ∓ I − 2uûˆ , is upper-triangular withdiagonalentries r ,...,r . e e e 11 NN where The product v ± kvk e1 t t t t (75) uˆ = . (68) Q = HN HN−1 · · · H2H1 v ± kvk e1

e e e 600 Notices of the AMS Volume 54, Number 5 Furthermore, we have definitions of the Householder reflections. A suit- able choice for U(N) is (76) Hm(vˆ)vˆ = e1. (83) H (vˆ) = −e−iθ (I − 2uûˆ∗). How do we choose the sign in the right-hand m side of equation (75)? From a mathematical point The unit vector uˆ is of view such a choice is irrelevant: in both cases vˆ + eiθ e (84) uˆ = 1 , H (vˆ) maps v into a vector whose only compo- iθ m kvˆ + e e1k nent different from zero is the first one. However, where v = (x ,...,x ) ∈ Cm and x = eiθ |x |. The numerically it can be important. The square of the 1 m 1 1 matrix H (vˆ) is unitary and denominator in (75) is m 2 (85) Hm(vˆ)vˆ = e1. (77) v ± kvk e1 = 2 kvk (kvk ± x1), Note that the introduction of eiθ in equations (83) where x1 is the first component of v. If x1 is compa- and (84) takes into account both the potential can- rable in magnitude to kvk and negative (positive) cellations and the correct values of the arguments and we choose the plus (minus) sign, then the term of the diagonal elements of the upper-triangular

(78) kvk ± x1, matrix R: equation (85) implies that all the rjj s are real and strictly positive. could be very small and cancellations with signif- For Sp(N) we have icant round-off errors may occur. Therefore, the ∗ Householder transformation to be implemented (86) Hm(vˆ) = −q(I − 2uûˆ ), in computer code should be with t (79) H (vˆ) = − sgn(x ) I − 2uûˆ , vˆ + qe1 m 1 (87) uˆ = , where kvˆ + qe1k where v = (x ,...,x ) ∈ Hm and x = q kx k. Also vˆ + sgn(x1)e1 1 m 1 1 (80) uˆ = . in this case vˆ + sgn(x1)e1 The additional factor of sgn(x ) in the right- (88) Hm(vˆ)vˆ = e1. 1 hand side of equation (79) assures that there is no ambiguity in the sign of the right-hand A Group Theoretical Interpretation side of equation (76). In turn, it guarantees that We now know how to generate random matrices all the diagonal elements of the upper-triangular from any of the classical compact groups U(N), matrix R are positive. This is not the definition O(N), and Sp(N). In order to achieve this goal, we of Householder reflection used in standard QR have used little more than linear algebra. However decomposition routines. Usually, simple and convenient this approach is (after all linear algebra plays a major role in writing most ′ ˆ ˆˆt (81) Hm(v) = I − 2uu , numerical algorithms), it hides a natural group with the same uˆ as in (75). Therefore, theoretical structure behind the Householder reflections, which was uncovered by Diaconis and (82) H′ (vˆ)vˆ = ∓e . m 1 Shahshahani [1]. Indeed, generating a random ma- As a consequence, the signs of the diagonal ele- trix as a product of Householder reflections is only ments of R are random. This is the reason why the one example of a more general method that can output of black box QR decomposition routines be applied to any finite or compact Lie group. Our must be modified in order to obtain orthogonal purpose in this section is to give a flavor of this matrices with the correct distribution. perspective. For the sake of clarity, as before, we Besides being numerically stable, this algorithm will discuss the orthogonal group O(N); the treat- has another advantage with respect to Gram- ment of U(N) and Sp(N) is, once again, almost Schmidt orthonormalization. In most applications identical. of numerical analysis Q need not be computed The need for a more general and elegant ap- explicitly, only Qw does, where w is a specific proach arises also if one observes that there is vector. Generating all the Householder reflections one feature of the QR decomposition that may 2 is an O(N ) process and computing HN w requires not be entirely satisfactory to a pure mathemati- O(N) operations—it just evaluates the scalar prod- cian: Why in order to generate a random point uct (uˆ, w). Successively multiplying HN ,...,H1 into on an N(N − 1)/2-dimensional manifold—O(N) w is an O(N2) process. Therefore, it takes in total in this case—do we need to generate N2 random O(N2) operations to compute Qw. Instead, Gram- numbers? It does not look like the most efficient Schmidt orthonormalization is an O(N3) process. option, even if it is a luxury that can be easily However, if Q is explicitly needed, computing the afforded on today’s computers. product (68) requires O(N3) operations too. We will first show how the key ideas that we The generalizations to U(N) and Sp(N) are want to describe apply to finite groups, as in this straightforward. The only differences are in the setting they are more transparent. Suppose that

May 2007 Notices of the AMS 601 we need to generate a random element g in a ﬁnite Proof. Suppose we construct O ∈ O(N) distribut-

group ΓN . In this context, if ΓN has p elements, ed with Haar measure by factorizing a matrix X in uniform distribution simply means that the prob- the Ginibre ensemble as described in the section

ability of extracting any g ∈ ΓN is 1/p. In addition, “Householder Reﬂections”. The random matrix O we assume that there exists a chain of subgroups is the product of Householder reﬂections (68) and

of ΓN : each factor Hm(vˆm) is a function of the unit vec- m−1 tor vˆm ∈ S only. We need to show that such (89) Γ1 ⊂ Γ2 ⊂···⊂ ΓN . vˆms are independent and uniformly distributed in Sm−1 In practical situations it is often easier to generate for m = 1,...,N. a random element g˜ in a smaller subgroup, say At each step in the construction of the upper- triangular matrix (67), the matrix multiplied by Γm−1 ∈ ΓN , than in ΓN itself; we may also know the m-th Householder reﬂection, i.e., how to take a random representative gm in the left coset Cm = Γm/Γm−1. Now, write the decomposition (96) Xm = Hm(vˆm) · · · HN−1(vˆN−1)HN (vˆN )X, is still in the Ginibre ensemble. All its elements (90) Γm ≅ Cm × Γm−1. are, therefore, i.i.d. normal random variables. This Once we have chosen a set of representatives of is a consequence of the invariance Cm, an element g ∈ Γm is uniquely factorized as (97) dµG(OX) = dµG(X), O ∈ O(N), g = gm g˜, where gm ∈ Cm. If both gm and g˜ are uniformly distributed in Cm and Γm−1 respectively, of the measure of the Ginibre ensemble. Now, then g is uniformly distributed in Γm. vˆm = (x1,...,xm) is constructed by taking the We can apply this algorithm iteratively start- m-th dimensional vector vm obtained by drop- − − + ing from Γ1 and eventually generate a random ping the ﬁrst N m elements of the (N m 1)-th column of X . The components of v are i.i.d. element in ΓN . In other words, we are given the m m decomposition normal random variables. It follows that the p.d.f. of vm is (91) ΓN ≅ CN ×···× C2 × Γ1. m 1 2 (98) P(vm) = exp −xj An element g ∈ ΓN has a unique representation as m/2 π j=1 a product Y 1 m 1 (92) g = g · · · g , = exp − x2 = exp − kv k2 . N 1 π m/2  j  π m/2 m jX=1 where gm is a representative in Cm. If the gms are v   v uniformly distributed in C so is g in Γ . This is Since P( m) depends only on the length of m, m N and not on any angular variable, the unit vector known as the subgroup algorithm [1]. vˆ = v / kv k is uniformly distributed in Sm−1 This technique applies to random permutations m m m and is statistically independent of vˆk for k ≠ m. of N letters. The chains of subgroups is (93) {Id} ⊂ S ⊂···⊂ S , 2 N Theorem 4 is more transparent than relying on

where Sm is the m-th symmetric group. Other the QR decomposition, which seems only a clever examples include generating random positions of technical trick. If nothing else, the counting of

Rubik’s cube and random elements in GL(N, Fp), the number of degrees of freedom matches. In Sm−1 where Fp is a finite field with p elements. fact, the dimension of the unit sphere is For O(N) the decompositions (91) and (92) m − 1. Thus, the total number of independent real are hidden behind the factorization (68) in terms parameters is of Householder reflections. Indeed, the subgroup N N(N − 1) algorithm for O(N) is contained in (99) (m − 1) = . 2 mX=1 Theorem 4. Let vˆ1,..., vˆN be uniformly distribut- Why is theorem 4 the subgroup algorithm for S0 SN−1 ed on ,..., respectively, where O(N)? As we shall see in theorem 5, the factoriza- (94) tion (95) is unique—provided that we restrict to Sm−1 Rm m 2 = vˆm = (x1,...,xm) ∈ j=1 xj = 1 the definition (79) of the Householder reflections.

n m P o This means that is the unit sphere in R . Furthermore, let Hm(vˆ) be N−1 1 the m-th Householder reﬂection deﬁned in equa- (100) O(N) ≅ S ×···× S × O(1), tion (79). The product where 0 (95) O = HN (vˆN )HN−1(vˆN−1) · · · H2(vˆ2)H1(vˆ1) (101) O(1) ≅ S = {−1, 1}. is a random orthogonal matrix with distribution If we proceed by induction, we obtain given by Haar measure on O(N). (102) O(N) = SN−1 × O(N − 1).

602 Notices of the AMS Volume 54, Number 5 Therefore, a matrix O ∈ O(N) admits a unique Corollary 1. Let dµO(N) and dµO(N−1) be the Haar representation as measures on O(N) and O(N−1) respectively. Then

(103) O = HN (vˆN )Ω, (112) dµO(N) = dµSN−1 × dµO(N−1), SN−1 where where dµSN−1 is the uniform measure on .

1 0 What is the meaning of dµSN−1 ? Given that we are dealing with uniform random variables, it is (104) Ω =   0 O quite natural that we end up with the uniform     measure. In this case, however, it has a precise    e  group theoretical interpretation. Left multiplica- and O ∈ O(N − 1). A consequence of theorem 4 tion of the right-hand side of equation (103) by N−1 ′ is that if vˆN is uniformly distributed in S and O ∈ O(N) induces a map on the coset space: O is distributede with Haar measure on O(N − 1), ′ ′ ′ ′ ′′ (113) O HN (vˆ)Ω = HN (vˆ )Ω Ω = HN (vˆ )Ω . then O is Haar distributed too. The final link with Since the decomposition (103) is unique the trans- thee subgroup algorithm is given by formation vˆ ֏ vˆ′ is well defined. This map can be Theorem 5. The left coset space of O(N − 1) in easily determined. A coset is specified by where N−1 O(N) is isomorphic to S , i.e., e1 is mapped, therefore N−1 ′ ′ ′ ′ (105) O(N)/O(N − 1) ≅ S . (114) O HN (vˆ)e1 = O vˆ = vˆ = HN (vˆ )e1. A complete class of representatives is provided by If vˆ is uniformly distributed on the unit circle so is ′ 1 N−1 vˆ = Ovˆ. Thus, dµSN−1 is the unique measure on the the map HN : S -→ O(N), coset space O(N)/O(N −1) invariant under the left t − sgn(x1)(I − 2uûˆ ) if vˆ ≠ e1, action of O(N). Its uniqueness follows from that (106) HN (vˆ) = (IN if vˆ = e1, of Haar measure and from the factorization (112). Corollary 1 is a particular case of a theorem where that holds under general hypotheses for topologi- vˆ + sgn(x )e Γ (107) uˆ = 1 1 cal compact groups. Indeed, let be such a group, vˆ + sgn(x1)e1 Ξ a closed subgroup and C = Γ /Ξ. Furthermore, let dµΓ , dµC and dµX be the respective invariant and x1 is the first component of vˆ. measures, then Proof. The group of N × N matrices Ω defined in (115) dµΓ = dµΞ × dµC . equation (104) is isomorphic to O(N − 1). Since

(108) Ωe1 = e1, Acknowledgements O(N − 1) can be identiﬁed with the subgroup of This article stems from a lecture that I gave at the O(N) that leave e invariant, i.e., 1 summer school on Number Theory and Random

(109) O(N − 1) = O ∈ O(N) Oe1 = e1 . Matrix Theory held at the University of Rochester in June 2006. I would like to the thank the organiz- Now, if two matrices O and O′ belong to the same ers David Farmer, Steve Gonek, and Chris Hughes coset, then for inviting me. I am also particularly grateful to ′ (110) Oe1 = O e1 = vˆ Brian Conrey, David Farmer, and Chris Hughes for the encouragement to write up the content of this and vice versa. In other words, cosets are speci- lecture. fied by where e1 is mapped. Furthermore, since kOe1k = 1, we see that they can be identified with the points in the unit sphere. Finally, the References map (106) is one-to-one and is such that [1] P. Diaconis and M. Shahshahani, The sub- (111) H (vˆ)e = vˆ. group algorithm for generating uniform random N 1 variables, Prob. Eng. Inf. Sc. 1 (1987), 15–32. Therefore, HN spans a complete class of represen- [2] F. M. Dyson, The threefold way. Algebraic structure tatives. of symmetry groups and ensembles in quantum mechanics, J. Math. Phys. 3 (1962), 1199–1215. Incidentally, theorem 4 implies [3] M. L. Eaton, Multivariate Statistics: A Vector Space Approach, Wiley and Sons, New York, NY, 1983. [4] A. Edelman and N. R. Rao, Random matrix theory, 1The Householder reflections defined in equation (79) are Acta Num. 14 (2005), 233–297. not continuous at e . Indeed, it can be proved that there 1 [5] R. M. Heiberger, Algorithm AS127. Generation of is no continuous choice of coset representatives. In the random orthogonal matrices, App. Stat. 27 (1978), section “Householder Reflections”, this distinction was su- 199–206. perfluous: if v is randomly generated, the probability that v = αe1 is zero.

May 2007 Notices of the AMS 603 [6] N. M. Katz and P. Sarnak, Random matrices, Frobe- nius eigenvalues, and monodromy, Amer. Math. Soc. Colloquium Publications, 45, Amer. Math. Soc., Providence, RI, 1999. [7] J. P. Keating and N. C. Snaith, Random matrix theory and ζ(1/2 + it), Commun. Math. Phys. 214 (2000), 57–89. [8] , Random matrix theory and L-functions at s = 1/2, Commun. Math. Phys. 214 (2000), 91–110. [9] M. L. Mehta, Random matrices, Elsevier, San Diego, CA, 2004. [10] Recent perspectives in random matrix theory and number theory, LMS Lecture Note Series, 322, (F. Mezzadri and N. C. Snaith, eds.), Cambridge University Press, Cambridge, 2005. [11] H. L. Montgomery, The pair correlation of zeros of the zeta function, Analytic Number Theory: Proc. Symp. Pure Math. (St. Louis, MO, 1972), vol. 24, Amer. Math. Soc., Providence, RI, 1973, pp. 181–93. [12] A. M. Odlyzko, The 1020-th zero of the Riemann zeta function and 70 million of its neighbors, 1989, http://www.dtc.umn.edu/∼odlyzko/ unpublished/index.html. [13] M. Rubinstein, Low-lying zeros of L-functions and random matrix theory, Duke Math. J. 109 (2001), 147–181. [14] G. W. Stewart, The eﬃcient generation of random orthogonal matrices with an application to condition estimators, SIAM J. Num. Anal. 17 (1980), 403–409. [15] M. A. Tanner and R A. Thisted, A remark on AS127. Generation of random orthogonal matrices, App. Stat. 31 (1982), 190–192. [16] R. W. M. Wedderburn, Generating random rota- tions, Research report, Rothamsted Experimental Station (1975). [17] J. Wishart, The generalised product moment distribution in samples from a normal multivariate population, Biometrika 20A (1928), 32–52. [18] M. R. Zirnbauer, Riemannian symmetric super- spaces and their origin in random-matrix theory, J. Math. Phys. 37 (1996), 4986–5018. [19] K. Zyczkowski˙ and M. Kus, Random unitary matrices, J. Phys. A: Math. Gen. 27 (1994), 4235–4245.

604 Notices of the AMS Volume 54, Number 5