Eigenvalue distributions of beta-Wishart matrices

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

Citation Edelman, Alan, and Plamen Koev. “Eigenvalue Distributions of Beta- Wishart Matrices.” Random Matrices: Theory and Applications 03, no. 02 (April 2014): 1450009.

As Published http://dx.doi.org/10.1142/S2010326314500099

Publisher World Scientific Pub Co Pte Lt

Version Author's final manuscript

Citable link http://hdl.handle.net/1721.1/116006

Terms of Use Creative Commons Attribution-Noncommercial-Share Alike

Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ SIAM J. MATRIX ANAL. APPL. c 2013 Society for Industrial and Applied Vol. XX, No. X, pp. XX–XX

EIGENVALUE DISTRIBUTIONS OF BETA-WISHART MATRICES∗

ALAN EDELMAN† AND PLAMEN KOEV‡

Abstract. We derive explicit expressions for the distributions of the extreme eigenvalues of the Beta-Wishart random matrices in terms of the hypergeometric function of a matrix argument. These results generalize the classical results for the real (β = 1), complex (β = 2), and quaternion (β = 4) Wishart matrices to any β > 0.

Key words. random matrix, Wishart distribution, eigenavalue, hypergeometric function of a matrix argument

AMS subject classifications. 15A52, 60E05, 62H10, 65F15

DOI. XX.XXXX/SXXXXXXXXXXXXXXXX

1. Introduction. Recently, the classical real (β = 1), complex (β = 2), and quaternion (β = 4) Wishart random matrix ensembles were generalized to any β > 0 by what is now called the Beta–Wishart ensemble [2, 9]. In this paper we derive the explicit distributions for the extreme eigenvalues and the trace of this ensemble as series of Jack functions and, in particular, in terms of the hypergeometric function of matrix argument. These results generalize the classical expressions for the real and complex cases as series of zonal polynomials [17] and Schur functions [18], respectively. This paper is motivated by the importance of the real and complex Wishart matrices (and in particular their eigenvalue distributions) in multivariate statistical analysis. We refer to the classical text by Muirhead [17], as well as [8, 10, 18]. Recently, it is becoming increasingly apparent that the classical matrix ensembles (and perhaps all of random matrix theory) generalizes from the Dyson’s three-fold way (β = 1, 2, 4) [5] to any β > 0 [6]. Recent examples include the Beta-Hermite [3], Beta–Laguerre [3], Beta–Jacobi [4, 7, 12, 14], and Beta–Wishart ensembles [2, 9]. Our new results are particularly convenient for practical evaluation using our algorithms for the hypergeometric function of matrix argument [13], see section 5.

2. Definitions and background. Since the eigenvalue distributions will be expressed as series of multivariate symmetric polynomials, we start with the relevant definitions.

For an integer k ≥ 0 we say that κ = (κ1, κ2,...) is a partition of k (denoted κ ` k) if κ1 ≥ κ2 ≥ · · · ≥ 0 are integers such that |κ| ≡ κ1 + κ2 + ··· = k. Partitions with t equal parts are denoted as (a)t = (a, a, . . . , a). For two partitions λ and κ we write λ ⊆ κ to indicate that λi ≤ κi for all i.

∗Received by the editors DATE; accepted for publication (in revised form) by EDITOR DATE; published electronically DATE. This research was supported by the National Science Foundation Grant DMS-1016086 and by the Woodward Fund for Applied Mathematics at San Jose State Uni- versity. The Woodward Fund is a gift from the estate of Mrs. Marie Woodward in memory of her son, Henry Teynham Woodward. He was an alumnus of the Mathematics Department at San Jose State University and worked with research groups at NASA Ames. http://www.siam.org/journals/simax/XX-X/XXXXX.html †Department of Mathematics, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139 ([email protected]). ‡Department of Mathematics, San Jose State University, 1 Washington Sq., San Jose, CA 95192 ([email protected]). 1 2 and PLAMEN KOEV

For a partition κ = (κ1, κ2, . . . , κm) and β > 0, the Generalized Pochhammer symbol of parameter β is defined as m κ Y Yi  (i − 1)β  (2.1) (a)(β) ≡ a − + j − 1 . κ 2 i=1 j=1 The multivariate Gamma function of parameter β is defined as m   m(m−1)β Y (i − 1)β (m − 1)β (2.2) Γ(β)(c) = π 4 Γ c − for <(c) > . m 2 2 i=1

(β) For an m × m matrix X, the Jack functions Cκ (X) are symmetric homogeneous (β) polynomials in the eigenvalues x1, x2, . . . , xm of X. The Cκ ’s are indexed by parti- tions κ and form an orthogonal basis of the space multivariate symmetric polynomials. We refer to Stanley [19], Macdonald [16], and Forrester [8] for details and assume the reader has some familiarity with these functions. For integers p ≥ 0 and q ≥ 0, and an m × m complex symmetric matrices X, Y , the hypergeometric function of a matrix argument is defined as

∞ (β) (β) (β) (β) (β) X X 1 (a1)κ ··· (ap)κ Cκ (X)Cκ (Y ) (2.3) pFq (a; b; X,Y ) ≡ · (β) (β) · (β) , k! (b )κ ··· (b )κ Cκ (I) k=0 κ`k 1 q where (a) = (a1, a2, . . . , ap) and b = (b1, b2, . . . , bq). For one matrix argument X,

(β) (β) pFq (a; b; X) ≡ pFq (a; b; X,I) Following [2], an m × m Beta–Wishart matrix with n (n ≥ m) degrees of freedom (β) and covariance matrix Σ (denoted A ∼ Wm (n, Σ)) has joint eigenvalue density nβ m − 2 (n−m+1)β |Σ| Y −1 Y (β) (2.4) λ 2 |λ − λ |β · F (− β Λ, Σ−1), (β) nβ i k j 0 0 2 Km ( 2 ) i=1 j

(β) 3. Identities involving pFq . We present several identities, including a new one, which we need in the next section. First, we incorporate the Vandermonde determinant into a new measure µ to prevent it from appearing in all the integrals that follow:

Y β dµ(X) ≡ |xi − xj| dX. i

(β) (β) Γm (b) Tm (a, b) ≡ (β) (β) ; Γm (a)Γm (b − a) m Γ( iβ ) (β) m! Y 2 cm ≡ m(m−1)β β ; π 4 i=1 Γ( 2 ) m − 1 (3.1) p ≡ β + 1. 2 EIGENVALUES OF BETA–WISHART MATRICES 3

Then, for a nonnegative integer r,

(β) tr X (β) 0F0 (X,I + Y ) = e 0F0 (X,Y );(3.2) (β) r 1F0 (−r; X) = |I − X| ;(3.3) (β) (β) −1 −b 2F1 (a, b; c; X) = 2F1 c − a, b; c; −X(I − X) · |I − X| ;(3.4) (β) (β) (β) Γm (c)Γm (c − a + r) (3.5) 2F1 (a, −r; c; X) = (β) (β) Γm (c − a)Γm (c + r) (β) m−1  × 2F1 a, −r; a − r + 1 + 2 β − c; I − X ; (β) Z (β) Tm (a, b) (β) a−p b−a−p (3.6) 1F1 (a; b; Y ) = (β) 0F0 (X,Y )|X| |I − X| dµ(X); cm [0,1]m Z (β) 1 −tr (X) (β) a−p (3.7) 1F0 (a; Y ) = (β) (β) e 0F0 (X,Y )|X| dµ(X); cm Γm (a) m R+ a+p (β) −1 |Y | (3.8) Cκ (Y ) = (β) (β) (β) cm Γm (a + p) · (a + p)κ Z (β) a (β) × 0F0 (−X,Y )|X| Cκ (X)dµ(X); + Rm (β) mr (β) −1 (β) Γm (p + r) r X X Cκ (−X ) (3.9) 2F0 (p, −r; X) = (β) | − X| . Γm (p) k! k=0 κ`k, κ1≤r

The identities (3.2) and (3.8) were conjectured by Macdonald [15] and proven by Forrester and Baker [1, section 6]. The identities (3.3), (3.4), and (3.5) are due to Forrester: see (13.4) and Propo- sitions 13.1.6 and 13.1.7 in [8], respectively. Note that (3.3) was established for ar- guments X whose eigenvalues are in [0, 1]. When r ≥ 0 is an integer, however, (3.3) is true for any X—we have polynomials on both sides,1 which must be identical everywhere. The integrals (3.6) and (3.7) are due to Macdonald [15, (6.20), (6.21)]. For com- pleteness, we repeat his argument here. The integral (3.6) follows directly from Kadell’s extension of Selberg’s integral [11, Theorem 1]

Z (β) (β) (β) Cκ (X) a−p b−a−p cm (a)κ (3.10) (β) |X| |I − X| dµ(X) = (β) · (β) [0,1]m Cκ (I) Tm (a, b) (b)κ

1 (β) by multiplying both sides by |κ|! Cκ (Y ) and summing over all κ. To deduce (3.7), we change variables yi = Nxi, i = 1, 2, . . . , m, in (3.10), where N ≡ b − p, to obtain

m Z (β) (β) (β) Cκ (Y ) a−p Y yi N ma+|κ| cm (a)κ (β) |Y | 1 − N dµ(Y ) = N · (β) · (β) m C (I) T (a, N + p) (N + p) [0,N] κ i=1 m κ (β) ma+|κ| (β) (β) (β) Γm (N + p − a)N = Γm (a)cm (a)κ (β) (β) . Γm (N + p)(N + p)κ

1 (β) In the series for 1F0 on the left, if κ has any part greater than r or has more than m parts, (β) (β) then (−r)κ = 0 or Cκ (X) = 0, respectively. Either way the κ term in the series is 0. 4 ALAN EDELMAN and PLAMEN KOEV

We take limits as N → ∞. The identity Γ(z + 1) = zΓ(z) along with the definition (2.2) imply

(β) Γm (N + p − a) ma (3.11) lim (β) N = 1. N→∞ Γm (N + p)

N −|κ| (β) yi −yi From (2.1), limN→∞ N (N + p)κ = 1. Also, limN→∞ 1 − N = e for i = 1, 2, . . . , m, and therefore

Z (β) Cκ (Y ) a−p −tr Y (β) (β) (β) (β) |Y | e dµ(Y ) = Γm (a)cm · (a)κ . [0,∞]m Cκ (I)

1 (β) Multiplying both sides by |κ|! Cκ (X) and summing over all κ, we get (3.7). Finally, (3.9) is a new result which we now prove. Theorem 3.1. The identity (3.9) holds for any nonnegative integer r. (β) (β) −1 r Proof. We combine (3.4), 2F1 (p, −r; c; I − X) = 2F1 (c − p, −r; c; I − X )|X| with (3.5) to get

(β) Γm (c − p + r) (β) (m−1)β (β) · 2F1 (p, −r; p − r + 1 + 2 − c; X) Γm (c − p) (β) Γm (p + r) (β) (m−1)β −1 r = (β) · 2F1 (c − p, −r; −p − r + 1 + 2 ; X ) · |X| . Γm (p)

(m−1)β mr We replace X by −cX, use the fact that p = 2 + 1, and move a c factor to the left to get:

Γ(β)(c − p + r) (3.12) m · F (β)(p, −r; −r − c; −cX) (β) mr 2 1 Γm (c − a) · c (β) Γm (p + r) (β) 1 −1 r = (β) · 2F1 (c − p, −r; −r; − c X ) · | − X| . Γm (p)

We take limits on both sides as c → ∞. From [8, (13.5)]

(β) (β) lim 2F (p, −r; −r − c; −cX) = 2F (p, −r; X); c→∞ 1 0

mr (β) −1 (β) 1 −1 X X Cκ (−X ) lim 2F (c − p, −r; −r; − X ) = . c→∞ 1 c k! k=0 κ`k, κ1≤r

Combining this with (3.11) above, we obtain (3.9). 4. The distributions of the extreme eigenvalues and trace. We start with the extreme eigenvalues. m−1 n−m+1 Theorem 4.1. Let p = 2 β + 1 and let t ≡ 2 β − 1 be a nonnegative (β) integer. For the extreme eigenvalues of a Wishart matrix A ∼ Wm (n, Σ) we have:

Γ(β)(p) nβ (4.1) P (λ (A) < x) = m xβ Σ−1 2 F (β) nβ ; nβ + p; − xβ Σ−1 max (β) n  2 1 1 2 2 2 Γm 2 β + p

mt (β) xβ −1 tr (− xβ Σ−1) X X Cκ ( 2 Σ ) (4.2) P (λ (A) < x) = 1 − e 2 . min k! k=0 κ`k, κ1≤t EIGENVALUES OF BETA–WISHART MATRICES 5

Proof. We follow the ideas in Muirhead [17]. For the first part, the condition λmax < x is equivalent to λi ∈ [0, x], i = 1, 2, . . . , n. To obtain the desired distribution, we integrate the joint density (2.4) over [0, x]n, make a change of variables L = xZ, then use dL = xmdZ and (3.6) to obtain

nβ m Z − 2 n−m+1 |Σ| Y β−1 (β) P (λ < x) = λ 2 F (− β L, Σ−1)dµ(L) max (β) nβ i 0 0 2 m [0,x] Km ( 2 ) i=1 − nβ m mnβ 2 Z n−m+1 |Σ| Y β−1 (β) xβ −1 = x 2 z 2 · F (Z, − Σ )dµ(Z) (β) nβ i 0 0 2 m Km ( 2 ) [0,1] i=1 nβ −1 (β) nβ (β) |xΣ | 2 Γ ( )Γ (p) = · c(β) · m 2 m F (β) nβ ; n+m−1 β + 1; − xβ Σ−1, (β) nβ m Γ(β)( n β + p) 1 1 2 2 2 Km ( 2 ) m 2

(β) with cm defined in (3.1). We use (2.2) and (2.5) to simplify

m(m−1)β m mnβ   2 1 π 4 Y β · c(β)Γ(β) nβ  = Γ iβ  = , (β) nβ m m 2 mnβ 2   2 2 Km ( 2 ) 2 (β) mβ  i=1 β Γm 2 implying (4.1). For the second part, the change of variables is L = x(I + Z) with dL = xmdZ:

P (λmin > x)

nβ m − 2 Z |Σ| Y (β) = λt · F (− β L, Σ−1)dµ(L) (β) nβ i 0 0 2 m Km ( 2 ) [x,∞] i=1 − nβ mnβ 2 Z |Σ| t (β) xβ −1 = x 2 |I + Z| · F (I + Z, − Σ )dµ(Z)(4.3) (β) nβ 0 0 2 K ( ) m m 2 R+ n −1 β (β) (β) xβ 2 Z −1 |xΣ | X (−t) C (−Z) (β) tr (− 2 Σ ) κ κ xβ −1 (4.4) = e (β) n 0F0 (Z, − 2 Σ )dµ(Z), Km ( β) m |κ|! 2 R+ κ⊆(mt) where we used (3.2) and (3.3) to go from (4.3) to (4.4). Next, from (3.8),

P (λmin > x)

−1 n β x −1 |xΣ | 2 tr (− 2 βΣ ) (4.5) = e (β) n Km ( 2 β) |κ| (β) Z X (−1) (−t)κ (β) (β) x −1 × Cκ (Z) · 0F0 (−Z, 2 βΣ )dµ(Z) |κ|! m κ⊆(mt) R+ n −1 β (β) x −1 |xΣ | 2 X (−t) tr (− 2 βΣ ) κ (β) 2 x −1 −p (β) (β) (4.6) = e (β) n Cκ (− xβ Σ)| 2 βΣ | cm Γm (p)(p)κ Km ( β) |κ|! 2 κ⊆(mt)

(β) (β) x −1 t tr (− βΣ ) xβ −1 Γm (p) X (−t)κ (p)κ (β) 2 (4.7) = e 2 Σ C (− Σ), 2 Γ(β)( nβ ) |κ|! κ xβ m 2 κ⊆(mt) which along with (3.9) implies (4.2). 6 ALAN EDELMAN and PLAMEN KOEV

(β) x −1 t Note that the sum in (4.2) is 0F0 ( a Σ ), truncated for partitions κ ⊆ (m) , and is thus particularly convenient to evaluate using our algorithms [13]. The following theorem generalizes Theorem 8.3.4 in Muirhead [17] to all β > 0. (β) Theorem 4.2. If A ∼ Wm (n, Σ), then the density of the trace of A is

∞ n xβ x −1 2 β − X 1 β xβ k−1 X n (β) 1 (β) −1 (4.8) βΣ e 2λ · β · · C (I − λΣ ), 2 Γ( nm β + k) 2λ 2λ 2 κ k! κ k=0 2 κ`k where λ is arbitrary. Proof. We follow, Muirhead’s argument, which carries over to any β > 0. From (2.4), the moment generating function of tr A is:

φ(t) = E[etr (tL)] n m − 2 β Z n−m+1 |Σ| −tr (−tL) Y 2 β−1 Y β (β) β −1 = (β) n e λi |λk − λj| 0F0 (−tL, 2t Σ )dL Km ( β) m 2 R+ i=1 j

We make a change of variables L → −tL and use (3.7) to obtain

mn − 2 β − n β (−t) (β) n β −1 φ(t) = |Σ| 2 · F ( β; Σ ) 0 (β) n (β) n 1 0 2 2t cm(β)Γm ( 2 β)Km ( 2 β) n n 2 − 2 β β −1 − 2 β = | − t β Σ| · |I − 2t Σ | n 2t − 2 β = I − β Σ . For 0 < λ < ∞ write

n 2t − 2 β φ(t) = I − β Σ n − 2 β mn n 2t − 2 β −1 − β 1 −1 = 1 − λ |λ Σ| 2 I − (I − λΣ ) β 2t 1 − β λ ! mn − β n (β) n 1 2t  2 −1 − 2 β −1 = 1 − β λ |λ Σ| · 1F0 β; 2t (I − λΣ ) 2 1 − β λ ∞ mn n 2t − 2 β −1 − β X 2tλ −k X 1 n (β) (β) −1 (4.9) = 1 − λ |λ Σ| 2 1 − β C (I − λΣ ), β β k! 2 κ κ k=0 κ`k

−1 2tλ (β) where t and λ are such that kI − λΣ k < 1 − β , so that the 1F0 function above converges. 2tλ −r Since 1 − β ) is the moment-generating function of the gamma distribution 2λ with parameters r and β and density function

− uβ r−1 e 2λ u g 2λ (u) = (u > 0), r, 2λ r β β Γ(r)

the moment-generating function (4.9) can be inverted term by term to obtain (4.8)— the density function of the trace of A, where λ is arbitrary and, as in Muirhead [17, 2λ0λ00 0 00 p. 341], can be chosen as λ = λ0+λ00 , where λ and λ are the largest and the smallest eigenvalues of Σ, respectively. EIGENVALUES OF BETA–WISHART MATRICES 7

CDF of λ max CDF of λ 1 min 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5

F(x) 0.5

0.4 0.4

0.3 0.3

0.2 0.2 theoretical prediction theoretical prediction empirical 0.1 0.1 empirical

0 0 0 5 10 15 20 25 30 35 0 1 2 3 4 5 6 7 8 x x

Fig. 5.1. The empirical distributions from 20,000 replications of the largest and smallest eigenvalues, respectively, of a 3 × 3 Beta–Wishart matrix with 5 degrees of freedom, β = 4/3, Σ = diag(1.3, 1, 0.7), plotted against the theoretical predictions (4.1) and (4.2).

5. Numerical experiments. The theoretical distributions of the extreme eigen- values of the Beta–Wishart ensemble are particularly conveniently evaluated using our algorithm for computing the hypergeometric function of a matrix argument [13]. We performed extensive numerical tests of the theoretical eigenvalue distributions in this paper against the empirically predicted ones by the Beta–Wishart model from our recent paper [2]. The results were always a match. In Figure 5.1 we report one of our experiments in which we plotted the largest and the smallest eigenvalues of the same Beta–Wishart ensemble. More experiments are reported in [2]. Acknowledgments. We acknowledge the great insights that we have gathered into the subject of hypergeometric functions with matrix argument from Ian Mac- donald’s unpublished manuscript on the topic [15]. We also thank Iain Johnstone for many useful discussions in developing this work.

REFERENCES

[1] T. H. Baker and P. J. Forrester, The Calogero-Sutherland model and generalized classical polynomials, Comm. Math. Phys., 188 (1997), pp. 175–216. [2] Alexander Dubbs, Alan Edelman, Plamen Koev, and Praveen Venkataramana, The beta–wishart ensemble. Submitted to J. Math. Phys. [3] Ioana Dumitriu and Alan Edelman, Matrix models for beta-ensembles, J. Math. Phys., 43 (2002), pp. 5830–5847. [4] Ioana Dumitriu and Plamen Koev, The distributions of the extreme eigenvalues of beta– Jacobi random matrices, SIAM J. Matrix Anal. Appl., 1 (2008), pp. 1–6. [5] Freeman J. Dyson, The threefold way. Algebraic structures of symmetry groups and ensembles in Quantum Mechanics, J. Math. Phys., 3 (1963), pp. 1199–1215. [6] Alan Edelman, The random matrix technique of ghosts and shadows, Markov Processes and Related Fields, 16 (2010), pp. 783–790. [7] Alan Edelman and Brian D. Sutton, The beta-Jacobi matrix model, the CS decomposition, and generalized singular value problems, Found. Comput. Math., 8 (2008), pp. 259–285. [8] Peter J. Forrester, Log-gases and random matrices, vol. 34 of London Mathematical Society Monographs Series, Princeton University Press, Princeton, NJ, 2010. [9] , Probability densities and distributions for spiked Wishart β-ensembles. .org/abs/1101.2261, 2011. 8 ALAN EDELMAN and PLAMEN KOEV

[10] Iain M. Johnstone, Multivariate analysis and Jacobi ensembles: largest eigenvalue, Tracy- Widom limits and rates of convergence, Ann. Statist., 36 (2008), pp. 2638–2716. [11] Kevin W. J. Kadell, The Selberg-Jack symmetric functions, Adv. Math., 130 (1997), pp. 33– 102. [12] Rowan Killip and Irina Nenciu, Matrix Models for Circular Ensembles, Int. Math. Res. Not., 50 (2004), pp. 2665–2701. [13] Plamen Koev and Alan Edelman, The efficient evaluation of the hypergeometric function of a matrix argument, Math. Comp., 75 (2006), pp. 833–846. [14] Ross A. Lippert, A matrix model for the β-Jacobi ensemble, J. Math. Phys., 44 (2003), pp. 4807–4816. [15] Ian G. Macdonald, Hypergeometric functions I. handwritten notes, 1987-1988. [16] , Symmetric functions and Hall polynomials, Oxford University Press, New York, Sec- ond ed., 1995. [17] R. J. Muirhead, Aspects of multivariate statistical theory, John Wiley & Sons Inc., New York, 1982. [18] T. Ratnarajah, R. Vaillancourt, and M. Alvo, Eigenvalues and condition numbers of complex random matrices, SIAM J. Matrix Anal. Appl., 26 (2004/05), pp. 441–456. [19] Richard P. Stanley, Some combinatorial properties of Jack symmetric functions, Adv. Math., 77 (1989), pp. 76–115.