Randomized Linear Algebra Approaches to Estimate the Von Neumann Entropy of Density Matrices

Eugenia-Maria Kontopoulou Ananth Grama Wojciech Szpankowski Petros Drineas Purdue University Purdue University Purdue University Purdue University Computer Science Computer Science Computer Science Computer Science West Lafayette, IN, USA West Lafayette, IN, USA West Lafayette, IN, USA West Lafayette, IN, USA Email: [email protected] Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—The von Neumann entropy, named after John von Motivated by the above discussion, we seek numerical Neumann, is the extension of classical entropy concepts to the algorithms that approximate the von Neumann entropy of large field of quantum mechanics and, from a numerical perspective, density matrices, e.g., symmetric positive definite matrices can be computed simply by computing all the eigenvalues of a 3 density , an operation that could be prohibitively expensive with unit trace, faster than the trivial O(n ) approach. Our for large-scale density matrices. We present and analyze two algorithms build upon recent developments in the field of randomized algorithms to approximate the von Neumann entropy Randomized Numerical Linear Algebra (RandNLA), an in- of density matrices: our algorithms leverage recent developments terdisciplinary research area that exploits randomization as a in the Randomized Numerical Linear Algebra (RandNLA) liter- computational resource to develop improved algorithms for ature, such as randomized trace estimators, provable bounds for the power method, and the use of and Chebyschev large-scale linear algebra problems. Indeed, our work here polynomials to approximate matrix functions. Both algorithms focuses at the intersection of RandNLA and information come with provable accuracy guarantees and our experimental theory, delivering novel randomized linear algebra algorithms evaluations support our theoretical findings showing considerable and related quality-of-approximation results for a fundamental speedup with small accuracy loss. information-theoretic metric. I.INTRODUCTION A. Background Entropy is a fundamental quantity in many areas of science We will focus on finite-dimensional function (state) spaces. and engineering. The von Neumann entropy, named after John In this setting, the R represents the statistical von Neumann, is the extension of classical entropy concepts mixture of pure states, and from a linear algebraic perspective to the field of quantum mechanics, and its foundations can be can be written as traced to von Neumann’s work on Mathematische Grundlagen 1 T n×n der Quantenmechanik . In his work, Von Neumann introduced R = ΨΣpΨ ∈ R , (2) the notion of the density matrix, which facilitated the extension where Ψ ∈ n×k is the orthonormal matrix whose columns of the tools of classical statistical mechanics to the quantum R k×k are the k pure states and Σp ∈ R is a diagonal matrix domain in order to develop a theory of quantum measurements. whose entries are the (positive) pi’s. Let h(x) = x ln x for From a mathematical perspective (see Section I-A for de- any x > 0 and let h(0) = 0. Then, using the cyclical property tails) the density matrix R is a symmetric positive semidefinite of the trace and the definition of h(x) eqn. (1) becomes n×n matrix in with unit trace. Let pi, i = 1 . . . n be the X  T  R − pi ln pi = −tr Ψh(Σp)Ψ = −tr (h(R)) = −tr (R ln R) . R R eigenvalues of , in decreasing order; then, the entropy of i,pi>0 (3) is defined as n X The second equality follows from the definition of matrix H(R) = − p ln p . (1) i i functions [3]. i=1 We conclude the section by noting that our algorithms will The above definition is a proper extension of both the Gibbs use two tools that appeared in prior work. The first tool is entropy and the Shannon entropy to the quantum case and the power method, with a provable analysis that first appeared implies an obvious algorithm to compute H(R) by first in [4]. The second tool is a provably accurate trace estimation computing the eigendecomposition of R; known algorithms algorithm for symmetric positive semidefinite matrices that 3 for this task necessitate O(n ) time [1]. Clearly, as n grows, appeared in [5]. such running times are impractical. For example, [2] describes an entangled two-photon state generated by spontaneous para- B. Our contributions metric down-conversion, which can result in a density matrix We present and analyze two randomized algorithms to ap- with n ≈ 108. proximate the von Neumann entropy of density matrices, lever- 1Originally published in German in 1932; published in English under the aging two different polynomial approximations of the matrix title Mathematical Foundations of Quantum Mechanics in 1955. function H(R) = −tr (R ln R): the first approximation uses a Taylor series expansion while the second approximation uses of ill-conditioned matrices. However, the dependency of the Chebyschev polynomials. Both algorithms return, with high algorithm of Theorem 35 of [8] on terms like (log n/)6 or probability, relative-error approximations to the true entropy n1/3nnz(A)+npnnz(A) could blow up the running time of of the input density matrix, under certain assumptions. More the proposed algorithm for reasonably conditioned matrices. specifically, in both cases, we need to assume that the input We also note the recent work in [10], which used Taylor density matrix has n non-zero eigenvalues, or, equivalently, approximations to matrix functions to estimate the log determi- that the probabilities pi, i = 1 . . . n, corresponding to the nant of symmetric positive definite matrices (see also Section underlying n pure states are non-zero. The running time of 1.2 of [10] for an overview of prior work on approximating both algorithms is proportional to the sparsity of the input matrix functions via Taylor series). Finally, the work of [11] density matrix and depends (see Theorems 1 and 3 for precise used a Chebyschev polynomial approximation to estimate the statements) on, roughly, the ratio of the largest to the smallest log of a matrix and is reminiscent of our approach probability p1/pn, as well as the desired accuracy. in Section III and, of course, the work of [2]. From a technical perspective, the theoretical analysis of both II.ANAPPROACHVIA TAYLOR SERIES algorithms proceeds by combining the power of polynomial approximations, either using Taylor series or Chebyschev Our first approach to approximate the von Neumann entropy polynomials, to matrix functions, combined with randomized of a density matrix uses a Taylor series expansion to approxi- trace estimators. A provably accurate variant of the power mate the of a matrix, combined with a relative-error method is used to estimate the largest probability p1. If this trace estimator for symmetric positive semi-definite matrices estimate is significantly smaller than one, it can improve the and the power method to upper bound the largest singular running times of the proposed algorithms (see discussion after value of a matrix. Theorem 1). We note that while the power method introduces A. Algorithm and Main Theorem an additional nnz(R) log(n) term in the running time, in practice, its computational overhead is negligible. Our main result is an analysis of Algorithm 1 (see below) Finally, in Section IV, we present a highlight of our that guarantees relative error approximation to the entropy experimental evaluations of our algorithms on large-scale of the density matrix R, under the assumption that R = Pn T n×n synthetic density matrices, generated using Matlab’s QETLAB i=1 piψiψi ∈ R has n pure states with 0 < ` ≤ pi toolbox [6]. For a 30,000 by 30,000 matrix that was used in our for all i = 1 . . . n. The following theorem is our main quality- evaluations, the exact computation of the entropy takes hours, Algorithm 1 A Taylor series approach to estimate the entropy. whereas our algorithms return approximations with relative n×n errors well below 0.5% in only a few minutes. 1: INPUT: R ∈ R , accuracy parameter ε > 0, failure We conclude by noting that a longer version of this short probability δ, and integer m > 0. abstract is available in [7]. Indeed, we will often reference [7] 2: Estimate p˜1 using the power method of [4] with t = for omitted proofs, omitted experimental evaluations, and O(ln n) and q = O(ln(1/δ)). detailed discussions that are omitted from this version due 3: Set u = min{1, 6˜p1}.  2 to space considerations. 4: Set s = 20 ln(2/δ)/ε . n 5: Let g1, g2,..., gs ∈ R be i.i.d. random Gaussian vec- C. Prior work tors. To the best of our knowledge, prior to this work, the only 6: OUTPUT: return s m > −1 k non-trivial algorithm to approximate the von Neumann entropy 1 X X g R(In − u R) gi Hb (R) = ln u−1 + i . of a density matrix appeared in [2]. Their approach is based s k on the Chebyschev polynomial expansion that we also analyze i=1 k=1 in Section III. However, our analysis is somewhat different and, overall, much tighter, leveraging a provably accurate of-approximation result for Algorithm 1. variant of the power method as well as provably accurate trace estimators to derive a relative error approximation to the Theorem 1. Let R be a density matrix such that all proba- entropy of a density matrix, under appropriate assumptions. A bilities pi, i = 1 . . . n satisfy 0 < ` ≤ pi. Let u be computed brief technical comparison between our results in Section III as in Algorithm 1 and let Hb (R) be the output of Algorithm 1 and the work of [2] appears in Section III-A, while a detailed on inputs R, m, and  < 1; Then, with probability at least 1 − 2δ, comparison can be found in Section 3.3 of [7].

Independently and in parallel with our work, [8] presented Hb (R) − H (R) ≤ 2H (R) , a multipoint interpolation algorithm (building upon [9]) to by setting m =  u ln 1 . The algorithm runs in time compute a relative error approximation for the entropy of a real `   u ln(1/) ln(1/δ)  matrix with bounded condition number. The proposed running O · · nnz(R) + ln(n) · ln(1/δ) · nnz(R) . time of Theorem 35 of [8] does not depend on the condition ` 2 number of the input matrix (i.e., the ratio of the largest to the A few remarks are necessary to better understand the above smallest probability), which is a clear advantage in the case theorem. First, ` could be set to pn, the smallest of the probabilities corresponding to the n pure states of the density positive semidefinite for any k, a fact which will be useful matrix R. Second, it should be obvious that u in Algorithm 1 throughout the proof. Now, could be simply set to one and thus we could avoid calling m m ! p p˜ u X k X −1 k T the power method to estimate 1 by 1 and thus compute . RC /k = Ψ Σp (In − u Σp) /k Ψ , However, if p1 is small, then u could be significantly smaller k=1 k=1 than one, thus reducing the running time of Algorithm 1, which which shows that the matrix of interest is symmetric positive depends on the ratio u/`. Third, ideally, if both p1 and pn semidefinite. Additionally, since RCk is symmetric positive were used instead of u and `, respectively, the running time semidefinite, its trace is non-negative, which proves the second of the algorithm would scale with the ratio p /p . Fourth, inequality in eqn. (4) as well. We proceed to bound ∆2 as 1 n follows: even though the overhead of the power method (the second

∞ ∞ X  k X  m k−m term in the running time expression) appears to dominate, at ∆ = tr RC /k = tr RC C /k 2 k=m+1 k=m+1 least asymptotically, it is negligible in practice. ∞ ∞ X  m k−m  X m  k−m  = tr C C R /k ≤ kC k · tr C R /k (5) 2 B. Proof of Theorem 1 k=m+1 k=m+1

∞ ∞ m X  k−m m X  k = kC k · tr RC /k ≤ kC k · tr RC /k (6) We now prove Theorem 1, which analyzes the performance 2 2 k=m+1 k=1 of Algorithm 1. Our first lemma presents a simple expression !m ∞ ` X  k ≤ 1 − tr RC /k. (7) for H (R) using a Taylor series expansion. u k=1 Lemma 2. Let R ∈ n×n be a symmetric positive definite R 2 matrix with unit trace and whose eigenvalues lie in the interval To prove eqn. (5), we used von Neumann’s trace inequality k−m [`, u], for some 0 < ` ≤ u ≤ 1. Then, Eqn. (5) now follows since C R is symmetric posi- tive semidefinite3. To prove eqn. (6), we used the fact that ∞ −1 k k −1 X tr R(In − u R) tr RC /k ≥ 0 for any k ≥ 1. Finally, to prove eqn. (7), H (R) = ln u + . −1 k we used the fact that kCk2 = kIn −u Σpk2 ≤ 1−`/u since k=1 the smallest entry in Σ is at least ` by our assumptions. We Proof. p The proof follows by further manipulating eqn. (3) also removed unnecessary absolute values since tr RCk /k ln(I − u−1R) = and then applying the Taylor expansion of n is non-negative for any positive integer k. − P∞ u−1Rk/k. See [7] for the full proof. k=1 Combining the bounds for ∆1 and ∆2 gives We now proceed to prove Theorem 1. We will condition our m ∞ k analysis on power method algorithm being successful, which   `   X tr RC Hb (R) − H (R) ≤  + 1 − . happens with probability at least 1 − δ. In this case, u = u k min{1, 6˜p1} is an upper bound for all probabilities pi. For k=1 notational convenience, set C = I − u−1R. We start by n We have already proven in Lemma 2 that manipulating ∆ = H (R) − H (R) as follows: b ∞ k X tr RC −1

m s ∞ ≤ H (R) − ln u ≤ H (R) , X 1 1 X > k X 1  k ∆ = · g RC g − tr RC k i i k=1 k=1 k s i=1 k=1 k

m s m ∞ X 1 1 X > k X 1  k X 1  k ≤ · g RC g − tr RC + tr RC where the last inequality follows since u ≤ 1. Collecting our i i k=1 k s i=1 k=1 k k=m+1 k results, we get     s m m ∞ 1 X > X k X 1 k X  k = gi  RC /k gi − tr  RC  + tr RC /k .   m s k ` i=1 k=1 k=1 k=m+1 H (R) − H (R) ≤  + 1 − H (R) . | {z } | {z } b ∆1 ∆2 u We now bound the two terms ∆ and ∆ separately. We start Setting 1 2   with ∆1: the idea is to apply Theorem 5.2 from [5] on the u 1 Pm k  2 m = ln matrix k=1 RC /k with s = 20 ln(2/δ)/ . Hence, with `  probability at least 1 − δ: x and using 1 − x−1 ≤ e−1 (x > 0), guarantees that (1 − m ! ∞ ! `/u)m ≤  X k X k and concludes the proof of the theorem. We note ∆1 ≤  · tr RC /k ≤  · tr RC /k . (4) that the failure probability of the algorithm is at most 2δ (the k=1 k=1 sum of the failure probabilities of the power method and the trace estimation algorithm). Finally, we discuss the running A subtle point in applying Theorem 5.2 from [5] is that the time of Algorithm 1, which is equal to O(s · m · nnz(R)). m P k  ln(1/δ)   u ln(1/)  matrix k=1 RC /k must be symmetric positive semidefi- T Since s = O 2 and m = O ` , the running nite. To prove this, let the SVD of R be R = ΨΣpΨ , n×n where all three matrices are in R and the diagonal en- 2 P Indeed, for any two matrices A and B, tr (AB) ≤ i σi(A)σi(B), tries of Σp are in the interval [`, u]. Then, it is easy to where σi(A) (respectively σi(B)) denotes the i-th singular value of A −1 −1 T see that C = I − u R = Ψ(I − u Σ )Ψ and (respectively B). Since kAk2 = σ1(A) (its largest singular value), this n n p P k −1 k T implies that tr (AB) ≤ kAk2 σi(B); if B is symmetric positive RC = ΨΣp(In − u Σp) Ψ , where the diagonal entries P i −1 semidefinite, tr (B) = i σi(B). of In − u Σp are non-negative, since the largest entry in Σp 3This can be proven using an argument similar to the one used to prove is upper bounded by u. This proves that RCk is symmetric eqn. (4). time becomes (after accounting for the running time of the on inputs R, m, and  < 1; Then, with probability at least power method) 1 − 2δ,

 u ln(1/) ln(1/δ)  O · · nnz(R) + ln(n) · ln(1/δ) · nnz(R) . Hb (R) − H (R) ≤ 3H (R) , ` 2

q u by setting m = 2` ln(1/(1−`)) . The algorithm runs in time III.ANAPPROACHVIA CHEBYSCHEV POLYNOMIALS r u ln(1/δ)  O · · nnz(R) + ln(n) · ln(1/δ) · nnz(R) . Our second approach is to use a Chebyschev polynomial- ` ln(1/(1 − `)) 2.5 based approximation scheme to estimate the entropy of a The similarities between Theorems 1 and 3 are obvious: density matrix. Our approach follows the work of [2], but our same assumptions and directly comparable accuracy guaran- analysis uses the trace estimators of [5] and the power method tees. The only difference is in the running times: the Taylor of [4] and its analysis. Importantly, we present conditions series approach has a milder dependency on , while the under which the proposed approach is competitive with the Chebyschev-based approximation has a milder dependency on approach of Section II. the ratio u/`, which controls the behavior of the probabilities A. Algorithm and Main Theorem pi. We also note that the discussion following Theorem 1 is also applicable here. The proposed algorithm leverages the fact that the von Neu- We conclude the section with a brief comparison of our mann entropy of a density matrix R is equal to the (negative) bound in Theorem 3 with the results of [2]; a detailed compar- trace of the matrix function R ln R and approximates the ison of the two bounds may be found in Section 3.3 of [7]. The function R ln R by a sum of Chebyschev polynomials; then, work of [2] culminates to the error bounds described in The- the trace of the resulting matrix is estimated using the trace orem 4.3 (and the ensuing discussion). Unfortunately, without estimator of [5]. Pm u u  imposing a lower bound assumption on the pis it is difficult Let fm(x) = w=0 αwTw(x) with α0 = 2 ln 4 + 1 , w to get a meaningful error bound and an efficient algorithm. α = u 2 ln u + 3, and α = (−1) u for w ≥ 2. Let 1 4 4 w w3−w Indeed, [2] needs an efficient trace estimation procedure for Tw(x) = cos(w · arccos((2/u)x − 1)) and x ∈ [0, u] be the matrix −fm(R). However, while this matrix is always the Chebyschev polynomials of the first kind for any integer symmetric, it is not necessarily positive or negative semi- w > 0 u . Algorithm 2 computes (an upper bound estimate definite (unless additional assumptions are imposed on the p s, p R i for the largest probability 1 of the density matrix ) and like we did in Theorem 3). Unfortunately, we are not aware of f (R) then computes m and estimates its trace. We note that any provably accurate, relative error approximation algorithms g>f (R)g the computation i m i can be done efficiently using for the trace of just symmetric matrices: the results of [5], [12] Clenshaw’s algorithm; see Appendix C of [7] for the well- only apply to symmetric positive (or negative) semidefinite known approach. matrices. The work of [2] does provide an analysis of a trace estimator for general symmetric matrices (pioneered by Algorithm 2 A Chebyschev polynomial-based approach to estimate the entropy. Hutchinson in [13]). However, in our notation, in order to achieve a relative error bound, the final error bound of [2] 1: INPUT: R ∈ n×n, accuracy parameter ε > 0, failure R (see eqns. (19) and (20) in [2]), could necessitate setting s (the probability δ and integer m > 0. number of random vectors in Algorithm 2) to a prohibitively 2: Estimate p˜ using the power method of [4] with t = 1 large value, thus leading to running times that blow up to O(ln n) and q = O(ln(1/δ)). O(n2nnz(R)), which could easily exceed the trivial O(n3) 3: Set u = min{1, 6˜p1}.  2 running time to exactly compute H (R). See Section 3.3 of [7] 4: Set s = 20 ln(2/δ)/ε . n for details. 5: Let g1, g2,..., gs ∈ R be i.i.d. random Gaussian vec- tors. XPERIMENTS 1 Ps > IV. E 6: OUTPUT: Hb (R) = − s i=1 gi fm(R)gi. A detailed experimental evaluation of the proposed algo- rithms may be found in Section 4 of [7]. Here, we only Our main result is an analysis of Algorithm 2 that guarantees highlight one of our observations on a random density matrix a relative error approximation to the entropy of the density of size 30, 000 × 30, 000 in order to prove the practical effi- matrix R, under the assumption that R = Pn p ψ ψT ∈ i=1 i i i ciency of our algorithms. Our algorithms were implemented n×n has n pure states with 0 < ` ≤ p for all i = 1 . . . n. R i in MatLab; to be precise, we used MATLAB R2016a on a The following theorem is our main quality-of-approximation (dedicated) node with two 10-Core Intel Xeon-E5 processors result for Algorithm 2; a detailed proof of Theorem 3 may be (2.60GHz) and 512 GBs of RAM. We generated our random found in [7]. density matrices using the QETLAB Matlab toolbox [6]. We Theorem 3. Let R be a density matrix such that all proba- computed the exact singular values of the matrix (and thus the bilities pi, i = 1 . . . n satisfy 0 < ` ≤ pi. Let u be computed entropy) using the svd function of MatLab. The accuracy of as in Algorithm 1 and let Hb (R) be the output of Algorithm 2 our algorithms was evaluated by measuring the relative error. We set the parameters m and s to take values in the sets trices. Our algorithms leverage recent developments in the {5, 10, 15, 20} and {50, 100, 200}, respectively. The parameter RandNLA literature: randomized trace estimators, provable ˜ u was set to λmax, and was computed using the power method bounds for the power method, and the use of Taylor series (see [7] for details) in approximately 3.6 minutes. We report and Chebyschev polynomials to approximate matrix functions. the relative error (out of 100%) in Figure 1. Both algorithms come with provable accuracy guarantees We observe that the relative error is always less than under assumptions on the spectrum of the density matrix. 1% for both methods, with the Chebyshev approximation Empirical evaluations on 30, 000 × 30, 000 synthetic density yielding almost always slightly better results. Note that our matrices support our theoretical findings and demonstrate that Chebyshev-polynomial-based approximation algorithm signif- we can efficiently approximate the von Neumann entropy in icantly outperformed the exact computation: e.g., for m = 5 a few minutes with minimal loss in accuracy, whereas an the and s = 50, our estimate was computed in less than ten exact computation takes over 5.5 hours. An important open minutes and achieved less than .2% relative error, whereas problem is to relax (or eliminate) the assumptions associated the exact computations of the Von-Neumann entropy took with our key technical results. It would be critical to under- approximately 5.6 hours. Finally, we report the wall-clock stand whether our assumptions are, for example, necessary times (in minutes) in Figure 2. We note that for both algorithms to achieve relative error approximations and either provide and all combinations of the parameters, the approximation of algorithmic results that relax or eliminate our assumptions or the Von-Neumann entropy was computed in less than one hour. provide matching lower bounds and counterexamples.

ACKNOWLEDGMENT m=5 m=10 1% 1% Taylor We would like to thank the anonymous reviewers for Chebyshev their suggestions and for bringing Theorem 35 of [8] to our Taylor 0.5% 0.5% Chebyshev attention.

Relative error Relative error REFERENCES 0% 0% 50 100 150 200 50 100 150 200 Matrix Computations (3rd Ed.) s s [1] G. H. Golub and C. F. Van Loan, . m=15 m=20 Baltimore, MD, USA: Johns Hopkins University Press, 1996. 1% 1% [2] T. P. Wihler, B. Bessire, and A. Stefanov, “Computing the Entropy of Taylor Taylor Chebyshev Chebyshev a Large Matrix,” Journal of Physics A: Mathematical and Theoretical, vol. 47, no. 24, p. 245201, 2014. 0.5% 0.5% [3] N. J. Higham, Functions of Matrices: Theory and Computation. Philadelphia, PA, USA: Society for Industrial and Applied , Relative error Relative error 2008. 0% 0% 50 100 150 200 50 100 150 200 [4] L. Trevisan, “Graph Partitioning and Expanders,” 2011, handout 7. s s [5] H. Avron and S. Toledo, “Randomized Algorithms for Estimating the Trace of an Implicit Symmetric Positive Semi-definite Matrix,” Journal of the ACM Fig. 1. Relative error for 30, 000 × 30, 000 density matrix using the Taylor , vol. 58, no. 2, p. 8, 2011. [6] N. Johnston, “QETLAB: A Matlab toolbox for quantum entanglement, and the Chebyshev approximation algorithms with u = λ˜ . max version 0.9,” http://qetlab.com, 2016. [7] E.-M. Kontopoulou, A. Grama, W. Szpankowski, and P. Drineas, “Randomized Linear Algebra Approaches to Estimate the Von m=5 m=10 Neumann Entropy of Density Matrices,” 2018. [Online]. Available: 60 60 Taylor Taylor http://arxiv.org/abs/1801.01072 Chebyshev Chebyshev [8] C. Musco, P. Netrapalli, A. Sidford, S. Ubaru, and 40 40 D. P. Woodruff, “Spectrum Approximation Beyond Fast Matrix

20 20 Multiplication: Algorithms and Hardness,” 2018. [Online]. Available:

Time(min) Time(min) http://arxiv.org/abs/1704.04163 0 0 [9] N. J. Harvey, J. Nelson, and K. Onak, “Sketching and streaming entropy 50 100 200 50 100 200 s via approximation theory,” in IEEE Annual Symposium on Foundations s m=15 m=20 of Computer Science, 2008, pp. 489–498. 60 60 [10] C. Boutsidis, P. Drineas, P. Kambadur, E.-M. Kontopoulou, and Taylor Taylor Chebyshev Chebyshev A. Zouzias, “A Randomized Algorithm for Approximating the Log 40 40 Determinant of a Symmetric Positive Definite Matrix,” Linear Algebra and its Applications, vol. 533, pp. 95–117, 2017. 20 20 [11] I. Han, D. Malioutov, and J. Shin, “Large-scale Log-determinant Com- Time(min) Time(min) putation through Stochastic Chebyshev Expansions,” Proceedings of the 0 0 50 100 200 50 100 200 32nd International Conference on Machine Learning, vol. 37, pp. 908– s s 917, 2015. [12] F. Roosta-Khorasani and U. Ascher, “Improved Bounds on Sample Size for Implicit Matrix Trace Estimators,” Foundations of Computational Fig. 2. Wall-clock times: Taylor approximation (blue) and Chebyshev ˜ Mathematics, vol. 15, pp. 1187–1212, 2015. approximation (red) for u = λmax. Exact computation needed approximately [13] M. F. Hutchinson, “A Stochastic Estimator of the Trace of the Influence 5.6 hours. Matrix for Laplacian Smoothing Splines,” Communications in Statistics: Simulation and Computation, vol. 19, no. 2, pp. 433–450, 1990. V. CONCLUSIONSANDOPENPROBLEMS We presented and analyzed two randomized algorithms to approximate the von Neumann entropy of density ma-