Quantum principal component analysis only achieves an exponential speedup because of its state preparation assumptions

Ewin Tang∗ University of Washington (Dated: August 10, 2021) A central roadblock to analyzing quantum algorithms on quantum states is the lack of a comparable input model for classical algorithms. Inspired by recent work of the author [E. Tang, STOC’19], we introduce such a model, where we assume we can efficiently perform `2-norm samples of input data, a natural analogue to quantum algorithms that assume efficient state preparation of classical data. Though this model produces less practical algorithms than the (stronger) standard model of classical computation, it captures versions of many of the features and nuances of quantum linear algebra algorithms. With this model, we describe classical analogues to Lloyd, Mohseni, and Rebentrost’s quantum algorithms for principal component analysis [Nat. Phys. 10, 631 (2014)] and nearest-centroid clustering [arXiv:1307.0411]. Since they are only polynomially slower, these algorithms suggest that the exponential speedups of their quantum counterparts are simply an artifact of state preparation assumptions.

INTRODUCTION algorithms. In our previous work we suggest an idea for develop- Quantum (QML) has shown great ing classical analogues to QML algorithms beyond this promise toward yielding new exponential quantum exceptional case [9]: speedups in machine learning, ever since the pioneer- When QML algorithms are compared to clas- ing linear systems algorithm of Harrow, Hassidim, and sical ML algorithms in the context of finding Lloyd [1]. Since machine learning (ML) routines often speedups, any state preparation assumptions push real-world limits of computing power, an exponen- in the QML model should be matched with tial improvement to algorithm speed would allow for ML `2-norm sampling assumptions in the classical systems with vastly greater capabilities. While we have ML model. found many fast QML subroutines for ML problems since HHL [2–6], researchers have not been able to prove that In this work, we implement this idea by introducing a these subroutines can be used to achieve an exponentially new input model, sample and query access (SQ access), faster algorithm for a classical ML problem, even in the which is an `2-norm sampling assumption. We can get strongest input and output models [7, 8]. A recent work SQ access to data under typical state preparation assump- of the author [9] suggests a surprising reason why: even tions, so fast classical algorithms in this model are strong our best QML algorithms, with issues with input and out- barriers to their QML counterparts admitting exponential put models resolved, fail to achieve exponential speedups. speedups. To support our contention that the resulting This previous work constructs a classical algorithm match- model is the right notion to consider, we use it to de- ing, up to polynomial slowdown, a corresponding quantum quantize two seminal and well-known QML algorithms, algorithm for recommendation systems [10], which was quantum principal component analysis [12] and quan- previously believed to be one of the best candidates for tum supervised clustering [13]. That is, we give classical an exponential speedup in machine learning [11]. In light algorithms that, with classical SQ access assumptions arXiv:1811.00414v3 [cs.DS] 6 Aug 2021 of this result, we need to question our intuitions and re- replacing quantum state preparation assumptions, match consider one of the guiding questions of the field: when is the bounds and runtime of the corresponding quantum quantum linear algebra exponentially faster than classical algorithms up to polynomial slowdown. Surprisingly, we linear algebra? do so using only the classical toolkit originally applied to The main challenge in answering this question is not the recommendation systems problem, demonstrating the in finding fast classical algorithms, as one might expect. power of this model in analyzing QML algorithms. Rather, it is that most QML algorithms are incomparable From this work, we conclude that the exponential to classical algorithms, since they take quantum states as speedups of the quantum algorithms that we consider input and output quantum states: we don’t even know an arise from strong input assumptions, rather than from analogous classical model of computation where we can the quantumness of the algorithms, since the speedups search for similar classical algorithms [8]. The quantum vanish when classical algorithms are given analogous as- recommendation system is unique in that it has a classical sumptions. In other words, in a wide swathe of settings, input, a data structure implementing QRAM, and classi- on classical data, these algorithms do not give exponential cal output, a sample from a vector in the computational speedups. QML algorithms can still be useful for quan- basis, allowing for rigorous comparisons with classical tum data (say, states generated from a quantum system), 2

though a priori it’s not clear if they give a speedup in that an index i [n] for its entry x ; produce an independent ∈ i case, since the analogous “classical algorithm on quantum measurement of x in the computational basis; and query | i data” isn’t well-defined. for x . If we can only query for an estimate of the k k Our dequantized algorithms in the SQ access model pro- squared norm x¯ (1 ν) x 2, then we denote this by ν ∈ m ±n k k vide the first formal evidence supporting the crucial con- SQ (x). For A C × , sample and query access to A ∈ cern about strong input and output assumptions in QML. (notated SQ(A)) is SQ(A1, ,...,An, ) along with SQ(A˜) ∗ ∗ Based on these results, we recommend exercising care where A˜ is the vector of row norms, i.e. A˜i := Ai, . when analyzing quantum linear algebra algorithms, since k ∗k some algorithms with poly-logarithmic runtimes only ad- Sample and query (SQ) access will be our classical mit polynomial speedups. BQP-complete QML problems, analogue to quantum state preparation. As we noted such as sparse matrix inversion [1] and quantum Boltz- previously [9], we should be able to assume that classical mann machine training [14], still cannot be dequantized analogues can efficiently measure input states: QML al- in full unless BQP=BPP. However, many QML problems gorithms shouldn’t rely on fast state preparation as the that are not BQP-complete have strong input model as- “source” of an exponential speedup. The algorithm itself sumptions (like QRAM) and low-rank-type assumptions should create the speedup. (which makes sense for machine learning, where high- For typical instantiations of state preparation oracles dimensional data often exhibits low-dimensional trends). on classical input, we can get efficient SQ access to in- This regime is precisely when the classical approaches we put. For example, given input in QRAM [15], a strong outline here work, so such problems are highly suscepti- proposed generalization of classical RAM that supports ble to dequantization. We believe continuing to explore state preparation, we can get log-dimension-time SQ ac- the capabilities and limitations of this model is a fruitful cess to input [9, Proposition 3.2]. Similarly, sparse and direction for QML research. close-to-uniform vectors can be prepared efficiently, and Notation. [n] := 1, . . . , n . Consider a vector x correspondingly admit efficient SQ access [16]. So, in n { m n} ∈ usual QML settings, SQ assumptions are easier to satisfy C and matrix A C × . Ai, and A ,i will re- ∈ ∗ ∗ fer to A’s ith row and column, respectively. x , than state preparation assumptions. 2 k k This leads to a model based on SQ access that A F , and A will refer to ` , Frobenius, and spec- k k k k 1 n we codify with the informal definition of “dequanti- tral norm, respectively. x := x i=1 xi i and A := 1 m | i k k | i | i zation”. We say we dequantize a quantum protocol A i=1 Ai, i Ai, (where, by the previous defini- k kF k ∗k | i | ∗i P S : O(T )-time state preparation of φ1 ,..., φc ψ 1 n min m,n | i | i → | i tion, Ai, = Ai,j j ). A = σiuiv† C P Ai, j=1 i=1 i if we describe a classical algorithm of the form S : | ∗i k ∗k | i m ν is A’s singular value decomposition, where ui C , O(T )-time SQ(φ1, . . . , φc) SQ (ψ) with similar guar- n P P ∈ → vi C , σi R, ui and vi are sets of orthonor- antees to S up to polynomial slowdown. This is the sense ∈ ∈ { } { } mal vectors, and σ1 σ2 σmin m,n 0. in which we dequantized the quantum recommendation ≥ ≥ ·k · · ≥ ≥ A := σ u v† A := σ u v† system in prior work [10]. In the rest of this article, we σ σi σ i i i and k i=1 i i i denote low- rank approximations≥ to A. We assume basic arithmetic will dequantize two quantum algorithms, giving detailed P P operations take unit time, and O˜(f) := O(f log f). sketches of the classical algorithms and leaving proofs of correctness to the supplemental material [16]. These algorithms are applications of three protocols from our THE DEQUANTIZATION MODEL previous work [9] rephrased in our access model.

A typical QML algorithm works in the model where state preparation of input is efficient and a quantum state NEAREST-CENTROID CLASSIFICATION is output for measurement and post-processing. (Here, we assume an ideal and fault-tolerant quantum computer.) Lloyd, Mohseni, and Rebentrost’s In particular, given a data point x Cn as input, we for clustering estimates the distance of a data point to the ∈ assume we can prepare copies of x . For m input data centroid of a cluster of points [13]. The paper claims [17] m n | i points as a matrix A C × , we additionally assume that this quantum algorithm gives an exponential speedup ∈ efficient preparation of A , to preserve relative scale. We over classical algorithms. We dequantize Lloyd et al’s | i wish to compare QML and classical ML on classical data, quantum supervised clustering algorithm [13] with only so state preparation usually requires access to this data quadratic slowdown. Though classical algorithms by and its normalization factors. This informs the classical Aaronson [8] and Wiebe et al. [18, Section 7] dequantize input model for our quantum-inspired algorithms, where this algorithm for close-to-uniform and sparse input data, we assume such access, and instead of preparing states, respectively, we are the first to give a general classical we can prepare measurements of these states. algorithm for this problem. Definition. We have O(T )-time sample and query access Problem 1 (Centroid distance). Suppose we are given n n d d 1 2 to x C (notated SQ(x)) if, in O(T ) time, we can query access to V C × and u C . Estimate u ~1V ∈ ∈ ∈ k − n k 3 to ε additive error with probability 1 δ. product of tensors a b , where ≥ − h | i d n+1 n+1 Note that we are treating vectors as rows, with ~1 the a := Mji Mk, i j k = M M˜ ; u ¯ k ∗k | i | i | i ⊗ vector of ones. Let u¯ := u and let V be V , normalized i=1 j=1 k=1 k k X X X so all rows have unit norm. Both classical and quantum d n+1 n+1 (n+1) d n+1 w†wkMki algorithms argue about M R × and w R b := j i j k . ∈ ∈ M | i | i | i instead of u and V , where i=1 j=1 k, X X kX=1 k ∗k Then, we show we have SQ access to one of the tensors u¯ 1 ˜ M := 1 ¯ and w := u √n V . (a). With this, we see that the quadratic speedup from √n V k k −   h i amplitude amplification is the only speedup that quantum nearest-centroid achieves: 1~ 2 Because wM = u n 1V , we wish to estimate wM = − 2 2 1 2 k k wMM †w†. Let Z := w = u + V be an “aver- Theorem 4 (Classical Nearest-Centroid). Suppose we are k k k k n k kF n d d age norm” parameter appearing in our algorithms. given O(T )-time SQ(V ) C × and SQ(u) C . Then ∈ ∈ Z2 1 one can output a solution to Problem 1 in O(T ε2 log δ ) Theorem 2 (Quantum Nearest-Centroid [13]). Suppose time. that, in O(T ) time, we can (1) determine u and V ; k k k kF or (2) prepare a state u , V1 ,..., Vn , or V˜ . Then | i | i Z |1 i | i PRINCIPAL COMPONENT ANALYSIS we can solve Problem 1 in O(T ε log δ ) time. We now dequantize Lloyd, Mohseni, and Rebentrost’s The quantum algorithm proceeds by constructing the quantum principal component analysis (QPCA) algorithm states M and w , then performing a swap test to | i | i [12], an influential early example of QML [20, 21]. While get wM . The swap test succeeds with probability 1 | i the paper describes a more general strategy for Hamilto- wMM †w†, so we can run amplitude amplification to Z nian simulation of density matrices, their central claim is get an estimate up to ε error with O( 1 log 1 ) overhead. ε δ an exponential speedup in an immediate application: pro- Dequantizing this algorithm is simply a matter of de- ducing quantum states corresponding to the top principal quantizing the swap test, which is done in Algorithm 1. components of a low-rank dataset [12]. Here, Q(y) is query access to y, which supports querying The setup for the problem is as follows: suppose we are y O(1) n d ’s entries in time, but not querying samples or given a matrix A R × whose rows correspond to data ∈ norms. in a dataset. We will find the principal eigenvectors and eigenvalues of A†A; when A is a mean zero dataset, this Algorithm 1 Inner product estimation corresponds to the top principal components. ν n n Input: O(T )-time SQ (x) C , Q(y) C ∈ ∈ Problem 5 (Principal component analysis). Suppose we Output: an estimate of x y n d s = 54 1 log 2 h | i are given access to A C × with singular values σi and 1: Let ε2 δ ∈ 2: Collect measurements i1, . . . , is from x right singular vectors vi. Further suppose we are given σ, x 2 | i z = x† y k k j [s] . [z ] = x y k, and η with the guarantee that, for all i [k], σi σ 3: Let j ij ij x 2 for all E j | ij | ∈ h | i 2 2 2 ∈ ≥ 2 9 and σi σi+1 η A F . With probability 1 δ, output 4: Separate the zj ’s into 6 log buckets of size 2 , and take − ≥ k k ≥ − δ ε estimates σˆ2,..., σˆ2 and vˆ ,..., vˆ satisfying σˆ2 σ2 the mean of each bucket 1 k 1 k | i − i | ≤ ε A 2 and vˆ v ε for all i [k]. 5: Output the (component-wise) median of the means σk kF k i − ik ≤ v ∈ Denote A 2 /σ2 by K. Lloyd et al. get the following: k kF Theorem 6. Given A and the ability to prepare From a simple analysis of the random variable zi’s, we k kF get the following result. copies of A in O(T ) time, a quantum algorithm can | i 2 2 output the desired estimates for Problem 5 σˆ1,..., σˆk and 3 n vˆ ,..., vˆ in O˜(TK min(ε , δ)− ) time. Proposition 3 ([9, Proposition 4.2]). For x, y C , | 1i | ki σ ν ∈ given SQ (x) and Q(y), Algorithm 1 outputs an estimate Later results [10, 22, Theorems 5.2, 27] improve the 1 of x y to (ε+ν+εν) x y error with probability 1 δ runtime here to O˜(T Kε− polylog(nd/δ)) when A is given h | i T 1 k kk k ≥ − σ in time O( ε2 log δ ). in QRAM. We will compare to the original QPCA result. To dequantize QPCA, we use a similar high-level idea to For this protocol, quantum algorithms can achieve a that of the quantum-inspired recommendation system [9]. quadratic speedup via amplitude estimation (but no more, We begin by using a low-rank approximation algorithm, by unstructured search lower bounds [19]). To apply Algorithm 2, to output a description of approximate top this to nearest-centroid, we write wMM †w† as an inner singular values and vectors. 4

n k Algorithm 2 finds the large singular vectors of A by Proposition 7 ([9, Proposition 4.3]). For V C × , w k ∈ ∈ reducing its dimension down to W , whose singular value C , given SQ(V †) and Q(w), Algorithm 3 simu- decomposition we can compute quickly. Then, S, U,ˆ Σˆ lates SQν (V w) where the time to query is O(T k), ˆ ˆ ˆ 1 2 1 define approximate large singular vectors V := S†UΣ− . sample is O(T k C(V, w) log δ ), and query norm is 2 1 1 The full set of guarantees on the output of Algorithm 2 O(T k C(V, w) ν2 log δ ). Here, δ is the desired failure 2 2 are in the supplemental material [16], but in brief, for probability and C(V, w) = wiV ,i / V w . k ∗ k k k the right setting of parameters, the columns of Vˆ and the P diagonal entries of Σˆ satisfy the desired constraints for our In general, C(V, w) may be arbitrarily large, but in this O(K) vˆi’s and σˆi’s in Problem 5. The σˆi’s are output explicitly, application it is . Quantum algorithms achieve a k C(V, w) but the vˆi’s are described implicitly: vˆi = S†Uˆ ,i/σˆi. We speedup here when is large and is small, such ∗ have O(T )-time SQ(S) because all rows are normalized, as when V is a high-dimensional unitary, confirming our and rows of S are simply rows of A. Thus, sampling from intuition that unitary operations are hard to simulate S˜ is a uniform sample from [q] and sampling from Si, is classically. ∗ Altogether, we get our desired result. sampling from a row of A. Uˆ ,i is an explicit vector, so ∗ in essence, we need SQ access to a linear combination of n d Theorem 8. Given O(T )-time SQ(A) C × , SQ ∈ vectors, each of which we have access to. with ε , ε , δ (0, 0.01), there is an algorithm σ v ∈ that output the desired estimates for Problem 5 Algorithm 2 Low-rank approximation [23] K9 3 k 0.01 σˆ1,..., σˆk and O(T ε4 log ( δ ))-time SQ (vˆ1,..., vˆk) m n 12 8 Input: O(T )-time SQ(A) × , σ, ε, δ K 3 k K 2 k R in O( 6 log ( ) + T 4 log ( )) time, where ε = ` n ∈ ˆ q ` ˆ ` ` ε δ ε δ Output: SQ(S) C × , Q(U) C × , Q(Σ) C × min(0.1ε K1.5, ε2η, 1 K 1/2) 2 2∈ K∈4 1 ∈ σ v 4 − . 1: Set K = A /σ and q = Θ 2 log( ) k kF ε δ q n 1 1/2 2: Sample rows i1, . . . , iq from A˜ and define S × such  R Under the non-degeneracy condition η 4 K− , this A F ∈ 12 ≤ that Sr, := Air , k k ˜ K 3 1 √q Air , O(T log ( )) ∗ ∗ k ∗k runtime is ε6 ε12 δ . While the classical runtime j , . . . , j F F σ v 3: Sample columns 1 q from , where denotes the depends on ε , note that a quantum algorithm must also distribution given by sampling a uniform r [q], then v ∼ incur this error term to learn about vi from copies of sampling c from Sr. q q S ,jc vi . For example, computing entries or expectations of 4: Let W × be the normalized submatrix W ,c := ∗ C ∗ qF(jc) | i 1 ∈ (1) (`) observables of vi given copies of vi requires poly( ) or 5: Compute the left singular vectors of W uˆ ,..., uˆ that | i εv (1) (`) poly(n) correspond to singular values σˆ ,..., σˆ larger than σ time. q ` (i) 6: Output SQ(S), Uˆ R × the matrix with columns uˆ , ` ` ∈ (i) and Σˆ R × the diagonal matrix with entries σˆ . ∈ DISCUSSION

Algorithm 3 does exactly this: it uses rejection sampling We have introduced the SQ access assumption as a clas- to dequantize the swap test over a subset of qubits (getting sical analogue to the QML state preparation assumption V w via V ( w I)). | i h | | i ⊗ and demonstrated two examples where, in this classical model, we can dequantize QML algorithms with ease. We Algorithm 3 Matrix-vector SQ access now discuss the implications of this work with respect to k n k Input: O(T )-time SQ(V †) C × , Q(w) C related literature. ν ∈ ∈ Output: SQ (V w) A natural question is of this work’s relation to clas- 1: function RejectionSample(SQ(V †), Q(w)) sical literature: does this work improve on classical al- 2 2 2: Sample i [k] proportional to wi V ,i by manually ∈ | | k ∗ k gorithms for linear algebra in any regime? The answer calculating all k probabilities may be no, for a subtle but fundamental reason: recall 3: Sample s [n] from V ,i using SQ(V †) ∈ 2 ∗ k 2 that our main idea is to introduce an input model strong 4: Compute rs = (V w) /(k (Vsj wj ) ) (after query- s j=1 enough to give classical versions of QML while being weak ing for wj and Vsj for all j [k]) ∈P enough to extend to settings like QRAM, where classi- 5: Output s with probability rs (success); otherwise, out- put ∅ (failure) cal computers can only access the input in very limited 6: end function ways. In particular, the SQ access model that we study 7: Query: output (V w)s is weaker than the typical input model used for classical 8: Sample: run RejectionSample until success (outputting sketching algorithms [24–26]. O(T )-time algorithms in s kC(V, w) log 1 ) or δ failures (outputting ∅) the quantum-inspired access model are O˜(nnz + T )-time 9: Norm(ν): Let p be the fraction of successes from run- k 1 algorithms in the usual RAM model (where nnz is the ning RejectionSample ν2 C(V, w) log δ times; output k 2 2 number of nonzero entries of the input), but not vice pk wi V ,i i=1 | | k ∗ k versa: typical sketching algorithms can exploit better P data structures provided they only take O(nnz) time (e.g. oblivious sketches), whereas the quantum-inspired model 5 can only use the QRAM data structure. The crucial tize a wide swathe of low-rank — insight of this work is that some algorithms (such as Al- an exciting step forward in understanding QML. gorithm 2 of Frieze et al. [23]) generalize to the weaker Thanks to Ronald de Wolf for giving the initial idea quantum-inspired model. Our algorithms give exponen- to look at QPCA. Thanks to Nathan Wiebe for helpful tial speedups in the quantum-inspired setting, but since comments on this document. Thanks to Daniel Liang the model is weaker, one might expect that they perform and Patrick Rall for their help fleshing out these ideas worse in typical settings for classical computation (see and reviewing a draft of this document. Thanks to Scott [27]). These model considerations also explain why we Aaronson for helpful discussions. This material is based use Frieze et al. [23]: to our knowledge, this algorithm is upon work supported by the National Science Foundation the only one from the classical literature that naturally Graduate Research Fellowship Program under Grant No. generalizes to the SQ input model. DGE-1762114. The closest analogue to these results and techniques is a work by Van den Nest on probabilistic quantum simu- lation [28], which describes a notion of “computationally tractable” (CT) states that corresponds to our notion of SQ access for vectors. With this notion, the author ∗ [email protected]; ewintang.com describes special types of circuits on CT states where [1] A. W. Harrow, A. Hassidim, and S. Lloyd, Quantum weak simulation is possible, using variants of Proposi- algorithm for linear systems of equations, Physical review letters 103, 150502 (2009). tions 3 and 7. However, Van den Nest’s work does not [2] P. Rebentrost, M. Mohseni, and S. Lloyd, Quantum sup- have a version of Algorithm 2, since this technique only port vector machine for big data classification, Physical runs quickly on low-rank matrices, making it ineffective review letters 113, 130503 (2014). on generic quantum circuits. We exploit this low-rank [3] N. Wiebe, D. Braun, and S. Lloyd, Quantum algorithm for structure for efficient quantum simulation of a small-but- data fitting, Physical review letters 109, 050505 (2012). practically-relevant class of circuits: quantum linear alge- [4] S. Lloyd, S. Garnerone, and P. Zanardi, Quantum al- bra on data with low-rank structure. So, our techniques gorithms for topological and geometric analysis of data, Nature Communications 7, 10138 (2016). used for supervised clustering are within the scope of Van [5] Z. Zhao, J. K. Fitzsimons, and J. F. Fitzsimons, Quantum- den Nest’s work, whereas our techniques for PCA are new assisted Gaussian process regression, Physical Review A to this line of work. 99, 052331 (2019), arXiv:1512.03929 [quant-ph]. These techniques are not new to quantum simulation [6] F. G. Brandao and K. M. Svore, Quantum speed-ups for in general. Others have considered applying randomized solving semidefinite programs, in 2017 IEEE 58th Annual numerical linear algebra to quantum simulation [29], but Symposium on Foundations of Computer Science (FOCS) (IEEE, 2017). have not made the connection towards dequantizing quan- [7] A. M. Childs, Equation solving by simulation, Nature tum algorithms, especially in large generality. Low-rank Physics 5, 861 (2009). approximation is crucial for tensor network simulations of [8] S. Aaronson, Read the fine print, Nature Physics 11, 291 quantum systems [30, 31], where simulation can be done (2015). efficiently provided the input is, say, a matrix product [9] E. Tang, A quantum-inspired classical algorithm for rec- state with low tensor rank. In this context, low-rank ommendation systems, in Proceedings of the 51st Annual approximation is often performed exactly and only on a ACM SIGACT Symposium on Theory of Computing - STOC 2019 (ACM Press, 2019) arXiv:1807.04271 [cs.IR]. subset of the space, instead of approximately done on the [10] I. Kerenidis and A. Prakash, Quantum recommendation full state, as is done here. This reflects the fact that tensor systems, in 8th Innovations in Theoretical Computer Sci- network algorithms assume that the system is reasonably ence Conference (ITCS 2017), LIPIcs, Vol. 67 (Schloss approximated by a tensor network and aims to work well Dagstuhl, 2017) pp. 49:1–49:21. in practice, whereas our “dequantized” algorithms must [11] J. Preskill, in the NISQ era and work on a broader class of input and prioritizes prov- beyond, Quantum 2, 79 (2018). able guarantees in an abstract computational model over [12] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum prin- cipal component analysis, Nature Physics 10, 631 (2014). real-world performance. Nevertheless, some of these de- [13] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum algo- quantized algorithms might be able to be matched by rithms for supervised and unsupervised machine learning, tensor network contraction techniques, when the input arXiv (2013), arXiv:1307.0411 [quant-ph]. has low tensor rank. See the supplemental material for [14] M. Kieferová and N. Wiebe, Tomography and generative further discussion of this comparison [16]. training with quantum boltzmann machines, Phys. Rev. Since this work, numerous follow-ups have cemented A 96, 062327 (2017). the significance of the SQ access model introduced here [15] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum ran- dom access memory, Physical review letters 100, 160501 [32–35]. In particular, a recent work [34] essentially de- (2008). quantizes the singular value transformation framework of [16] See supplemental material for details on when sample and Gilyen et al. [36] when input is given in QRAM. These query access is possible; discussion on the relation of this works use fundamentally the same techniques to dequan- work to the tensor networks literature; and full proofs for 6

the results stated here. It includes Refs. [37–47]. posium on Algorithms and Computation (ISAAC 2020), [17] We were not able to verify the quantum algorithm (namely, LIPIcs (Schloss Dagstuhl–Leibniz-Zentrum für Informatik, the Hamiltonian simulation for preparing φ ) as stated. 2020). | i For our purposes, we can make the minor additional [33] N.-H. Chia, T. Li, H.-H. Lin, and C. Wang, Quantum- assumption of efficient state preparation access to φ , | i Inspired Sublinear Algorithm for Solving Low-Rank which makes correctness obvious. When we refer to the Semidefinite Programming, in 45th International Sympo- quantum algorithm in this letter, we mean this version of sium on Mathematical Foundations of Computer Science it. (MFCS 2020), LIPIcs (Schloss Dagstuhl–Leibniz-Zentrum [18] N. Wiebe, A. Kapoor, and K. M. Svore, Quantum algo- für Informatik, 2020) arXiv:1901.03254 [cs.DS]. rithms for nearest-neighbor methods for supervised and [34] N.-H. Chia, A. Gilyén, T. Li, H.-H. Lin, E. Tang, and unsupervised learning, Quantum Information and Com- C. Wang, Sampling-based sublinear low-rank matrix arith- putation 15, 316–356 (2015). metic framework for dequantizing quantum machine learn- [19] C. H. Bennett, E. Bernstein, G. Brassard, and U. Vazirani, ing, in Proceedings of the 52nd Annual ACM SIGACT Strengths and weaknesses of quantum computing, SIAM Symposium on Theory of Computing - STOC 2020 (ACM Journal on Computing 26, 1510 (1997). Press, 2020) arXiv:1910.06151 [cs.DS]. [20] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, [35] D. Jethwani, F. L. Gall, and S. K. Singh, Quantum- N. Wiebe, and S. Lloyd, Quantum machine learning, Na- Inspired Classical Algorithms for Singular Value Transfor- ture 549, 195 (2017). mation, in 45th International Symposium on Mathematical [21] C. Ciliberto, M. Herbster, A. D. Ialongo, M. Pontil, Foundations of Computer Science (MFCS 2020), LIPIcs A. Rocchetto, S. Severini, and L. Wossnig, Quantum ma- (Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2020) chine learning: a classical perspective, Proceedings of the arXiv:1910.05699 [cs.DS]. Royal Society A: Mathematical, Physical and Engineering [36] A. Gilyén, Y. Su, G. H. Low, and N. Wiebe, Quantum Sciences 474, 20170551 (2018). singular value transformation and beyond: exponential [22] S. Chakraborty, A. Gilyén, and S. Jeffery, The power improvements for quantum matrix arithmetics, in Pro- of block-encoded matrix powers: improved regression ceedings of the 51st Annual ACM SIGACT Symposium on techniques via faster Hamiltonian simulation, in 46th Theory of Computing - STOC 2019 (ACM Press, 2019) International Colloquium on Automata, Languages, and arXiv:1806.01838 [quant-ph]. Programming (ICALP 2019), LIPIcs (Schloss Dagstuhl, [37] A. Prakash, Quantum algorithms for linear algebra and 2019) arXiv:1804.01973 [quant-ph]. machine learning., Ph.D. thesis, UC Berkeley (2014). [23] A. Frieze, R. Kannan, and S. Vempala, Fast Monte-Carlo [38] L. Grover and T. Rudolph, Creating superpositions that algorithms for finding low-rank approximations, Journal correspond to efficiently integrable probability distribu- of the ACM (JACM) 51, 1025 (2004). tions, arXiv (2002), arXiv:0208112 [quant-ph]. [24] M. W. Mahoney, Randomized algorithms for matrices and [39] F. Verstraete and J. I. Cirac, Matrix product states rep- data, Foundations and Trends® in Machine Learning 3, resent ground states faithfully, Phys. Rev. B 73, 094423 123 (2011). (2006). [25] D. P. Woodruff, Sketching as a tool for numerical lin- [40] S. R. White, Density matrix formulation for quantum ear algebra, Foundations and Trends® in Theoretical renormalization groups, Phys. Rev. Lett. 69, 2863 (1992). Computer Science 10, 10.1561/0400000060 (2014). [41] G. Vidal, Efficient classical simulation of slightly entan- [26] R. Kannan and S. Vempala, Randomized algorithms in gled quantum computations, Phys. Rev. Lett. 91, 147902 numerical linear algebra, Acta Numerica 26, 95 (2017). (2003). [27] J. M. Arrazola, A. Delgado, B. R. Bardhan, and [42] G. Vidal, Efficient simulation of one-dimensional quantum S. Lloyd, Quantum-inspired algorithms in practice, Quan- many-body systems, Phys. Rev. Lett. 93, 040502 (2004). tum 10.22331/q-2020-08-13-307 (2020), arXiv:1905.10415 [43] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, [quant-ph]. and D. P. Mandic, Tensor networks for dimensionality [28] M. Van Den Nest, Simulating quantum computers with reduction and large-scale optimization: Part 1 low-rank probabilistic methods, Quantum Info. Comput. 11 (2011). tensor decompositions, Found. Trends Mach. Learn. 9, [29] A. Rudi, L. Wossnig, C. Ciliberto, A. Rocchetto, M. Pon- 249–429 (2016). til, and S. Severini, Approximating Hamiltonian dynam- [44] J. C. Bridgeman and C. T. Chubb, Hand-waving and inter- ics with the Nyström method, Quantum 4, 234 (2020), pretive dance: an introductory course on tensor networks, arXiv:1804.02484 [quant-ph]. Journal of Physics A: Mathematical and Theoretical 50, [30] U. Schollwöck, The density-matrix renormalization group 223001 (2017). in the age of matrix product states, Annals of Physics [45] J. Eisert, Computational difficulty of global variations 326, 96 (2011). in the density matrix renormalization group, Phys. Rev. [31] R. Orús, A practical introduction to tensor networks: Lett. 97, 260501 (2006). Matrix product states and projected entangled pair states, [46] Z. Landau, U. Vazirani, and T. Vidick, A polynomial time Annals of Physics 349, 117 (2014). algorithm for the ground state of one-dimensional gapped [32] N.-H. Chia, A. Gilyén, H.-H. Lin, S. Lloyd, E. Tang, local hamiltonians, Nature Physics 11, 566 (2015). and C. Wang, Quantum-Inspired Algorithms for Solving [47] W. Hoeffding, Probability inequalities for sums of Low-Rank Linear Equation Systems with Logarithmic bounded random variables, in The Collected Works of Dependence on the Dimension, in 31st International Sym- Wassily Hoeffding, edited by N. I. Fisher and P. K. Sen (Springer New York, New York, NY, 1994) pp. 409–426. Supplemental Material for “Quantum principal component analysis only achieves an exponential speedup because of its state preparation assumptions”

Ewin Tang∗ University of Washington (Dated: August 10, 2021)

MOTIVATING SAMPLE AND QUERY ACCESS gates. We can also get SQ(x) by querying all of the nonzero entries to compute the corresponding distribution In this section, we demonstrate how typical input mod- for sampling and norm in O(s) time. els where efficient state preparation oracles are possible Uniformity. When entries of x are close to uniform, also admit efficient sample and query access to input. the corresponding state can be prepared given the ability Throughout, we will try to be precise about time com- to query x in superposition. Classically, samples and plexities, but note that for classical algorithms we use the norm estimates can be found quickly using rejection sam- word RAM model of classical computation, which sup- pling: sample i [n] uniformly at random, and choose ∈ 2 log(n) n xi presses some factors (e.g. for reading indices) that it with probability C| x | 2 , where C is chosen so that 2 k k are not conventionally suppressed for quantum circuit n xi C maxi n |x |2 ; otherwise, restart. complexity. ≥ ∈ k k n Generic state preparation. Finally, we can generically For all these settings, the input vector x C is given ∈ classically, meaning that there is some way to efficiently convert a state preparation procedure to sample and query access: given input vectors x Cn classically, norms x , compute xi for all i [n]. This is typical for the QML ∈ k k ∈ and a quantum procedure to create copies of x , we can literature, and makes sense practically in the context of | i get SQ(x) by measuring x in the computational basis. including QML into a classical ML pipeline. However, we | i This could apply to cases where preparing x is feasible, note that this assumption is crucial for our techniques, | i and when it is no longer satisfied (say, the input comes but a full QRAM algorithm is impractical due to the space from a quantum system), we cannot get SQ access to and time overhead inherent to current QRAM proposals input. (e.g. from storing magic states or error-correcting). QRAM. Quantum random access memory is a pro- posal to implement state preparation of an n-dimensional quantum state in time polynomial in log n through the RELATION TO TENSOR NETWORKS use of a clever data structure and parallelization [1–4]. If the same input is reused or dynamically updated [5], Since this work can be seen as a form of quantum using QRAM for QML can amortize the cost of data load- simulation, it makes sense to compare and contrast the ing, removing the input bottleneck of these algorithms in algorithms presented here to tensor network algorithms. certain cost models. In our previous work [6, 3.1, 3.2], we Although the most direct analogue of our work, van den noted that this gives sample and query access to input as Nest’s paper [8], does not use low-rank approximation, it well. is a typical subroutine in the tensor networks context. Grover-Rudolph state preparation. Suppose we have As a reminder, tensor network algorithms work by us- an efficient way to apply an integration function on the ing tensor network representations of quantum states, n t 2 state x C we wish to prepare: I(s, t) = xi . such as matrix product states (MPS) or projected en- ∈ i=s | | arXiv:1811.00414v3 [cs.DS] 6 Aug 2021 Then, Grover-Rudolph state preparation [7] can perform tangled pair states (PEPS). For a small-but-practically- P the rotations done in a QRAM protocol without issues relevant set of quantum states, such as the ground states with parallelization, only assuming the ability to query of one-dimensional quantum spin systems, this tensor net- entries of x in superposition. So, one can prepare x with work representation is highly space-efficient [9, 10]. Since | i O(log(n)) applications of the integration oracle. this representation is also fairly time-efficient, particu- Given an efficient integration oracle, we can also gain larly excelling for 1D tensor networks like MPS, finding sample and query access to x: because all of the nodes of and analyzing these ground states through techniques a QRAM can be expressed as an integral of x over some like the density-matrix renormalization group method interval, we can simply perform that sample and query (DMRG) [11, 12] and time-evolving block decimation access protocol, replacing calls to the data structure with (TEBD) [13, 14] is surprisingly tractable. Specifically, in calls to the integration oracle. If integrating takes O(T ) this 1D case, many tensor network manipulations can be time, this gives O(T log n)-time SQ(x), the equivalent done in time polynomial in the bond dimension of the result to the quantum setting. tensor network (sometimes called the tensor rank)[15]. Sparsity. When x Cn has only s nonzero entries, and These tensor network algorithms still work even when the ∈ we know their locations, we can prepare x in O(s log n) states being studied cannot be exactly represented with | i 2 a small tensor network, provided they can still be closely generally has no provable guarantees, especially on gen- approximated by a tensor network of the desired shape. eral inputs, but performs exceptionally well in practice This approximation is typically done through iterative for the settings where this low-rank approximation makes low-rank approximation on local pieces of the state, as sense physically [18]. The algorithms presented here are is done in the aforementioned DMRG and TEBD algo- not intended to be used in practice, since (as discussed rithms [12, 16]. This technique is heuristic, though: since in the main text) we developed them with the intention NP-hard problems can be encoded into the ground states of being barriers for asymptotic speedup for quantum of local Hamiltonians, algorithms that find MPS approxi- machine learning algorithms. Instead, we want provable mations to ground states must perform poorly on some guarantees, which our algorithm for QPCA achieves by inputs [17]. performing low-rank approximation only once, which pro- We can observe that tensor network algorithms, at least duces low error for precisely the class of input density naively, do not suffice to recover the results described here. matrices under consideration. First, our algorithms require far weaker assumptions: instead of assuming the input state can be represented as an MPS with low tensor rank, we only assume that DETAILS FOR NEAREST-CENTROID the input mixed state has a density matrix with low CLASSIFICATION linear algebraic rank. For example, the results presented here support simulating QPCA applied to arbitrary pure Now, we give proofs to propositions and theorems stated states (though this is true for trivial reasons, since the in the main text, beginning with the dequantization of only principal component of a pure state is the pure nearest-centroid classification. Several of these results state itself) and probability distributions over a constant are versions of protocols from prior work [6]; we provide number of pure states. A tensor network algorithm might proofs for completeness, but encourage readers to look run into issues representing this pure state, since without there for more details. any other restrictions, one needs an exponential number of parameters to specify it. Of course, one could perform Proof of Proposition 3 repeated low-rank approximation to an input state to force it into, say, an MPS, but because this is done without considering the structure of the input state, this would likely incur large error. Our model bypasses these issues Algorithm 1 Inner product estimation ν n n Input: O(T )-time SQ (x) C , Q(y) C of representation by assuming an efficient representation ∈ ∈ Output: an estimate of x y of the input already exists in the form of SQ access: our 1 2 h | i 1: Let s = 54 2 log algorithm works by observing that a representation for ε δ 2: Collect measurements i1, . . . , is from x the output of QPCA can be found efficiently classically x 2 | i z = x† y k k j [s] . [z ] = x y 3: Let j ij ij x 2 for all E j if it’s allowed to depend on the input’s representation. | ij | ∈ h | i 2 9 With our model abstracting away representation details, 4: Separate the zj ’s into 6 log δ buckets of size ε2 , and take these classical analogues can run in time independent of the mean of each bucket any notion of bond dimension. 5: Output the (component-wise) median of the means In summary: our algorithms can be run efficiently on the set of low-rank states, and this set is much larger than Proposition 3 ([6, Proposition 4.2]). For x, y Cn, ν ∈ the set of states that are efficiently represented by tensor given SQ (x) and Q(y), Algorithm 1 outputs an estimate networks. This is not because the classical algorithms of x y to (ε+ν+εν) x y error with probability 1 δ are particularly good compared to state-of-the-art tensor h | i T 1 k kk k ≥ − in time O( ε2 log δ ). network methods, but because the SQ model abstracts away the difficulties that arise from representing states. Proof. We analyze Algorithm 1. Computing the zi’s, One might be able to use the tensor network literature means, and medians all take linear time, so the runtime is to prove that, say, QPCA on matrix product states can dominated by the collection of samples, Line 2, as desired. be simulated classically, but these results work in a far As for correctness, first consider the case that ν = 0. more general regime. This is the main advantage of the Then the zj’s satisfy that E[zj] = x y and Var[zj] 2 2 9 h | i ≤ x y . Taking the mean of 2 copies of z reduces the SQ model: the wide generality in which it applies. k k k k ε j variance to ε2 x 2 y 2/9, so by Chebyshev’s inequality, Though tensor network algorithms and the dequan- k k k k the probability that the mean is ε x y -far from x y tized algorithms presented here have drastically different √2 k kk k h | i goals, we can still compare how the differences in their is 2 . ≤ 9 desiderata affect their application and analysis of low- Next, we treat the real and imaginary components of rank approximation. Because tensor network algorithms the means separately, and take the median of the means perform repeated low-rank approximation on different component-wise. To show that this median of means parts of the space, the iterative algorithm that results satisfies the desired error bound, notice that if half of 3

6 log 2 ε x y x y the δ means are within √2 of in the Theorem 4 (Classical Nearest-Centroid). Suppose we k kk k h | i n d d 2 are given O(T )-time SQ(V ) C × and SQ(u) C . real axis, then the median of these 6 log δ means is also ε ∈ ∈ within x y of x y in the real axis. Let Ei, for i Then Algorithm 4 outputs a solution to Problem 1 in √2 k kk k h | i ∈ Z2 1 2 O(T ε2 log δ ) time. [6 log δ ], be the random variable that is 0 if the ith mean is within ε x y of x y in the real axis, and 1 otherwise. √2 Proof. Recall from the main text that the quantity we k kk k h | i 2 Further, let Zi, for i [6 log δ ], be independent random wish to estimate can be written as wMM †w†, where ∈ 2 Bernoulli variables that are 1 with probability 9 and 0 Pr[E = 1] 2 otherwise. We previously established that i 9 , u¯ 1 ˜ ≤ M := 1 ¯ and w := u √n V . so by a Hoeffding-Chernoff bound [19], √n V k k −   h i 6 log 2 δ E 2 Pr[ i=1 i 3 log δ ] First, notice that we have O(T )-time SQ(M) and Q(w): 2 ≥ 6 log δ 2 M’s rows are rescaled vectors that we have SQ access to; Pr[P Zi 3 log ] ≤ i=1 ≥ δ and M˜ := [1 1 1 ], so we have SQ(M˜ ) trivially; 2/9 1/2 7/9 1/2 6 log(2/δ) δ √n ··· √n P . and w’s entries can be queried using our given access. ≤ 1/2 1/2 ≤ 2 Next, use that wMM †w† = a b for a, b the flattened      h | i We get the same bound for the imaginary component, and tensors combine the two to get the desired correctness property. d n+1 n+1 When ν > 0, we cannot query for x 2 exactly, only ˜ k k a := Mji Mk, i j k = M M; (1) x¯ (1 ν) x 2 k ∗k | i | i | i ⊗ for an estimate . Consider the output i=1 j=1 k=1 ∈ ± k k x 2 x¯ X X X of Algorithm 1, where we replace the with , and d n+1 n+1 k xk 2 w†wkMki call this output α. The output satisfies k k α x y j | x¯ − h | i | ≤ b := i j k . (2) ε x y with probability 1 δ. Re-arranging, we can Mk, | i | i | i k kk k ≥ − i=1 j=1 k=1 k ∗k achieve the bound X X X Given O(T )-time SQ(M) and Q(w), we have O(T )-time α x y α x¯ x y + ( x¯ 1) x y | − h | i | ≤ | − x 2 h | i | | x 2 − h | i | SQ(a) and Q(b): namely, we can sample from a by sam- k k k k ˜ (1 + ν)ε x y + ν x y pling j and k from M, and then sampling i from Mj, . ∗ ≤ k kk k | h | i | Thus, we can apply Proposition 3 to estimate wT M T Mw ((1 + ν)ε + ν) x y . 1 1 ≤ k kk k to ε(4Z) error with probability 1 δ in O(T 2 log ) − ε δ Note that having SQ(y) instead of Q(y) improves the time, using that a = M 2 = 4 and b = w 2 = Z. k k k kF k k k k scaling by no more than constant factors. Rescaling ε by 4Z gives the desired result. We asserted in the main text that, for this protocol, quantum algorithms can achieve a quadratic speedup via DETAILS FOR PRINCIPAL COMPONENT amplitude estimation but no more. This holds because a ANALYSIS quantum algorithm for inner product estimation scaling 1 better than ε would imply the ability to detect the exis- Proof of Theorem 6 tence of a marked item with faster than √N oracle queries, contradicting unstructured search lower bounds [20]. We recall the problem that QPCA solves.

Problem 5 (Principal component analysis). Suppose we Proof of Theorem 4 n d are given access to A C × with singular values σi and ∈ right singular vectors vi. Further suppose we are given σ, We first recall the problem we wish to solve and present k, and η with the guarantee that, for all i [k], σi σ the algorithm that solves it. 2 2 2 ∈ ≥ and σi σi+1 η A F . With probability 1 δ, output − 2 ≥ k2 k ≥ −2 2 Problem 1 (Centroid distance). Suppose we are given estimates σˆ1,..., σˆk and vˆ1,..., vˆk satisfying σˆi σi n d d 1 2 2 | − | ≤ access to V C × and u C . Estimate u ~1V εσ A F and vˆi vi εv for all i [k]. ∈ ∈ k − n k k k k − k ≤ ∈ to ε additive error with probability 1 δ. ≥ − Theorem 6. Given A and the ability to prepare k kF copies of A in O(T ) time, a quantum algorithm can | i 2 2 Algorithm 4 Classical supervised clustering output the desired estimates for Problem 5 σˆ1,..., σˆk and 3 Input: O(T )-time SQ(V1,...,Vn, u) vˆ1 ,..., vˆk in O˜(TK min(εσ, δ)− ) time. Output: An estimate of λ = u 1 V ~1 2 | i | i k − n k 1: Achieve O(T )-time SQ(a), Q(b) for a, b as in (1), (2) The quantum algorithm we use is QPCA [21], and 2: Run Algorithm 1 to estimate a b to ε error and output h | i in particular, we use Prakash’s analysis of Lloyd et al’s the result QPCA: 4

Proposition (3.2.1 [3]). Let ρ be a density matrix with Analysis of Low-Rank Approximation Algorithm eigenvalues λi and eigenvectors vi. Given the ability to prepare states with density matrix ρ in T time, Lloyd et ˜ 3 al’s PCA algorithm [21] runs in time O(T min(εσ, δ)− ) Algorithm 2 Low-rank approximation [22] and returns a sample ( v , λ¯) where Pr[ v = vj ] = λj m n ¯ | i | i | i Input: O(T )-time SQ(A) R × , σ, ε, δ and λ λj εσ/4 with probability at least 1 δ. ` n ∈ ˆ q ` ˆ ` ` ∈ ± − Output: SQ(S) C × , Q(U) C × , Q(Σ) C × 2 2∈ K∈4 1 ∈ 1: Set K = A F /σ and q = Θ ε2 log( δ ) k k q n i , . . . , i A˜ S × 2: Sample rows 1 q from and define R such In other words, QPCA run on states with density ma- A F ∈ that Sr, := Air , k k 1 ∗ ∗ √q Air , trix A 2 A†A outputs a random singular vector vi with k ∗k F | i 3: Sample columns j1, . . . , jq from F, where F denotes the k k 2 probability proportional to its singular value σi . As we distribution given by sampling a uniform r [q], then 2 ∼ assume our eigenvalues have an η A F gap, the precise sampling c from Sr. k k q q S ,jc eigenvector vj sampled can be identified by the eigen- 4: Let W C × be the normalized submatrix W ,c := ∗ | i ∈ ∗ qF(jc) value estimate. Then, by computing enough samples, we 5: Compute the left singular vectors of W uˆ(1),..., uˆ(`) that can learn all of the eigenvalues of at least σ2 and get the correspond to singular values σˆ(1),..., σˆ(`) larger than σ q ` (i) ˜ 2 2 6: SQ(S) Uˆ × uˆ corresponding states with only O( A F /σ ) overhead, Output , R the matrix with columns , k k ` ` ∈ (i) and Σˆ R × the diagonal matrix with entries σˆ . giving Theorem 6. ∈

Proposition 9 ([6, Theorem 4.4], adapted from Frieze et n d al. [22]). Suppose we are given O(T )-time SQ(A) C × , ∈ a singular value threshold σ, and an error parameter 2 2 ε (0, σ/ A F /4]. Denote K := A /σ . Then in Proof of Theorem 6. Because we can prepare A in O(T ) ∈ k k k kF | i 1 12 8 time, we can also prepare ρ with density matrix A 2 A†A p K 3 1 K 2 1 k kF O log + T log in O(T ) time, since this is the state of the second set ε6 δ ε4 δ ¯   q n of qubits of A . So, we can get a sample ( v , λ) in time, Algorithm 2 outputs O(T )-time SQ(S) × and | i3 | i C O(T min(εσ, δ)− ) time. The eigenvectors and eigenvalues ˆ q ` ˆ ` ` K4 1 ∈ 2 U C × , Σ R × (for ` = Θ( ε2 log δ )) implicitly of ρ are vi and σi , respectively, from the SVD of A. ∈ ∈ describing a low-rank approximation to A, D := AVˆ Vˆ † 1 By assumption, we know that the first k eigenvectors with Vˆ := S†UˆΣˆ − (notice rank D `). This description 2 ≤ have size σ , and because we estimate our λi’s to εσ/4 satisfies the following with probability 1 δ: ≥ error, we can identify an eigenvector by its eigenvalue ≥ − (a) σ` σ ε A F and σ`+1 < σ + ε A F ; estimate. For each i [k], vi appears with probabil- ≥ −2 k k 2 2 k k 2 A D A A + ε A ∈ | i A 1 (b) F ` F F ; 2 2 k kF k − k ≤ k − k k k ity σ / A F , so we see it in σ2 log δ samples with ` 2 2 2 1.5 3 ≥ k k (c) i=1 σˆi σi ε A F /K = εσ / A F ; probability 1 δ. By union bound, we can find vi | − | ≤ k k k k ≥ − | i (d) Vˆ Λ ε for some Λ with orthonormal columns and their corresponding eigenvalue estimates for all k kP − kF ≤ 2 ˆ A k and the same image as V . k kF with O( σ2 log δ ) samples. The eigenvectors are exactly correct and the eigenvalues have ε A 2 error, as desired: Intuitively, (a) shows that ` is the right place to truncate σk kF so, this is the desired output. The runtime is the time to up to ε error, (b) is a low-rank approximation guaran- 2 A F k tee, (c) shows that our singular values are approximately output O( k k2 log ) samples. σ δ correct, and (d) shows that our singular vectors are ap- proximately orthonormal. These additive-error bounds are weaker than the relative-error bounds common in the classical ML literature. This seems inherent, since our SQ access assumes less about input compared to the classical Note that the quantum algorithm crucially uses the gap literature (see discussion). Quantum algorithms also only assumption in Problem 5. Without a gap assumption, it’s achieve additive-error bounds. still possible to efficiently learn the subspace spanned by the top k singular vectors. In this work, we restrict our Proof. Because the only difference between Algorithm 2 scope to the task of finding individual singular vectors as and ModFKV from previous work [6] is the different stated [21]. choice of ε, this result mostly follows directly from pre- vious work. The algorithm and runtime are given in Also note that the QPCA result isn’t particularly useful Section 4; (a) follows from Lemma 4.5( ); (b) follows if A is not low-rank: if ρ is high-dimensional, we would ♥ from Lemma 4.5( ); (d) follows from Proposition 4.6. For expect that λ ’s are around O(1/n), and if we wanted to ♦ j (c), we know that distinguish individual eigenvectors, we would also have 2 2 that ε = O(1/n), ruining any exponential speedup. A†A S†S , SS† CC† A /√q = εσ /K. σ k − kF k − kF ≤ k kF 5

This is stated in the proof of [6, Proposition 4.6]; no Query clearly has the stated properties. (2) implies factor of log(1/δ) appears because it is used to amplify the that Sample is correct if it succeeds, (3) implies that probability of this equation being true. Let σM,i denote it has the stated failure probability, and (4) implies the the singular values of M, respectively. By applying the stated runtime. (3) implies that Norm is correct (by a Hoffman-Wielandt inequality to both, we get that Chernoff bound), and (4) implies the stated runtime.

σ2 σ2 2, σ2 σ2 2 ε A 2 /K2. | A,i − S,i| | S,i − C,i| ≤ k kF Proof of Theorem 8 qX qX so

k k 1/2 Algorithm 5 Principal component analysis 2 2 2 2 2 σ σ √k σ σ n d | A,i − C,i| ≤ | A,i − C,i| Input: SQ(A) C × , σ, k, η as in Problem 5 i=1 i=1 ∈ X  X  Output: σˆi and vˆi for i [k] as in Problem 5 1/2 1.5 2 ∈1 1/2 ε min(ε K , ε η, K− ) σ0 σ ε A √ 2 2 2 1: Set σ v 4 and F k σA,i σC,i ← ← − k k ≤ | − | 2: Run Algorithm 2 with parameters σ0, ε, and δ/k to find q n √  X2 2 2 1.5 a low-rank approximation described by SQ(S) C × , kε A F /K ε A F /K . q ` ` ` ∈ ≤ k k ≤ k k Uˆ C × , Σˆ C × ∈ ∈ 3: For i [k], let σˆi := Σˆ ii, vˆi := S†Uˆ ,i/Σˆ ii ∈ 2 ν ∗ 4: Output σˆi and SQ (vˆi); SQ access follows from Algo- (i) rithm 3 with matrix S† and vector Uˆ /Σˆ ii.

Proof of Proposition 7 n d Theorem 8. Given O(T )-time SQ(A) C × , with ∈ εσ, εv, δ (0, 0.01), Algorithm 5 outputs the desired esti- ∈ K9 3 k Algorithm 3 Matrix-vector SQ access mates for Problem 5 σˆ1,..., σˆk and O(T ε4 log ( δ ))-time 0.01 K12 3 k K8 2 k k n k SQ (vˆ ,..., vˆ ) O( log ( ) + T log ( )) Input: O(T )-time SQ(V †) C × , Q(w) C 1 k in ε6 δ ε4 δ time, ν ∈ ∈ 1.5 2 1 1/2 Output: SQ (V w) where ε = min(0.1εσK , εvη, 4 K− ). 1: function RejectionSample(SQ(V †), Q(w)) 2 2 2: Sample i [k] proportional to wi V ,i by manually Most of this theorem simply follows by combining the ∈ | | k ∗ k calculating all k probabilities guarantees from Proposition 9 and Proposition 7 to get 3: Sample s [n] from V ,i using SQ(V †) the PCA guarantees. The only nontrivial piece of the ∈ 2 ∗ k 2 4: Compute rs = (V w)s/(k (Vsj wj ) ) (after query- proof is showing that v vˆ ε , since the low-rank j=1 k i − ik ≤ σ ing for wj and Vsj for all j [k]) P approximation algorithm gives no guarantee about con- s ∈ r 5: Output with probability s (success); otherwise, out- trolling individual vˆi’s. The 9(b) bound, even with ε = 0, put ∅ (failure) only implies that vˆi span the same subspace as vi . 6: end function { } { } So, for r [k], we consider the rank-r approximation 7: Query: output (V w)s ∈ 8: Sample: run RejectionSample until success (outputting formed by using only vˆ1,..., vˆr. The 9(b) bound for all r 1 of these low-rank approximations give the desired bounds: s) or kC(V, w) log δ failures (outputting ∅) 9: Norm(ν): Let p be the fraction of successes from run- using the gap assumption, the first vˆr to deviate from vr k 1 ning RejectionSample ν2 C(V, w) log δ times; output will violate the bound for that rank approximation. So, k 2 2 pk wi V ,i all vˆr must be close to their corresponding vr. i=1 | | k ∗ k P Proof. First, consider runtime. From Proposition 9, using n k Proposition 7 ([6, Proposition 4.3]). For V C × , w that σ0 3σ/4, we have that the runtime of Algorithm 5 k ∈ ∈ ≥ C , given SQ(V †) and Q(w), Algorithm 3 simu- is correct. Further, for our vˆi’s, applying Algorithm 3 lates SQν (V w) where the time to query is O(T k), gives the desired runtimes, since 2 1 sample is O(T k C(V, w) log δ ), and query norm is 2 1 1 S 2(Uˆ /Σˆ )2 O(T k C(V, w) ν2 log δ ). Here, δ is the desired failure j j, ji ii 2 2 C(S†, Uˆ ,i/Σˆ ii) = k ∗k probability and C(V, w) = wiV ,i / V w . ∗ vˆ 2 k ∗ k k k P k ik 2 ˆ 2 2 2 P S F U ,i A F A F Proof. We analyze Algorithm 3. We use the following easy- k k k ∗ k = k k k k = O(K). to-verify facts about the function RejectionSample: (1) ≤ Σˆ 2 vˆ 2 Σˆ 2 vˆ 2 ≤ σ2(1 ε) iik ik iik ik − r 1 by Cauchy-Schwarz, so the protocol is well-defined; s ≤ 2 2 (2) conditioned on success, s is a measurement from V w , where we used Cauchy-Schwarz, that S F = A F , | i ˆ ˆ k k k k r s U ,i = 1, Σii σ, and Proposition 9(d). because the denominator of s is the probability is k ∗ k ≥ For correctness: by 9(a), σ < σ, so ` k, and Line 3 selected in Line 3, up to normalization; (3) the function `+1 ≥ (kC(V, w)) 1 of Algorithm 5 is well-defined. Further, by 9(c), we can succeeds with probability − ; (4) running it 3 2 2 σ0 2 takes O(T k) time. bound σˆi error σˆi σi ε A εσ A F . | − | ≤ k kF ≤ k k 6

2 Now, we show that vi vˆi εσ. Let Vˆr denote Vˆ ΛΛ†vi [0, 1], that VM F = M F if V is an k − k ≤ k k ∈ k k k k 2 truncated to its first r columns. Observe that for all isometry, that Frobenius norm decomposes M F = ˆ ˆ 2 2 2 2 k k r `, A AVrVr† F A Ar F + ε A F holds with M ,i , and the gap assumption from the PCA prob- ≤ k − k ≤ k − k k k k ∗ k probability 1 δ. This holds for r = ` by Proposi- lem statement. ≥ − P 2 2 tion 3(b), so suppose r < `. We can imagine that our run So, ΛrΛr†VrVr† F = Λr†Vr F r O(ε/η), and equiv- k k k 2 k ≥ − of Algorithm 3 is equivalent to simultaneous runs with alently, (VrVr† I)ΛrΛr† F = O(ε/η). 1 k − k different singular value thresholds γr := (σr + σr+1), 2 v vˆ the only differences being that we truncate to thresh- k r − rk old σ instead of γr, and q is larger. Truncating to γr is vr Λ ,r + Λ ,r vˆr r ≤ k − ∗ k k ∗ − k equivalent to truncating at the th singular value, since vr Λ ,r + ε ∗ σˆr > γr > σˆr+1: ≤ k − k vr VkVk†Λ ,r + VkVk†Λ ,r Λ ,r + ε ` ≤ k − ∗ k k ∗ − ∗ k 1 2 2 1 2 2 η σˆ σ σˆ σ σˆ σ A ; vr VkVk†Λ ,r + (VkVk† I)ΛkΛk† F + ε | i − i| ≤ σ | i − i | ≤ σ | i − i | ≤ 10k kF ≤ k − ∗ k k − k i=1 vr VkV †Λ ,r + O( ε/η) X k ∗ 1 1 2 2 η ≤ k − k σi γi = σi σi+1 σi σi+1 A F . p | − | 2| − | ≥ 4 A F | − | ≥ 4 k k = (vr VkV †Λ ,r)†(vr VkV †Λ ,r) + O( ε/η) k k − k ∗ − k ∗ So, σˆ σ η A > σ η A γ , and similarly q p r ≥ r − 10 k kF r − 4 k kF ≥ r 2 2 vr, Λ ,r + O( ε/η) for the other side. Increasing q is equivalent to decreas- ≤ − h ∗ i ˆ q p ing ε, so for our truncated V[r]’s, the guarantees from = O( ε/η) = O(εv) Proposition 3 hold. Namely, 3(b) holds as desired. By The lastp line follows from combining previously-seen in- choosing failure probability δ/k, we can guarantee, with 2 2 2 2 equalities: let xi,j = vi, Λ ,j . Then we can rewrite probability 1 δ, A AVˆ Vˆ † A A +ε A ≥ − k − r r kF ≤ k − rkF k kF h ∗ i holds for all r [k]. s ∈ 2 Fix Λ as the isometry from Proposition 3(d) satisfying Λ†V s O(ε/η) x s O(ε/η); k s skF ≥ − ⇐⇒ i,j ≥ − Vˆ Λ ε. Now, consider a truncation Vˆ for any i,j=1 k − kF ≤ r X r [k], and notice that Vˆr Λr F ε. k ∈ k − k ≤ 2 2 ˆ ˆ 2 Λ†vi 1 xi,j 1; A AΛrΛr† F ( A AVrVr† F + O(ε) A F ) k k ≤ ⇐⇒ ≤ k − k ≤ k − k k k j=1 2 2 X A AVˆ Vˆ † + O(ε) A ≤ k − r r kF k kF k 2 2 2 A A + O(ε) A (Λ ,j)†Vk 1 xi,j 1. r F F k ∗ k ≤ ⇐⇒ ≤ ≤ k − k k k i=1 Now, we rewrite the expression, heavily using that Λr and X Vr (the matrix with columns v1, . . . , vr from A’s SVD) Now, consider taking a linear combination of these inequal- ities: add together the first inequality for s = r 1, r and are isometries. − subtract the second and third inequalities for i, j [r 1]. 2 ∈ − O(ε) A F k k r 1 r 2 2 − A AΛrΛr† F A Ar F ≥ k − k − k − k xi,j + xi,j 2r 1 O(ε/η) 2 2 2 2 ≥ − − = A AΛ Λ† ( A A ) i,j=1 i,j=1 k kF − k r rkF − k kF − k rkF X X 2 2 r 1 k = Ar F AΛrΛr† F − k k − k k xi,j 1 r r min m,n − ≥ − 2 2 2 i=1 j=1 = σi σi ΛrΛr†vi X X − k k r 1 k i=1 i=1 − X X x 1 r r min m,n − i,j ≥ − 2 2 2 2 j=1 i=1 = σi (1 ΛrΛr†vi ) σi ΛrΛr†vi X X − k k − k k r 1 k i=1 i=r+1 − X X = x (x + x ) 1 O(ε/η) r min m,n ⇒ rr − i,j j,i ≥ − 2 2 2 2 i=1 j=r+1 σ (1 ΛrΛ†vi ) σ ΛrΛ†vi X X ≥ r − k r k − r+1 k r k i=1 i=r+1 X X So, xrr 1 O(ε/η), implying that vr, Λ ,r 1 2 2 2 2 ≥ − h ∗ i ≥ − = σ (I ΛrΛ†)VrV † σ ΛrΛ†(I VrV †) O(ε/η). r k − r r kF − r+1k r − r kF 2 2 2 2 = σ (r Λ Λ†V V † ) σ (r Λ Λ†V V † ) r − k r r r r kF − r+1 − k r r r r kF 2 2 η A (r Λ Λ†V V † ) ≥ k kF − k r r r r kF 2 Here, we used the Pythagorean theorem ( AΠ F + A(I 2 2 k k k − Π) = A for Π an orthogonal projector), that ∗ [email protected]; ewintang.com kF k kF 7

[1] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum ran- [13] G. Vidal, Efficient classical simulation of slightly entan- dom access memory, Physical review letters 100, 160501 gled quantum computations, Phys. Rev. Lett. 91, 147902 (2008). (2003). [2] J. Preskill, Quantum computing in the NISQ era and [14] G. Vidal, Efficient simulation of one-dimensional quantum beyond, Quantum 2, 79 (2018). many-body systems, Phys. Rev. Lett. 93, 040502 (2004). [3] A. Prakash, Quantum algorithms for linear algebra and [15] A. Cichocki, N. Lee, I. Oseledets, A.-H. Phan, Q. Zhao, machine learning., Ph.D. thesis, UC Berkeley (2014). and D. P. Mandic, Tensor networks for dimensionality [4] S. Aaronson, Read the fine print, Nature Physics 11, 291 reduction and large-scale optimization: Part 1 low-rank (2015). tensor decompositions, Found. Trends Mach. Learn. 9, [5] I. Kerenidis and A. Prakash, Quantum recommendation 249–429 (2016). systems, in 8th Innovations in Theoretical Computer Sci- [16] J. C. Bridgeman and C. T. Chubb, Hand-waving and inter- ence Conference (ITCS 2017), LIPIcs, Vol. 67 (Schloss pretive dance: an introductory course on tensor networks, Dagstuhl, 2017) pp. 49:1–49:21. Journal of Physics A: Mathematical and Theoretical 50, [6] E. Tang, A quantum-inspired classical algorithm for rec- 223001 (2017). ommendation systems, in Proceedings of the 51st Annual [17] J. Eisert, Computational difficulty of global variations ACM SIGACT Symposium on Theory of Computing - in the density matrix renormalization group, Phys. Rev. STOC 2019 (ACM Press, 2019) arXiv:1807.04271 [cs.IR]. Lett. 97, 260501 (2006). [7] L. Grover and T. Rudolph, Creating superpositions that [18] Z. Landau, U. Vazirani, and T. Vidick, A polynomial time correspond to efficiently integrable probability distribu- algorithm for the ground state of one-dimensional gapped tions, arXiv (2002), arXiv:0208112 [quant-ph]. local hamiltonians, Nature Physics 11, 566 (2015). [8] M. Van Den Nest, Simulating quantum computers with [19] W. Hoeffding, Probability inequalities for sums of probabilistic methods, Quantum Info. Comput. 11 (2011). bounded random variables, in The Collected Works of [9] F. Verstraete and J. I. Cirac, Matrix product states rep- Wassily Hoeffding, edited by N. I. Fisher and P. K. Sen resent ground states faithfully, Phys. Rev. B 73, 094423 (Springer New York, New York, NY, 1994) pp. 409–426. (2006). [20] C. H. Bennett, E. Bernstein, G. Brassard, and U. Vazirani, [10] R. Orús, A practical introduction to tensor networks: Strengths and weaknesses of quantum computing, SIAM Matrix product states and projected entangled pair states, Journal on Computing 26, 1510 (1997). Annals of Physics 349, 117 (2014). [21] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum prin- [11] S. R. White, Density matrix formulation for quantum cipal component analysis, Nature Physics 10, 631 (2014). renormalization groups, Phys. Rev. Lett. 69, 2863 (1992). [22] A. Frieze, R. Kannan, and S. Vempala, Fast Monte-Carlo [12] U. Schollwöck, The density-matrix renormalization group algorithms for finding low-rank approximations, Journal in the age of matrix product states, Annals of Physics of the ACM (JACM) 51, 1025 (2004). 326, 96 (2011).