The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)
Revisiting Online Quantum State Learning
Feidiao Yang, Jiaqing Jiang, Jialin Zhang, Xiaoming Sun Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China University of Chinese Academy of Sciences, Beijing, China {yangfeidiao, jiangjiaqing, zhangjialin, sunxiaoming}@ict.ac.cn
Abstract theories may also help solve some interesting problems in quantum computation and quantum information (Carleo and In this paper, we study the online quantum state learn- Troyer 2017). In this paper, we apply the online learning the- ing problem which is recently proposed by Aaronson et al. (2018). In this problem, the learning algorithm sequentially ory to solve an interesting problem of learning an unknown predicts quantum states based on observed measurements and quantum state. losses and the goal is to minimize the regret. In the previ- Learning an unknown quantum state is a fundamental ous work, the existing algorithms may output mixed quan- problem in quantum computation and quantum information. tum states. However, in many scenarios, the prediction of a The basic version is the quantum state tomography prob- pure quantum state is required. In this paper, we first pro- lem (Vogel and Risken 1989), which aims to fully recover pose a Follow-the-Perturbed-Leader (FTPL) algorithm that the classical description of an unknown quantum state. Al- can guarantee to predict pure quantum states. Theoretical√ O( T ) though quantum state tomography gives a complete char- analysis shows that our algorithm can achieve an ex- acterization of the target state, it is quite costly. Recent pected regret under some reasonable settings. In the case that the pure state prediction is not mandatory, we propose an- advancement showed that fully reconstructing an unknown other deterministic learning algorithm which is simpler and quantum state in the worst case needs exponential copies of more efficient. The algorithm is based on the online√ gradient the state (Haah et al. 2016; Odonnell and Wright 2016). descent (OGD) method and can also achieve an O( T ) re- However, in some applications, it is unnecessary to fully gret bound. The main technical contribution of this result is reconstruct an unknown quantum state. Some side informa- an algorithm of projecting an arbitrary Hermitian matrix onto tion is sufficient. Therefore, some learning tasks move on the set of density matrices with respect to the Frobenius norm. to learn the success probabilities of applying a collection of We think this subroutine is of independent interest and can be two-outcome measurements to an unknown state, with re- widely used in many other problems in the quantum comput- ing area. In addition to the theoretical analysis, we evaluate spect to some metrics. Of which, the shadow tomography the algorithms with a series of simulation experiments. The problem (Aaronson 2018) requires to estimate the success experimental results show that our FTPL method and OGD probabilities uniformly over all measurements in the collec- method outperform the existing RFTL approach proposed by tion. Aaronson (2018) showed that the required number of Aaronson et al. (2018) in almost all settings. In the imple- copies of the unknown state in the shadow tomography is mentation of the RFTL approach, we give a closed-form so- nearly linear to the number of qubits and poly-logarithmic lution to the algorithm. This provides an efficient, accurate, in terms of the number of the measurements. and completely executable solution to the RFTL method. More generally, it may not need to estimate the success probabilities within an error uniformly over all two-outcome Introduction measurements. Following the idea of the statistical learn- The interdisciplinary research between quantum comput- ing theory, we may assume that there is a distribution over ing and machine learning is becoming an attractive area in some possible two-outcome measurements. And our goal is recent years (Biamonte et al. 2017; Lloyd, Mohseni, and to learn a quantum state such that the expected difference be- Rebentrost 2014). On one hand, people expect to take advan- tween the success probabilities of applying a measurement tage of the great power of quantum computers to improve the sampled from the distribution to the learned state and the tar- efficiency of the algorithms in big data processing and ma- get state respectively is within a specific error. This is called chine learning. One representative example following this the statistical learning model or the PAC-learning model of idea is the HHL algorithm (Harrow, Hassidim, and Lloyd quantum states. Aaronson (2007) proved that the number of 2009). On the other hand, machine learning algorithms and samples for the PAC-learning of quantum states only grows linearly with the number of qubits of the state, which is sur- Copyright c 2020, Association for the Advancement of Artificial prisingly an exponential reduction compared with the full Intelligence (www.aaai.org). All rights reserved. quantum state tomography.
6607 However, the assumption that there is a distribution over First, predicting a pure quantum state is of special inter- some two-outcome measurements and the data are i.i.d. sam- est in quantum state learning (Lee, Lee, and Bang 2018; ples from this distribution does not always hold. The En- Benedetti et al. 2019) since a pure state has its unique value vironment may change over time or it is even adversar- in theory and practice. However, the existing RFTL method ial. Complementary to the statistical learning theory, the cannot make such a guarantee since its prediction is always online learning theory is good at coping with arbitrary or a full rank matrix, which corresponds to a mixed state. It is even adversarial sequential data. Therefore, Aaronson et al. still a challenge to predict pure states in online quantum state (2018) further proposed the model of online quantum states learning. In this paper, we propose a Follow-the-Perturbed- learning. In this model, the data, such as measurements and Leader (FTPL) algorithm (Kalai and Vempala 2005) that can losses, are provided sequentially. The learning algorithm is guarantee to predict pure quantum states every round. The to predict a series of quantum states interactively. Its goal is key idea is to formulate the optimization objective with a to minimize the regret, which is the difference in the total stochastic linear regularization to be a special semi-definite loss between the learning algorithm and the best fixed quan- programming (SDP), and we show this SDP always has a tum state in hindsight. rank-1 solution which is corresponding to a pure quantum Although the existing theory has provided helpful ideas state. Our analysis shows that the regret√ with respect to the for this problem, it is still a challenge to design and analyze expected prediction is bounded as O( T ). We further adapt algorithms for online quantum state learning. For example, the FTPL method to a typical and reasonable setting with√L1 a feasible solution in conventional online learning is often loss. In this case, our FTPL method can achieve an O( T ) a vector in the real Euclidean space, but a feasible solution expected regret. in quantum state learning is a complex matrix with special Second, if pure states are not mandatory, the online gra- constraints. Besides, a direct adaption of the existing tech- dient descent (OGD) method (Zinkevich 2003) is a sim- niques can not utilize the properties of the quantum setting. ple and efficient approach for this problem. Actually, it is If we can take advantage of these unique features or lever- widely used in practice for online learning. However, the age techniques from quantum computing, we may get better OGD method relies on a subroutine of projection if the fea- results or different solutions. sible solutions are constrained. In this paper, we propose an In (Aaronson et al. 2018), the authors proposed three very algorithm of projecting an arbitrary Hermitian matrix onto different approaches. They evaluated the algorithms with the set of quantum states with respect to the Frobenius norm. two metrics, the regret in online learning and the number The key idea is to reduce the problem to project a vector of errors. First, they adapted the Regularized Follow-the- onto the probability simplex. Our algorithm is exact and ef- Leader (RFTL) algorithm (Abernethy, Hazan, and Rakhlin ficient. It could be widely used as a subroutine in many other 2008; Shalev-Shwartz and Singer 2007) to the online quan- problems in quantum computing. We apply our method to tum state learning. Particularly, they employed the nega- the projected online gradient descent algorithm for quantum tive von Neumann entropy as the regularization. The RFTL state√ learning and we show that this method can achieve an √ O T method can achieve an O(L nT ) regret, where L is the ( ) regret. Lipschitz coefficient of the loss functions, n is the number Third, the RFTL method due to Aaronson et al. (2018) re- of qubits of a state, and T is the time horizon of the learn- lies on an offline oracle of solving a linear optimization with n the negative von Neumann entropy regularization, which is ing process. Its number of errors is O( ε2 ) under some as- sumptions. The RFTL method has the best theoretical guar- not fully discussed in the original work. In this paper, we antee over the other two methods. Their second method give a closed-form solution to this offline convex optimiza- employs the technique of postselection-based learning pro- tion problem. Our result provides an efficient, accurate, and cedure (Aaronson 2005). Its error number is bounded as completely executable solution to the RFTL method. We n n also implement this solution in our experiments. O( ε3 log ε ) and the regret bound is not available. Their third method is based on an upper-bound on the sequen- Last but not least, we conduct a series of simulation ex- tial fat-shattering dimension of quantum states (Nayak 1999; periments to evaluate the algorithms. In our experiments, the Ambainis et al. 2002; Rakhlin, Sridharan,√ and Tewari 2015). FTPL method and the OGD method outperforms the RFTL 3/2 It achieves a regret bound of O(L nT log T ). What approach in almost all settings. weird is that this result is non-explicit and non-constructive, that is, we actually do not have any specific algorithm corre- Preliminaries sponding to this result. Seeing from this discussion, although For readers unfamiliar with the field of quantum computing the latter two methods highly utilize the techniques of quan- and making this paper self-contained, in this section, we in- tum computing and they do not rely on the theory of online troduce some basic concepts in quantum computation and learning, they actually do not outperform the typical RFTL quantum information that are necessary for this paper. Inter- method. This shows the great power of machine learning ested readers are recommended to the celebrated textbook and optimization theories on solving interesting problems in by Nielsen and Chuang (2002). quantum computation and quantum information. Quantum state Unlike in classical computers that a bit In this paper, we revisit the problem of online quantum is in a deterministic state of either 0 or 1, in quantum com- state learning from the perspective of online convex opti- puting, a quantum bit (qubit henceforth) could be in a su- mization. perposition of 0 and 1. The state of a qubit could be de-
6608 T 2 2 scribed by a unit vector (α0,α1) ∈ C , where |α0| + We assume that the adversary can access the player’s 2 |α1| =1, meaning that when measuring the qubit, we may strategy, without knowing its random numbers if it is a ran- 2 observe the result 0 with probability |α0| or the result 1 domized algorithm. Generally, the adversary can select Et 2 with probability |α1| . Similarly, the state of an n-qubit sys- and t in an arbitrary way. We call it the oblivious setting T tem could be described by a unit vector (α0,...,α2n−1) ∈ if Et and t are chosen before the game playing, and the 2n C . When measuring this system, we may observe the re- adaptive setting is that Et and t may depend on the player’s 2 n sult (i)2 with probability |αi| for i =0,...,2 − 1, where predictions in previous rounds. (i)2 is the n-bit binary representation of integer i. It is called the realizable case if there is an underlying Pure state, mixed state, and density matrix An n- unknown quantum state ρ to be learned and the loss func- qubit system could be in a more complicated state than the tion t is relevant to ρ, while in the non-realizable case the state introduced above. It can be a distribution over those loss functions need not be consistent with any quantum state “simple” states. More precisely, it could be in a “simple” (Aaronson et al. 2018). 2n state Ψi ∈ C with probability pi for some i. For distin- Common loss functions are the L1 loss (absolute loss) de- guishing these cases, we call a state corresponding to a unit fined as vector a pure state and a state that is a distribution over mul- t(x):=|x − bt|, tiple pure states a mixed state. and the L2 loss (square loss) given by The representation of pure states and mixed states can be 2 unified with a mathematical tool called density matrix (or t(x):=(x − bt) . density operator). Specifically, if a quantum state is in a 2n In the realizable case, bt may be an approximation to the suc- Ψi ∈ C pi state with probability , its density matrix is a E ρ d × d complex matrix ρ, where d =2n, defined as cess probability of applying t to , or a Bernoulli random variable corresponding to the measurement result. † ρ = piΨiΨ , While we generally discuss the generic settings in this pa- i L i per, we pay particular attention to a case with 1 loss and bt ∈{0, 1}. We argue that this is a typical, realistic, and † in which Ψi is the adjoint conjugate of Ψi. reasonable setting since in physics we can only observe the It can be showed that a 2n ×2n complex matrix ρ is a den- measurement results of the unknown state. sity matrix (quantum state) if and only if it is Hermitian, pos- In online learning, we use the regret to evaluate the per- itive semi-define (PSD), and with trace 1. Further, a density formance of an algorithm. In our problem, it is defined as matrix ρ represents a pure state if and only if rank(ρ)=1. T T The set of density matrices of n-qubit states is denoted as regretT = t(Tr(Etωt)) − min t(Tr(Etω)). ω∈K 2n×2n † t=1 t=1 Kn = ρ ∈ C | ρ = ρ ,ρ 0, Tr(ρ)=1 . The regret is the difference in the total loss between the We abbreviate Kn as K when n is clear in the context. It is learning algorithm and the best fixed prediction in hindsight. straightforward to verify that Kn is a convex set. A good algorithm should have a sublinear regret in terms of Two-outcome measurement Quantum measurement is the time horizon T , that is, the average regret per round is a means to extract classical (observable) information from approaching to 0 when T trends to infinity. a quantum state. While quantum measurement is a broad Many online learning algorithms utilize the gradients of topic, in this paper we only consider a special case called the loss functions. In our setting, the gradient of the loss t two-outcome measurement (Aaronson 2018; Aaronson et with respect to the point ωt is denoted as al. 2018). When applying a two-outcome measurement to a quantum state, it succeeds with a specific probability. Math- d ∇t = t(Tr(Etωt)) = t(Tr(Etωt))Et. ematically, a two-outcome measurement could be described dωt by a Hermitian PSD matrix E ∈ Cd×d with eigenvalues in [0, 1]. When applying E to a quantum state ρ, it gets a suc- Our Results cessful result (Yes or 1) with probability Tr(Eρ), or a failed We present our work in this section. In the first subsection, result (No or 0) with probability 1 − Tr(Eρ). we discuss the FTPL method of predicting pure quantum states. The second subsection proposes a projection algo- Problem Description and Settings rithm onto the set of density matrices and applies it to the Online quantum state learning is a sequential prediction pro- OGD method. In the third subsection, we provide a closed- cess with interactions between a player (algorithm) and an form solution to the RFTL method. adversary over T rounds. In round t, the player predicts Follow-the-Perturbed-Leader Algorithm for Pure a quantum state ωt ∈K. Simultaneously the adversary chooses and reveals a two-outcome measurement Et and a Quantum State Prediction loss function t : R → R. At the end of this round the player Predicting a pure quantum state is of special interest in quan- suffers a loss t(Tr(Etωt)). tum state learning (Lee, Lee, and Bang 2018; Benedetti et As a convention in online learning, we assume the loss al. 2019) since a pure quantum state has its unique value in function t is convex and L-Lipschitz. theory and practice. For example, if we want to utilize the
6609 learning results by preparing or transporting them, the de- element with value 1 corresponding to the smallest element vice or resource for pure states are more economical than in λ and other elements are 0. Finally the solution to the † that for mixed states. original SDP ωt = Udiag(x)U is a rank-1 density matrix, However, although the RFTL method has a good regret corresponding to a pure quantum state. guarantee, it always predicts mixed quantum states since Before discussing the regret of the FTPL method, we we can show that ωt is full-rank. This can also be implied turn to the construction of the random Hermitian matrix from Claim 14 in (Aaronson et al. 2018). Similarly, the OGD Z method which will be introduced shortly cannot make such t. While the exponential distribution is commonly used promise as well. Therefore, it is still a challenge to predict in conventional FTPL method (Kalai and Vempala 2005; Neu and Bartok´ 2016), we employ the normal distribution ωt as a pure quantum state. We realize that by adding a stochastic linear regulariza- in our algorithm since the former depends on the assump- tion instead of the deterministic convex regularization as in tion that the elements in the gradients are all non-negative the RFTL method, we can always guarantee pure state pre- which does not hold in our setting. A random Hermitian matrix Z ∈ Cd×d can be sam- diction, leading to a Follow-the-Perturbed-Leader algorithm Z ∼N , depicted in Algorithm 1. pled as follows. First its diagonal elements i,i (0 1) for i =1,...,d. And then its upper triangle is filled as 1 1 Zi,j ∼N(0, 2 )+iN (0, 2 ) for 1 i
6610 By linearity we have Algorithm 2 Projection onto the set of density matrices d × d A t(Tr(EtE[ωt])) = E [t(Tr(Etωt))] . 1: Input:a Hermitian matrix (no matter PSD or not) † Applying this observation to Algorithm 1 and Theorem 2, 2: Decompose A into its diagonal form A = UQU T consequently we have the following expected regret bound. 3: Let vector q =(Q1,1,...,Qd,d) Theorem 3. For the setting with L1 loss and bt ∈{0, 1}, the 4: Project vector q onto the d-dimensional probability sim- T FTPL algorithm (Algorithm 1) running with the gradients plex Δd and get λ =(λ1,...,λd) , that is defined in (2) and Z1 = ... = Zd ∼DN achieves an expected regret bound as λ =argminx − q2 x∈Δd T T E [regretT ]=E (Tr(Etωt)) − min (Tr(Etω)) 5: Let diagonal matrix Λ = diag(λ1,...,λd) ω∈K t=1 t=1 A U U † √ 6: return = Λ = O(Ld T ). This is a more direct result than Theorem 2 since it is A d×d completely consistent with the algorithm. Theorem 4. Suppose is an arbitrary Hermitian ma- trix, then the matrix A produced by Algorithm 2 is a density Projection onto Density Matrices and Its matrix and it satisfies Application to Online Gradient Descent A − A F A − ρF Online gradient descent is a simple and efficient method in online learning and therefore it is widely used in practice. for any d × d density matrix ρ. Besides, the OGD method has other advantages. For exam- Our algorithm is computationally efficient. Since the pro- ple, it can achieve a logarithmic regret for strongly convex cess in step 4 of projecting a vector onto the probability sim- loss functions (Hazan, Agarwal, and Kale 2007), which is a plex takes O(d log d) time, the major part in the computa- significant improvement compared with the square-root re- tional cost is from the process of spectrum decomposition in gret in general cases. However, the OGD method relies on a step 2 and the matrix multiplication in step 6. We think this subroutine of projection for constrained optimization since problem is of independent interest and our method could be one step of an update may lead to a solution outside of the widely used as a subroutine in many other problems in quan- feasible region so we have to pull it back. tum computing. Projection is an operation of finding the nearest point in a With the projection method we propose above, we can de- set S to a given point y. Formally, it is defined as sign a projected online gradient descent algorithm for online ΠS (y)=argminx − y. quantum state learning, which is depicted in Algorithm 3. x∈S It is well known that projection is an important subrou- Algorithm 3 OGD method for online quantum state learning tine in many optimization and machine learning algorithms. 1: Input: T , ω1 = Id/d, step size {ηt} However, projection is a tricky operation since there is no 2: for t =1to T do generic approach and the algorithm, as well as the compu- 3: Predict ωt and observe measurement Et as well as tational cost, heavily relies on the properties of the feasible loss function t. set. For example, it is quite easy to project a point onto a 4: Update and project: ball, but projecting a point onto a general polytope is a lin- ear programming problem and it is quite costly. ωt+1 =ΠK(ωt − ηtt(Tr(Etωt))Et). In this section, we devise an algorithm that projects an arbitrary Hermitian matrix onto the set of density matrices 5: end for with respect to the Frobenius norm. This operation is an ana- logue of the projection in vector space with respect to the By adapting the standard technique of analyzing the OGD Euclidean norm. method (Hazan 2016) to our setting of learning quantum 2 √ Our approach is described in Algorithm 2. The main idea states, Algorithm 3 achieves an O( T ) regret bound, which is to project the spectrum vector of the input matrix onto is stated formally in Theorem 5. the probability simplex with respect to the Euclidean norm, 1 d−1 which is a well-studied and efficient process (Chen and Ye Theorem 5. By setting ηt = η = , the regret of 2011; Wang and Carreiraperpinan 2013). Ld T The soundness of Algorithm 2 is stated as the following Algorithm 3 is bounded as theorem. regretT L (d − 1)T. 2We are grateful to the reviewer for pointing out that some sim- ilar result was independently developed recently, see (Gonc¸alves, Closed-form Solution to the RFTL Algorithm Gomes-Ruggiero, and Lavor 2016). In addition, some relevant pro- jection methods were developed under special assumptions such as The basic idea of the RFTL method due to Aaronson et al. sparsity, see (Bolduc et al. 2017; Kyrillidis et al. 2013). (2018) is to predict a density matrix that minimizes a linear
6611 loss with an additional negative von Neumann entropy reg- Experiments t ularization. Specifically, in round the algorithm predicts In addition to the theoretical analysis discussed above, in this section, we evaluate the algorithms of online quantum state t−1 2n learning with a series of simulation experiments. ωt := arg min η Tr (∇sω)+ λi(ω)logλi(ω) . ω∈K s=1 i=1 Experimental Settings (3) We implement and compare the three algorithms we dis- Aaronson et al. (2018) showed that the regret of the RFTL cussed, the FTPL method, the OGD method, and the RFTL algorithm can be upper-bounded as 2L (2 log 2)nT . method with our closed-form solution. Although Aaronson et al. (2018) stated that the mathe- In this experiment, we consider a typical and reasonable matical programming in (3) is convex and it can be solved setting described as follows. First, it is a realizable setting, ρ efficiently, there is still something to do with this offline con- that is, there is an underlying unknown quantum state , vex optimization problem to provide an end-to-end solution pure or mixed, to be learned. The loss functions t are de- ρ to the RFTL method. First, it is a constrained optimization termined by the measurements applied to , although the problem and the feasible set is the set of density matrices. By measurements could be chosen randomly or adversarially. z |z − b | utilizing some typical convex optimization algorithms such Specifically, we consider the absolute loss t( )= t , b as the projected gradient descent, we have to design a sub- in which t is a Bernoulli random variable with expectation E b E ρ routine of projecting a matrix onto the set of density ma- [ t]=Tr( t ), corresponding to the result of measuring ρ E trices, as we just did. Second, the gradient of the objective with t. t−1 Evaluating online learning algorithms without live data function in (3) at density matrix ω is η ∇s +I +logω, s=1 is a challenge, especially for the adversarial settings, since whose norm is unbounded since ω is an arbitrary density it is quite difficult to find out the worst case to fool an al- matrix and its eigenvalues can be arbitrarily close to 0. This gorithm. For this purpose, we propose an adaptively adver- is inconsistent with the bounded-gradient assumption (Lip- sarial data generation policy to select Et. Specifically, for a schitz condition) of some convex optimization algorithms given algorithm and a time step t, the adversary can predict and their analysis.3 Third, generic convex optimization al- the output of the algorithm, denoted as ω t, since the adver- gorithms are often iterative processes. Only an approximate sary can access the code of the algorithm. And then the ad- solution is found after finite steps and the optimal solution versary chooses a two-outcome measurement Et that max- cannot be guaranteed. There is a theoretical gap between imizes the difference between Tr(Etω t) and Tr(Etρ), that an approximate solution and the optimal assumption in the is, Et =argmax |Tr(E(ω t − ρ))|. We call such data gen- analysis of the RFTL method, not to mention that it may take E eration policy an immediate-punishment adversary. In addi- much time to find an acceptable solution. tion, we also consider the stochastic data generation policy Fortunately, we can circumvent all the issues by providing in which Et is generated randomly. This is the most natu- a closed-form solution to the convex optimization in (3), as ral way to evaluate the algorithms. In short, we implement four different data generation policy, the stochastic policy
and the immediate-punishment adversarial policies against −η t−1 ∇ exp s=1 s the FTPL method, the OGD method, and the RFTL method, ωt := . (4) t−1 respectively. Tr exp −η ∇s s=1 Regarding other experimental parameters, we take the number of qubits n =4, so the dimension of the density matrices is 16 × 16. For each experiment, we run for 100 We formally state the result in the following theorem. trials with randomly generated target states ρ and report the average curves. Theorem 6. Suppose ∇s are Hermitian matrices for s = 1,...,t− 1, and K is the set of density matrices. ωt given Experimental Results in equation (4) is the optimal solution of the mathematical programming problem in (3). The experimental results are illustrated in Figure 1. Each sub-figure corresponds to a specific experimental setting, la- beled with its sub-caption. In each sub-caption, the first term Our result provides an efficient, accurate, and completely corresponds to the type of the underlying quantum state, executable solution to the RFTL algorithm. Our solution has “pure” for a pure state and “mixed” for a mixed state. The several advantages over using a generic offline convex op- second term denotes the data generation policy, “stochastic” timization oracle to solve (3). First, it is not iterative and for the stochastic data policy, “FTPL” for the adversarial data it produces an exact optimal solution, while generic iterative policy against the FTPL method, and so on. optimization algorithms do not guarantee. Second, it is com- putationally efficient since it only takes one step of diagonal 3Some recent advancement pointed out that the bounded gradi- decomposition, while every step in generic optimization al- ent assumption is unnecessary in some special cases in stochastic gorithms may require such operation. Besides, our solution programming (Nguyen et al. 2018). Thanks the reviewer for pro- does not depend on the subroutine of projection. viding this reference.
6612 ● ● ● ● ● RFTL ● RFTL ● RFTL ● RFTL
t OGD t ● OGD t ● OGD t ● OGD ● FTPL ● FTPL ● FTPL ● FTPL ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● average regre average ● ● regre average ● ● regre average ● ● regre average ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.02 0.06 0.10 0.00 0.10 0.20 0.00 0.10 0.20 0.01 0.02 0.03500 0.04 0.05 1000 1500 2000 500 1000 1500 2000 500 1000 1500 2000 500 1000 1500 2000 T T T T
(a) pure, stochastic (b) pure, FTPL (c) pure, OGD (d) pure, RFTL
● ● ● ● RFTL ● RFTL ● RFTL ● RFTL
t OGD t OGD t OGD t OGD ● ● FTPL FTPL FTPL ● FTPL ● ● ● ● ● ● ● ● ● ● ● ● ● ● average regre average regre average regre average regre average ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.02 0.04 0.06 ● ● ● ● ● ● ● 0.01 0.02 0.03 0.04 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.02 0.06 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ●
0.01 0.02 0.03500 0.04 0.05 1000 1500 2000 500 1000 1500 2000 500 1000 1500 2000 500 1000 1500 2000 T T T T
(e) mixed, stochastic (f) mixed, FTPL (g) mixed, OGD (h) mixed, RFTL
Figure 1: The experimental results with the metric of average regret. Each sub-figure corresponds to an adversarial setting. The first term is the type of the target state and the second term is what algorithm the adversary is against for.
We report the experimental results in terms of the aver- typical and reasonable setting with√ L1 loss. In this case, the age regret, that is, regretT /T , which should converge to 0 FTPL method can achieve an O( T ) expected regret. for algorithms with a sublinear regret. In a sub-figure, the horizontal axis corresponds to the time horizon T and the Second, we propose a simple and efficient online gradient vertical axis denotes the average regret. A red curve is for descent algorithm for√ online quantum state learning which the FTPL method, a blue curve is for the OGD method, and can achieve an O( T ) regret bound. The OGD method a green curve is for the RFTL method. is based on an algorithm of projecting an arbitrary Hermi- Seeing from the experimental results, we can observe that tian matrix onto the set of density matrices, which could be our FTPL method outperforms the RFTL approach in almost widely used as a subroutine in many other problems in quan- all settings, not to mention its unique merit of predicting tum computing. pure states. Specifically, when the target state is pure, the Third, we give a closed-form solution to the existing FTPL method performs much better than the RFTL method, RFTL method. Our result provides an efficient, accurate, and even though the data policy is adversarial against it. What completely executable solution to the RFTL method. surprising is that the FTPL method performs the best even though the target state is mixed, as long as the adversary is In addition to the theoretical analysis, we also conduct a not against it. series of simulation experiments to evaluate the algorithms. Besides, the performance of the OGD method is quite The experimental results show that our FTPL and OGD good and stable, even though the adversary is against it. It method outperforms the RFTL method in almost all settings, also outperforms the RFTL method in almost all settings, not to mention the unique merit of the FTPL method of pre- which demonstrates the value of the simple and efficient dicting pure states. These results demonstrate the value of OGD method in practice. our methods in practice. As for the future work, there are several challenges and Summary and Future Work interesting problems in this topic. Compared with the RFTL method, although our methods have some merit such as pure In this paper, we revisit the problem of online quantum state state prediction, the regret bounds in our results are worse in learning in the perspective of online convex optimization. terms of the number of qubits n, since d =2n. We think First, we propose a Follow-the-Perturbed-Leader algo- it is due to the loose analysis. Tighter analysis to the FTPL rithm that can guarantee to predict pure states, which is of method and the OGD method in terms of the number of the special value in quantum state learning. Our analysis shows qubits n is a challenge. Besides, finding the lower bounds in that the regret√ with respect to the expected prediction is different settings is also helpful to understand the problem bounded as O( T ). We further adapt the FTPL method to a of online quantum state learning.
6613 Acknowledgments Kalai, A., and Vempala, S. 2005. Efficient algorithms for This work was supported in part by the National Natural Sci- online decision problems. Journal of Computer and System ence Foundation of China Grants No. 61433014, 61832003, Sciences 71(3):291–307. 61872334, 61761136014, K.C.Wong Education Foundation Kyrillidis, A.; Becker, S.; Cevher, V.; and Koch, C. 2013. and the Strategic Priority Research Program of Chinese Sparse projections onto the simplex. In International Con- Academy of Sciences Grant No. XDB28000000. ference on Machine Learning, 235–243. Lee, S. M.; Lee, J.; and Bang, J. 2018. Learning unknown References pure quantum states. Physical Review A 98(5). Aaronson, S.; Chen, X.; Hazan, E.; Kale, S.; and Nayak, A. Lloyd, S.; Mohseni, M.; and Rebentrost, P. 2014. Quantum 2018. Online learning of quantum states. neural information principal component analysis. Nature Physics 10(9):631. processing systems 8962–8972. Nayak, A. 1999. Optimal lower bounds for quantum au- Aaronson, S. 2005. Limitations of quantum advice and one- tomata and random access codes. In 40th Annual Symposium way communication. Theory of Computing 1(1):1–28. on Foundations of Computer Science, FOCS ’99, 17-18 Oc- tober, 1999, New York, NY, USA, 369–377. Aaronson, S. 2007. The learnability of quantum states. Pro- ceedings of the Royal Society A: Mathematical, Physical and Neu, G., and Bartok,´ G. 2016. Importance weighting with- Engineering Sciences 463(2088):3089–3114. out importance weights: An efficient algorithm for combi- natorial semi-bandits. The Journal of Machine Learning Re- Aaronson, S. 2018. Shadow tomography of quantum states. search 17(1):5355–5375. symposium on the theory of computing 325–338. Nguyen, L. M.; Nguyen, P. H.; van Dijk, M.; Richtarik,´ P.; Abernethy, J. D.; Hazan, E.; and Rakhlin, A. 2008. Compet- Scheinberg, K.; and Taka´c,ˇ M. 2018. Sgd and hogwild! con- ing in the dark: An efficient algorithm for bandit linear opti- vergence without the bounded gradients assumption. arXiv mization. In 21st Annual Conference on Learning Theory - preprint arXiv:1802.03801. COLT 2008, Helsinki, Finland, July 9-12, 2008, 263–274. Nielsen, M. A., and Chuang, I. 2002. Quantum computation Ambainis, A.; Nayak, A.; Ta-Shma, A.; and Vazirani, U. V. and quantum information. 2002. Dense quantum coding and quantum finite automata. J. ACM 49(4):496–511. Odonnell, R., and Wright, J. 2016. Efficient quantum to- mography ii. symposium on the theory of computing 899– Benedetti, M.; Grant, E.; Wossnig, L.; and Severini, S. 2019. 912. Adversarial quantum circuit learning for pure state approxi- mation. New Journal of Physics 21(4):043023. Rakhlin, A.; Sridharan, K.; and Tewari, A. 2015. Online learning via sequential complexities. J. Mach. Learn. Res. Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, 16:155–186. N.; and Lloyd, S. 2017. Quantum machine learning. Nature 549(7671):195. Shalev-Shwartz, S., and Singer, Y. 2007. A primal-dual per- spective of online learning algorithms. Machine Learning Bolduc, E.; Knee, G. C.; Gauger, E. M.; and Leach, J. 2017. 69(2-3):115–142. Projected gradient descent algorithms for quantum state to- mography. npj Quantum Information 3(1):44. Vogel, K., and Risken, H. 1989. Determination of quasiprobability distributions in terms of probability distri- Carleo, G., and Troyer, M. 2017. Solving the quantum butions for the rotated quadrature phase. Physical Review A many-body problem with artificial neural networks. Science 40(5):2847. 355(6325):602–606. Wang, W., and Carreiraperpinan, M. A. 2013. Projection Chen, Y., and Ye, X. 2011. Projection onto a simplex. arXiv: onto the probability simplex: An efficient algorithm with a Optimization and Control. simple proof, and an application. arXiv: Learning. Gonc¸alves, D. S.; Gomes-Ruggiero, M. A.; and Lavor, Zinkevich, M. 2003. Online convex programming and C. 2016. A projected gradient method for optimization generalized infinitesimal gradient ascent. In Proceedings over density matrices. Optimization Methods and Software of the 20th International Conference on Machine Learning 31(2):328–341. (ICML-03), 928–936. Haah, J.; Harrow, A. W.; Ji, Z.; Wu, X.; and Yu, N. 2016. Sample-optimal tomography of quantum states. symposium on the theory of computing 913–925. Harrow, A. W.; Hassidim, A.; and Lloyd, S. 2009. Quantum algorithm for linear systems of equations. Physical review letters 103(15):150502. Hazan, E.; Agarwal, A.; and Kale, S. 2007. Logarithmic regret algorithms for online convex optimization. Machine Learning 69(2-3):169–192. Hazan, E. 2016. Introduction to online convex optimization. Foundations and Trends in Optimization 2(3-4):157–325.
6614