Quick viewing(Text Mode)

Arxiv:2005.10189V2 [Quant-Ph] 22 Feb 2021 Noise Limit [22–26]

Arxiv:2005.10189V2 [Quant-Ph] 22 Feb 2021 Noise Limit [22–26]

Error mitigation with Clifford quantum-circuit data

Piotr Czarnik, Andrew Arrasmith, Patrick J. Coles, and Lukasz Cincio Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.

Achieving near-term quantum advantage will require accurate estimation of quantum observ- ables despite significant hardware noise. For this purpose, we propose a novel, scalable error- mitigation method that applies to gate-based quantum computers. The method generates training noisy exact data {Xi ,Xi } via quantum circuits composed largely of Clifford gates, which can be efficiently noisy exact simulated classically, where Xi and Xi are noisy and noiseless observables respectively. Fit- ting a linear ansatz to this data then allows for the prediction of noise-free observables for arbitrary circuits. We analyze the performance of our method versus the number of , circuit depth, and number of non-Clifford gates. We obtain an order-of-magnitude error reduction for a ground-state energy problem on 16 qubits in an IBMQ quantum computer and on a 64- noisy simulator.

I. INTRODUCTION

Currently, one of the great unsolved technological questions is whether near-term quantum computers will be useful for practical applications. These noisy intermediate-scale quantum (NISQ) devices do not have enough qubits or high enough gate fidelities for fault- tolerant [1]. Consequently, any observable measured on a NISQ device will have limited accuracy. However, candidate applications such as quan- tum chemistry require chemical accuracy to beat classi- cal methods [2,3]. Similarly quantum approximate opti- mization has the potential to beat classical optimization when high accuracy is achieved [4–6]. Hence, it is widely regarded that near-term quantum advantage will only be achieved through error mitiga- tion. Error mitigation (EM) is broadly defined as meth- ods that reduce the impact of noise, rather than directly correct it. EM includes efforts to optimize quantum cir- cuits with compiling and machine learning [7–9]. It also includes variational quantum [10–14], some of which can be used to remove the effects of incoherent noise [15–19]. A prominent EM approach is to perform classical post-processing of observable expectation val- ues. Perhaps the most widely used example is state-of- the-art zero-noise extrapolation (ZNE), which has shown great promise [20, 21]. ZNE involves collecting data at FIG. 1. Our proposed error mitigation method. For a set various levels of noise, achieved by stretching gate times of states that are classically simulable, one generates noisy or inserting identities, and using this noisy data to ex- and corresponding noise-free data on a quantum computer trapolate an observable’s expection value to the zero- and classical computer, respectively. One learns to correct arXiv:2005.10189v2 [quant-ph] 22 Feb 2021 noise limit [22–26]. It has been successfully employed on this training data by fitting the parameters of an ansatz. to correct ground-state energies for problem sizes up to Finally, one uses this ansatz with the fitted parameters to 4-qubits [20, 21, 27]. Among other approaches are quasi- predict noise-free observables for arbitrary quantum states. probabilistic error mitigation [22] and numerous appli- cation specific approaches leveraging symmetries and/or post-selection techniques to mitigate errors [28–31]. novel, scalable EM method that is applicable to all gate- A crucial requirement of any EM method is scalability. based quantum computers. The basic idea is shown in While it is relatively easy to develop EM methods for Fig.1. First we generate training data, of the form small qubit systems, EM methods that work effectively noisy exact noisy exact Xi ,Xi , where Xi and Xi are the noisy at the quantum supremacy scale (> 50 qubits) are much {and noiseless} versions of an observable’s expectation more challenging to construct. Even methods that are in value of interest. The noisy values are obtained directly principle scalable may not actually scale well in practice. from the quantum computer, while the noiseless values This work aims to address this issue by proposing a are simulated on a classical computer. Scalability is 2 achieved by generating the training data from quantum circuits composed largely of Clifford gates (gates that map Pauli operators to Pauli operators), and hence these circuits are efficiently classically simulable. Next we fit the training data with a model, and finally we use the fitted model to predict the noise-free observable. Our method is conceptually simple and could be re- fined with sophisticated model fitting methods offered by modern machine learning [32]. Nevertheless, even with simple linear-regression-based fitting, our method per- forms extremely well in practice. We consider the task of estimating the ground-state energy of an Ising spin chain by variationally training the Quantum Alternating Op- erator Ansatz (QAOA) [4,5]. For this task, our method reduces the error by an order-of-magnitude for both a 16- qubit problem solved on IBMQ quantum computer and FIG. 2. Correcting the ground-state energy of the 16-qubit a 64-qubit problem solved on a noisy simulator, though Ising model with our CDR method. The ground states were we expect that this improvement will generically vary de- prepared by optimizing a p = 2 QAOA circuit on IBM’s Al- pending upon the task considered. We also demonstrate maden quantum processor. (a) The relative energy error is the utility of our method for non-variational algorithms plotted for the noisy (red) and corrected (blue) results for such as quantum phase estimation. Finally we find that several optimization instances. (b) The inferred energy per our method appears to perform better than ZNE for the qubit, E/Q, is plotted, along with the exact values (black). The error bars are explained in the text. Here the mean rel- larger scale problems we consider. ative energy error is 0.409 for the noisy estimates and 0.028 for the corrected ones. II. OUR METHOD methods. In this article we use least square re- We refer to our method as Clifford Data Regression gression, with a linear ansatz (see AppendixA for (CDR). Let X be the observable of interest whose ex- discussion): exact pectation value Xψ = ψ X ψ one wishes to estimate h |noisy| i noisy noisy f(X , a) = a1X + a2, (2) for a given state ψ . Let Xψ be the noisy version of ψ ψ this expectation value| i obtained from the quantum com- puter. To remove the noise from this expectation value, obtaining parameters a1 and a2 by minimizing the CDR method involves the following steps: X exact noisy 2 C = X (a1X + a2) . (3) φi − φi 1. One chooses a set of states ψ = φi that will be φi∈Sψ used to construct the trainingS data{| i} . Each φ Tψ | ii state must satisfy the property that it is efficient to noisy 4. One uses the ansatz f(Xψ , a) with the fitted classically compute the expectation value of X for noisy this state. The CDR method ensures this property parameters to correct Xψ . by constructing each φi state from a quantum cir- cuit composed largely| ofi Clifford gates. We denote We now discuss several strategies for how to choose in Step 1. In general, we have found that it is advan- the number of non-Clifford gates used to prepare Sψ tageous to tailor the set ψ to the specific state ψ ; in each φi as N, which (as shown below) plays the S | i role of| ai refinement parameter. other words, to bias the training data towards the target state of interest. A simple strategy for this purpose is to exact 2. For each φi ψ, one evaluates X = generate the state-preparation circuits for the φ states | i ∈ S φi | ii φi X φi using a classical computer. One also eval- by replacing a subset of the gates in the circuit that pre- huates| | thei noisy version of this expectation value, pares ψ with Clifford gates that are close in distance noisy, using the quantum computer of interest. to the| originali gates. An alternative strategy, which is Xφi These two quantities are incorporated into the used in our implementations, is to employ Markov Chain noisy exact training data set ψ = X ,X . Monte Carlo (MCMC) to generate classically-simulable T { φi φi } states φi based on the values of their observables. (See 3. One constructs an ansatz or model for the noise- Appendix| iB for details.) free value of the observable in the vicinity of ψ , In general, one may base the MCMC sampling on the | i noisy noisy exact noisy closeness of X to X . However, a potentially X = f(X , a), (1) φi ψ ψ ψ more efficient strategy exists for the specific application where a are free parameters. The parameters can of variational quantum algorithms [10, 11, 33]. When be found either by regression or machine-learning ψ is a state that is meant to optimize a variational cost | i 3 function (e.g., the energy of a given Hamiltonian), then one can instead base the sampling on minimizing this cost function. In this way, one can employ MCMC to obtain classically-simulable states that are close to extremizing the variational cost function.

III. NUMERICAL IMPLEMENTATIONS

A. Implementation for QAOA

A central application of error mitigation is to cor- rect the energies of Hamiltonian eigenstates prepared on a quantum computer. Here we illustrate this appli- cation with the Quantum Alternating Operator Ansatz (QAOA) [4,5], which can serve as an ansatz for Hamil- tonian ground states. We consider the transverse Ising FIG. 3. Correcting the ground-state energy of the Ising model model, given by with our CDR method. The ground states were prepared by optimizing a QAOA circuit on a noisy simulator. The X X 0 mean (circles) and maximal (squares) relative energy error is H = g σj σj σj , (4) − X − Z Z plotted for the noisy (red) and corrected (blue) results. (a) j hj,j0i The results for Q = 32, p = 2 plotted versus N. (b) The results for Q = 16,N = 28 plotted versus p. (c) The results where σ , σ are Pauli operators and j, j0 denotes a X Z h i for p = 2,N = 28 plotted versus Q. (d) The results for sum over nearest neighbors. We study the case of g = 2 p = 3,N = 28 plotted versus Q. (belonging to a paramagnetic phase) with open boundary conditions for different numbers of qubits Q. To apply P j the QAOA, we write H = H1 + H2 with H2 = g j σX 0 − P j j and H1 = hj,j0i σZ σZ . Then the QAOA is and Q. One can see that our CDR method results in − between one to two orders-of-magnitude reduction in the Y eiβj H2 eiγj H1 ( + )⊗Q, (5) error. Increasing the number of non-Clifford gates N | i j=p,p−1...,1 in the training data monotonically reduces the error, as shown in Fig.3(a). This is expected because increasing N where βj, γj are variational parameters, p is the num- allows the training set ψ to become closer to the target ber of ansatz layers, and + = ( 0 + 1 )/√2. The ex- state ψ of interest. Hence,S our results show that N is a | i | i | i | i ponentials of H2 and H1 can be easily decomposed into refinement parameter, allowing one to obtain better error quantum circuits (see AppendixD for a circuit diagram). mitigation with the increased computational difficulty of For this implementation, we perform the optimization simulating more non-Clifford gates. N is limited to . 50 of the βj, γj parameters, and then correct the energies for state-of-the-art classical simulators, but our results of low-energy local minima of this optimization. This show that N 30 already leads to orders-of-magnitude 0 . correction involves correcting all σj and σj σj terms reduction in the error. h X i h Z Z i associated with (4). (Note however that the same train- Figures3(b), (c) and (d) show that correcting errors ing set Sψ, which is generated based on the total en- with CDR becomes more challenging with deeper circuits ergy, is used to correct each term.) Figure2 shows that and larger qubit counts, respectively. However, the rate our CDR method reduces the relative energy error by an of error growth with p and Q is not very sharp, and we order-of-magnitude on IBM’s Almaden quantum proces- still obtain large error reductions for either p = 4 layers sor [34]. We show error bars that reflect our confidence or 64 qubits. It is worth noting that 64 qubits is consid- in our ability to correct the training data. Specifically we ered to be in the regime where quantum advantage might take the error bars as three standard deviations, where be demonstrated. Furthermore, the p = 4 ansatz has the standard deviation is obtained directly from the cost more two-qubit entangling gates per pair of neighboring function in (3) by dividing C by the training size and qubits than the ansatzes used to demonstrate quantum then taking the square root (see AppendixC for more supremacy [36]. details). To study the scaling behavior, we also implement this To obtain substantial improvement for deeper p > 4 problem using a classical matrix-product-state [35] simu- we consider the case of noise rates modestly reduced in lator that incorporates a noise model obtained from gate comparison to current devices. To investigate such a sce- set tomography of IBM’s Ourense quantum computer. nario we construct a noise model by taking gate process Figure3 presents the relative energy error, uncorrected matrices as a convex combinations of process matrices E (red) and corrected (blue), for different values of N, p, of the realistic IBM’s Ourense noise model [9] noisy and E 4

FIG. 4. Correcting energies of deep QAOA circuits. Here we show mean and maximal relative energy errors for 30 local FIG. 5. Inferred energy per qubit, E/Q, for minima ob- minima of the transverse Ising model energy minimization tained from various optimization instances of the QAOA cir- plotted versus p. The optimization was performed with Q = cuit for the Ising model on a noisy simulator. The noisy, 8 using a full noisy simulator. The red and exact, and corrected results correspond to the red, blue, and blue curves correspond respectively to unmitigated and CDR black curves, respectively. The results were obtained with mitigated (N = 28) results. A noise model used here has p = 2 and N = 28 for various qubit numbers: (a) Q = 8, (b) noise rates inversely proportional to p (6) and is equivalent to Q = 16, (c) Q = 32, and (d) Q = 64. The relative errors for the realistic IBM’s Ourense noise model (used in Fig.5) for these minima were analyzed in Fig.3(c). The error bars are p = 4. explained in the text.

B. ZNE Implementation process matrices of the noiseless case exact. E In addition to applying our CDR method for QAOA, 4 we also examined the performance of the ZNE method = α noisy + (1 α) exact, α = . (6) E E − E p under the same conditions as those in Fig.2, i.e., for the Q = 16, p = 2 case with the same circuits on IBM’s Al- maden quantum processor. Unfortunately, for this prob- For this noise model we optimize the QAOA for p = 4 24 − lem instance we were unable to attain a meaningful cor- and Q = 8 using full density matrix noisy simulator. rection with ZNE. This difficulty was likely largely due to Therefore, for p = 4 we recover the realistic noise model the size and depth of the we considered, and for the deepest simulated p = 24 we reduce the as ZNE depends on the base circuit being considered not noise rates by a factor of 6 which is insufficient to enable being too noisy to start with. For more details, see Ap- the fault-tolerant [37]. Again, we pendixF. mitigate energies of the optimization local minima using N = 28 near-Clifford circuits. We gather results in Fig.4. The mitigated energies show an order of magnitude im- provement even for the deepest p = 24. Furthermore, C. Implementation for phase estimation we observe that the improvement remains approximately constant for investigated here numbers of ansatz layers Let us now illustrate how our method applies to quan- p. tum phase estimation. We consider a near-term ver- sion of phase estimation that only requires a single an- To give further insight into the scaling with Q, we show cilla [38]. For an input state , this estimates the results for individual optimization instances in Fig.5. χ −iHt −iHt on a quantum| i computer for a se- Interestingly, Fig.5 shows that the CDR method is ca- χ e χ = e riesh | of times| i andh theni classically Fourier transforms the pable of removing noise-induced fluctuations in the en- t time series to estimate eigenvalues of . The expectation ergy, i.e., very different noisy energy values are correctly H value −iHt is obtained by measuring on an mapped to the same corrected ground-state energy val- e σX + iσY ancillah qubiti after applying the controlledh e−iHt igate to ues. For large Q, some remnant of these fluctuations the state [38]. still linger in the corrected energies, leading to the worse χ | i − ˜ performance in Fig.3(c), although it suggests that em- We performed error mitigation of e iHt for a three- ˜ h1 1 i2 1 3 ploying a more detailed model in (1) accounting for addi- qubit Max-Cut Hamiltonian H = 6 (σZ σZ + σZ σZ + 2 3 tional features in the training data could further improve σZ σZ ), χ being a random pure state, and times t the accuracy of the corrected energy. 1, 2, ...,|136i .We employed the same noisy simulator as∈ { } 5

noise rates, reaching 24 rounds of 8-qubit QAOA. Further testing our method on real quantum hard- ware will be important, as will be further studying the scaling of our method’s performance with circuit depth and problem size. In addition, refining our method with ansatzes that are more sophististicated than our linear ansatz (e.g., using neural networks [39] or other machine- learning approaches) could be fruitful. Finally, it would be interesting to extend the applicability of our approach to analog quantum simulators. Furthermore, we note that a related, but distinct, approach using Clifford cir- cuits to learn quasi-probabilistic error mitigation was re- cently proposed [40].

V. ACKNOWLEDGEMENTS FIG. 6. Correcting the results of quantum phase estimation on a noisy simulator with CDR. (a) Decomposition of a ran- We thank Kipton Barros, Kenneth Rudinger, and Mo- ˜ dom pure state in the binned eigenbasis of H defined in the han Sarovar for helpful conversations. We thank IBM for text, without (red) and with (blue) correction. The probabil- providing access to their quantum computers. The views ity distribution q is shown as a function of the binned energy eigenvalue λ˜. The inset shows its error. The random state expressed are those of the authors and do not reflect those whose decomposition is shown is the one for which CDR per- of IBM. All authors acknowledge support from LANL’s formed most poorly. (b) Relative error of this decomposition Laboratory Directed Research and Development (LDRD) over 20 instances (different random pure states), ordered by program. PJC acknowledges support from the LANL increasing corrected error. For all instances, we employed ASC Beyond Moore’s Law project. This work was also training data constructed from quantum circuits with 6 of supported by the U.S. Department of Energy (DOE), Of- the total 12 non-Clifford gates replaced by Clifford gates (see fice of Science, Office of Advanced Scientific Computing AppendixE for a circuit diagram). Research, under the Quantum Computing Application Teams program. This research used quantum computing resources provided by the LANL Institutional Computing that used for the QAOA implementation in Figs.3-5. Program, which is supported by the U.S. Department of Figure6 shows the results. One can see in Fig.6(a) that Energy National Nuclear Security Administration under the decomposition of χ in the energy eigenbasis is signif- Contract No. 89233218CNA000001. icantly improved after| correctioni with the CDR method. Indeed, Fig.6(b) shows that the relative error is reduced by at least a factor of three by CDR. Appendix A: Motivation for linear ansatz

In this work, we assumed essentially the simplest IV. CONCLUSIONS ansatz possible for correcting noisy observables, given by the linear relationship in Eq. (2). While more compli- cated ansatzes may improve the performance of CDR, With quantum supremacy recently demonstrated [36], we note here that there does exist theoretical motivation the next milestone in quantum computing may be to for a linear ansatz. achieve quantum advantage for practical applications Consider a depolarizing noise channel that acts glob- such as chemistry or optimization. These applications ally on all qubits. The action of this channel takes the will require accurate estimation of observables on noisy Q form: quantum hardware, and hence large-scale error mitiga- tion will be necessary. In this work, we proposed a scal- ρ (1 p)ρ + p11/d , (A1) able method to significantly reduce the errors (potentially → − where is the Hilbert-space dimension and . by orders of magnitude) of quantum observables. The d 0 6 p 6 1 Suppose that this channel acts repeatedly at different method, called Clifford Data Regression (CDR), learns m times throughout the quantum circuit of interest. Then, how to correct errors on a training data set. Construct- for the circuit that prepares the noise-free state , the ing this training set exploits the classical simulability of ψ actual final state will be: | i quantum circuits composed largely of Clifford gates. This m m allows our method to scale to large problem size. We ob- ρψ = (1 p) ψ ψ + (1 (1 p) )11/d . (A2) tained meaningful corrections with CDR for a 64-qubit − | ih | − − Similarly, for a circuit that prepares a state φ from the ground-state-energy problem. Additionally, we found i training set , the actual final state will be:| i that the circuit depth our method can mitigate will be Sψ notably increased with modest reductions in hardware ρ = (1 p)m φ φ + (1 (1 p)m)11/d . (A3) φi − | iih i| − − 6

For the observable of interest, X, the actual relationship making small update steps to our circuit. At each update between the noisy and noise-free values is given by step, we choose to either accept or reject the change. On

noisy accepting a new circuit it is added to Sψ, and we look Xψ = Tr(ρψX) for updates starting from this new circuit. This process = (1 p)mXexact + (1 (1 p)m)Tr(X)/d , is repeated until we have generated as many circuits as − ψ − − (A4) desired for Sψ. In our applications, the initial point is chosen by find- which is equivalent to ing a near-Clifford circuit with N non-Clifford gates that is close to the circuit we wish to correct. To generate up- exact noisy 1 X =a X + a with a = , date steps, we randomly pick np pairs of the circuit’s σZ ψ 1 ψ 2 1 (1 p)m − rotations. Here, each pair consists of one rotation that (1 (1 p)m)Tr(X) has been replaced by a Clifford gate (specifically, Sn for a = . (A5) 2 − − m −iσZ π/4 − d(1 p) some integer n where S = RZ (π/2) = e is the π/2 − rotation gate) and one that has not. We chose np = 5 in Hence we see that global depolarizing noise leads pre- our implementations. For each pair we then replace the cisely to a linear relationship between noisy and noise- non-Clifford rotation by a power of S and the Clifford free expectation values. gate by the original rotation in that part of the circuit. Similarly the training data will take the form of: n When replacing a rotation RZ (α) by S , the power n is noisy exact noisy noisy randomly sampled with weight w(n), which is given by X ,X = X , a1X + a2 , (A6) { φi φi } { φi φi } −d2/σ2 n where a1 and a2 are independent of i and are given by the w(n) = e , d = RZ (α) S , (B1) formulas in (A5). Fitting a linear regression to the data || − || in the training set will lead to a perfect fit, and the fitted with σ = 1/2. Note that such an update preserves the parameters will have the values of a1 and a2 given in number of non-Clifford gates N. (A5). Hence, using our linear regression approach in this For our implementations, the new state φ proposed exact by this update step is then either accepted| ori rejected case will lead to exactly the right correction for Xψ . The above analysis shows that our linear ansatz will according to a Metropolis-Hastings rule with a likelihood perfectly correct for global depolarizing noise. This anal- function L. This likelihood was defined differently for our ysis can be extended to a global version of the quantum QAOA and phase estimation examples. For QAOA we erasure channel. More generally, this analysis applies to used any channel that maps the input state to a convex com- exact − exact− 2 2 L(X ) e (X X0) /Xσ . (B2) bination of the input state and a fixed state, of the form: φ ∝

ρ (1 p)ρ + pρ0 , (A7) Here, X is the energy per qubit, and X0 = 2.1 being → − exact − roughly X . We used Xσ = 0.05, though we found where is a fixed state. ψ ρ0 that instead using Xσ = 0.1 or 0.2 produced similar re- In addition, we remark that the linear ansatz will per- sults. Larger values (Xσ 1) degraded the quality of fectly correct for certain types of measurement noise. the results. ∼ This is because measurement noise typically can be We emphasize that though this method was used for thought of as noise acting on the local state just prior this proof of principle demonstration, the special knowl- to a noise-free measurement. If this noise corresponds edge required to choose X0 will often not be avaialble. to depolarizing noise (e.g., for symmetric measurement When correcting more general minimization problems noise) or more generally to the noise in Eq. (A7) (which one can use a different form that assigned higher like- allows for asymmetric measurement noise), then one can lihoods to lower values of the observable instead. For apply the analysis above to show that the linear ansatz example, one might use something like will perfectly correct the noise. exact While we do not expect the linear ansatz to correct for L(Xexact) e−X /Xσ (B3) all types of noise, one can nevertheless replace the linear φ ∝ ansatz with a more complicated one, and further work is instead of (B2). needed to explore the benefits of such (more complicated) For the phase estimation example, we used a similar ansatzes. expression to what we used for QAOA:

noisy noisy 2 2 noisy −(Xφ −Xψ ) /Xσ L(Xφ ) e , (B4) Appendix B: Generating the set Sψ ∝

with Xσ = 0.1. This version is appropriate when correct- We use a Markov Chain Monte Carlo (MCMC) [41] ing observables that we do not have special information sampling technique in order to generate a set of classi- about (e.g., that X is not supposed to be minimized). cally simulable training states, ψ, for use in CDR. Start- To give insight into cost of the CDR method we an- ing from some initial simulableS circuit, we build by alyze in detail construction of the training sets for deep Sψ 7

Appendix C: Error bars

In the main text, we display error bars on the cor- rected observables obtained by the CDR method. These error bars are meant to convey one’s confidence in the predicted noise-free observable. Conceptually speaking, there are two main sources of error to consider: (1) Im- perfect training on the training data such that the cost function C does not go to zero, and (2) Inability of the training data to capture the noise processes that affect the target state ψ . The latter source of error is difficult to quantify in practice,| i although it can be systematically removed by increasing the number of non-Clifford gates N as discussed in the main text. Therefore we focus our error bars on the former source of error, i.e., imperfect training. For the most part, we FIG. 7. Classical computational cost of the training set construction. The mean value of a MCMC auto-correlation find (see Fig. 4) that the error bars associated with imper- fect training are sufficient to encompass the discrepancy length ξMCMC (B5) for deep QAOA circuits from Fig.4 is plot- ted versus p using log-log scale. ξMCMC is one of the factors between the predicted and exact observable values. How- determining cost of the classical MCMC simulations, see text ever, for our largest implementation, Q = 64 in Fig.5(d), for a detailed explanation. It appears to scale polynomially one can see that our error bars sometimes underestimate with increasing p. the true error. This suggests that our error bars are use- ful as lower bounds on the true error. As mentioned in the main text, we calculate our error bars using the value of the cost function C obtained after QAOA circuits from Fig.4. The MCMC procedure de- training. Specifically, the magnitude of the error bar is scribed above generates a chain of near-Clifford circuits given by three standard deviations, where the standard indexed by an index i for each mitigated circuit. We p deviation is given by C/(L 1), where L is the number build the training set choosing the chain elements with − of states in the training set ψ. i = i0 + kξMCMC, k = 0, 1,..., 74, where ξMCMC mea- S sures auto-correlation in the chain and i0 determines how many initial elements of the chain are rejected. We define ξ as MCMC Appendix D: QAOA circuit structure

PL−ξMCMC ¯ ¯ (θi θ) (θi+ξMCMC θ) 1 i=i0 − · − = , (B5) PL−ξMCMC S P RZ (2gβ + π) P S θ θ¯ θ θ¯ 10 • • i=i0 ( i ) ( i ) − · − RZ (2γ) S P RZ (2gβ + π) P S • •

RZ (2γ) S P RZ (2gβ + π) P S where θ is a vector of rotations’ angles in the cir- • • RZ RZ (2γ) S P RZ (2gβ + π) P S cuit, θ¯ = 1 PL and L is the chain’s length. Required L i=i0 classical resources are determined in our case by ξMCMC and i0. We observe that mean value of ξMCMC obtained by averaging over the minima scales polynomially with FIG. 8. A circuit implementing a layer of the QAOA ansatz increasing p, see Fig.7. The behavior of i0 is somewhat for Q = 4. β, γ are QAOA parameters. Here P = RX (π/2) = more complicated as it depends on a starting point of e−iσX π/4 and is a Clifford gate. The only non-Clifford gates −iσZ α/2 the MCMC chain. If the initial point with the energy in this circuit are the σZ rotations: RZ (α) = e . We 0 close to the chain’s stationary energy distribution, one iγσj σj note that we have made use of the decomposition of e Z Z can choose i0 = 1 without decreasing quality of the en- from Ref. [42]. ergy correction. Otherwise it is necessary to reject some initial elements of the chain. For p 16 we were able to find good initial points starting from≤ a set of 200 near- Figure8 shows the structure of our QAOA circuit for Clifford circuits obtained by our chain initialization algo- the Ising spin chain considered in the main text. We note rithm. That is no longer possible for p = 20, 24 for which that all gates except for the the (2(Q 1)p) σ rotations − Z we choose i0 = 70ξMCMC + 1 and i0 = 150ξMCMC + 1, are Clifford gates. The only changes made to this circuit respectively. We remark that to reduce classical compu- for generating the training data ψ are therefore only S n tation time for these p values one can choose the initial replacements of these σZ rotations by S , as discussed in points using MCMC chains obtained for smaller p. AppendixB. 8

S P S S S P S • • • •

I P RZ (α1) P RZ (α4) RZ (f1(t)) • • • •

I P RZ (α2) P RZ (α5) RZ (f4(t)) RZ (f5(t)) RZ (f6(t)) • •

I P RZ (α3) P RZ (α6) RZ (f2(t)) RZ (f3(t)) • •

FIG. 9. Quantum circuit used to estimate Re(hχ|e−iHt˜ |χi) for H˜ , and a random product state |χi given by random angles α1, α2, α3, α4, α5, α6. Parameters f1(t), f2(t), f3(t), f4(t), f5(t), f6(t) were obtained by compiling the circuit with an algorithm from Ref. [7]. The compilation was accurate up to numerical precision. We assume here all to all device connectivity.

Appendix E: Quantum Phase Estimation circuit decreasing the magnitude of the drive Hamiltonian: structure T T 0 = cT, → (F2) Figure9 shows the quantum phase estimation circuit K(t) K0(t) = c−1K(c−1t). ˜ 1 1 2 1 3 2 3 To see how this works,→ let us integrate (F1) with respect from Ref. [38] for H = 6 (σZ σZ +σZ σZ +σZ σZ ) compiled by an algorithm from Ref. [7] and applied to a randomly to time from t = 0 to t = T 0, using our modified drive chosen input product state. As in the QAOA ansatz, we Hamiltonian from (F2). Calling the state evolved this 0 −1 note that all gates except for the the σZ rotations are way ρ (t) = ρ(c t), we have: Clifford gates. Again, our classically simulable training Z cT Z cT data set ψ is generated by replacing some of these σZ 0 0 0 0 0 S n ρ (T ) = ρ(0) i [K (t), ρ (t)] dt + λ (ρ (t)) dt rotations with S as discussed in AppendixB. − 0 0 L Z T 0 Z T 0 = ρ(0) i [K(t), ρ(t0)] dt0 + cλ (ρ(t0)) dt0. − 0 0 L Appendix F: Zero Noise Extrapolation (F3)

For comparison with our CDR method, we have per- We therefore have that, under these assumptions, the formed zero noise extrapolation (ZNE) for the QAOA final state driven over a longer time with the stretched implementations discussed in the main text on IBM’s 20- drive Hamiltonian is equivalent to one evolved with the original drive Hamiltonian with λ cλ. For a more qubit Almaden quantum processor [34]. Specifically, we → used Qiskit’s Pulse package [43] to systematically stretch detailed derivation of this formalism, see Ref. [22]. the microwave pulse sequences used to physically imple- ment our Q = 16, p = 2 optimized QAOA circuits as done in Ref. [20]. 2. Implementation

For our implementation, we follow Ref. [20] in choos- ing the stretch factors c 1, 1.1, 1.25, 1.5 . We then 1. Background stretched the pulse sequences∈ { generated from} the QAOA circuit as shown in Fig. 10. In addition to running Following Ref. [22], the evolution of the quantum com- the circuit at different stretch factors, we also made use puter can be modeled in terms of the drive Hamiltonian of Qiskit’s built-in measurement error mitigation func- described by the quantum circuit, K(t) and a Lindblad tions [43] as ZNE does not directly handle read-out error. operator (ρ) representing the physical noise channels: Finally, we used 212992 shots to measure each operator L for each value of c. ∂ ρ(t) = i [K(t), ρ(t)] + λ (ρ(t)) . (F1) ∂t − L 3. Results Here λ is a (hopefully) small parameter that represents the strength of the action of the noise channels, so the The data points we measured as well as our extrapo- limit λ 0 would represent noiseless quantum computa- lated energies are shown in Fig. 11 for the three instances → tion. Attempting to approximate this limit is the heart of of low energy QAOA studied in the main text. For the ZNE. While an experimenter cannot in general directly sake of completeness, we show the extrapolation with a adjust λ, under the assumption that the form of is in- linear fit, a quadratic fit, and finally the cubic polyno- L variant under time re-scaling and independent of K(t), mial that is the standard ZNE approach for four values one can in effect increase λ. of c [20, 22]. As shown in Fig. 11, it appears that for our Increasing λ is accomplished by increasing (stretching) particular use case the ZNE method did not provide an the time for the circuit evolution (T ) by a factor of c while accurate correction for the energy expectation values. 9

FIG. 10. Stretching the QAOA circuit pulse sequence. The first t0 = 2.22 µs of the pulse sequences for a Q = 16, p = 2 QAOA circuit with c = 1 and c = 1.5 are shown. In the stretched case the pulse envelopes are lower amplitude but longer in duration. For convenience, the amplitude of the pulse envelopes shown have been normalized between the channels (shown as horizontal lines), but the normalizations are the same between the c = 1 and c = 1.5 cases.

FIG. 11. Extrapolating the stretched circuit data. Note that the error bars displayed here represent the uncertainties of the fit for c = 0 and the uncertainty from finite statistics for c ∈ {1, 1.1, 1.25, 1.5}. Note also that the third order fit is equivalent to using equation (3) in [22]. For reference, the true values of E/Q for instances 1, 2, and 3 are roughly -2.1153, -2.1045, and -2.0990, respectively. These instances correspond to the first three instances shown in Fig. 2.

[1] John Preskill, “Quantum computing in the NISQ era and Ian D Kivlichan, Tim Menke, Borja Peropadre, Nico- beyond,” Quantum 2, 79 (2018). las PD Sawaya, et al., “ in the age [2] Yudong Cao, Jonathan Romero, Jonathan P Olson, of quantum computing,” Chemical reviews 119, 10856– Matthias Degroote, Peter D Johnson, Mária Kieferová, 10915 (2019). 10

[3] Sam McArdle, Suguru Endo, Alan Aspuru-Guzik, Si- state diagonalization,” npj 5, 1– mon C Benjamin, and Xiao Yuan, “Quantum computa- 10 (2019). tional chemistry,” Reviews of Modern Physics 92, 015003 [19] M Cerezo, Kunal Sharma, Andrew Arrasmith, and (2020). Patrick J Coles, “Variational eigensolver,” [4] E. Farhi, J. Goldstone, and S. Gutmann, “A quantum arXiv preprint arXiv:2004.01372 (2020). approximate optimization algorithm,” arXiv preprint [20] Abhinav Kandala, Kristan Temme, Antonio D Córcoles, arXiv:1411.4028 (2014). Antonio Mezzacapo, Jerry M Chow, and Jay M Gam- [5] Stuart Hadfield, Zhihui Wang, Bryan O’Gorman, betta, “Error mitigation extends the computational reach Eleanor G Rieffel, Davide Venturelli, and Rupak Biswas, of a noisy quantum processor,” 567, 491–495 “From the quantum approximate optimization algorithm (2019). to a quantum alternating operator ansatz,” Algorithms [21] Eugene F Dumitrescu, Alex J McCaskey, Gaute Ha- 12, 34 (2019). gen, Gustav R Jansen, Titus D Morris, T Papenbrock, [6] G. E. Crooks, “Performance of the quantum approximate Raphael C Pooser, David Jarvis Dean, and Pavel optimization algorithm on the maximum cut problem,” Lougovski, “Cloud quantum computing of an atomic nu- arXiv preprint arXiv:1811.08419 (2018). cleus,” Physical review letters 120, 210501 (2018). [7] L. Cincio, Y. Subaşı, A. T. Sornborger, and P. J. Coles, [22] Kristan Temme, Sergey Bravyi, and Jay M Gam- “Learning the for state overlap,” New betta, “Error mitigation for short-depth quantum cir- Journal of Physics 20, 113022 (2018). cuits,” Physical review letters 119, 180509 (2017). [8] Prakash Murali, Jonathan M Baker, Ali Javadi-Abhari, [23] Matthew Otten and Stephen K Gray, “Recovering noise- Frederic T Chong, and Margaret Martonosi, “Noise- free quantum observables,” Physical Review A 99, adaptive compiler mappings for noisy intermediate-scale 012338 (2019). quantum computers,” in Proceedings of the Twenty- [24] Suguru Endo, Simon C Benjamin, and Ying Li, “Prac- Fourth International Conference on Architectural Sup- tical quantum error mitigation for near-future applica- port for Programming Languages and Operating Systems tions,” Physical Review X 8, 031027 (2018). (2019) pp. 1015–1029. [25] Tudor Giurgica-Tiron, Yousef Hindy, Ryan LaRose, An- [9] Lukasz Cincio, Kenneth Rudinger, Mohan Sarovar, and drea Mari, and William J Zeng, “Digital zero noise ex- Patrick J Coles, “Machine learning of noise-resilient trapolation for quantum error mitigation,” arXiv preprint quantum circuits,” arXiv preprint arXiv:2007.01210 arXiv:2005.10921 (2020). (2020). [26] Zhenyu Cai, “Multi-exponential error extrapolation and [10] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. combining error mitigation techniques for nisq applica- Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, tions,” arXiv preprint arXiv:2007.01265 (2020). “A variational eigenvalue solver on a photonic quantum [27] Andre He, Benjamin Nachman, Wibe A de Jong, and processor,” Nature Communications 5, 4213 (2014). Christian W Bauer, “Resource efficient zero noise ex- [11] Jarrod R McClean, Jonathan Romero, Ryan Babbush, trapolation with identity insertions,” arXiv preprint and Alán Aspuru-Guzik, “The theory of variational arXiv:2003.04941 (2020). hybrid quantum-classical algorithms,” New Journal of [28] Sam McArdle, Xiao Yuan, and Simon Benjamin, “Error- Physics 18, 023023 (2016). mitigated digital quantum simulation,” Phys. Rev. Lett. [12] Marco Cerezo, Andrew Arrasmith, Ryan Babbush, Si- 122, 180501 (2019). mon C Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R [29] Xavi Bonet-Monroig, Ramiro Sagastizabal, M Singh, McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and TE O’Brien, “Low-cost error mitigation by symme- et al., “Variational quantum algorithms,” arXiv preprint try verification,” Physical Review A 98, 062339 (2018). arXiv:2012.09265 (2020). [30] Matthew Otten, Cristian L Cortes, and Stephen K Gray, [13] Suguru Endo, Zhenyu Cai, Simon C Benjamin, and Xiao “Noise-resilient using symmetry- Yuan, “Hybrid quantum-classical algorithms and quan- preserving ansatzes,” arXiv preprint arXiv:1910.06284 tum error mitigation,” Journal of the Physical Society of (2019). Japan 90, 032001 (2021). [31] Zhenyu Cai, “Quantum error mitigation using symmetry [14] Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw, expansion,” arXiv preprint arXiv:2101.03151 (2021). Tobias Haug, Sumner Alperin-Lea, Abhinav Anand, [32] Michael A Nielsen, Neural networks and deep learning, Matthias Degroote, Hermanni Heimonen, Jakob S. Vol. 2018 (Determination press San Francisco, CA, USA:, Kottmann, Tim Menke, Wai-Keong Mok, Sukin Sim, 2015). Leong-Chuan Kwek, and Alán Aspuru-Guzik, “Noisy [33] Xiao Yuan, Suguru Endo, Qi Zhao, Ying Li, and Si- intermediate-scale quantum (nisq) algorithms,” arXiv mon C Benjamin, “Theory of variational quantum simu- preprint arXiv:2101.08448 (2021). lation,” Quantum 3, 191 (2019). [15] Y. Li and S. C. Benjamin, “Efficient variational quantum [34] “IBM Q: Quantum devices and simulators,” https:// simulator incorporating active error minimization,” Phys. www.research..com/ibm-q/technology/devices/, Rev. X 7, 021050 (2017). [Online; accessed May 18, 2020]. [16] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. Sorn- [35] M. Fannes, B. Nachtergaele, and R. F. Werner, “Finitely borger, and P. J. Coles, “Quantum-assisted quantum correlated states on quantum spin chains,” Communica- compiling,” Quantum 3, 140 (2019). tions in Mathematical Physics 144, 443–490 (1992). [17] Kunal Sharma, Sumeet Khatri, Marco Cerezo, and [36] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Patrick J Coles, “Noise resilience of variational quantum Joseph C Bardin, Rami Barends, Rupak Biswas, Sergio compiling,” New Journal of Physics 22, 043006 (2020). Boixo, Fernando GSL Brandao, David A Buell, et al., [18] Ryan LaRose, Arkin Tikku, Étude O’Neel-Judy, Lukasz “Quantum supremacy using a programmable supercon- Cincio, and Patrick J Coles, “Variational quantum ducting processor,” Nature 574, 505–510 (2019). 11

[37] Daniel Gottesman, “An introduction to quantum er- (2018). ror correction and fault-tolerant quantum computation,” [42] Abhijith J., Adetokunbo Adedoyin, John Ambrosiano, arXiv preprint arXiv:0904.2557 (2009). Petr Anisimov, Andreas Bärtschi, William Casper, [38] Rolando D Somma, “Quantum eigenvalue estimation via Gopinath Chennupati, Carleton Coffrin, Hristo Djidjev, time series analysis,” New Journal of Physics 21, 123025 David Gunter, Satish Karra, Nathan Lemons, Shizeng (2019). Lin, Alexander Malyzhenkov, David Mascarenas, Su- [39] Giacomo Torlai, Guglielmo Mazzola, Giuseppe Carleo, san Mniszewski, Balu Nadiga, Daniel O’Malley, Diane and Antonio Mezzacapo, “Precise measurement of quan- Oyen, Scott Pakin, Lakshman Prasad, Randy Roberts, tum observables with neural-network estimators,” arXiv Phillip Romero, Nandakishore Santhi, Nikolai Sinitsyn, preprint arXiv:1910.07596 (2019). Pieter J. Swart, James G. Wendelberger, Boram Yoon, [40] Armands Strikis, Dayue Qin, Yanzhu Chen, Simon C. Richard Zamora, Wei Zhu, Stephan Eidenbenz, Patrick J. Benjamin, and Ying Li, “Learning-based quantum error Coles, Marc Vuffray, and Andrey Y. Lokhov, “Quan- mitigation,” arXiv preprint arXiv:2005.07601 (2020). tum algorithm implementations for beginners,” (2018), [41] Don Van Ravenzwaaij, Pete Cassey, and Scott D Brown, arXiv:1804.03719 [cs.ET]. “A simple introduction to markov chain monte–carlo [43] Héctor Abraham et. al., “Qiskit: An open-source frame- sampling,” Psychonomic bulletin & review 25, 143–154 work for quantum computing,” (2019).