Training variational quantum algorithms is NP-hard — even for logarithmically many qubits and free fermionic systems

Lennart Bittel∗ and Martin Kliesch† Heinrich Heine University Düsseldorf, Germany

Variational quantum algorithms (VQAs) are proposed to solve relevant computational problems on near term quantum devices. Popular versions are variational quantum eigensolvers (VQEs) and quantum approximate optimization algorithms (QAOAs) that solve ground state problems from quantum chemistry and binary optimization problems, respectively. They are based on the idea to use a classical computer to train a parameterized quantum circuit. We show that the corresponding classical optimization problems are NP-hard. Moreover, the hardness is robust in the sense that for every polynomial time algorithm, there exists instances for which the relative error resulting from the classical optimization problem can be arbitrarily large, assuming P 6= NP. Even for classically tractable systems, composed of only logarithmically many qubits or free fermions, we show that the optimization is NP-hard. This elucidates that the classical optimization is intrinsically hard and does not merely inherit the hardness from the ground state problem. Our analysis shows that the training landscape can have many far from optimal persistent local minima. This means gradient and higher order decent algorithms will generally converge to far from optimal solutions.

I. INTRODUCTION 0 U (φ ) | i 1 1 The last years have seen enormous progress towards

large-scale quantum computation. A central goal of this 0 U3(φ3) effort is the implementation of a quantum computation | i UM that solves computational problems of practical relevance 0 U (φ ) faster than any classical computer. However, the noisy | i 5 5 nature of quantum gates and the high overhead cost of U2(φ2)

noise reduction and error correction limit near term de- 0 U4(φ4) vices to shallow circuits [1]. | i Variational quantum algorithms (VQAs) have been proposed to bring us a step closer towards this goal. Here, φ an optimization problem is captured by a loss function guess Classical O(φ) Post given by expectation values of observables w.r.t. states h i φ Optimization Processing generated from a parameterized quantum circuit. Then a res classical computer trains the quantum circuit by optimiz- ing the expectation value over the circuit’s parameters. Figure 1. Sketch of a VQA optimization routine. This work Figure1 illustrates a possible VQA routine. Popular can- addresses the complexity of the classical optimization part didates to be used on near term devices are quantum (red). approximate optimization algorithms (QAOAs) [2] and variational quantum eigensolvers (VQEs) [3], see [4] for a review. optimal solution of relevant optimization problems (i.e. VQEs are proposed, for instance, to solve electronic arXiv:2101.07267v1 [quant-ph] 18 Jan 2021 the model mismatch is small). Second, the classical op- structure problems, which are central in quantum chem- timization over the parameters of the quantum circuit istry and material science. Proposals of QAOAs include needs to be solved fast enough and with sufficient accu- improved algorithms for quadratic optimization problems racy. We will focus on this second challenge. over binary variables such as the problem of finding the For the classical optimization several heuristic ap- MaxCut maximum cut of a graph ( ). For hybrid clas- proaches are known, which are mostly based on gradient sical/quantum computation to be successful, two chal- descent ideas and higher order methods. This is conve- lenges need to be overcome. First, one needs to find nient, as with the parameter shift rule [5], the gradient parameterized quantum circuits that have the expressive can be calculated efficiently. Methods include standard power to yield a sufficiently good approximation to the BFGS optimization and extensions [6] and natural gra- dient descent [7], which has a favorable performance at least for certain easy instances [8]. Second order methods ∗ [email protected] require a significant overhead in the number of measure- † [email protected] ments but can yield a better accuracy [9]. Quantum an- 2 alytic descent [10] uses certain classical approximations B. Notation of the objective function in order to reduce the number of quantum circuit evaluations at the cost of a higher We use the notation [n] := 1, . . . , n . The Pauli ma- { } classical computation effort. trices are denoted by σx, σy, and σz. An X However, it has also been shown recently that there are acting on subsystem j of a larger quantum system is de- (j) (1) certain obstacles that need to be overcome to render the noted by X , e.g., σx is the Pauli-x- acting on classical optimization successful. The training landscape subsystem 1. X refers to the operator of operator k k can have so-called barren plateaus where the loss function X. is effectively constant and hence yields a vanishing gradi- The number of edges of the graph with the adjacency ent, which prevents efficient training. This phenomenon matrix A is denoted by E(A) . By MaxCut(A) we denote | | can be caused, for example by random initializations [11] the solution of MaxCut for an adjacency matrix A, see and non-locality of the observable defining the loss func- Problem1. tion [12]. Also sources of randomness given by noise in Throughout, we only consider adjacency matrices A the gate implementations can cause similar effects [13]. of undirected unweighted graphs with at least one edge, Moreover, the problem of barren plateaus cannot be fully i.e., A 0, 1 d×d is a non-zero symmetric binary matrix ∈ { } resolved by higher order methods [14]. with vanishing diagonal. In this work, we show that the existence of persistent local minima can also render the training of variational quantum algorithms infeasible. For this, we encode the II. A CONTINUOUS MaxCut OPTIMIZATION NP-hard MaxCut problem into the corresponding classical optimization for several versions of VQAs, which have We will introduce a continuous, trigonometric problem many far from optimal local minima. which we show to be NP-hard to optimize and approxi- Specifically, we obtain hardness results concerning the mate. This is related to earlier work about optimization optimization in four different settings: (i) We use an or- of trigonometric functions [17] for which NP-hardness is acle description of a quantum computer and show that known. For the specific class of functions, we also show the classical optimization of VQA is an NP-hard prob- the existence of an approximation ratio explicitly. We use lem, even if it only needs to be solved within constant this problem later to obtain hardness results for various relative precision. Next, we remove the oracle from the VQA versions. problem formulation by focusing on classically tractable systems where the underlying ground state problem is Problem 1 (MaxCut). efficiently solvable. Here, we consider quantum systems Instance: The adjacency matrix A 0, 1 d×d of an un- ∈ { } where (ii) the Hilbert dimension scales polynomi- weighted undirected graph. P ally in the number of parameters (i.e. logarithmically Task: Find S [n] that maximizes Ai,j. ⊂ i∈S,j∈[n]\S many qubits) or (iii) is composed of free fermions. (iv) If the setup is restricted to be of QAOA type, we show MaxCut is famously known to be NP-hard. Addition- that our hardness results also hold. ally MaxCut is APX-hard, meaning that every polyno- mial time approximation algorithm has an approxima- tion ratio Algorithm Solution α < 1 for at least some Optimal Solution ≤ max instances, assuming P = NP. It was shown that if 6 the unique games conjecture is true, then the best ap- A. Connection to complexity theory proximation ratio of a polynomial algorithm is αmax = θ/π min <θ<π 0.8786 [18], which is also what 0 (1−cos(θ)/2) ≈ The decision version of VQA optimization is in the the best known algorithms can guarantee [19]. Without 16 complexity class QCMA, problems that can be verified using this conjecture, it is proven that αmax ≤ 17 ≈ with a classical proof on a quantum computer. The 0.941 [20]. For our purpose we define a continuous, class QMA, which allows for the proof to be a quantum trigonometric version of MaxCut. Minima of real val- state, contains QCMA. Much about the relationship be- ued functions are given by real numbers that may not tween classical MA, QCMA and QMA is yet unknown. have an efficient numerical representation. However, it Prominently, finding the ground state energy of a local is common to say that a minimization problem is solved Hamiltonian is QMA-hard [15, 16]. This means that if if it is solved to exponential precision, which is the con- QCMA = QMA then VQA algorithms will not be able to vention we will also be using throughout this paper. The 6 solve the local Hamiltonian problem, but possibly other intuitive notion is that the hardness does not come from problems contained in QCMA. What our results also the difficulty of representing the minimum. show is that even if the relevant energy eigenstates are contained in the VQA ansatz, the classical optimization Problem 2 (Continuous MaxCut). may still be at least as difficult as solving NP problems Instance: The adjacency matrix A 0, 1 d×d of an un- ∈ { } (SectionIIIA). weighted graph. 3

Task: Find φ [0, 2π)d that minimizes The same minima (in the MaxCut formulation) are also ∈ achieved by a greedy algorithm: start with a random bi- d 1 X partition of the vertex set, then repeatably change a sin- µ(φ) := Ai,j(cos(φi) cos(φj) 1). (1) 4 − gle vertex assignment if it increases the cut until the cut i,j=1 cannot be increased any further by this update rule. This 1 Lemma 1. Problem2 is NP-hard. Moreover, if P = algorithm has an approximation ratio of α = 2 , mean- NP, for every polynomial time algorithm there exists an6 ing it only guarantees to approximate MaxCut to half its approximation ratio, which is equal to that of MaxCut. optimal solution. If Problem2 is solved with gradient based methods, any local minimum can be the final re- Proof. We first show that the discrete problem at the ex- sult, therefore gradient based algorithms also only have tremal points cos(φ) = 1 solves Problem2 and then α = 1 ± an approximation ratio of 2 , which is significantly show equivalence of the discrete problem and MaxCut, worse than what modern MaxCut solvers can achieve [19]. which implies the same is true for Problem2 since MaxCut has an approximation ratio. We now argue that the relevant local minima of µ are III. Variational quantum algorithms discrete and given by φ 0, π d. For this, we observe ∈ { } that VQA is a general framework of hybrid quantum com- ∂µ 1 X puters, where classically tunable parameters φ of a uni- = sin(φi) Ai,j cos(φj) , (2) tary circuit are used to minimize the expectation value ∂φi −2 j: j=6 i of an observable. First, in SectionIIIA, we consider such computing schemes, where the quantum part is composed A = 0 i A = AT where we have used that i,i for all and . of qubits. In order to show that the classical simulation Local extrema φ satisfy ∂µ = 0 for all i. Therefore, all ∂φi is hard, we assume oracle access to an idealized quantum candidates for local minima are given by φ 0, π d. As device. Next, in SectionIIIB, we show that the problem ∈ { } 2 2 is also hard for VQA settings with small ∂ µ ∂ µ = , (3) dimensions (or logarithmically many qubits) so that the ∂φ2 − ∂φ2 i φi=0 i φi=π oracle can be replaced by efficient classical simulation. In SectionIIIC we use the same small setting, but con- there always exists a minimum/maximum pair for φi sider QAOAs instances instead. Last, in SectionIIID, ∂2µ 7→ µ(φ), unless 2 = 0, where one can show that chang- we again consider Hilbert space dimensions that are ex- ∂φi ing φi does not affect the overall value of the objective ponentially large in the parameter size φ but state the function µ. It follows that for any φ [0, 2π)d optimization problem so that the oracle can be replaced ∈ by efficient free fermionic calculations. µ(φ) min µ(φ ), µ(φ ) , (4) ≥ { |φi=0 |φi=π } where φ denotes the vector obtained from φ by re- A. VQA optimization with quantum computer |φi=x placing φi by x. This means, we can iteratively choose φi access to be in 0, π , without increasing µ. Therefore, an al- { } gorithm returning continuous values of φi can be turned The common application of VQAs is within quantum discrete in polynomial time without reducing approxi- computing, where a quantum computer is used to es- mating power. The discrete problem can be written as timate the expectation value and a classical algorithm d chooses the circuit parameters of the quantum computer. 1 X For the classical optimization, we describe the informa- min (µ(φ)) = min Ai,j(vivj 1) φ∈{0,π}d 4 v∈{−1,1}d − i,j=1 tion obtained from the quantum computer with oracle d calls made by the classical algorithm. 1 X  (5) = min Ai,j 2δvi=6 vj Problem 3 (VQA minimization, oracular formulation). 4 v∈{−1,1}d − i,j=1 Instance: A set of generators Hi i∈{1,...,L} and an ob- X { 2 }⊗N = max Ai,j , servable O acting on = (C ) , given in terms of − S⊂[d] H i∈S,j∈[d]\S their Pauli basis representation. Oracle access: We set Ψ(φ) := UL(φL) U1(φ1) 0 −iHiφ | i ··· | i where δvi=6 vj = 1 δvi,vj and we used that A is symmetric. with U (φ) = e . The oracle returns O(φ) := − i Therefore, we obtain a (many-one) reduction of MaxCut Ψ(φ) O Ψ(φ) , given φ, up toO any desiredh polyno-i to Problem2, implying Problem2 is NP-hard. hmial additive| | error.i Task: Find φ RL that minimizes O(φ) provided ac- We want to note that, since the Hessian is diagonal at cess to . ∈ h i φ 0, π d, basically a local extremum is a local mini- O ∈ { } mum in the continuous problem, whenever changing any We use the oracle to outsource difficult computations, single φi does not decrease the objective function. similarly to how a quantum computer would in a physical 4 implementation. The motivation of our oracle is that where O refers to the smallest expectation value that h imin the complexity of Problem3 captures the complexity of can be achieved through classical optimization. Since we only the classical optimization effort in hybrid quantum are interested in classical algorithms we define an opti- computations. The oracle can be seen as post-selecting mization error, similarly to how approximation ratios are on the successful runs only, therefore making the return defined for APX-problems, over all considered instances. deterministic. Definition 1 (Optimization error). The optimization er- of an optimization algorithm is the smallest Proposition 1 (Hardness of VQA optimization, oracular ror ∆ 0 number≥ such that formulation). Assuming P = NP there is no determinis- tic classical algorithm that solves6 Problem3 in polynomial O O ∆ |h ia − h imin| (12) time. ≥ O k k It is straightforward to show that Problem3 is NP- for all considered VQA instances, where O min is the optimal solution of the classical optimizationh i and O hard to solve. Essentially, we use a diagonal observable h ia for which the ground state problem is NP-hard and use the approximation result from the algorithm. unitaries to reach every computational basis state. Corollary 1. If P = NP, then there exists no polynomial time algorithm which6 can guarantee any optimization er- Proof. We prove the proposition via a reduction of Prob- ror ∆ < 1 for all VQAs as defined by Problem3. lem2 to Problem3. For this, let N = d and o the usual Ising Hamiltonian encoding of MaxCut Proof. We prove this statement by relating the optimiza- tion error of a VQA to the approximation ratio of MaxCut d and by introducing a boosting technique to amplify errors 1 X (i) (j) O := Ai,j(σ σ 1) . (6) in the setting of Problem3. 4 z z − i,j=1 From the proof of Proposition1 we obtain O = k k MaxCut(A) and the optimal solution is also O = |h imin| We use L = d layers with MaxCut(A), as there is no model mismatch. From the algorithm we get O = µa(A), where µa(A) is approx- (i) h ia σy imation of the continuous MaxCut problem (Problem2) Hi := , i [d] , (7) 2 ∈ and, therefore, an approximation to MaxCut itself. With this argument it follows that as generators. By direct calculation we find that ∆ 1 α . (13) O(φ) = Ψ(φ) O Ψ(φ) ≥ − h i h | | i To boost this result we introduce a variable k and choose d 1 X operators for k d qubits = Ai,j(cos(φi) cos(φj) 1) (8) × 4 − O˜ = ( 1)k−1O⊗k , (14) i,j=1 − ⊗k = µ(φ) , U˜(φ) = U(φ) . (15) ˜ poly which is the objective function of Problem2. We can verify that the generators and O only have (d) many terms for constant k. For the expectation value this To analyze the overall approximation power of an algo- gives rithm we define the approximation error for an instance k− ⊗k ⊗k ⊗k O˜(φ) = ( 1) 1 Ψ(φ) O Ψ(φ) to be h i − h | | i (16) = O(φ) k = µ(φ) k , O λmin(O) − |h i| −| | δ := | h ia − | , (9) O where the introduced sign ensures that the problem re- k k mains a minimization for all k. We obtain O˜ = k k where λ (O) is the smallest eigenvalue of the observ- O˜ = MaxCut(A)k. This yields min h imin able O and O a is the final value returned by the algo- h i MaxCut(A)k O˜ rithm. Here, the quantity is normalized by the operator − h ia ∆ sup k norm of O. There are two error contributions: (i) the ≥ A MaxCut(A) model mismatch δm is the approximation error result-  k  µa(A) (17) ing from the ansatz class being unable to represent the = sup 1 | | MaxCut k ground state and (ii) the optimization error δo is the er- A − (A) ror due to the classical algorithm not converging to the = (1 αk) , optimal solution within the class. That is, − where α is the approximation ratio related to Prob-

O λmin(O) O O lem2 of the algorithm. Therefore, no optimization error δ = | h imin − | + | h ia − h imin | (10) O O strictly smaller than ∆ < 1 can exit for all instances (if k k k k P = NP), as this would mean in return, the algorithm 6 = δm + δo , (11) could solve Problem2 to arbitrary precision. 5

B. Logarithmic number of qubits — polynomial and Hilbert space dimension O(φ) = Ψ(φ) O Ψ(φ) h i h | | i We can improve on the previous result, by allowing d d 1 X X isφi −ipφj 1 X only N O(log(d)) many qubits, where d is the input = e Ai,je Ai,j ∈ 16 − 4 length of the MaxCut instance. This drastically reduces s,p∈{+,−} i,j=1 i,j=1 d the system’s size and complexity. Notably, since the 1 X Hilbert space is now only of polynomial dimension, both = Ai,j (cos(φi φj) + cos(φi + φj) 2) 8 − − the calculation of expectation values and the ground state i,j=1 problem can be computed efficiently. Yet, we show that d 1 X VQA optimization is still NP-hard. This means that the = Ai,j(cos(φi) cos(φj) 1) 4 − classical optimization does not merely inherit the hard- i,j=1 ness of the ground state problem, but is intrinsically diffi- = µ(φ) cult. Since the expectation value is efficiently numerically (23) simulatable, we do not require oracle access to a quantum as corresponding expectation value. This completes the computer to analyze the problem. Also for convenience, reduction of Problem2 to Problem4. instead of the Pauli-basis, we use the computational basis of the Hilbert space of dimension dim( ) = 2N =: n. H H From this result, NP-completeness follows for the de- This gives the following problem description. cision version.

Problem 4 (VQA minimization problem). Problem 5 (VQA minimization, decision version). An initial state n, a set of generators n Instance: Ψ0 C Instance: An initial state Ψ0 C , a set of generators | ni, ∈ where is the number of | ni ∈ Hi i∈{1,...,L} Herm(C ) L Hi i∈{1,...,L} Herm(C ), where L are the number { } ⊂ n { } ⊂ n layers and an observable O Herm(C ). of layers, an observable O Herm(C ) and a thresh- ⊂ ∈ Task: For Ψ(φ) := UL(φL) U1(φ1) Ψ0 with old a R. | −iHi iφ ···L | i ∈ Ui(φ) = e , find φ R that minimizes For Ψ := U (φ ) U (φ ) Ψ with ∈ Task: φ L L 1 1 0 O(φ) := Ψ(φ) O Ψ(φ) . U (φ) =| e−iiHiφ, determine if··· there exists| φi d h i h | | i i R for which Ψ(φ) O Ψ(φ) a. ∈ Theorem 1 (Hardness of VQA optimization). VQA op- h | | i ≤ timization (Problem4) is NP-hard. Corollary 2. Problem5 is NP-complete.

Proof. Let A 0, 1 d×d be the adjacency matrix of an Proof. As calculating the expectation value of observable ∈ { } d unweighted graph. On the Hilbert space = C2 we on polynomial dimensional Hilbert spaces is in P, φ is H first define a valid proof for the yes instances, which can be veri- fied in polynomial time and is therefore in NP. Together d 1 1 with hardness of problem4, this means Problem5 is NP- O0 := A . (18) 8 · ⊗ 1 1 complete.

To obtain the observable in the computational basis, we We now want to show that L = 1 layer is sufficient to modify the diagonal, show hardness. For this we will use certain properties of Hamiltonian spectra.  0 Oi,j i = j Oi,j = 2d 6 . (19) Definition 2 (Approximate ergodic energy spectrum). P O0 i = j − α=1 α,j Let  > 0. We call a set Ei i∈n R an -approximate { } ⊂ n ergodic energy spectrum if for all φ [0, 2π) there ex- + ∈ The initial state and generators are chosen as ists t R such that ∈ 0 2d 1 X φi Eit  (24) Ψ := j , (20) | − |mod 2π ≤ | 0i √ | i 2d j=1 for all i [n], where x := infk∈ x 2πk [0, π]. ∈ | |mod 2π Z | − | ∈ Hi := 2i 1 2i 1 2i 2i , (21) | − ih − | − | ih | Generic energy spectra are exactly ( = 0) ergodic. where we take L = d layers. As parameterized state we For our purpose we want to show that there are also ef- obtain ficiently expressible approximate ergodic energy spectra. Lemma 2 (Approximate ergodic energy spectra). Ψ(φ) := Ud(φd) ...U1(φ1) Ψ0 | i | i Let m N. Then 2d ∈ 1 X −iφj iφj  (22) = e 2j 1 + e 2j 2π √2d | − i | i Ei := (25) j=1 mi 6 with i [n] defines an -approximate ergodic energy spec- Problem 6 (QAOA minimization problem). ∈ n trum with Instance: Two Hamiltonians Hb,Hc Herm(C ) and the number of layers L in unary notation∈ 1. 4π  = . (26) Task: For a tunable state Ψ(β, γ) := m | i Ub(βL)Uc(γL) Ub(β1)Uc(γ1) Ψ0 , where Ψ0 ··· | i −iHbβ | i We provide a proof in AppendixA. The chosen energies is the ground state of Hb, Ub(β) = e and −iHcγ d Uc(γ) = e , find β, γ R which minimize can be expressed with n log2(m) bits of precision. ∈ × d e O(β, γ) := Ψ(β, γ) Hc Ψ(β, γ) . With this property we can show the following theorem. h i h | | i Theorem 3 . Theorem 2. VQA optimization (Problem4) is NP-hard (Hardness of optimization in QAOAs) Problem6 is NP-hard for L = 1 layer. for L = 1 layer. Proof. To show this, we show a reduction from single Proof. We modify U(φ) U(φ) = U(Eφ) from Eq. (21). → layer VQA to QAOA, which implies that Problem6 is This can be achieved with the generator NP-hard. 2d+1 d We consider the Hilbert space = C . For Hb we X H H = Ej ( 2j 1 2j 1 2j 2j ) (27) take | − i h − | − | i h | j=1 Hb = diag(E , E ,E , E ,...,Ed, Ed, 1) , (29) 1 − 1 2 − 2 − − U(φ) = exp( iφH) O and . The initial state and remain where Ei < 1 for all i [d] and τ is some real constant − | | ∈ identical. This leads to the expectation value that we adjust later. For Hc

d X Hc = O 0 + τ ( +2d 2d + 1 + 2d + 1 +2d ) , (30) O(φ) = Ai,j (cos(Eiφ) cos(Ejφ) 1) = µ(Eφ) . ⊕ | i h | | i h | h i − i,j P2d =1 where + d = j /√2d and define the observable | 2 i j=1 | i (28) O as in Eq. (19). O 0 refers to O being embedded in ⊕ the first 2d computational states in the Hilbert space. By If Ei i are chosen as in Lemma2 then O(φ) approxi- design, λ (Hb) = 1 is the ground state energy with { }µ(φ) φ = Eφ h i min − mates with to arbitrary precision, which we ground state 2d + 1 . NP | i have shown to be -hard to optimize in Lemma1. For the state we obtain

By viewing the VQA in Theorem2 as a continuous Ψ(γ) := Uc(γ) 2d + 1 time evolution for logarithmically many qubits, we obtain | i | i (31) = cos(τγ) 2d + 1 + i sin(τγ) +2d the following result (we are unaware of this statement | i | i being explicitly proven before). after applying the first Hamiltonian, where we used that + d is an eigenstate of O 0. This gives the final state Corollary 3. For a system with logarithmically many | 2 i ⊕ qubits, we consider the expectation value of a (unitarily) Ψ(β, γ) = Ub(β)Uc(γ) 2d + 1 | i | i time evolved observable O(t) , starting from some initial iβ h i + = cos(τγ)e 2d + 1 state. Then minimizing the expectation value over t R0 | i ∈ d is NP-hard. 1 X + i sin(τγ) e−iEj β 2j 1 + eiEj β 2j . √ | − i | i 2d j=1 C. Quantum approximate optimization algorithms for a logarithmic number of qubits From this variational state we derive the expectation value

O(β, γ) = Ψ(β, γ) Hc Ψ(β, γ) QAOAs can be seen as certain types of VQAs that h i h | | i use the structure from adiabatic computation, where a = sin2(τγ)f(β) + 2τ cos(τγ) sin(τγ)g(β) slow enough transition between two Hamiltonians Hb and Hc guarantees remaining in the ground state as long with as the Hamiltonians are gapped and level crossings are 1 X f(β) = (Ai,j(cos(Eiβ) cos(Ejβ) 1) , (32) avoided [21]. QAOAs capture a time-discrete version of 4 − this approach by alternatingly applying the time evolu- i,j tions of the Hamiltonians. Accordingly, parameter vec- d L sin(β) X tors β, γ R need to be chosen, which define how long g(β) = cos(Ejβ) . (33) ∈ − d each Hamiltonian is applied. i=1 We demonstrate that the hardness of VQA optimiza- tion for logarithmically many qubits also translates to QAOA problems. 1 Formally, the problem is as follows. This means that the length of the input scales linearly with L. 7

For τ 1 the contribution of g becomes insignificant means that if P = NP, any polynomial time algorithm  π 6 and γ = minimizes the objective function as f(β) 0, is only able to guarantee QAOA and therefore general 2τ ≤ meaning the problem is equivalent to minimizing f(β) = VQA minimizations to an error ∆ 1 α . ≥ − max µ(Eβ) which approximates µ(φ) to arbitrary precision if E is chosen as in Lemma2 and therefore gives a reduction For gradient based methods, in contrast, this means ∆ 1 for logarithmically many qubits, as α = 1/2 was from Problem2. ≥ 2 shown. In the proof of Theorem3, the energies of Hb span many orders of magnitude. This means a potential quan- tum computer needs to be incredibly precise to imple- D. Free fermionic models ment such a QAOA. We will show, that this is not re- quired, but that also for very simple spectra, hardness Free fermionic models are a certain class of fermionic results can be obtained. many-body systems, without actual particle-particle in- Theorem 4. QAOA optimization (Problem6) is NP- teractions. They are especially interesting for us, as they hard for periodic optimization β, γ [0, 2π)L, even can be simulated efficiently for so-called Gaussian input ∈ if we restrict Hb 3 and Hc 3 (i.e Ei states and observables. 3, 2,..., 3 k). k ≤ k k ≤ ∈ Fermionic creation and annihilation operators are de- {− − } † noted by cj and cj. They satisfy the anti-commutation Proof outline. In AppendixB, we construct explicit † relations ci , cj = δi,j and ci, cj = 0 for all i, j. We Hamiltonians Hb and Hc from an adjacency matrix A, { }quadratic {Gaussian} 2MaxCut(A) call an operator or if it is a quadratic where the solution is Hc = 1 and h imin − |E(A)| polynomial in the creation and annihilation operators. Hc = 1. We do this by embedding a modified ver- We will consider (balanced) quadratic observables of the k k sion of Problem2 into the QAOA circuit in such a way, form which penalizes deviations from it by increasing the ex- X † pectation value. H = hi,j ci cj (35) i,j We will now show bounds on the optimization errors for VQA in this restricted setting. Here, we are unable to and call h the coefficient matrix of H, which is Hermitian. use the same boosting technique to increase the hardness Also in the following, we denote operators by capital and result further. their respective coefficient matrices by lower case letters. A quantum state is Gaussian if it can be arbitrar- Corollary 4. All polynomial time algorithms for QAOA ily well approximated by a thermal state of a quadratic and therefore VQA optimization (Problems6 and4) have Hamiltonian. For a Hamiltonian H we denote its ground an optimization error , where is the ∆ 1 αmax αmax state by approximation ratio of MaxCut≥ −. e−βH Proof. For the Hamiltonians in proof of Theorem4, we ρ[H] = lim . (36) β→∞ Tr[e−βH ] have Hc = 1 and the lowest achievable expectation k k 2MaxCut(A) value is Hc = 1 , where E(A) is the h imin − |E(A)| | | From this we can define the VQA problem in the free number of edges of the the graph. From this we can cal- fermionic setting. culate an upper limit on the possible guaranteed precision of an optimization algorithm for all instances Problem 7 (VQA minimization problem, free fermions).   Instance: Coefficient matrices h0, h1, . . . , hL, o Hc Hc n ∈ ∆ sup |h imin − h ia| Herm(C ). ≥ A Hc Task: The coefficient matrices define quadratic observ- k k   2MaxCut(A) 2 µa(A) ables H0,H1,...,HL and O via (35) and ρ0 = ρ[H0]. = sup 1 1 | | For the evolved state A − E(A) − − E(A)  | | | | 2MaxCut(A) 2 µa(A) (34) † † = sup | | ρ(φ) := UL(φL) U1(φ1)ρ0U1 (φ1) UL(φL) , A E(A) − E(A) ··· ··· | | | | −iHiφ L 2MaxCut(A) with Ui(φ) = e , find φ R that minimizes = (1 α) ∈ − E(A) O(φ) := Tr[Oρ(φ)]. | | h i 1 α , ≥ − Theorem 5. Problem7 is NP-hard, even if the initial state ρ0 is pure. where the supremum goes over all adjacency matrices and µa(A) is the approximation of MaxCut from the algo- Proof. | | We prove the theorem via a reduction of Prob- rithm and α is the approximation ratio of the algorithm; lem2 to Problem7. Therefore, we consider a Hermitian |E A | in the last step we used that MaxCut(A) ( ) . This adjacency matrix A 0, 1 d×d. ≥ 2 ∈ { } 8

For the VQA setup, we use n = d 2 fermionic modes We encoded NP-hard problems into local extrema of × ci with i [2d] and L = d layers. To encode Problem2 the optimization landscape of VQA problems. Gradient ∈ 2d×2d we define h0, hi i∈[L], o Herm(C ) descent type optimization and also higher order methods { } ∈ can converge to any local minimum determined mostly   1 by the initialization. From this we could explicitly show, h0 = 1 , (37) − n that even for logarithmically many qubits, these methods   1 1 0 have an approximation error of ∆ 2 . For our particular hi = Ei , i [d] (38) ≥ ⊗ 0 1 ∈ VQA, this is significantly worse than what modern effi- − cient MaxCut solvers can guarantee. This emphasizes the where 1a,b = 1 and Ei;a,b = δi,a,b (Kronecker delta) for need for effective initialization procedures for VQAs algo- all i, a, b. The coefficient matrix o is given by the matrix rithms and poses the challenge to find non-local heuristics O as defined in Eqs. (18) and (19), which is used for the for VQA optimization to overcome the problem of these encoding of the adjacency matrix A. persistent local minima to reach smaller optimization er- † rors. We define Γi,j := Tr(cjciρ0) to be the covariance ma- trix of ρ0, which can be evaluated to Γ = 1/(2d) us- ing the identity (C13). As the eigenvalues of h0 are ACKNOWLEDGMENTS λ = ( 1, 1, , 1), ρ describes a pure state, cp. Ap- − ··· 0 pendixC. From Eq. (C3) we obtain the coefficient matrix of O(φ) in Heisenberg picture as We thank David Wierichs, Sevag Gharibian, Raphael Brieger and Thomas Wagner for helpful comments on our o(φ) = eihdφd eih1φ1 o e−ih1φ1 e−ihdφd . (39) manuscript and Jens Watty, Christian Gogolin and David ··· ··· Gross for fruitful discussions on the nature of VQEs and With these prerequisites we can derive the expectation QAOAs. value O(φ) = Tr(O(φ)ρ ) h i 0 APPENDICES  2d  X † = Tr  o(φ)i,jci cjρ0 Appendix A: Proof of Lemma2 on ergodic energy i,j=1 spectra X = o(φ)i,jΓj,i E i,j (40) Starting from the definition of 1 X 2π = o(φ)i,j Ei := , (A1) 2d mi i,j let φ [0, 2π)n be the desired phase vector. For this we 1 ∈ = Ai,j (cos(φi) cos(φj) 1) define 4 −   φim = µ(φ) , si := 0, . . . , m 1 (A2) 2π ∈ { − } where the last step follows analogously to Eq. (23). As and this gives the objective function from Problem2 this n shows the desired reduction. X j−1 n t(s) := sjm 0, . . . , m 1 . (A3) ∈ { − } j=1 IV. CONCLUSION AND OUTLOOK Then

φi Eit −   Our results show that the classical training is a crucial   n challenge in VQA based hybrid quantum computations. 2πsi 2πsi X j−1−i = φi +  2π sjm  − m m − We have not only shown that optimizing VQA algorithms j=1 NP is -hard, but also that no polynomial time algorithm   can have an optimization error ∆ < 1 for all instances   i−1 2πsi X j−1−i (assuming P = NP). Additionally, for significantly sim- = φi 2π sjm  6 − m − (A4) pler system, such as those composed of logarithmically j=1 | {z } many qubits or free fermions, the hardness results already | · |≤2π/m hold. This also shows that hardness does not merely de-  n  rive from the ground state problem. Additionally, we X j−1−i 2π sjm  . have extended these results further to optimization on a − single layer of gates, to continuous unitary time evolution j=i+1 | {z } and to QAOA problems. ∈2πZ 9

Hence, and 4π φi Eit mod 2π , (A5) X | − | ≤ m HT := H x,y i, j, a, b i, j, a, b (B2) 0 | ix h |y which is what we wanted to show. i,j,a,b,x,y

Appendix B: Proof of Theorem4 on multilayer with x, y 1, 2 . ∈ { }d×d QAOAs Let A 0, 1 be the adjacency matrix of an un- ∈ { } weighted graph with at least one edge. We will construct H Now we construct a many-one reduction from Prob- b such that it has the ground state lem2 to a multilayer QAOA optimization. For this pur- pose, we first define some useful objects. d d 2 2 1 X Let := C C C C , where d will be the gs := Ai,j i, j, a, b . (B3) K ⊗ ⊗ ⊗ | bi qP | i ∈ K size of an adjacency matrix to encode MaxCut and the 2 i,j Ai,j i=6 j,a,b number of layers (L = d). We define a larger Hilbert space as a direct sum = d with H H H1 ⊕ · · · ⊕ H2 +1 ` = for ` [2d + 1]. We canonically identify each ` H ∼ K ∈ H For this construction it will be helpful to denote with the corresponding subspace ` and denote the H ⊂ H canonical basis states by i, j, a, b , where i, j [d], {| i`} ∈ a, b 0, 1 and ` indicates the subspace `. Next, we ∈ { } H Hgs := 3 gsb gsb . (B4) define four two-level unitary evolutions by − | ih |

ψ (φ) = e−iφ/2 cos(φ/2) 1 + e−iφ/2 sin(φ/2) 2 , | 0 i | i | i The solution of MaxCut will be captured by the last sub- ψ1(φ) = cos(φ) 1 + sin(φ) 2 , space d . For this we define Hp Herm( ) | i | i | i H2 +1 ⊂ H ∈ K ψ (φ) = eiφ cos(φ) 1 + i eiφ sin(φ) 2 , as | 2 i | i | i ψ (φ) = e−iφ cos(φ) 1 i e−iφ sin(φ) 2 , | 3 i | i − | i 1 X ˜ −iHiφ Hp = δa6 a i, j, a, b i, j, a,˜ b , (B5) which are generated as ψi(φ) = e 1 by 2 =˜ | i h | | i | i i,j,a,b,a,˜ ˜b 1 1 i 0 i H0 = − ,H1 = − , 2 i 1 i 0     where δa=˜6 a := 1 δa,a˜. Finally, we define Hb,Hc 1 1 1 1 − ∈ H2 = − − ,H3 = , Herm( ) as 1 1 1 1 H − − with eigenvalues 0, 1 , 1, 1 , 2, 0 and 0, 2 , re- { } {− } {− } { } (1) (d) spectively. Based on these evolutions, we define transfer Hb = Hgs HT HT , (κ) ⊕ ⊕ · · · ⊕ (B6) Hamiltonians HT ,H Herm( ) as Hc = HT HT Hp , T ∈ K ⊕ K ⊕ · · · ⊕ ⊕  H x,y if i = j or a = 0,  1  where the ground state of Hb is gs given as the (κ) X H2 x,y if i = κ or j = κ, b = 0, | bi1 HT := embedded state gs = gs 0 . Similarly, H j = κ, b = 1, b 1 b i,j,a,b,x,y  3 x,y if | i | i ⊕ ∈ H  gsb ` ` is defined. H1 x,y otherwise | i ∈ H ⊂ H For the first layer, this gives the state i, j, a, b i, j, a, b × | ix h |y

(B1)

1 X Ψ = gs = Ai,j i, j, a, b , | 0i | bi1 pP | i1 2 Ai,j i=6 j,a,b

−iγ1/2 −iγ1/2 Uc(γ1) Ψ0 = sin(γ1/2)e gsb + e cos(γ1/2) gsb , (B7) | i | i2 | i1 −iγ1/2 π b π sin(β1) sin(γ1/2)e X ia(δi,1(β1+ )+(−1) δj,1(β1+ )) Ub(β )Uc(γ ) Ψ = Ai,je 2 2 i, j, a, b + ... ; 1 1 | 0i pP | i3 2 Ai,j i=6 j,a,b 10 only the highest i subspace is shown, as this is the relevant one. Applying all d layers of the QAOA gives H Ψ(βγ) = Ub(βd)Uc(γd) ...Ub(β )Uc(γ ) Ψ | i 1 1 | 0i Q −iγk/2 π b π k sin(βk) sin(γk/2)e X ia(βi+ +(−1) (βj + )) (B8) = Ai,je 2 2 i, j, a, b + ... pP | i2d+1 2 Ai,j i=6 j,a,b

And thus the expectation value of Hc becomes

Q 2 2 1 k sin (βk) sin (γk/2) Ψ(βγ) Hc Ψ(βγ) = P h | | i 2 4 Ai,j d X 2Ai,j (cos(βi + βj + π) + cos(βi βj) + cos( βi βj π) + cos( βi + βj)) + O × − − − − − h resti i,j=1 Q 2 2 d 1 k sin (βk) sin (γk/2) X = P 8Ai,j sin(βi) sin(βj) + Orest 2 4 Ai,j h i i,j=1 Q 2 2 d k sin (βk) sin (γk/2) X P Ai,j sin(βi) sin(βj) , ≥ Ai,j i,j=1 (B9)

2 where we used that Ai,j = Ai,j and Orest 0 denotes First, we explain how time evolution can be simulated h i ≥ † † the expectation valueof Hc within 1 2d. The efficiently. With the commutation relation [ci cj, ckcl] = H ⊕ · · · ⊕ H † † expression δj,kc cl δi,lc cj the time evolution in the Heisenberg i − k X picture becomes f(β) := Ai,j sin(βi) sin(βj) = 4µ(β π/2) + 2 E(A) − | | i,j n ˙ X † O = i[H,O] = i [h, o]i,jci cj , (C1) is minimized for a shifted solution of Problem2 with i,j=1 local extrema βi π/2, 3π/2 . Its minimum value is ∈ { } non-positive, as MaxCut(A) E(A) /2. This means where o and h are again the coefficient matrices of O and ≥ | | ˙ Pn † that H, as in (35). With O = i,j=1 o˙i,jci cj we obtain

Y 2 2 g(β, γ) := sin (βk) sin (γk/2) o˙ = i[h, o] . (C2) k O(t) = eiHtO e−iHt needs to be maximized. This can be achieved trivially For this gives by setting γi = π for all i and choosing β to be a lo- o(t) = eihto e−iht , (C3) cal extremum, where the function evaluates to 1. This also minimizes O = 0. This means the problem is h resti meaning that the Hilbert space unitary eiHt is repre- equivalent to minimizing µ(β π/2), which completes − sented by the unitary n n matrix eiht on the level of the reduction from Problem2. Similarly, it follows that × second moments. an algorithm approximating this QAOA also returns a Secondly we derive an expression for the covariance lower bound to MaxCut(A). matrix Γ for thermal states. Quadratic observables (35) Finally, we show the claimed norm bounds on Hb can be written in a normal form. This form can be ob- and Hc from Eq. (B6). Direct calculations reveal that (κ) tained by observing that unitary mode transformations H = 3, H = 2, HT = 1 and Hp = 1. k gsk T k k k k leave the commutation relations invariant: For Hence, Hc = 1 and Hb = 3. n k k k k X c˜i = ui,jcj (C4) j=1 Appendix C: Free fermions with u U(n) being unitary matrix, In this section, we provide some basics on free fermions ∈ n for the special case of particle number preserving Hamil- † X ∗ † c˜ , c˜j = ui,ku ck, c = δi,j . (C5) tonians. Throughout, we consider n fermionic modes { i } j,l{ l } k,l=1 with annihilation operators c1, . . . , cn. 11

Hence, with basic linear algebra one can find a transfor- mal state w.r.t. c˜i is given by { } mation u U(n) such that ∈ ˜ † Γ(β)i,j = c˜jc˜i β (C8) ∂ n n n = δi,j ln(Z) (C9) X † X ˜ † X † − ∂(βλi) H = hi,jci cj = hi,jc˜i c˜j = λic˜i c˜i , (C6) δi,j i,j=1 i,j=1 i=1 = , (C10) e−βλi + 1 where we have denoted the expectation value of the ther- †˜ ˜ where h = u hu and h = diag(λ). This describe n de- mal state by β, used the normal form (C6) in the sec- h · i coupled modes each with eigenenergies Ei 0, λi . The ond step and (C7) in the last step. In compact notation, ∈ { P}n total energy of an eigenstate is therefore E = i=1 Ei. ˜ We note that the ground state energy is non-degenerate Γ(˜ β) = (e−βh + 1)−1 . (C11) if λi = 0 for all i [n]. The normal form (C6) allows 6 ∈ us to write the partition function of a thermal state at Therefore, the covariance matrix w.r.t. ci is inverse temperature β as { } n D † E X ∗ D † E Γ(β)i,j = cjci = uk,iul,j c˜l c˜k , (C12) β β Y k,l=1 Z = Tr[exp( βH)] = (e−βλi + 1) . (C7) − i that is

† −βh −1 Γ(β) = u Γ˜u = e + 1 . (C13) Hence, the covariance matrix of the corresponding ther-

[1] J. Preskill, Quantum computing in the NISQ era and [11] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, beyond, Quantum 2 (2018), 10.22331/q-2018-08-06-79, and H. Neven, Barren plateaus in quantum neural net- arXiv:1801.00862 [quant-ph]. work training landscapes, Nat. Commun. 9, 4812 (2018), [2] E. Farhi, J. Goldstone, and S. Gutmann, A quantum arXiv:1803.11173 [quant-ph]. approximate optimization algorithm, arXiv:1411.4028 [12] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. [quant-ph]. Coles, Cost-function-dependent barren plateaus in shal- [3] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. low quantum neural networks, arXiv:2001.00550 [quant- Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, A ph]. variational eigenvalue solver on a photonic quantum pro- [13] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, cessor, Nat. Commun. 5, 4213 (2014), arXiv:1304.3061 L. Cincio, and P. J. Coles, Noise-induced barren plateaus [quant-ph]. in variational quantum algorithms, arXiv:2007.14384 [4] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, [quant-ph]. S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, [14] M. Cerezo and P. J. Coles, Impact of barren L. Cincio, and P. J. Coles, Variational quantum algo- plateaus on the Hessian and higher order derivatives, rithms, arXiv:2012.09265 [quant-ph]. arXiv:2008.07454 [quant-ph]. [5] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and [15] A. Y. Kitaev, A. Shen, and M. N. Vyalyi, Classical and N. Killoran, Evaluating analytic gradients on quan- quantum computation, Vol. 47 (American Mathematical tum hardware, Phys. Rev. A 99, 032331 (2019), Society, 2002). arXiv:1811.11184 [quant-ph]. [16] J. Kempe, A. Kitaev, and O. Regev, The complexity of [6] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited the local Hamiltonian problem, SIAM Journal of Comput- memory algorithm for bound constrained optimization, ing 35, 1070 (2006), arXiv:quant-ph/0406180 [quant-ph]. SIAM J. Sci. Comput. 16, 1190 (1995). [17] L. Pfister and Y. Bresler, Bounding multivariate trigono- [7] J. Stokes, J. Izaac, N. Killoran, and G. Carleo, metric polynomials with applications to filter bank design, Quantum natural gradient, Quantum 4, 269 (2020), arXiv:1802.09588 [eess.SP]. arXiv:1909.02108 [quant-ph]. [18] S. Khot, G. Kindler, E. Mossel, and R. O’Donnell, Opti- [8] D. Wierichs, C. Gogolin, and M. Kastoryano, Avoiding mal Inapproximability Results for MAX-CUT and Other local minima in variational quantum eigensolvers with the 2-Variable CSPs? SIAM J. Comput. 37, 319 (2007). natural gradient optimizer, arXiv:2004.14666 [quant-ph]. [19] M. X. Goemans and D. P. Williamson, Improved approx- [9] A. Mari, T. R. Bromley, and N. Killoran, Estimating the imation algorithms for maximum cut and satisfiability gradient and higher-order derivatives on quantum hard- problems using semidefinite programming, J. ACM 42, ware, arXiv:2008.06517 [quant-ph]. 1115 (1995). [10] B. Koczor and S. C. Benjamin, Quantum analytic de- [20] J. Håstad, Some optimal inapproximability results,J. scent, arXiv:2008.13774 [quant-ph]. ACM 48, 798 (2001). 12

[21] T. Albash and D. A. Lidar, Adiabatic quantum computation, Rev. Mod. Phys. 90, 015002 (2018), arXiv:1611.04471 [quant-ph].