QFold: Quantum Walks and Deep Learning to Solve Folding

P. A. M. Casares,1, ∗ Roberto Campos,1, 2, † and M. A. Martin-Delgado1, 3, ‡ 1Departamento de F´ısica Te´orica, Universidad Complutense de Madrid. 2Quasar Science Resources, SL. 3CCS-Center for Computational Simulation, Universidad Polit´ecnica de Madrid. (Dated: January 26, 2021) We develop quantum computational tools to predict how fold in 3D, one of the most important problems in current biochemical research. We explain how to combine recent deep learning advances with the well known technique of quantum walks applied to a Metropolis algorithm. The result, QFold, is a fully scalable hybrid quantum algorithm that in contrast to previous quantum approaches does not require a lattice model simplification and instead relies on the much more realistic assumption of parameterization in terms of torsion angles of the amino acids. We compare it with its classical analog for different annealing schedules and find a polynomial quantum advantage, and validate a proof-of-concept realization of the quantum Metropolis in IBMQ Casablanca quantum processor.

I. INTRODUCTION therefore represent a large boost for biochemical re- search. Proteins are complex biomolecules, made up of Until recently, one of the most popular approaches one or several chains of amino acids, and with a to fold proteins was to apply a Metropolis algorithm large variety of functions in organisms. Amino acids parameterised in terms of the torsion angles, as is are 20 compounds made of amine (−NH2) and car- done for example in the popular library package boxyl (−COOH) groups, with a that dif- Rosetta [8] and the distributed computing project ferences them. However, the function of the pro- Rosetta@Home [9, 10]. The main problem with this tein is not only determined by the chain, approach, though, is that the problem is combina- which is relatively simple to figure out experimen- torial in nature, and NP complete even for simple tally, but from its spatial folding, which is much models [11, 12]. For this reason, other approaches more challenging and expensive to obtain in a labo- are also worth exploring. In the 2018 edition of the ratory. In fact it is so complicated that the gap be- Critical Assessment of Techniques for Protein Struc- tween proteins whose sequence is known and those ture Prediction (CASP) competition [13], for exam- for which the folding structure has been addition- ple, the winner was DeepMind’s AlphaFold model ally analyzed is three orders of magnitude: there are [14], that was able to show that Deep Learning tech- over 200 million sequences available at the UniProt niques allow to obtain much better results. Deep- database [1], but just over 172 thousand whose struc- Mind approach consisted on training a neural net- ture is known, as given in the Protein Data Bank work to produce a mean field potential, dependent [2]. Furthermore, experimental techniques cannot on the distance between amino acids and the tor- always analyse the tridimensional configuration of sion angles, that can be later minimized by gradient the proteins, giving rise to what is called the dark descent. proteome [3] that represents a significant fraction of In this article we study how Quantum Comput- the organisms including humans [4, 5]. There are ing could help improve the state of the art in this even proteins with several stable foldings [6], and problem when large error-corrected quantum com- others that have no stable folding called Intrinsically puters become available. We propose using the pre- arXiv:2101.10279v1 [quant-ph] 25 Jan 2021 Disordered [7]. diction of AlphaFold as a starting point for a quan- Since proteins are such cornerstone biomolecules, tum Metropolis-Hastings algorithm. The Metropolis and retrieving their folding so complicated, the prob- algorithm is a Markov-chain Monte Carlo algorithm, lem of is widely regarded as one of that is, an algorithm that performs a random walk the most important and hard problems in compu- W over a given graph. The Metropolis algorithm is tational biochemistry, and has motivated research specially designed to quickly reach the equilibrium for decades. Having an efficient and reliable com- state, the state πβ such that Wπβ = πβ. Slowly putational procedure to guess their structure would modifying the inverse temperature parameter β such that the states with smaller energy become increas- ingly favoured by the random walk, we should end in ∗ [email protected] the ground state of the system with high probability. † [email protected] Several modifications of the Metropolis-Hastings ‡ [email protected] algorithm to adapt it to a quantum algorithm have 2

FIG. 1. Example of the smallest dipeptide: the glycylglycine. We can see that each amino acid has the chain (Nitrogen-Cα-Carboxy). Different amino acids would have a different side chain attached to the Cα instead of Hydrogen as it is the case for the Glycyne. In each figure we depict either angle φ or ψ. Angle ψ is defined as the torsion angle between two planes: the first one defined by the three atoms in the backbone of the amino acid (N1, Cα,1, C1), and the second by the same atoms except substituting the Nitrogen in that amino acid by the Nitrogen of the subsequent one: (Cα,1, C1, N2). For the φ angle the first plane is made out of the three atoms in the amino acid (N2, Cα,2, C2) whereas the second plane is defined substituting the Carboxy atom in the amino acid by the Carboxy from the preceding amino acid: (C1, N2, Cα,2). These graphics were generated using [15] and Inkscape.

been proposed [16–21], mostly based on substitut- the use of quantum techniques to speedup or im- ing the classical random walk by a Szegedy quantum prove the process of protein folding. The reason walk [22]. On the contrary, our work takes advan- for this is because even simplified models of pro- tage of the application of a quantum Metropolis al- tein folding are NP hard combinatorial optimization gorithm under out-of-equilibrium conditions similar problems, so polynomial speedups could in principle to what is usually done classically, and has been done be expected from the use of quantum computing in- on Ising models[21]. Specifically, we aim to simulate stead of their classical counterparts. The literature this procedure for several small peptides, the small- on this problem [23–29] and related ones [30, 31] fo- est proteins with only a few amino acids; and com- cuses on such simplified lattice models that are still pare the expected running time with the classical very hard, and mostly on adiabatic computation. In simulated annealing, and also check whether start- contrast, our work presents much more realistic fully ing from the initial state proposed by an algorithm scalable model, parametrized in terms of the torsion similar to AlphaFold may speed up the simulated angles. The torsion angles, also called dihedral, are annealing process. angles between the atoms in the backbone structure Our work benefits from two different lines of of the protein, that determine its folding. An ex- research. The first one makes use of quantum ample with the smallest of the dipeptides, the gly- walks to obtain polynomial quantum advantages, in- cylglycine, can be found in figure 1. These angles spired mainly by Szegedy work [22], and by the- are usually three per amino acid, φ, ψ and ω, but oretical quantum Metropolis algorithms indicated the latter is almost always fixed at value π and for in the above. In contrast with [21], our work fo- that reason, not commonly taken into account in the cuses only on the unitary heuristic implementation models [14]. of the Metropolis algorithm, but studies what hap- These considerations, and the fact that we use a pens with a different system (peptides) and with dif- distilled version of AlphaFold [14] as initialization, ferent annealing schedules instead of only testing a makes our work different from the usual approach in single linear schedule for the inverse temperature β. quantum protein folding: commonly adiabatic ap- Lastly, we also validate a proof of concept using IBM proaches have been used so far, whereas our algo- Casablanca processor, experimentally realizing the rithm is digital. The downside of this more precise quantum Metropolis algorithm in actual quantum approach is that the number of amino acids that we hardware. are able to simulate is more restricted, but we are The second line of research related to our work is nevertheless able to perform experiments in actual 3

FIG. 2. Scheme of the QFold algorithm. Starting from the amino acid sequence, we use Psi4 to extract the atoms conforming the protein, and a Minifold module, in substitution of AlphaFold, as initializer. The algorithm then uses the guessed angles by Minifold as a starting point (or rather, as the means of the starting von Mises distributions with κ = 1), and the energy of all possible positions calculated by Psi4, to perform a quantum Metropolis algorithm that finally outputs the torsion angles. In the scheme of the algorithm, the backbone builder represents a subroutine that recovers the links between atoms of the protein, and in particular the backbone chain, using the atom positions obtained from PubChem using Psi4. The initializer, instantiated in our case by Minifold, is a second subroutine that gives a first estimate of the torsion angles, before passing it to the quantum Metropolis. The energy calculator uses Psi4 to calculate the energy of all possible rotation angles that we want to explore, and these energies are used in the quantum Metropolis algorithm, which outputs the expected folding. For a more detailed flowchart, we refer to the figure 3. hardware. Finally, it is worth mentioning that in the II. QFOLD ALGORITHM 2020 CASP competition, DeepMind’s team tested their AlphaFold v2 algorithm which significantly im- The algorithm we introduce is called QFold and proves the results from their previous version. has three main components that we will introduce in this section (see figure 2 for a scheme of QFold): In summary, the main contributions of our work an initialization routine to find a good initial guess are threefold: firstly, we design a quantum algorithm of the dihedral angles that characterise the pro- that is scalable and realistic, and provided with tein folding, a quantum Metropolis to find an even a fault-tolerant quantum computer could become lower energy state from the initial guess, and a clas- competitive with current state of the art techniques. sical metropolis to compare against the quantum Secondly, we analyse the use of different cooling Metropolis to assess possible speedups. The aim schedules in out-of-equilibrium quantum walks, and of this section is to introduce the theoretical back- perform ideal quantum simulations of QFold and ground we have used for our results. compare its performance with the equivalent classi- cal Metropolis algorithm, pointing towards a quan- tum speedup. This quantum advantage is enough A. Initializer to make the quantum Metropolis more convenient than its classical counterpart in average-length pro- QFold makes use of quantum walks as a resource teins even after taking into account slowdowns due to accelerate the exploration of protein configura- to error correction protocols. Thirdly, we implement tions (see figures 2 and 3). However, in nature pro- a proof-of-concept of the quantum Metropolis algo- teins do not explore the whole exponentially large rithm in actual quantum hardware, validating our space of possible configurations in order to fold. In work. a similar fashion, QFold does not aim to explore all 4

Initialization module Protein Random Randomisation

Rotation bits Energy files? No Initialization Minifold Mi ni f ol d

Yes

Original

Energy files

Simulation/ Experiment experiment? Simulation Initialization Simulation module

Experiment module

Annealing Quantum schedule Classical metropolis metropolis

Number of steps

Probabilities Quantum TTS Classical TTS

FIG. 3. Flow chart of the QFold algorithm. This figure has to be viewed with the help of figure 2. QFold has several functionalities integrated altogether, that could be summarized in an initialization module, a simulation module, and an experiment module. We denote by diamonds each of the decisions one has to make. The top part constitutes the initialization module, where Minifold can be used to get a guess of the correct folding, and Psi4 uses PubChem to calculate the energies of rotations. The bottom half represents the experiment or simulation algorithms, that output either Probabilities or Quantum/Classical TTS and makes use of Qiskit. Different options of the algorithm are represented by diamonds, and more information on them can be found in section III B. possible configurations, but rather uses a good ini- rithm. Therefore, we expect that improved versions tialization guess state based on Deep Learning tech- of Rosetta, using in our case quantum walks, could niques such as AlphaFold. Since such initial point is be of help to find an even better solution to the pro- in principle closer to the actual solution in the real tein folding problem than the one provided by only space, we expect it to be most helpful the larger the using AlphaFold. protein being modelled. In fact, one of the motiva- The AlphaFold initializer starts from the amino tions for our work was the fact that adding a Rosetta acid sequence (S), and performs the following pro- relaxation at the end of the AlphaFold algorithm was cedures: able to slightly improve the results of the AlphaFold algorithm [14]. Notice that a Rosetta relaxation is 1. First perform a Multiple Sequence Alignment the way Rosetta calls its classical Metropolis algo- (MSA) procedure to extract features of the protein already observed in other proteins 5

whose folding is known. library, the function that calculates an approxima- tion to such energy is called scoring function. 2. Then, parametrizing the proteins in terms of Starting from a state i, the Metropolis algorithm their backbone torsion angles (φ, ψ) (see figure proposes a change uniformly at random to one of the 1), train a residual convolutional neural net- configurations, j, connected to i. We will call Tij to work to predict distances between amino acids, the probability of such proposal. Then this change or as they call them, residues. is accepted with probability 3. Train also a separate model that gives   −β(Ej −Ei) a probability distribution for the torsion Aij = min 1, e , (1) angle conditional on the protein sequence and its previously analysed MSA features, resulting in an overall probability of change i → j P (φ, ψ|S, MSA(S)). This is done using a 1- at a given step Wij = TijAij. dimensional pooling layer that takes the pre- Slowly varying β one decreases the probability dicted distances between amino acids and out- that steps that increase the energy of the state are puts different secondary structure such as the accepted, and as a consequence when β is sufficiently α-helix or the β-sheet [32]. To make the pre- large, the end state is a local minima. If this anneal- diction, the algorithm makes use of bins of size ing procedure is done sufficiently slowly, one can also 10º, effectively discretising its prediction. ensure that the minima found is the global minima. However, in practice one does not perform this an- 4. All this information, plus some additional fac- nealing as slowly as required, resorting instead to tors extracted from Rosetta, is used to train heuristic restarts of the classical walk, and selecting an effective potential that aims to give smaller the best result found by the several different trajec- energies to the configurations that the model tories. believes to be more likely to happen in nature. In our implementation we emulate having ora- Finally, at inference time one starts from a rough cle access to the energies of different configurations. guess using the MSA sequence, and performs gradi- Such oracle in practice is a subroutine that calls the ent descent on the effective potential. One can also Psi4 package [34] to calculate the energies of all pos- perform several attempts with noisy restarts and re- sible configurations for the torsion angles that we turn the best option. Interestingly enough, the neu- want to explore. We give more detail of our partic- ral network is also able to return an estimation of ular implementation in section III B. its uncertainty. Such uncertainty is measured by the parameter κ in the von Mises distribution, and plays the role of the inverse of the variance. The von Mises C. Quantum Metropolis distribution is the circular analog of the normal dis- tribution, and its use is justified because angles are A natural generalisation of the Metropolis algo- periodic variables [33]. rithm explained in the previous section is the use of quantum walks instead of random walks. The most popular quantum walk for this purpose is B. Classical Metropolis Szegedy’s [22], that consists of two rotations simi- lar to the rotations performed in Grover’s algorithm As we have mentioned in the introduction, a rel- [35]. Szegedy’s quantum walk is defined on a bi- atively popular approach to perform protein fold- partite graph. Given the acceptance probabilities ing has been to use the Metropolis algorithm. The Wij = TijAij, Aij defined in (1), for the transition Metropolis algorithm is an algorithm that performs from state i to state j, one defines the unitary a random walk over the configuration space Ω. The X p configuration space is the abstract space of possible U |ji |0i := |ji Wji |ii = |ji |pji . (2) values the torsion angles that a given protein can i∈Ω take. As such, a given state i is a list of values for Taking such torsion angles. In particular, for computational purposes, we will set that angles can take values from R0 := 1 − 2Π0 = 1 − 2(1 ⊗ |0i h0|) (3) a given set, that is, it will not be a continuous but a discrete distribution. Over such space we can define the reflection over the state |0i in the second sub- probability distributions. Furthermore, since those space, and S the swap gate that swaps both sub- angles will dictate the position of the atoms in the spaces, we define the quantum walk step as protein, the state i will also imply an energy level Ei, † † due to the interaction of the atoms. In the Rosetta W := U SUR0U SUR0. (4) 6

We refer to Appendix A and Fig. 9 for a detailed we would like such procedure to be fast. For exam- account on the use of these quantum walks in a sim- ple going through all configuration solutions would ilar way to the Grover rotations. For completeness, be quite accurate albeit extremely expensive if not in Appendix A we review in more detail the theoret- directly impossible. ical basis of Szegedy quantum walks that leads us to A natural metric to use in this context is then the believe that a quantum advantage is possible in our Total Time to Solution (TTS) [21] defined as the problem. average expected time it would take the procedure It is well known that if δ is the eigenvalue gap of to find the solution if we can repeat the procedure the classical walk, and ∆ the phase gap of the quan- in case of failure: tum walk, then the complexity of the classical walk −1 log(1 − δ) is O(δ ), the complexity of the quantum algorithm TTS(t) := t . (7) O(∆−1), and the relation between the phase and log(1 − p(t)) eigenvalue gap is given by ∆ = Ω(δ1/2) [36], offering a potential quantum advantage. Our algorithm aims where t ∈ N is the number of quantum/random steps to explore what is the corresponding efficiency gain performed in an attempt of the quantum/classical in practice. Metropolis algorithm, p(t) the probability of hitting The quantum Metropolis algorithm that we em- the right state after those steps in each attempt, and ploy [21] uses a small modification of the Szegedy δ a target success probability of the algorithm taking quantum walk, substituting the bipartite graph by a into account restarts, that we set to the usual value of 0.9. In any case, since it is a constant, the value coin. That is, we will have 3 quantum registers: |·iS indicating the current state of the system, |·i that of TTS(t) with other value of δ is straightforward M to recover. Although one should not expect to be indexes the possible moves one may take, and |·iC the coin register. We may also have ancilla registers able to calculate p(t) in the average protein because |·i . The quantum walk step is then finding the ground state is already very challeng- A ing, for smaller instances it is possible to calculate W˜ = RV †B†FBV. (5) the TTS for example executing the algorithm many times and calculating the percentage of them that Here V prepares a superposition over all possible correctly identifies the lowest energy state. Using quantum resources, the corresponding definition is steps one may take in register |·iM , B rotates the (B7) from appendix B. coin qubit |·iC to have amplitude of |1iC correspond- ing to the acceptance probability indicated by (1), We can see that this metric represents the com- promise between longer walks and the corresponding F changes the |·iS register to the new configuration expected increase in probability of success. Using (conditioned on the value of |·iM and |·iC = |1iC ), this figure the way we have to compare classical and and R is a reflection over the state |0iMCA. Although other clever options are available [21], quantum walks is to compare the minimum values here we implement the simplest heuristic algorithm, achieved for the mint TTS(t). Similar metrics have which consists of implementing L steps of the quan- also been defined previously in the literature [37]. tum walk On the other hand we would also like mention that there is a small modification of the classical algo- |ψ(L)i := W˜ L...W˜ 1 |π0i , (6) rithm that could improve its TTS, because we only output the last configuration of the random walk in- where t = 1, ..., L also defines an annealing sched- stead of the state with minimum energy found so far, ule, for chosen values of β(t) at each step. More a common choice for example in the Rosetta@Home detailed explanation of algorithm [21] can be found project. The reason for not having included this in appendix B. modification is because the length of the classical path, 2 to 50 steps, represents a sizable portion of the total space that ranges from 64 to 4096 avail- III. SIMULATIONS, EXPERIMENTS AND able positions, whereas that will not be the case for RESULTS large proteins. We believe that had we run the clas- sical experiments with that modification, we would A. Figures of merit have introduced a relatively large bias in the results, favouring the classical random walks in the smallest When looking for a metric to assess the goodness instances of the problem, and therefore likely over- of given solutions to protein folding we have to strike estimating the quantum advantage. a balance between two important aspects: on the For the experiment run in IBM Quantum systems one hand, we want a model that with high proba- and whose results can be found in section III C 4, bility finds the correct solution. On the other hand, the metric we use instead of the TTS is the proba- 7

bility of measuring the correct answer, and in par- tion in π/2 radians. In general, the precision of the ticular whether we are able to detect small changes angles will be 21−bπ, for b the number of rotation in the probability corresponding to the correct solu- bits. Notice that the precision of 10º of AlphaFold tion. Measuring the TTS here would not be inter- when reporting their angles, that we indicated in esting due to the high level of noise of the circuit. section II A, is intermediate between b = 5 and b = 6. The main idea here is that when we increase b or the number of angles, the size of the search B. Simulation and experimental setup space becomes larger, and evaluating how the classi- cal and quantum mint TTS(t) grow we may be able 1. Simulations to check whether a polynomial quantum advantage exists. The TTS can be directly calculated from in- ferred classical probabilities, if one is executing the For the assessment of our algorithm we have built classical metropolis or the quantum Metropolis in a simulation pipeline that allows to perform a variety quantum hardware, or from the amplitudes, if one is of options. The main software libraries used are Psi4 running a simulation of the latter. Its specific defi- for the calculation of energies of different configura- nition can be seen in equation (7). tions in peptides [34], a distilled unofficial version of Finally, we implemented and used the experiment AlphaFold dubbed Minifold [38], and Qiskit [39] for mode, that in contrast to the the simulation mode the implementation of quantum walks. Simulations explained in previous paragraphs, allows us to run were run on personal laptops and small clusters for the smallest instances of our problem in actual IBM prototyping, and due to its large computational cost, Q hardware, dipeptides with a single rotation bit. Amazon Web Services [40] for deploying the com- Other choices we have to make involve the value plete algorithm and obtaining the results. A scheme of the parameter β in (1), whether it is fixed or fol- of the pipeline of the simulation algorithm can be lows some annealing schedule, the number of steps seen in figure (3). for which the system computes their TTS, or the The initialization procedure takes as input the κ parameter from the von Mises distribution if the name of the peptide we want to simulate and initialization is given by Minifold. In fact, if the uses Psi4 to download the file of the correspond- original AlphaFold algorithm were to be used, the κ ing molecule from PubChem [41], an online repos- values returned by AlphaFold could actually be used itory containing abundant information over many instead of our default value κ = 1, and furthermore molecules, including atomic positions. the preparation of the amplitudes corresponding to After that, and before executing the quantum and this probability distribution could be made efficient classical walks, the system checks whether an energy using the Grover-Rudolph state preparation proce- file is available, and if not uses Psi4 to calculate and dure [44]. store all possible energy values of the different rota- Additionally, while Qiskit allows to recover the tions of the torsion angles. For the calculation of probabilities from the amplitudes, to evaluate the that energy we choose the relatively common 6-31G classical walks we have to repeat a certain num- atomic orbital basis functions [42], and the proce- ber of times the procedure to infer the probabilities. dure of Moller-Plesset to second order [43] as a more This number of repetitions is controlled by a vari- accurate and not too expensive alternative to the able named number iterations, that we have set Hartree-Fock procedure. However, it is also possible to 500 × (2b)2n−2, where b is the number of bits to choose any other basis or energy method. Fi- and n the number of amino acids, to reflect that nally, the system performs the corresponding quan- larger spaces require more statistics. tum and random walks and returns the minimum If a non-fixed β is attempted, we have imple- TTS found. mented and tested several options for the annealing In order to evaluate the impact of adding a ma- schedule. The implemented schedules are: chine learning module such as AlphaFold at the beginning of the algorithm, we have implemented • Boltzmann or logarithmic implements the fa- the initialization option to start from random values mous logarithmic schedule [45] of the dihedral angles, from the ones returned by β(t) = β(1) log(te) = β(1) log(t) + β(1). (8a) minifold, or the actual original angles that the molecule has, as returned by the PubChem library. Notice that the multiplication of t times e is On the other hand, to evaluate the potential quan- necessary in order to make a fair comparison tum advantage, we also allow to select the number of with the rest of the schedules, so that they bits that specify the discretization of the torsion an- all start in β(1). As a consequence, this is gles. For example, 1 bit means that angles can take not truly the theoretical schedule required to values in {0, π}, whereas 2 bits indicate discretiza- achieve a quadratic speedup. 8

FIG. 4. In section III C 4 we explain a proof of concept experimental realization of the quantum Metropolis algorithm, and this figure represents the implementation of the coin flip subroutine of this hardware-adapted algorithm, whose operator is represented by B in (5). The circuit has two key simplifications that reduce the depth. The first one is that we first rotate the coin register to |1i and then we rotate it back to |0i if the acceptance probability is not 1. This halves the cost, since otherwise one would have to perform 8 multi-controlled rotations (all possible combinations of the control values for registers move, φ and ψ), and in this case we only perform 4 of them. The second simplification again halves the cost, grouping together rotations with similar values. We empirically see that the rotation values controlled on |000i and |010i are very similar, so we group them in rotation R0, that implicitly depends on β. Similarly, we group the rotations controlled on |001i and |101i in R1, also dependent on β. This also has the nice effect of only requiring to control on two out of the three bottom registers, (φ, ψ and move), transforming CCC-RX gates in CC-RX . Separated by the two barriers, from left to right and from top to bottom we can see the implementation of such CC-RX gates.

• Cauchy or linear implements a schedule given Our current system has two main limitations de- by pending on the mode it is used. If the aim is to perform a simulation, then the amount of Random β(t) = β(1)t. (8b) Access Memory of the simulator is the main concern. This is why we have simulated, for a fixed value of • geometric defines β:

β(t) = β(1)α−t+1, (8c) • Dipeptides with 3 to 5 rotation bits.

where α < 1 is a parameter that we have • Tripeptides with 2 rotation bits. heuristically set to 0.9. • Tetrapeptides with a single rotation bit. And finally exponential uses • Additionally, dipeptides with 6 bits, tripeptides with β(t) = β(1) exp(α(t − 1)1/N ), (8d) 3 bits and tetrapeptides with 2 bits can be simulated for a few steps, but not enough of them to calculate where α is again set to 0.9 and N is the space our figure of merit with confidence. If the β value dimension, which in this case is equal to the is not fixed but follows some annealing schedule, the number of torsion angles. requirements are larger, but we can still simulate the same peptides. Further than that, the Random For comparison purposes, the value of β(1) chosen Access Memory requirements for an ideal (classi- has been heuristically optimized to 50. cal) simulation of the quantum algorithm become 9

FIG. 5. Implementation of the full quantum Metropolis circuit from section III C 4 implemented in actual quantum hardware, using the coin flip rotation described in figure 4. The steps of the circuit are separated by barriers for a better identification (from left to right, and from top to bottom): first put φ and ψ in superposition. Then, for each of the two steps W˜ from (5): put the move register (that controls which angle to move) in a superposition, operator V , and prepare the coin, B. Then, controlled on the coin being in state |1i and the move register the corresponding angle, change the value of ψ (if move = |1i) or φ (if move = |0i), denoted by F . Then we uncompute the coin, B† and the move preparation, V †, and perform the phase flip on |movei |coini = |00i represented by R. The second quantum walk step proceeds equally (with different value of β and the rotations), but now we do not have to uncompute the move and coin registers before measuring φ and ψ because it is the last step. quite large. Notice that Qiskit simulator supports then perform 2 quantum walk steps W˜ with values 32 qubits at the moment, but our system is more of β empirically chosen 0.1 and 1 to have large prob- constrained by the depth of the circuit, which can abilities of measuring |0iφ |0iψ, where we encode the run into millions of gates. state of minimum energy, the correct state of the dipeptide. The figures corresponding to the circuit are 4 and 2. Experiments 5, the former depicting the coin flip procedure and the latter using it as a subroutine in the circuit W have also performed experiments in the IBMQ as a whole. The coin flip subroutine is the most Casablanca processor. In contrast with the previous costly part of the quantum circuit both in this hard- experiments, due to the low signal to noise ratio in ware implementation and in the simulations of pre- the available quantum hardware, it does not make vious sections too, since it includes multiple multi- much sense to directly compare the values of the controlled rotations of the coin qubit. Perhaps an TTS figure of merit. Instead, the objective here is to important remark to make is that this hardware- be able to show that we can implement a two-step adapted circuit contains some simplifications in or- quantum walk (the minimum required to produce der to minimize the length of the circuit as much as interference) and still be able to see an increase in possible, since it will be one of the most important probability associated with the correct state. Since quantities determining the amount of error in the we are heavily constrained in the depth of the quan- circuit, our limiting factor. tum circuit we can implement, we experiment only There is one more important precision to be made: with dipeptides, and with 1 bit of precision in the since our implementation of the quantum circuit in rotation angles: that is φ and ψ can be either 0 or the IBMQ Casablanca processor has 176 basic gates π. of depth even after being heavily optimized by Qiskit In this quantum circuit, depicted in figures 4 and transpiler, we need a way to tell whether what we 5, we will have 4 qubits, namely |φi, |ψi, a coin are measuring is only noise or relevant information qubit, and another indicating what the angle register survives the noise. Our first attempt to distinguish to update in the next step. Additionally, we always these two cases was to use the natural technique of start from the uniform superposition |+iφ |+iψ. We zero-noise extrapolation, where additional gates are 10

Peptides Precision random Precision minifold b quantum min(TTS) random quantum min(TTS) minifold 3 136.25 270.75 Dipeptides 0.53 0.53 4 547.95 1137.45 5 1426.28 1458.02 Tripeptides 0.46 0.71 2 499.93 394.49 Tetrapeptides 0.51 0.79 1 149.80 26.30

TABLE I. Table of average precisions defined in equation (9), and corresponding quantum minimum TTS, defined in equation (7) as the expected number of steps it would take to find the solution using the quantum algorithm, with different initializations. b denotes the rotation bits, and in bold we have indicated which of minifold or random values are best. The aim of this table is understanding the impact of minifold initialization in the quantum min TTS, our figure of merit. The two main aspects to notice from the table are that Minifold precision grows with the size of the peptide, and that when it is the case that the minifold precision is higher, the corresponding quantum min TTS values are lower than their random counterparts. This supports the idea that using a smart initial state helps to find the native folding of the protein faster. added that do not change the theoretical expected C. Experimental and simulation results value of the circuit, but introduce additional noise [46]. By measuring how the measured probabilities 1. Initialization change, one can extrapolate backwards to the theo- retical ‘zero noise’ case. Unfortunately, the depth of In this section we analyse the impact of different the circuit is already so large that it does not work: it initialization methods for the posterior use of quan- does not converge or else returns unrealistic results, tum/classical walks. Although we know that Al- at least when attempted with the software library phaFold is capable of making a relatively good guess Mitiq [47]. for the correct folding, and therefore it is reasonable to expect AlphaFold’s guess to be close to the opti- mal folding solution in the conformation space, our For this reason we need to find a way out that aim is to give some additional support to this idea. is only valid because our circuit is parameterised in As an initializer, we decided to use (and minorly terms of the angles and the values of β. Additionally contributed to) Minifold because even though it does we know that if we were to set the value of β to 0, the not achieve the state of the art in the prediction theoretical result would be 1/4 as there are 4 possible of the angles, it is quite simple and sufficient to states. As a consequence, our strategy consists of illustrate our point. Minifold uses a residual net- trying to detect changes in the probability when we work implementation given in Tensorflow and Keras use β(t) = (0, 0) or β(t) = (0.1, 1). The notation [48, 49]. Perhaps the most important detail of us- β(t) denotes the value of β chosen at each of the ing this model is that because we are trying to pre- two steps. dict small peptides, and Minifold uses a window of 34 amino acids for its predictions, we had to use padding. For the experiment, we reserved 3 hours of usage The metrics that we analyse in this case are of the IBMQ Casablanca processor, with quantum twofold: in the first place, we would like to see volume 32. During that time we were able to run 25 whether Minifold achieves a better precision on the so-called ‘jobs’ with β(t) = (0, 0) and 20 ‘jobs’ for angles than random guessing. This is a necessary 8 arbitrarily chosen dipeptides. Each run consisted condition for our use of Minifold, or more generally, of 8192 repetitions of the circuit (which can be seen any smart initialization module, to make sense. We in figures 4 and 5) and an equal number of measure- measure the precision as 1 minus the normalized an- ments, what means that for each dipeptide we run a gular distance between the returned value by the ini- total of 163840 circuits, and 204800 for β(t) = (0, 0) tialization module and the actual value we get from as a baseline. PubChem (see figures 1 and 2): d(α, α˜) p = 1 − , (9) π As we will see from the results, the main limi- tation of our experiment is the noise of the system whereα ˜ is the estimated angle (either φ or ψ) given and therefore the depth of the circuit. For this rea- by Minifold or chosen at random, α is the true value son we restrict ourselves to a single rotation bit in calculated from the output of Psi4 and PubChem, dipeptides. and d denotes the angular distance. Since we have 11

104 ing to the minifold initialization in the fit. While classical min(TTS) < quantum min(TTS) this may seem to indicate that our initialization is in fact harmful to the convergence of the quantum 103 algorithm, in fact the explanation is quite the oppo- site: for the smaller instances of the problem, and very specially in the case of random initialization,

102 the minimum TTS value is achieved for t = 2 as can be seen from the two horizontal structures formed

Quantum min(TTS) by the blue points in the figure, meaning that in

0.89 1 qTTS = 6.9 × cTTS such cases only using the minifold initialization the 10 minifold initialization qTTS = 44.3 × cTTS0.53 random initialization quantum algorithm is able to profit from the incip- dipeptides tripeptides ient quantum advantage. This effect disappears for quantum min(TTS) < classical min(TTS) tetrapeptides rotation bits b larger instances of the problem, but while for random 100 100 101 102 103 104 initialization there is a penalisation of the TTS in Classical min(TTS) the smallest problems (thus lowering the exponent), the minifold initialization is capable of correcting FIG. 6. Comparison of the Classical and Quantum min- for this effect, lowering the TTS of smaller instances imum TTS achieved for the simulation of the quantum Metropolis algorithm with β = 103, for 10 dipeptides of the problem and, as a bonus, rising the exponent. (with rotation bits b = 3, 4, 5), 10 tripeptides (b = 2) We are therefore inclined to believe that the mini- and 4 tetrapeptides (b = 1), also showing the differ- fold exponent more accurately represents the true ent initialization options (random or minifold), and the asymptotic exponent of the algorithm for this prob- best fit lines. In dashed grey line we separate the space lem. In conclusion, while the small size of our exper- where the Quantum TTS is smaller than the Classical iments does not allow us to see the benefits of using TTS. The key aspect to notice in this graph is that al- a smart initialization, they have been important to though for smaller instances the quantum algorithm does get a calibrated estimate of the actual quantum ad- not seem to match or beat the times achieved by the vantage, and we can also see that it helps reduce classical Metropolis, due to the exponent being smaller the TTS cost both in the classical and quantum al- than one (either 0.89 or 0.53 for or re- minifold random gorithm, which nicely fits the intuition that being spectively) for average size proteins we can expect the quantum advantage to be dominant and make the quan- closer-than-random to the solution helps to find the tum Metropolis more useful than its classical counter- solution faster. part. In section III C 2 we discuss further details and ex- plain why the random initialization exponent seems more favourable than the minifold exponent. 2. Fixed β

We now discuss whether we are able to observe a normalized it, the random initialization gets a the- quantum advantage in the TTS, our figure of merit. oretical average precision of 0.5. In table I there We will again be discussing the results given in fig- is a summary comparing the average precision re- ure 6, for it represents the best fit to the classical sults of minifold and random initialization broken vs quantum TTS, and therefore accurately depicts down by the protein and bits. The dipeptide re- the expected quantum advantage: we can see that sults show that due to the small size of the pep- the slopes separated for the different initialization tide, minifold has barely a better precision than options are 0.89 for the minifold initialization and just random. However, this situation improves for 0.53 for the random initialization. As a consequence, tripeptides and tetrapeptides, getting a better pre- we can see that if these trends are sustained with cision, and as a consequence, lower TTS values. larger proteins, there is a polynomial advantage. If Minifold having a greater precision in the an- As we have seen, figure 6 points towards a quan- gles than random guessing was a precondition for tum advantage. The final question we would there- our analysis to make sense, the actual metric we are fore like to answer, is what does this advantage mean interested in is whether it has some impact reduc- for the modelling of large enough proteins. For that, ing the TTS metric. Otherwise we could avoid us- we only need one additional ingredient, which is how ing an initialization module altogether. First of all the expected classical min TTS scales with the size we were not expecting an actual reduction of the of the configuration of the problem. Our data in exponent due to the use of minifold initialization this respect is even more restricted because we only as much as multiplicative TTS reduction prefactors. have access to configuration spaces of 64, 256 and However, in figure 6 the exponent of random initial- 1024 positions. Therefore we are making a regres- ization model is smaller than the one correspond- sion to only three points, but could give us nev- 12

Logarithmic schedule Linear schedule 104 104 classical min(TTS) < quantum min(TTS) classical min(TTS) < quantum min(TTS)

103 103

102 102 Quantum min(TTS) 0.88 minifold initialization 0.86 minifold initialization 101 qTTS = 5.0 × cTTS random initialization 101 qTTS = 7.0 × cTTS random initialization qTTS = 81.0 × cTTS0.29 dipeptides qTTS = 74.2 × cTTS0.34 dipeptides tripeptides tripeptides quantum min(TTS) < classical min(TTS) tetrapeptides quantum min(TTS) < classical min(TTS) tetrapeptides rotation bits b rotation bits b 100 100 100 101 102 103 104 100 101 102 103 104

Geometric schedule Exponential schedule 104 104 classical min(TTS) < quantum min(TTS) classical min(TTS) < quantum min(TTS)

103 103

102 102 Quantum min(TTS) 0.85 minifold initialization 1.0 minifold initialization 101 qTTS = 6.5 × cTTS random initialization 101 qTTS = 4.0 × cTTS random initialization qTTS = 82.5 × cTTS0.29 dipeptides qTTS = 69.7 × cTTS0.37 dipeptides tripeptides tripeptides quantum min(TTS) < classical min(TTS) tetrapeptides quantum min(TTS) < classical min(TTS) tetrapeptides rotation bits b rotation bits b 100 100 100 101 102 103 104 100 101 102 103 104 Classical min(TTS) Classical min(TTS)

FIG. 7. Comparison of the Classical and Quantum minimum TTS achieved for the same peptides as those in figure 6, except that due to computational cost we do not include dipeptides with 5 rotation bits. This figure corresponds to section III C 3, and shows the different initialization options (random or minifold) and annealing schedules (Boltzmann/logarithmic, Cauchy/linear, geometric and exponential), and the best fit lines. In dashed grey line we depict the diagonal. The corresponding fit exponents are given in table II, where we can see that in three out of the four cases using an annealing schedule increases the quantum advantage. On the other hand, using an exponential schedule does not seem to give but a tiny advantage when used with a minifold initialization. ertheless some hints of whether our technique, the puting, the use of the quantum Metropolis seems use of quantum Metropolis algorithm, will be help- very powerful and competitive. And this conclusion ful to solve the tridimensional structure of proteins is robust: if we took the quantum advantage expo- given a large enough and fault tolerant quantum nent to be just e = 0.95 and the growth of the TTS computer. The regression exponent of a log(size) with the size of the space an extremely conservative vs log(classical min TTS) fit using both random and r = 0.5, the quantum speedup before factoring in minifold initializations is r = 0.88, that should not the operating frequency of the computer will still be be confused with those in figure 6. a factor of ≈ 1022. Larger proteins, which exist in nature, will exhibit even larger speedups in the TTS, Let us take, for the sake of giving an example, the expected time it would take to find the native an average 250 amino acids protein, which has ap- structure. proximately 500 angles to fix. If we use b = 6 bits to specify such angles as might be done in a realistic setting, the classical mint TTS would be ≈ (2b)2×250×r = (26)500×0.88. The quantum 3. Annealing schedules and variable β mint TTS, on the other hand, will be such num- ber to the corresponding exponent of figure 6, that In the previous section we analysed what quan- we will call em = 0.89 and er = 0.53 for minifold tum advantage could be expected when using a fixed and random. This will translate to a speedup fac- β schedule, resulting in a pure quantum walk with tor of between ≈ 1087 and 10373, although the latter several steps. However, Metropolis algorithms are represents an upper bound and the former is prob- rarely used in practice with a fixed β, since at the ably closer to the actual value. This reveals that beginning of the algorithm one would like to favour even if we take into account the slowdown due to exploration, requiring a low β value, and at the end error correction and other factors in quantum com- one would like to focus mostly on exploitation, that 13

Fit exponents Schedule Random init. Minifold init. Fixed β 0.53 0.89 Logarithmic 0.29 0.88 Linear 0.34 0.86 Exponential 0.37 1.00 Geometric 0.29 0.85

TABLE II. Table of scaling exponents for different an- nealing schedules and initialization options. The pep- tides are the same, except that for fixed β we have also in- cluded dipeptides with 5 bits of precision, what is costly for the rest of schedules. For fixed β, the value heuris- tically chosen was β = 1000, while the initial β value in each of the schedules, defined in (8), is β(1) = 50. is achieved with a high β value. It can be seen that FIG. 8. Results from hardware measurements corre- this necessity is linked to the well known exploration- sponding to the experimental realization of the quantum exploitation trade-off in Machine Learning and more Metropolis described in section III C 4 and the circuit de- generally in Optimization [50]. picted in figures 4 and 5. For each dipeptide we perform The annealing schedule with the strongest theo- a student t-test to check whether the average success retical background is usually called the inhomoge- probabilities for β(t) = (0, 0) and β(t) = (0.1, 1) are ac- tually different [52]. The largest p-value measured in all neous algorithm, and it is known because one can 8 cases is 3.94 · 10−18, indicating that in all cases the dif- prove that the algorithm will converge to the global ference is significant. For each of the eight dipeptides we minimum value of the energy with probability 1, al- run 163840 times the circuit, and for the baseline 204800 beit generally too slowly (section 3.2 [51]). Its imple- times. mentation is conceptually similar to the Boltzmann or logarithmic schedule that we use, but with a dif- ferent prefactor [51]. 4. Experiments in IBMQ Casablanca Therefore, the question we would like to answer in this section is what happens when we use our quan- tum Metropolis algorithm outside of the equilibrium, After having analysed the potential quantum ad- that is, when we use a schedule that changes faster vantage in simulation, we would also like to imple- than the theoretical inhomogeneous algorithm. For ment the quantum Metropolis in IBM Q Casablanca this task, several optimization schedules have been processor. The circuit we have implemented is made proposed, of which we have implemented and tested of two quantum walk operators, the minimum re- four different options whose mathematical formula- quired to cause interference, in the smallest instance tion can be seen in (8). of our problem: dipeptides with a single rotation bit, Our conclusions from figure 7 and table II are that thus allowing for angles of 0 and π. using a variable schedule, the quantum advantage The corresponding implementation of the circuit, can be made larger than that of the fixed the tem- which was run in IBMQ Casablanca processor, is perature algorithm. However, not all cases give the depicted in figures 4 and 5. An important detail same advantage, with the exponential schedule giv- of our implementation of this quantum Metropolis ing practically none, and the geometric schedule ex- algorithm is that in order to be able to see more than plained in section 5.2 of [51] being the most promis- noise we have needed to perform simplifications that ing, with an exponent of 0.85 if minifold initializa- benefit from the particular structure of the problem. tion is used. Linear and logarithmic schedules lie This allows us to minimize the depth of the circuit, in between. that strongly influences the noise, the limiting figure Lastly, large differences in the exponents exist de- of our experiment. pending on the initialization criteria used, although In fact, both due to the high level of noise of the the same argument on why this is the case given in circuit, and the minimal size of the problem we are section III C 2 applies here too. In consequence, a trying to solve, which only has 4 possible states to more detailed analysis should be carried out in fu- search from, in this experiment it does not make ture work. However, from figures 6 and 7 we can see sense to use the Total Time to Solution as a figure of that the decrease in the expected TTS value when merit. Rather, we are only interested in seeing if the using a minifold initialization is reflected in the cor- quantum circuit is sensitive to the underlying prob- responding prefactors, smaller and more favourable. abilities, which depend on the chosen β value. As we 14 discussed in section III B 2, for β = 0 all states are quantum computing algorithmic proposals for pro- equiprobable whereas for higher β values the prob- tein folding. Although in our current simulations ability of the target state is higher, and our exper- presented in this work the range of the precision is iment aims to see this difference of probability in limited by the resources of the classical simulation, practice. nothing prevents QFold from reaching a realistic ac- The depicted circuit, with values of β(t) = (0, 0) curate precision once a fully fledged quantum com- and β(t) = (0.1, 1) was run as explained in the sec- puter is at our disposal, since our formulation is fully tion III B 2 for several hundred thousand times for scalable within a fault-tolerant scenario. each dipeptide to be able to certify that our measure- The quantum Metropolis algorithm itself relies on ments do not correspond to pure noise. The results the construction of a coined version of Szegedy quan- of such probability differences are depicted in figure tum walk [21], and we use the Total Time to Solution 8, and they are highly significant as indicated by the defined in (7), as a figure of merit. Our construction low p-value achieved in all cases, smaller or equal to of this quantum Metropolis algorithm represents the 3.94 · 10−18, in the corresponding student t-test of first main contribution of our work, as it represents the underlying binomial distribution. Such binomial a scalable and realistic algorithm for protein folding distribution corresponds to returning value 1 when that in contrast to previous proposals, does not rely the measurement correctly identifies the minimum- on simplified lattice models. energy state which is encoded in |00i, and 0 when it The second main contribution is an analysis of the does not. expected quantum advantage in protein folding, that It can also be seen that in seven out of the although moderate in the scaling exponent with the eight cases tested, the quantum Metropolis algo- Minifold initialization, could represent a large dif- rithm points in the right direction of increased suc- ference in the expected physical time needed to find cess probability. This gives us confidence that we the minimal energy configuration in proteins of av- are measuring a small amount of quantum effects. erage size due to the exponential nature of this com- The outlier, glycylvaline, is surprising because is the binatorial problem. This quantum advantage anal- dipeptide that in the simulation shows the great- ysis is also performed for different realistic anneal- est theoretical probability of measuring |00i, and at ing schedules, indicating that the out-of-equilibrium the same time is the dipeptide with the largest p- quantum Metropolis algorithms can show a similar value, although still very significant. We can only quantum advantage, and can even improve the ad- hypothesise that this is due to some experimental vantage shown for the fixed beta case, as can be seen imperfection. from table II. The third contribution is a proof-of- concept small implementation of our algorithm in actual quantum software. IV. CONCLUSIONS AND OUTLOOK Our results for the computation of protein fold- ing provide further support to the development of We have studied how quantum computing might classical simulations of quantum hardware. A clear complement modern machine learning techniques to message from our simulations is that it is worthwhile predict the folding of the proteins. For that, we have developing quantum software and classical simula- introduced QFold, an algorithm that implements a tions of ideal quantum computers in order to con- quantum Metropolis algorithm using as a starting firm that certain quantum algorithms provide a re- guess the output of a machine learning algorithm, alistic quantum advantage. Some of the quantum that in our case is a simplified implementation of Al- algorithms that must be assessed in realistic condi- phaFold algorithm named Minifold, but that could tions are those based on quantum walks like quan- in fact be substituted by any initialization module tum Metropolis variants with fast annealing sched- that uses future improvement of such deep learning ule. For them, the complexity scales as O(δ−a), δ techniques. the eigenvalue gap, and a > 0 dependent on the An important feature of QFold is that it is a real- specific application and realization of the quantum istic description of the protein folding, meaning that Metropolis. This is in contrast with pure quan- the description of the folded structure relies on the tum walks, where the classical complexity scales as actual torsion angles that describe the final confor- O(δ−1) and the quantum complexity as O(δ−1/2), mation of the protein. This is realised by the number as can be seen in appendix B. However, it would be of bits b used to fix the precision of those angles, for na¨ıve to consider this quadratic quantum advantage which a moderate value of b = 5 or b = 6 would as an achievable and useful one in problems similar be as accurate as the original AlphaFold. This is to ours. Instead, one should aim to compare quan- in sharp contrast with the rigid lattice models used tum algorithms with the classical ones used in prac- to approximate protein folding in the majority of tice, where heuristic annealing schedules are used 15 instead. As a consequence, finding out the precise Similar quantum Metropolis algorithms could be values of the corresponding exponent is of great im- used in a variety of domains, and as such a detailed portance since it is a measure of the quantum ad- analysis, both theoretical and experimental, of the vantage. For this reason, not all efforts should be expected quantum advantage in each case seems de- devoted to finding a quantum advantage solely with sirable. noisy ‘baby quantum computers’, like NISQ devices, but also to continue investigating the real possibili- ties of universal quantum computers when they be- V. ACKNOWLEDGEMENTS come available. An unknown that is worth addressing is whether P.A.M.C and R.C contributed equally to this the new software for protein folding coming out of work. We would like to thank kind advice from the CASP competition [13] will remain in the pub- Jaime Sevilla on von Mises distributions and sta- lic domain or else, will go proprietary. This is spe- tistical t-tests, Alvaro Mart´ınezdel Pozo and An- cially important since some the most powerful tools tonio Rey on protein folding, Andrew W. Senior used by that modern software rely on deep learning on minor details of his AlphaFold article, Carmen that has to be trained. It so happens that the pro- Recio, Juan G´omez,Juan Cruz Benito, Kevin Kr- tein databases used for that training are of public sulich and Maddy Todd on the usage of Qiskit, and domain. They are the result of many years of col- Jessica Lemieux and the late David Poulin on as- laborative work among many research institutions pects of the quantum Metropolis algorithm. We that are funded publicly. Our results point towards also want to thank IBM Quantum Research for al- the possibility of using an open public software like lowing us to use their quantum processors under Qiskit [39], Psi4 [34] and ‘community’ implementa- the Research program. We also thank Quasar Sci- tions of the AlphaFold algorithm [14] of which Mini- ence for facilitating the access to the AWS resources. fold [38] is an example, to compensate the power of We acknowledge financial support from the Spanish commercial software for protein folding. MINECO grants MINECO/FEDER Projects FIS We would also like to point out that despite the 2017-91460-EXP, PGC2018-099169-B-I00 FIS-2018 great advances achieved by the new classical meth- and from CAM/FEDER Project No. S2018/TCS- ods in the latest editions of the CASP competition 4342 (QUITEMAD-CM). The research of M.A.M.- [13], there is still huge room for improvement and D. has been partially supported by the U.S. Army there are many gaps in the problem of protein folding Research Office through Grant No. W911NF-14-1- that await to be filled. These include understanding 0103. P. A. M. C. thanks the support of a MECD protein-protein and protein-ligand interactions, the grant FPU17/03620, and R.C. the support of a CAM real time folding evolution to the equilibrium config- grant IND2019/TIC17146. uration, the dark proteome, proteins with dynamical configurations like the intrinsically disordered pro- teins (IDP) and so on and so forth. Crucially, we Appendix A: Szegedy quantum walks believe that the current limitations of the training data sets, which are biased towards easily to be crys- In order to explain what are Quantum Walks, we tallized proteins, puts constraints on what can be need to introduce Markov Chain. Given a configura- achieved using only deep learning techniques. Our tion space Ω, a Markov Chain is a stochastic model present research is an attempt to explore techniques over Ω, with transition matrix Wij, that specifies that address these limitations. the probability of transition that does not depend on Future improvements of our work include primar- previous states but only on the present one. Random ily a refinement of our experimentally found quan- walks are the process of moving across Ω according tum advantage, with peptides of larger size, and a to Wij. more accurate comparative analysis of the precise Quantum walks are the quantum equivalent of quantum advantage that can be found with each an- random walks [53]. The most famous and used quan- nealing schedule. Such research would be valuable tum walks are those defined by Ambainis [54] and because it is an important decision to be made when Szegedy [22], although several posterior generalisa- deploying these optimization algorithms in practice, tions and improvements have been developed such and no research of this question has been carried out as those in [36]. Quantum walks often achieve a to the best of our knowledge. quadratic advantage in the hitting time of a target Additionally, we also believe further work should state with respect to the spectral gap, defined be- be conducted in clarifying whether asymptotically low, and are widely used in several other algorithms one should expect either of the initialization modes [55–57], as we shall see. to be polynomially faster finding the ground state. Szegedy quantum walks are not usually defined 16

using a coin, but rather on a bipartite walk. This Using the previous expressions we can state, using implies duplicating the Hilbert space, and defining ΠA |φji |0i = |φji |0i the unitary Π U †VS |φ i |0i = cos φ |φ i |0i , (A6a) X p A j j j U |ji |0i := |ji Wji |ii = |ji |pji (A1a) i∈Ω and and also the closely related † ΠB |φji |0i = U VS cos φj |φji |0i . (A6b) X p V |0i |ii := Wij |ji |ii = |pii |ii . (A1b) (A6b) is true because (supplementary material [20]), j∈Ω † † ΠAU VSΠA = ΠASV UΠA (A7) Wij must be understood as the probability that state |ii transitions to state |ji. We can check that due to these operators fulfil that SU = VS, for S the Swap † operation between the first and second Hilbert sub- hφj, 0|ΠAU VSΠA|φj, 0i = λj (A8) space. The cost of applying U is usually called † † (quantum) update cost. Define also the subspaces = λj = hφj, 0|ΠASV UΠA|φj, 0i .

Thus W will preserve the subspace spanned by † A := span{|ji |0i : j ∈ Ω} (A2a) {|φji |0i ,U VS |φji |0i}, which is invariant under ΠA and ΠB; mirroring the situation in the Grover and algorithm [35]. Also, as a consequence of the previ- ous, and of the fact that in A + B operator W has † † B := U SUA = U VSA. (A2b) eigenvalues e2iϕj [22, 36], in such subspace the oper- ator W can be written as a block diagonal operator Having defined U and V , we can define M := U †VS, with matrices a matrix with entries valued hi, 0|U †VS|j, 0i = p p p   Wji Wij = πi/πjWij, the last equality thanks cos(2ϕj) − sin(2ϕj) wj = . (A9) to the detailed balance equation [58]. In fact, in ma- sin(2ϕj) cos(2ϕj) −1/2 1/2 trix terms it is usually written M = Dπ WDπ Finally, notice that the eigenvalue gap of W is where we have written Dπ to indicate the diago- nal matrix with the entries of the equilibrium state- defined as δ = 1−λ1, and in general the hitting time −1 vector of probabilities, π. This implies that W and of a classical walk will grow like O(δ ) (Proposition 1 of [36]). On the other hand, the hitting time of the M have the same spectrum λ0 = 1 ≥ ... ≥ λd−1 ≥ 0, −1 as the matrix W is positive definite, pT Wp ∈ [0, 1], Quantum Walk will scale like O(∆ ) (Theorem 6 p 2 and of size d. The corresponding eigenstates are of√ [36]), where ∆ := 2ϕ1. But ∆ ≥ 2 1 − |λ1| ≥ 1/2 |φji |0i, and phases ϕj = arccos λj. In particular 2 δ, so ∆ = Ω(δ ). In fact, writing δ = 1 − λ1 = P √ |φ0i = i πi |ii, is the equilibrium distribution. 1 − cos ϕ1, and expanding in Taylor series cos ϕ1 = 2 4 We can also define the projectors around A and ϕ1 ϕ1 6 1 − 2 + 24 + O(ϕ1), we can see that B as ΠA and ΠB 2 2 4 ϕ1 ϕ1 ϕ1 ΠA := (1 ⊗ |0i h0|), (A3a) ≥ 1 − cos ϕ ≥ − . (A10) 2 1 2 24

† † Using the definitions of δ and ∆ and the fact that ΠB := U VS(1 ⊗ |0i h0|)SV U (A3b) ϕ1 ∈ (0, π/2), it is immediate with their corresponding rotations ∆2 ∆2 ∆4 ∆2  ∆2  ≥ δ ≥ − 4 = 1 − RA = 2ΠA − 1, (A4a) 8 8 2 · 24 8 2 · 24 ∆2  π2  ≥ 1 − . R = 2Π − 1. (A4b) 8 2 · 24 B B (A11) Using this rotation we further define a quantum walk step as we did in the main text equation (4), Consequently, ∆ = Θ(δ1/2), and this is the reason why Quantum Walks are quadratically faster than † † W = RBRA = U SURAU SURA. (A5) their classical counterparts. 17

FIG. 9. Geometrical visualization of a quantum walk operator W of Szegedy type. W performs a series of rotations that in the subspace A + B, defined in (A2) with their corresponding rotation operators (A4), may be written as a block diagonal matrix, where each block is a 2-dimensional rotation ωj = R(2ϕj ) given by (A9). This figure represents the direct sum of Grover-like rotations in the subspace spanned by A + B, and therefore W . This quantum walk operator represents equation (4) from section II C.

Appendix B: Mathematical description of the will appear in when sampling from that state, with out-of-equilibrium quantum Metropolis high probability. Thus, we wish to prepare one such algorithm state to be able find the configuration with the low- est energy, in our case the folded state of the protein In the appendix A we have reviewed the Szegedy in nature. quantum walk. In this appendix, we present a quan- One way to construct such distribution πβ is to tum Metropolis-Hasting algorithm based on the use create a rapidly mixing Markov Chain that has as of Szegedy walks. equilibrium distribution πβ. Such Markov Chain is The objective of the Metropolis-Hastings algo- characterized, at a given β, by a transition matrix W rithm is sampling from the Gibbs distribution ρβ = that induces a random walk over the possible states. |πβi hπβ| = Z−1(β) P e−βE(φ) |φi hφ|, where That is, W maps a given distribution p to another φ∈Ω p0 = Wp. Let us introduce some useful definitions: E(φ) is the energy of a given configuration of angles an aperiodic walk is called irreducible if any state of the molecule, β plays the role of the inverse of a in Ω can be accessed from any other state in Ω, al- temperature that will be lowered during the process, though not necessarily in a single step. Additionally, and Z(β) = P e−βE(φ) a normalization factor. φ∈Ω we will say that a walk is reversible when it fulfills Ω represents the configuration space, in our case the the detailed balance condition possible values the torsion angles may take. One can immediately notice that if β is sufficiently large only β β β β the configurations with the lowest possible energy Wj,iπi = Wi,jπj . (B1) 18

Metropolis-Hastings algorithm uses the following The second proposal is to perform the unitary transition matrix heuristic procedure [21] ( T A , if i 6= j |ψ(L)i = WL...W1 |π0i . (B5) W = ij ij (B2) ij 1 − P T A , if i = j, k6=j kj kj This is in some ways the simplest way one would think of quantizing the Metropolis-Hastings algo- where, as given in the main text equation (1), rithm, implementing a quantum walk instead of a   random walk. It is very similar to the classical way of −β(Ei−Ej ) Aij = min 1, e , (B3) performing many steps of the classical walk, slowly increasing β until one arrives to the aim temper- and ature. In addition, this procedure is significantly simpler than the previously explained ones because ( 1 N there is a move connecting j to i it does not require to perform phase estimation on Tij = 0, else, Wt. (B4) Two more innovations are introduced by [21]. In for N the number of possible outgoing movements the first place, a heuristic Total Time to Solution from state j, which we assume to be independent (TTS) is defined. Assuming some start distribution of the current state, as is the case for our particu- if operators W are applied t times, the probability lar problem. In the case of the Metropolis-Hastings of success is given by p(t). In order to be successful algorithm the detailed balance condition is fulfilled with constant probability 1−δ that means repeating with the above definitions. the procedure log(1 − δ)/ log(1 − p(t)) times. The Having defined the Metropolis algorithm, we now total expected time to success is then want to quantize it using the previous section. log(1 − δ) There have been several proposals to quantize the TTS(t) := t , (B6) Metropolis algorithm, as can be seen in the Ap- log(1 − p(t)) pendix B. They often rely on slow evolution from as also indicated in the main text equation (7). For |π i → |π i using variations of Amplitude Ampli- t t+1 the unitary procedure fication, until a large β has been achieved. However, most often this Metropolis algorithms log(1 − δ) are used outside of equilibrium, something that is TTS(L) = L g 2 (B7) log(1 − | hπ |WL...W1|π0i | ) not often worked out in the previous approaches. There are at least two ways to perform the quanti- with πg the ground state of the Hamiltonian. zation of the out-of-equilibrium Metropolis-Hastings Additionally [21] constructed an alternative to algorithm [21]. The first one, called ‘Zeno with Szegedy operator W˜ using a Boltzmann coin, such rewind’ [19] uses a simpler version of previous quan- that the quantum walk operator operates on three tum walks, where instead of amplitude amplifica- registers |xiS |ziM |biC . S stands for the codifica- tion, quantum phase estimation is used on operators tion of the state, M for the codification of possible Wj, so that measuring the first eigenvalue means movements, and C the Boltzmann coin. Their oper- that we have prepared the corresponding eigenvec- ator W˜ is equivalent to the Szegedy operator under 0 tor |ψt i = |πti [17]. Since the eigenvalue gap is ∆t, a conjugation operator Y that maps moves to states −1 the cost of Quantum Phase Estimation is O(∆t ). and viceversa: ⊥ Define measurements Qt = |πti hπt| and Qt = ˜ † † 1 − |πti hπt|, for t an index indicating the step of W = RV B FBV, (B8) the cooling schedule. Performing these measure- ments consists, as we have mentioned, in performing with,

phase estimation of the corresponding operator Wt, X −1/2 −1 ⊥ V : |0iM → N |ji (B9a) at cost ∆t . If a measurement of type Qt indicates a restart of the whole algorithm it is called ‘with- j ⊥ out rewind’. However, if measurement Qt is ob- tained, one can perform phase estimation of Wt−1. B : |xi |ji |0i → 2 2 S M C If | hπt|πt−1i | = Ft , then the transition probability q q  ⊥ ⊥ between Qj ↔ Qt−1 and between Qt ↔ Qt−1 is |xiS |jiM 1 − Ax·zj ,x |0i + Ax·zj ,x |1i 2 given by Ft ; and the transition probability between (B9b) ⊥ ⊥ Qt ↔ Qt−1 and between Qt ↔ Qt−1 is given by 2 1 − Ft , so in a logarithmic number of steps one can F : |xi |ji |bi → |x · zbi |ji |bi (B9c) recover state |πt−1i or |πti. S M C j S M C 19

R : |0i |0i → − |0i |0i M C M C (B9d) coin being in state 1, and R is a reflection operator |jiM |biC → |jiM |biC , (j, b) 6= (0, 0) on state (0, 0) for the coin and movement registers. Although our encoding of the operators and states Here V proposes different moves, B prepares the is slightly different, the algorithm that we have used Boltzmann coin, F flips the bits necessary to pre- is this one, mainly due to its simplicity. pare the new state, conditional on the Boltzmann

[1] U. Consortium, “Uniprot: a worldwide hub of pro- [13] A. Kryshtafovych, T. Schwede, M. Topf, K. Fi- tein knowledge,” Nucleic Acids Research, vol. 47, delis, and J. Moult, “Critical assessment of methods no. D1, pp. D506–D515, 2019. of prediction (casp)—round xiii,” [2] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, Proteins: Structure, Function, and Bioinformatics, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. vol. 87, no. 12, pp. 1011–1020, 2019. Bourne, “The protein data bank,” Nucleic Acids Re- [14] A. Senior, R. Evans, J. Jumper, J. Kirkpatrick, search, vol. 28, no. 1, pp. 235–242, 2000. L. Sifre, T. Green, C. Qin, A. Zidek, A. Nelson, [3] N. Perdig˜ao, J. Heinrich, C. Stolte, K. S. Sabir, A. Bridgland, et al., “Improved protein structure M. J. Buckley, B. Tabor, B. Signal, B. S. Gloss, C. J. prediction using potentials from deep learning,” Na- Hammang, B. Rost, et al., “Unexpected features ture, 2020. of the dark proteome,” Proceedings of the National [15] M. D. Hanwell, D. E. Curtis, D. C. Lonie, T. Van- Academy of Sciences, vol. 112, no. 52, pp. 15898– dermeersch, E. Zurek, and G. R. Hutchison, “Avo- 15903, 2015. gadro: an advanced semantic chemical editor, visu- [4] A. Bhowmick, D. H. Brookes, S. R. Yost, H. J. alization, and analysis platform,” Journal of Chem- Dyson, J. D. Forman-Kay, D. Gunter, M. Head- informatics, vol. 4, no. 1, p. 17, 2012. Gordon, G. L. Hura, V. S. Pande, D. E. Wemmer, [16] P. Wocjan and A. Abeyesinghe, “Speedup via quan- et al., “Finding our way in the dark proteome,” tum sampling,” Physical Review A, vol. 78, no. 4, Journal of the American Chemical Society, vol. 138, p. 042336, 2008. no. 31, pp. 9730–9742, 2016. [17] R. Somma, S. Boixo, and H. Barnum, “Quan- [5] N. Perdig˜aoand A. Rosa, “Dark proteome database: tum simulated annealing,” arXiv preprint studies on dark proteins,” High-throughput, vol. 8, arXiv:0712.1008, 2007. no. 2, p. 8, 2019. [18] R. D. Somma, S. Boixo, H. Barnum, and E. Knill, [6] P. N. Bryan and J. Orban, “Proteins that switch “Quantum simulations of classical annealing pro- folds,” Current Opinion in Structural Biology, cesses,” Physical Review Letters, vol. 101, no. 13, vol. 20, no. 4, pp. 482–488, 2010. p. 130504, 2008. [7] A. K. Dunker, J. D. Lawson, C. J. Brown, R. M. [19] K. Temme, T. J. Osborne, K. G. Vollbrecht, Williams, P. Romero, J. S. Oh, C. J. Oldfield, A. M. D. Poulin, and F. Verstraete, “Quantum metropolis Campen, C. M. Ratliff, K. W. Hipps, et al., “In- sampling,” Nature, vol. 471, no. 7336, p. 87, 2011. trinsically disordered protein,” Journal of Molecular [20] M.-H. Yung and A. Aspuru-Guzik, “A quantum– Graphics and Modelling, vol. 19, no. 1, pp. 26–59, quantum metropolis algorithm,” Proceedings of the 2001. National Academy of Sciences, vol. 109, no. 3, [8] R. Das and D. Baker, “Macromolecular model- pp. 754–759, 2012. ing with rosetta,” Annu. Rev. Biochem., vol. 77, [21] J. Lemieux, B. Heim, D. Poulin, K. Svore, and pp. 363–382, 2008. M. Troyer, “Efficient Quantum Walk Circuits for [9] University of Washington , “Rosetta@home.” Metropolis-Hastings Algorithm,” Quantum, vol. 4, boinc.bakerlab.org, 2021. p. 287, June 2020. [10] R. Das, B. Qian, S. Raman, R. Vernon, J. Thomp- [22] M. Szegedy, “Quantum speed-up of markov chain son, P. Bradley, S. Khare, M. D. Tyka, D. Bhat, based algorithms,” in 45th Annual IEEE Sympo- D. Chivian, et al., “Structure prediction for casp7 sium on Foundations of Computer Science, pp. 32– targets using extensive all-atom refinement with 41, IEEE, 2004. rosetta@ home,” Proteins: Structure, Function, and [23] R. Babbush, A. Perdomo-Ortiz, B. O’Gorman, Bioinformatics, vol. 69, no. S8, pp. 118–128, 2007. W. Macready, and A. Aspuru-Guzik, “Construction [11] W. E. Hart and S. Istrail, “Robust proofs of np- of energy functions for lattice heteropolymer mod- hardness for protein folding: general lattices and els: a case study in constraint satisfaction program- energy potentials,” Journal of Computational Biol- ming and adiabatic quantum optimization,” arXiv ogy, vol. 4, no. 1, pp. 1–22, 1997. preprint arXiv:1211.3422, 2012. [12] B. Berger and T. Leighton, “Protein folding in [24] A. Robert, P. K. Barkoutsos, S. Woerner, the hydrophobic-hydrophilic (hp) is np-complete,” and I. Tavernelli, “Resource-efficient quantum in Proceedings of the Second Annual International algorithm for protein folding,” arXiv preprint Conference on Computational Molecular Biology, arXiv:1908.02163, 2019. pp. 30–39, 1998. [25] A. Perdomo-Ortiz, N. Dickson, M. Drew-Brook, 20

G. Rose, and A. Aspuru-Guzik, “Finding low-energy work for quantum computing,” 2019. conformations of lattice protein models by quantum [40] Amazon.com, Inc., “Amazon web services.” aws. annealing,” Scientific Reports, vol. 2, p. 571, 2012. amazon.com, 2021. [26] M. Fingerhuth, T. Babej, et al., “A quantum al- [41] S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, ternating operator ansatz with hard and soft con- S. He, Q. Li, B. A. Shoemaker, P. A. Thiessen, straints for lattice protein folding,” arXiv preprint B. Yu, et al., “Pubchem 2019 update: improved arXiv:1810.13411, 2018. access to chemical data,” Nucleic Acids Research, [27] T. Babej, M. Fingerhuth, et al., “Coarse-grained vol. 47, no. D1, pp. D1102–D1109, 2019. lattice protein folding on a quantum annealer,” [42] F. Jensen, “Atomic orbital basis sets,” Wiley Inter- arXiv preprint arXiv:1811.00713, 2018. disciplinary Reviews: Computational Molecular Sci- [28] A. Perdomo, C. Truncik, I. Tubert-Brohman, ence, vol. 3, no. 3, pp. 273–295, 2013. G. Rose, and A. Aspuru-Guzik, “Construction of [43] T. Helgaker, P. Jorgensen, and J. Olsen, Molecu- model hamiltonians for adiabatic quantum compu- lar electronic-structure theory. John Wiley & Sons, tation and its application to finding low-energy con- 2014. formations of lattice protein models,” Physical Re- [44] L. Grover and T. Rudolph, “Creating superpositions view A, vol. 78, no. 1, p. 012320, 2008. that correspond to efficiently integrable probability [29] C. Outeiral, G. M. Morris, J. Shi, M. Strahm, S. C. distributions,” arXiv preprint quant-ph/0208112, Benjamin, and C. M. Deane, “Investigating the po- 2002. tential for a limited quantum speedup on protein [45] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, lattice problems,” arXiv preprint arXiv:2004.01118, “Optimization by simulated annealing,” Science, 2020. vol. 220, no. 4598, pp. 671–680, 1983. [30] V. K. Mulligan, H. Melo, H. I. Merritt, S. Slocum, [46] K. Temme, S. Bravyi, and J. M. Gambetta, “Error B. D. Weitzner, A. M. Watkins, P. D. Renfrew, mitigation for short-depth quantum circuits,” Phys- C. Pelissier, P. S. Arora, and R. Bonneau, “De- ical Review Letters, vol. 119, no. 18, p. 180509, 2017. signing peptides on a quantum computer,” bioRxiv, [47] R. LaRose, A. Mari, P. J. Karalekas, N. Shammah, p. 752485, 2020. and W. J. Zeng, “Mitiq: A software package for er- [31] L. Banchi, M. Fingerhuth, T. Babej, C. Ing, and ror mitigation on noisy quantum computers,” 2020. J. M. Arrazola, “Molecular docking with gaussian [48] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, boson sampling,” Science Advances, vol. 6, no. 23, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Is- p. eaax1950, 2020. ard, et al., “Tensorflow: A system for large-scale [32] The α-helix and the β-sheet correspond to two com- machine learning,” in 12th {USENIX} Symposium mon structures found in protein folding. Such struc- on Operating Systems Design and Implementation tures constitute what is called the secondary struc- ({OSDI} 16), pp. 265–283, 2016. ture of the protein, and are characterised because [49] F. Chollet et al., “Keras.” https://github.com/ (φ, ψ) = (−π/3, −π/4) in the α-helix, and (φ, ψ) = fchollet/keras, 2015. (−3π/4, −3π/4) in the β-sheet, due to the hydrogen [50] R. S. Sutton and A. G. Barto, Reinforcement learn- bonds that happen between backbone amino groups ing: An introduction. MIT press, 2018. NH and backbone carboxy groups CO. [51] P. J. Van Laarhoven and E. H. Aarts, “Simulated [33] R. Von Mises, Mathematical Theory of Probability annealing,” in Simulated annealing: Theory and ap- and Statistics. Academic Press, 2014. plications, pp. 7–15, Springer, 1987. [34] J. M. Turney, A. C. Simmonett, R. M. Parrish, [52] Student, “The probable error of a mean,” E. G. Hohenstein, F. A. Evangelista, J. T. Fer- Biometrika, pp. 1–25, 1908. mann, B. J. Mintz, L. A. Burns, J. J. Wilke, M. L. [53] R. Portugal, Quantum walks and search algorithms. Abrams, et al., “Psi4: an open-source ab initio elec- Springer, 2013. tronic structure program,” Wiley Interdisciplinary [54] A. Ambainis, “Quantum walk algorithm for element Reviews: Computational Molecular Science, vol. 2, distinctness,” SIAM Journal on Computing, vol. 37, no. 4, pp. 556–565, 2012. no. 1, pp. 210–239, 2007. [35] L. K. Grover, “Quantum mechanics helps in search- [55] G. D. Paparo and M. Martin-Delgado, “Google in a ing for a needle in a haystack,” Physical Review Let- quantum network,” Scientific Reports, vol. 2, p. 444, ters, vol. 79, no. 2, p. 325, 1997. 2012. [36] F. Magniez, A. Nayak, J. Roland, and M. Santha, [56] G. D. Paparo, M. M¨uller,F. Comellas, and M. A. “Search via quantum walk,” SIAM Journal on Com- Martin-Delgado, “Quantum google in a complex puting, vol. 40, no. 1, pp. 142–164, 2011. network,” Scientific Reports, vol. 3, p. 2773, 2013. [37] T. Albash and D. A. Lidar, “Demonstration of a [57] G. D. Paparo, V. Dunjko, A. Makmal, M. A. scaling advantage for a quantum annealer over sim- Martin-Delgado, and H. J. Briegel, “Quantum ulated annealing,” Physical Review X, vol. 8, no. 3, speedup for active learning agents,” Physical Review p. 031016, 2018. X, vol. 4, no. 3, p. 031002, 2014. [38] E. Alcaide, “Minifold: a deeplearning-based [58] Notice that in most texts the definition of M does mini protein folding engine.” https://github.com/ not explicitly include S. It is assumed implicitly EricAlcaide/MiniFold/, 2019. though. [39] H. Abraham et al., “Qiskit: An open-source frame-