New encoding for implementing lattice folding problems on a quantum annealer

Ryuji Takagi1, 2 1Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA 2Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA (Dated: May 18, 2018) We introduce a new encoding scheme that maps a lattice protein model to a problem Hamiltonian for the quantum annealing protocol. Our encoding scheme is a space-based encoding approach inspired by the diamond encoding method, which allocates the qubits to the locations that ith can possibly reach. Compared to the diamond encoding, our method uses the exponentially smaller number of qubits to encode the position of each amino acid at the trade-off with a higher level of non-locality. We apply our scheme to a simple example—a chain with four amino acids on the 2D lattice— and numerically obtain the energy spectrum during the annealing process. We confirm that the ground state indeed corresponds to the native configuration, and illegal configurations are appropriately suppressed by large energy penalty.

I. INTRODUCTION present atoms driven by the empirically obtained force field in the system (open libraries for the force field in- clude AMBER [17] and CHARMM [18]). However, it The problem has been one of the is computationally demanding to fully reconstruct the most important topics in biology. A protein consists whole dynamics of the protein folding until it reaches of a chain of amino acid residues, which are connected the native structure. to each other by the covalent bonds. Many of the The lattice protein model is something that sits at form three-dimensional structures, and vast the opposite edge to the all-atom model. It is a sim- of investigations indicate that the structure of a pro- plified model where all the amino acids are to be situ- tein, called the native configuration, is encoded into ated on lattice points. Energy function is determined the sequence of amino acids of the chain. It is the by the interaction due to the non-covalent bonds that configuration that minimizes the free energy, which are exerted between the acids that are positioned next takes into account the energy of the system, which is to each other. prediction is turned dominated by non-covalent interactions [1], and the into a computational optimization problem where the entropy, which is the contribution from thermal fluc- energy function is to be minimized. Despite the sim- tuation [2–8]. Functionality of the protein is closely plicity of the model, it has proven to be helpful to related to the three-dimensional structure, so clearly extract a statistical description of the folding [7]. folding process plays an essential role in biological sys- tems. For instance, misfolding can lead to a reduced Even inside the lattice model, one still needs to functionality of the protein [9], which causes some dis- decide the model for interaction and the dimension- eases including the cancer [10–12]. ality of the lattice. The simplest interaction model is the hydrophobic-polar (HP) model where amino There are a couple of approaches to investigate the acids are only labeled as either hydrophobic (H) or protein folding problem. One of the important as- polar (P), and we assume negative energy only be- pects to understand is the dynamics of the folding tween two H type acids [19]. It turns out that even process. The question is how proteins realize their in this simplest model, it is hard to find the global native configurations in such a short amount of time minimum of the energy function; it has been found to (on the order of microseconds [13]) among an astro- be NP-complete [20, 21]. Due to this intrinsic hard- nomical number of possible configurations [14]. An- ness, tractable size of the system is somewhat limited; other important problem is the protein structure pre- what has been reported is 36 acids for 3D with an diction, where one is to identify the native configura- exact algorithm [22] and 136 acids with a heuristic al- tion given some sequence of amino acids. It bridges a gorithm [23]. Also, it has turned out that HP model microscopic description of an amino-acid sequence to is not good enough to capture some of the key prop- a complex structure of the protein and gives insights erties of a real problem, such as cooperativity [24, 25]. into the functionality of the protein. We will focus on To remedy this, more realistic interaction model, such the protein structure problem in this report. as Miyazawa-Jernigan (MJ) model [26], needs to be Simulation has been a powerful tool to investigate considered, but it puts additional complexity to the the protein folding, and various models have been em- problem. The challenge is ultimately reduced to the ployed. For instance, simulation with the all-atom hardness of the computational optimization problem. model has been an active field of research [13, 15, 16]. Perdomo et al. tackled this difficulty by having The idea is to simulate the molecular dynamics of the a quantum device solve the optimization problem 2 by quantum annealing [27, 28]. Quantum anneal- consists of Pauli Z operators, such as Ising model P ing [29, 30] is an optimization protocol inspired by the Hp = ij JijZiZj. Clearly, the eigenstates of Hp simulated annealing [31] where it uses the quantum are computational basis states, but finding the ground fluctuation instead of the thermal fluctuation. It has state is hard in general. Now, let us consider the fol- been argued that the quantum annealing may show a lowing time-dependent Hamiltonian speedup over classical algorithms because of the quan- tum effect such as quantum tunneling, which may al- H(s) = sHp + (1 − s)Hi, 0 ≤ s ≤ 1 (1) low the temporary solution stuck at a local minimum to tunnel through the energy gap to reach the global where Hi is some initial Hamiltonian, which does not minimum [32]. It has been still unclear whether such commute with H . We choose H such that the ground a speedup is possible, and it has been under active in- p i state is easy to prepare. For instance, Hi can be cho- vestigation. Although it is unlikely that it realizes an P sen as Hi = − i Xi, whose ground state is clearly exponential speedup on a NP-complete problem, there ⊗n |+i where |+i = √1 (|0i + |1i). The protocol works is still a hope that it could give a polynomial or even 2 a constant factor speedup, which makes a practical as follows. We prepare the ground state of H(0) = Hi. difference. We then gradually increase s from 0 to 1. According The crucial part of this approach is to come up with to the quantum adiabatic theorem [36], if the change an encoding scheme from the lattice protein folding in the Hamiltonian is slow enough, the state always problem to a Hamiltonian that can be implemented on stays in the ground state of the Hamiltonian. Thus, the currently-available quantum device [33, 34]. Sev- at the end of the protocol for s = 1, the state should eral schemes have been proposed [27, 28, 35], each of be the ground state of Hp, which is the answer we which has its own advantage and disadvantage. For wanted. How slow it should be depends on the min- instance, the “turn encoding” used in [28] is efficient imum gap of the Hamiltonian. Specifically, let T be in required number of qubits, but it comes with high the running time of the algorithm. Then, to ensure level of complication in the Hamiltonian. The “dia- the final state remains the ground state, it is suffi- ˙ 2 mond encoding” proposed in [35], on the other hand, cient to take T  maxs ||H(s)||/∆ where ∆ is the allows for a simple construction of the Hamiltonian, minimum gap of H(s) [36]. It means that if the gap is but it is inefficient in the required number of qubits. only polynomially small, the algorithm runs in poly- In this report, we provide a new encoding scheme nomial time whereas it takes exponential time for the that sits in the middle of these two. Our scheme al- Hamiltonian with an exponentially small gap. lows for a straightforward construction of the Hamil- tonian while greatly reducing the required number of qubits compared to the diamond encoding. It is in- III. LOG DIAMOND ENCODING spired by the diamond encoding but uses exponen- tially less number of qubits to encode the location of amino acids. Because of this property, we call our en- Our construction is inspired by the diamond encod- coding scheme log-diamond encoding. We first briefly ing method proposed in [35]. The idea of the diamond review the quantum annealing and provide a detailed encoding is to assign qubits for ith acid only to the explanation on how the encoding works. We then ap- lattice points that are possibly reached by the ith acid. ply it to a simple example and numerically check that The diamond encoding allocates one qubit to each 2 it gives the right configuration as its solution. possible lattice point, so it requires i qubits when i is even and i2 − 1 qubits when i is odd for encoding the location of the ith acid. Our encoding solves this inefficiency by encoding it into exponentially smaller II. LATTICE PROTEIN FOLDING ON A numbers of qubits in the trade-off for locality of the in- QUANTUM ANNEALER teraction. Although we explain for the case of 2D, ex- tension to higher dimensions is straightforward. Imag- Here, we briefly review the quantum annealing pro- ine the 2D lattice specified by the x, y coordinate tocol from the perspective of the computational opti- (x, y), and suppose that the first acid is positioned mization problem. at (0, 0). The second acid may be positioned either We use Xi and Zi to denote Pauli X and Z oper- at (1, 0), (0, 1), (−1, 0), (0, −1). In general, the i + 1th ators acting on the ith qubit, and we call {|0i , |1i} acid can be only positioned at the point whose dis- computational basis where they are the eigenstates of tance from the point at which the ith acid is 1. To see Z, i.e. Z |0i = |0i ,Z |1i = − |1i. The space of n- the structure of the reachable lattice points, it is con- qubit states is spanned by the tensor product of the venient to introduce the ‘layer’, which is a set of lat- single-qubit computational basis states. There are 2n tice points positioned on the edges of a diamond. Let of those basis states. Suppose we would like to ob- Lk = {(x, y)| x, y ∈ Z, |x+y| = k −1∨|x−y| = k −1}, tain the ground state of the Hamiltonian Hp that only then the ith acid can be only positioned at the point 3 in 1100 ( L ∪ L ∪ ...L ∪ L (i : even) 0010 010 0100 i i−2 4 2 (2) Li ∪ Li−2 ∪ ...L3 ∪ L1 (i : odd) 1010 110 10 100 1000

The location of an acid is completely determined by 0110 001 01 specifying a layer and a point in the layer, each of 00 000 0000 which we allocate qubits to. Specifically, we express L2 the position of the ith acid by siqi where si = 1110 101 11 111 1101 L3 si1si2 . . . siTi and qi = qi1qi2 . . . qiRi are the qubits labeling the layer and the position in the layer respec- 0001 011 0101 L tively. For the ith acid, there are di/2e possible layers 4 1001 so Ti = logdi/2e. The required number of qubits to specify the point in a layer depends on the layer we are looking at since layer L includes 4k lattice points. Figure 1. Schematics of the log diamond encoding. Differ- k ent colors represent different layers, and points in a layer However, since we need to use the same number of are labeled in the binary form (the left-most bit being the qubits to specify the ith acid independent of the layer least significant bit) counted counter-clockwise from the specified by si, we will take Ri = logd4(i − 1)e, the right most point. maximum possible number of qubits required, and use from the left-most qubits qi1 to encode the position in the layer. We label the points in the layer counter- will not take the ground state of the total Hamilto- clockwise from the right-most point in the layer as nian. Let us consider the Hamiltonian of the form shown in Fig.1. There are some unused qubits in qi for the encoding depending on the value of si. We will Hp = Hoverlap + Hconnect + Hpair. (4) see later that the values of those qubits do not mat- ter because they do not contribute to energy function. where first two are terms penalizing the illegal config- The number of qubits required to express the position urations, and the third term takes care of the interac- of ith qubit is then Ti + Ri = logdi/2e + logd4(i − 1)e, tion between acids. which is exponentially smaller than the number of Before going into details about the construction of qubits required in the diamond encoding. For in- each term, let us introduce some functions that will be stance, for i = 4, we get Ti = 1 and Ri = 4. To specify useful in the later discussions. As we saw, the lattice the point (1, 0), the qubit expression takes s4 = 0 (be- point for the ith acid is completely determined by si cause (1, 0) ∈ L2 and L2 is the first possible layer for and qi, so a map from the qubit expression to the x, y i = 4) and q4 = 00 ∗ ∗ where ∗ can be either 0 or coordinate is well-defined. We write such a map as 1, each of which gives the same energy. Also, note ri(si, qi). For instance, r4(1, 1000) = (2, 1) because that we do not need to allocate any qubit for i = 1 s4 = 1 specifies the layer L4 (for i = 4, there are two or si for i = 2, 3 because we assume the first acid is possible layers, L2 and L4, so s4 = 0 refers to L2 and always positioned at (0, 0), and there is only one layer s4 = 1 refers to L4) and q4 = 1000 specifies the second to consider for i = 2, 3. Furthermore, we can take point in L4 reached by going counter-clockwise start- q2 = 00 and q32q33 = 00 without loss of generality ing from the right-most point (3, 0). We also define because of the symmetry. The quantum state for the the projector chain consisting of N acids then takes the form x˜ik x I + (−1) Zik Pik(˜xik) = (5) |00i |q3100i |s4q4i ... |sN qN i . (3) 2 where x = s, q and Zx denotes the Pauli Z acting on which uses PN (logdi/2e + logd4(i − 1)e) + 1 qubits. ik i=4 the qubit corresponding to x . H gives energy Next, we will consider the Hamiltonian such that ik overlap penalty to the configurations where more than one the configuration with minimum energy realizes the acids occupy the same lattice point. We realize it by ground state of the Hamiltonian. Note that taking taking into account the interaction between neighboring in- teractions is not sufficient for constructing an appro- N X X ˆ priate Hamiltonian, because it may take the ground Hoverlap = λoverlap fij (6) state that represents illegal configurations. For in- i=2 j>i stance, we do not allow for configurations where more than one acids occupy the same lattice point or two where λoverlap > 0 is a constant setting the amount ˆ acids that are connected by covalent bond are not po- of the energy penalty, and fij is the operator acting sitioned next to each other. We need to penalize these on the qubits corresponding to ith and jth acids that configurations by giving extra energy penalty so they gives 1 when acting on the state where the lattice point 4 for these two acids coincide and 0 otherwise. Explic- so the ground state energy takes zero with the ground ˆ itly, fij is written by state being the tensor product of |+i.

0     ˆ X Y Y fij =  Pik(˜sik)Pil(˜qil)  Pjk(˜sjk)Pjl(˜qjl) IV. EXAMPLE k,l k,l (7) Let us apply our construction to a simple example. We consider the HP model and the chain consisting of P0 where means the sum over s˜iq˜i and s˜jq˜j such that four acids with the type H-P-P-H. (It does not mat- ri(s˜i, q˜i) = rj(s˜j, q˜j). ter if we consider HP model or MJ model for 4-acid Hconnect gives energy penalty to the configurations chain because they have the same native configura- where two acids connected by covalent bond are not tion). We use this simple example with an obvious positioned next to each other on the lattice. It can be native configuration in order to check that our encod- constructed as ing properly outputs the desired native configuration N−1 as a ground state. The basis states are composed by X Hconnect = λconnect(N − 3) − λconnect gˆii+1 (8) 10 qubits |00i |q3100i |s4q41q42q43q44i and it spans the 6 i=3 space with dimension 2 . Hoverlap only involves the second acid and the where λconnect > 0 is a constant, and gˆii+1 is the operator acting on the qubits corresponding to ith and fourth acid due to the construction. The expression i + 1th acids that gives 1 when acting on the state for the second acid is already fixed as q2 = 00. The where the lattice point for these neighboring acids are fourth acid would take the same position if and only next to each other on the lattice and 0 otherwise. The it takes s4 = 0 and q4 = 00 ∗ ∗ where ∗ denotes any first term is added so the energy becomes zero for valid value either 0 or 1. Thus, we get  s   q   q  configurations. Importantly, we should set λoverlap  1 + Z41 1 + Z41 1 + Z42 Hoverlap = λoverlap . λconnect so illegal configurations will not be allowed. 2 2 2 gˆii+1 is explicitly written by (11)   where we take λ = 20. P00 Y overlap gˆii+1 =  Pik(˜sik)Pil(˜qil) Hconnect takes care of the bonds between the third k,l and the fourth acid. TableI shows the combinations   that contribute to Hconnect, which composes gˆii+1 in Y (8). Here, we take λconnet = 10. ×  Pi+1k(˜si+1k)Pi+1l(˜qi+1l) (9) k,l q31 s41 q41 q42 q43 q44 P00 where means the sum over s˜iq˜i and s˜i+1q˜i+1 such 0 0 0 0 ∗ ∗ that ||ri(s˜i, q˜i) − ri+1(s˜i+1, q˜i+1)|| = 1. 1 0 0 0 0 Finally, Hpair takes into account the interaction en- 1 1 0 0 0 ergy between acids that are not bonded by covalent 1 0 0 1 1 bond but sit next to each other on the lattice. It is written by the form 1 0 0 0 ∗ ∗ 0 1 0 ∗ ∗ N−3 X X 1 1 0 0 0 Hpair = cijgˆij (10) 1 0 1 0 0 i=1 j>i+2 where cij ≤ 0 is the matrix determined by the inter- Table I. Combinations of q31 and s4q4 that contribute to action model. For instance, in HP model, cij = −1 Hconnect. if ith acid and jth acid are hydrophobic and cij = 0 otherwise. In MJ model, cij takes the value in the Finally, Hpair only involves the first and the fourth interaction table in [26]. To exclude illegal configura- acids because it is the only pair that could have the tions, we need to set λconnect  maxij |cij|. interaction. The fourth acid can interact with the first Now that we construct the problem Hamiltonian acid if and only if s4 = 0. Thus, H , the annealing process is ready to be run. For  s  p 1 + Z41  x  H = c (12) P000 I−Xik pair 14 the initial Hamiltonian, we choose Hi = 2 2 x where x = s, q and Xik denotes the Pauli X acting where we take c14 = −1. P000 on the qubit corresponding to xik, and denotes Fig.2 shows the obtained ground state and the 1st the summation over all the qubits except four prede- excited state and corresponding configurations and en- termined qubits in (3). We choose this projector form ergy. The ground state appropriately corresponds to 5

q s q q q q configuration energy 31 41 41 42 43 44 of the protocol. Interestingly, this degeneracy may 1 0 1 0 * * -1 help to reduce the error due to the thermal excitation because even if the initial ground state gets excited 0 1 0 0 0 0 0 into an excited state at some point where the spectral 0 1 0 0 0 0 gap becomes small, it will eventually go to the desired ground state as long as it jumps to the third excited 0 1 0 0 0 0 state or less.

0 1 0 0 0 0 V. CONCLUSIONS

0 1 0 0 0 0 We have introduced a new encoding scheme that maps a lattice protein folding problem to a Hamilto- Figure 2. Obtained solutions for the configurations with nian that can be implemented on a quantum annealer. the ground energy and the first excited energy and their It uses exponentially smaller number of qubits than corresponding configurations. ∗ denotes either 0 or 1. Four the original diamond encoding method while keeping states take the energy −1 with the same configuration, and the simplicity of the construction of the Hamiltonian. five states take the energy 0 with different legal configura- The smaller use of the qubits is made possible at the tions. cost of a higher level of non-locality comparing to the diamond encoding, which is inherently 2-local. Our construction is applicable to any interaction model 20.0 from HP model to MJ model, and it can be also ex- 17.5 tended to 3D lattice in a straightforward manner. We 15.0 applied our encoding scheme to a simple example and 12.5 confirmed that the ground states indeed correspond 10.0 to the native configuration that minimizes the energy.

energy 7.5 The degeneracy of the ground states inherent from the

5.0 construction may make the process robust against the

2.5 thermal noise.

0.0 Because of the space-based nature of the encoding, one could employ heuristic arguments saying that the 0.0 0.2 0.4 0.6 0.8 1.0 s folding tends to take place in rather a restricted region of the lattice [37–39]. It is a physically reasonable sim- Figure 3. Computed eigenenergies during the annealing plification but it cannot be directly applied to another process with respect to the parameter s. qubit efficient encoding such as turn encoding [35]. It would be an interesting future work to investigate how our construction can be combined with such heuristic the expected native configuration, and it is irrelevant approaches. to the values of the last qubits which do not contribute It would be also interesting to try implementing to the energy function. The first excited states corre- our encoding on the most recent quantum device with spond to five possible legal states that do not minimize 2048 qubits [40]. When Perdomo et al. performed the energy. The other states with higher energies cor- their experiment, the quantum device back then only respond to the illegal states such as the ones where had 128 qubits (with 115 qubits being confirmed func- either overlap or violation of the covalent bonds oc- tional). Now that much more qubits are available, curs. Fig.3 shows the energy spectrum during the an- more complex proteins could be simulated, and our nealing process. It can be seen that some number of encoding may be suitable at such a large N region configurations having the same energy end up making because of the possible heuristic approach available bundles and each bundle gets separated into the en- that becomes more effective as the number of acids ergy corresponding to thier configurations at the end becomes large.

[1] G. D. Rose, P. J. Fleming, J. R. Banavar, and A. Mar- [5] V. S. Pande, Physical review letters 105, 198101 itan, Proceedings of the National Academy of Sciences (2010). 103, 16623 (2006). [6] K. A. Dill, S. B. Ozkan, M. S. Shell, and T. R. Weikl, [2] C. B. Anfinsen, Science 181, 223 (1973). Annu. Rev. Biophys. 37, 289 (2008). [3] E. Shakhnovich, M. Karplus, et al., nature 369, 248 [7] L. Mirny and E. Shakhnovich, Annual review of bio- (1994). physics and biomolecular structure 30, 361 (2001). [4] G. M. Crippen, Biochemistry 30, 4232 (1991). [8] E. I. Shakhnovich, Physical Review Letters 72, 3907 6

(1994). formatics 40, 543 (2000). [9] A. Smith, Nature 426, 883 (2003). [25] H. Kaya and H. S. Chan, Proteins: Structure, Func- [10] J. Laurén, D. A. Gimbel, H. B. Nygaard, J. W. tion, and Bioinformatics 40, 637 (2000). Gilbert, and S. M. Strittmatter, Nature 457, 1128 [26] S. Miyazawa and R. L. Jernigan, Journal of molecular (2009). biology 256, 623 (1996). [11] S. B. Prusiner, Proceedings of the National Academy [27] A. Perdomo, C. Truncik, I. Tubert-Brohman, G. Rose, of Sciences 95, 13363 (1998). and A. Aspuru-Guzik, Physical Review A 78, 012320 [12] M. D. Scott and J. Frydman, in Protein misfolding (2008). and disease (Springer, 2003) pp. 67–76. [28] A. Perdomo-Ortiz, N. Dickson, M. Drew-Brook, [13] P. L. Freddolino, C. B. Harrison, Y. Liu, and G. Rose, and A. Aspuru-Guzik, Scientific reports 2, K. Schulten, Nature physics 6, 751 (2010). 571 (2012). [14] C. Levinthal, Mossbauer spectroscopy in biological [29] T. Kadowaki and H. Nishimori, Physical Review E systems 67, 22 (1969). 58, 5355 (1998). [15] D. A. Beck and V. Daggett, Methods 34, 112 (2004). [30] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser, [16] M. Karplus and J. Kuriyan, Proceedings of the Na- arXiv preprint quant-ph/0001106 (2000). tional Academy of Sciences of the United States of [31] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, sci- America 102, 6679 (2005). ence 220, 671 (1983). [17] D. A. Case, T. A. Darden, T. r. Cheatham, C. L. Sim- [32] T. Albash and D. A. Lidar, Reviews of Modern merling, J. Wang, R. E. Duke, R. Luo, M. Crowley, Physics 90, 015002 (2018). R. C. Walker, W. Zhang, et al., Amber 10, Tech. Rep. [33] M. W. Johnson, M. H. Amin, S. Gildert, T. Lanting, (University of California, 2008). F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Jo- [18] A. D. MacKerell Jr, D. Bashford, M. Bellott, R. L. hansson, P. Bunyk, et al., Nature 473, 194 (2011). Dunbrack Jr, J. D. Evanseck, M. J. Field, S. Fischer, [34] R. Harris, M. Johnson, T. Lanting, A. Berkley, J. Jo- J. Gao, H. Guo, S. Ha, et al., The journal of physical hansson, P. Bunyk, E. Tolkacheva, E. Ladizinsky, chemistry B 102, 3586 (1998). N. Ladizinsky, T. Oh, et al., Physical Review B 82, [19] K. F. Lau and K. A. Dill, Macromolecules 22, 3986 024511 (2010). (1989). [35] R. Babbush, A. Perdomo-Ortiz, B. O’Gorman, [20] B. Berger and T. Leighton, Journal of Computational W. Macready, and A. Aspuru-Guzik, arXiv preprint Biology 5, 27 (1998). arXiv:1211.3422 (2012). [21] P. Crescenzi, D. Goldman, C. Papadimitriou, A. Pic- [36] A. Messiah, Quantum Mechanics [Vol 1-2]. (1964). colboni, and M. Yannakakis, Journal of computa- [37] D. Baker, Nature 405, 39 (2000). tional biology 5, 423 (1998). [38] M. T. Oakley, D. J. Wales, and R. L. Johnston, The [22] R. D. Schram, G. T. Barkema, and R. H. Bisseling, Journal of Physical Chemistry B 115, 11525 (2011). Journal of Statistical Mechanics: Theory and Exper- [39] E. I. Shakhnovich, Folding and Design 1, R50 (1996). iment 2011, P06019 (2011). [40] E. Gibney, Nature News 541, 447 (2017). [23] L. Toma and S. Toma, Protein Science 5, 147 (1996). [24] H. S. Chan, Proteins: Structure, Function, and Bioin-