Protein Folding Optimization Using Differential Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Protein Folding Optimization using Differential Evolution Extended with Local Search and Component Reinitialization Borko Boˇskovi´c · Janez Brest This version, created on May 8, 2018, is an electronic reprint of the original article published in the Information Sciences Journal, 2018, Vol. 454-455, pp. 178{199, doi: 10.1016/j.ins.2018.04.072. This reprint differs from the original in pagination and typographic detail. Abstract This paper presents a novel Differential Evo- 1 Introduction lution algorithm for protein folding optimization that is applied to a three-dimensional AB off-lattice model. The protein structure prediction represents the prob- The proposed algorithm includes two new mechanisms. lem of how to predict the native structure of a pro- A local search is used to improve convergence speed and tein from its amino acid sequence. This problem is one to reduce the runtime complexity of the energy calcula- of the more important challenges of this century [17] tion. For this purpose, a local movement is introduced and, because of its nature, it attracts scientists from within the local search. The designed evolutionary algo- different fields, such as Physics, Chemistry, Biology, rithm has fast convergence speed and, therefore, when Mathematics, and Computer Science. Within the pro- it is trapped into the local optimum or a relatively good tein structure prediction, the Protein Folding Optimiza- solution is located, it is hard to locate a better similar tion (PFO) represents a computational problem for sim- solution. The similar solution is different from the good ulating the protein folding process and finding a native solution in only a few components. A component reini- structure. Most proteins must fold into a unique three- tialization method is designed to mitigate this problem. dimensional structure, known as a native structure, to Both the new mechanisms and the proposed algorithm perform their biological function [2]. A protein's func- were analyzed on well-known amino acid sequences that tion is determined by its structure. The inability of a are used frequently in the literature. Experimental re- protein to form its native structure prevents a protein sults show that the employed new mechanisms improve from fulfilling its function correctly, and this may be the efficiency of our algorithm and that the proposed al- the basis of various human diseases [24]. gorithm is superior to other state-of-the-art algorithms. The PFO belongs to the class of NP-hard prob- It obtained a hit ratio of 100% for sequences up to 18 lems [11] and, with current algorithms and computa- monomers, within a budget of 1011 solution evaluations. tional resources, it is possible to predict the native struc- New best-known solutions were obtained for most of the tures of relatively small proteins. The reason for that is arXiv:1710.07031v2 [cs.AI] 6 May 2018 sequences. The existence of the symmetric best-known the huge and multimodal search space. For example, a solutions is also demonstrated in the paper. polypeptide that has only 18 amino acids, will have 31 angles within a simplified AB model (see Section 3). Us- ing uniform discretization with only 10 values for each Keywords Protein folding optimization · Three- angle, there would be 1031 possible configurations. To dimensional AB off-lattice model · Differential evaluate and select the correctly folded conformation evolution · Local search · Component reinitialization among all these conformations in the time elapsed since the Big Bang, we need the huge computational speed of 1031=(4:32 · 1017) = 2:31 · 1013 conformation evalu- B. Boˇskovi´c · J. Brest ations per second. This is much faster than the speed Faculty of Electrical Engineering and Computer Science, obtained within our experiment, where we can evaluate University of Maribor, SI-2000 Maribor, Slovenia only 5:73 · 105 conformations per second. From these E-mail: [email protected], [email protected] numbers, we can see that the search space is huge, even 2 Borko Boˇskovi´c,Janez Brest in the simplified model, which makes this problem very to the ab-initio PFO methods, which optimize struc- hard. However, in reality, the proteins fold into their tures from scratch, and do not require any information native conformation on a time scale of seconds, and about related sequences. It showed a very fast conver- this contradiction is known as Levinthal's paradox [7]. gence speed, and it was capable of obtaining signifi- An optimization algorithm can give good results of a cantly better results than other state-of-the-art algo- PFO problem only if it can locate good solutions and rithms. evaluate solutions efficiently. Here, the approximation Taking into account the finding of the previous para- techniques, such as heuristic and metaheuristic, with graph, we propose two new mechanisms, that, addition- efficient data structures, become the only viable alter- ally, improve the efficiency of our algorithm. A new lo- natives as the problem size increases. cal search mechanism was designed in order to improve Some simplified protein models exist, such as HP convergence speed and to reduce the runtime complex- models within different lattices [5] and the AB off-lattice ity of the algorithm. A similar idea was already used model [28]. Simplified protein models were designed for within the HP model [5], where it is applied to the cubic development, testing, and comparison of different ap- lattice. Using a simple local search mechanism, where proaches. The AB off-lattice model was used in the pa- only one solution's component is changed, can produce per for demonstrating the efficiency of the proposed al- a structure whereby a lot of monomers are moved. This gorithm. This model takes into account the hydropho- means their positions must be recalculated and effi- bic interactions which represent the main driving forces cient energy calculation is not possible. In contrast to of a protein structure formation and, as such, still im- simple local search, our mechanism improves the qual- itates its main features realistically [14]. Although this ity of conformations using the local movements within model is incomplete, it allows the development, test- the three-dimensional AB off-lattice model. We define ing, and comparison of various search algorithms, and a local movement as a transformation of conformation, offers a global perspective of protein structures. It can whereby only two consecutive monomers are moved lo- be helpful in confirming or questioning important the- cally in such a way that the remaining monomers re- ories [3]. main in their positions. The described local movement Our algorithm is based on the Differential Evolu- allows efficient evaluation of neighborhood solutions and tion (DE) algorithm that was proposed by Storn and faster convergence speed. Price [29]. It is a powerful stochastic population-based With the fast convergence speed the algorithm can algorithm. Three simple operators, mutation, crossover, locate good solutions quickly, but it has a problem lo- and selection, were used inside the DE algorithm to cating good similar solutions. For example, if an al- transform real-coded individuals with the purpose to gorithm locates a good solution that is different from locate optimal or sub-optimal solutions. Because of its the global best solution in only one or few components, simplicity and efficiency, it was used in various numer- then the random restart, that was used in our previ- ical optimization problems, such as an animated trees ous work, is not an efficient solution. For that purpose, reconstruction [36], an intrusion detection [1], and an a component reinitialization was designed and incor- image thresholding [25]. An advanced DE variant, such porated within our algorithm. This mechanism is em- as L-SHADE [30] was also the winner of the recent CEC ployed when the local best solution is detected. Instead (IEEE Congress on Evolutionary Computation) compe- of the random restart, it produces similar solutions that titions. For more details about DE, we refer the reader are different from the local best solution in only a few to [27] and to survey [10]. components. It has been shown that the PFO has a highly rugged We called the proposed algorithm DElscr and it was landscape structure containing many local optima and tested on two sets of amino acid sequences that were needle-like funnels [16], and, therefore, the algorithms used frequently in the literature. The first set included that follow more attractors simultaneously are ineffec- 18 real peptide sequences, and the second set included tive. In our recent work [4], to overcome this weakness, 4 well-known artificial Fibonacci sequences with differ- we proposed a Differential Evolution (DE) algorithm ent lengths. Experimental results show that the pro- that uses the DE=best=1 =bin strategy. With this strat- posed mechanisms improve the efficiency of the algo- egy, our algorithm follows only one attractor. The tem- rithm, and the algorithm is superior to other state- poral locality mechanism [35] and self-adaptive mecha- of-the-art algorithms. Its superiority is especially ev- nism [6] of the main control parameters were used addi- ident for longer sequences. With the proposed algo- tionally to speed up the convergence speed. When the rithm, that is stochastic, we cannot prove the optimal- algorithm was trapped in a local optimum, then ran- ity of the obtained conformations. However, we can in- dom reinitialization was used. This algorithm belongs fer about them according to the observed hit ratio. The Protein Folding Optimization using DE with Local Search and Component Reinitialization 3 experimental results show that our algorithm obtained the well-known gradient method. The statistical tem- a hit ratio of 100% for sequences that contain up to perature molecular dynamics based algorithm \statisti- 18 monomers. For all longer sequences, we can only cal temperature annealing" was applied to an AB model report the best-known conformations that are almost in [18].