Protein Secondary Structure Prediction by Fuzzy Min Max Neural Network with Compensatory Neurons
Total Page:16
File Type:pdf, Size:1020Kb
Protein Secondary Structure Prediction by Fuzzy Min Max Neural Network with Compensatory Neurons Sudipta Saha Protein Secondary Structure Prediction by Fuzzy Min Max Neural Network with Compensatory Neurons Thesis submitted in partial fulfillment of the requirements for the degree Of Master of Technology In Computer Science & Engineering By Sudipta Saha (Roll No. 06CS6023) Under the supervision of Prof. Jayanta Mukhopadhyay Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur-721302 West Bengal, India May, 2008. Department Of Computer Science & Engineering Indian Institute of Technology Kharagpur-721302, India Certificate This is to certify that the thesis titled “Protein Secondary Structure Prediction by Fuzzy Min-Max Neural Network with Compensatory Neurons”, submitted by Sudipta Saha, to the Department of Computer Science and Engineering, in partial fulfillment for the award of the degree of Master of Technology is a bona fide record of work carried out by him under my supervision and guidance. The thesis has fulfilled all the requirements as per the regulations of the institute and, in my opinion, has reached the standard needed for submission. Prof. Jayanta Mukhopadhyay Dept. of Computer Science and Engineering Indian Institute of Technology Kharagpur -721302, India Dedicated To My parents and wife Acknowledgement I take this opportunity to express my deep sense of gratitude to my guide Dr. Jayanta Mukhopadhyay for his guidance, support and inspiration throughout the duration of the work. I would like to specially acknowledge the help and encouragement I received from Dr. P. K. Biswas of Department of Electronics and Electrical Communication Engineering and Dr. A. K. Majumdar of Department of Computer Science & Engineering, IIT Kharagpur. In addition, I am thankful to all the faculty members, staffs and research scholars of the Department of Computer Science and Engineering and my friends for providing me adequate help whenever required. I am also grateful to my parents for their constant encouragement and financial support in my years of studies. Lastly I would like to thank my wife Sipra Saha, for all the love. Sudipta Saha Dept. of Computer Science and Engineering Indian Institute of Technology Kharagpur -721302, India Date: May 05, 2008 i Abstract Neural Networks are extensively being used now-a-days for predicting the three dimensional structures of the proteins. Different types of neural networks are employed for this work still now. There are several levels of three dimensional structures of the proteins. In our work a special kind of neural network has been employed for predicting the secondary structure of the proteins from the primary structure. This neural network combines the neural network concept with the fuzzy logic. The method of prediction also uses the algorithm described by Chou– Fasman [4] to break the ties between different classes of predictions. The basic algorithm used here for the fuzzy min-max neural network has been taken from [3]. Some small drawbacks of the training algorithm has been identified and removed as a part of our work. The prediction is tried with the improved neural network. Apart from these, some domain knowledge relating to the nature of the protein secondary structures are also used to post–process the prediction output of the basic neural network to get improved prediction accuracy. So far more than 25000 of proteins have been sequenced and the three dimensional structures of these proteins are also determined. In our work it has been tried to extract as much information as possible from the already sequenced proteins. To achieve this, we have employed multiple instances of the neural network and trained them with different set of data. The protein data bank is used as a primary resource for the protein data for both training and testing our prediction system. The overall accuracy (Q3) achieved is around 70%. It is better than existing statistics based prediction systems like Chou-Fasman [1], GOR I [2] and it is comparable to some of the neural network based systems. ii Contents List of Figures v List of Tables vi 1. Introduction ............................................................................... 1 1.1 Motivation of the Work ..................................................................... 1 1.2 Objective of the Work ....................................................................... 2 1.3 Organization of the Thesis ................................................................ 3 2. Protein Structure ....................................................................... 5 2.1 Introduction ..................................................................................... 5 2.2 Molecular Structure of Protein .......................................................... 6 2.3 3D Structures of Protein .................................................................. 7 2.4 Ramachandran Diagram ................................................................ 10 2.5 Different Secondary Structures of Protein ....................................... 12 2.6 Levinthal’s Paradox ........................................................................ 14 2.7 Summary ....................................................................................... 15 3. Literature Survey .................................................................... 16 3.1 Introduction ................................................................................... 16 3.2 Chou-Fasman Method .................................................................... 17 3.3 GOR Method .................................................................................. 19 3.4 PhD Method ................................................................................... 20 3.5 PSI-Pred Method ............................................................................ 21 3.6 JPred Method ................................................................................. 22 3.7 Summary ....................................................................................... 23 iii Contents 4. Neural Network Architecture ................................................... 24 4.1 Introduction ................................................................................... 24 4.2 FMNN ............................................................................................ 25 4.3 FMCN ............................................................................................ 31 4.4 Improvements on FMCN ................................................................. 40 4.5 Summary ....................................................................................... 48 5. Secondary Structure Prediction with Improved FMCN ............. 49 5.1 Introduction ................................................................................... 49 5.2 Application of FMCN ...................................................................... 50 5.3 Accuracy Measurement Techniques ................................................ 59 5.4 Complexity in Using FMCN ............................................................. 62 5.5 Experimental Results ..................................................................... 63 5.6 Summary ....................................................................................... 68 6. Multiple Instantiations of FMCN-units ..................................... 69 6.1 Introduction ................................................................................... 69 6.2 System Architecture ....................................................................... 70 6.3 Experimental Results ..................................................................... 78 6.4 Summary ....................................................................................... 85 7. Conclusion and Future Works .................................................. 86 7.1 Conclusion ..................................................................................... 86 7.2 Future Work .................................................................................. 88 Appendix A ....................................................................................... 90 Appendix B ....................................................................................... 91 Appendix C ....................................................................................... 94 Appendix D .................................................................................... 110 Bibliography ................................................................................... 114 iv List of Figures Fig 2.1 ....................................................................................................... 6 Fig 2.2 ....................................................................................................... 8 Fig 2.3 ....................................................................................................... 9 Fig 2.4 ....................................................................................................... 9 Fig 2.5 ..................................................................................................... 10 Fig 2.6 ..................................................................................................... 11 Fig 2.7 ..................................................................................................... 12 Fig 2.8 ....................................................................................................