Using Machine Learning to Determine Fold Class and Secondary Structure Content from Raman Optical Activity and Raman Vibrational Spectroscopy

Using Machine Learning to Determine Fold Class and Secondary Structure Content from Raman Optical Activity and Raman Vibrational Spectroscopy

Using Machine Learning to Determine Fold Class and Secondary Structure Content from Raman Optical Activity and Raman Vibrational Spectroscopy A thesis submitted to the University of Manchester for the degree of MPhil in the Faculty of Life Sciences 2012 Myra Kinalwa-Nalule 1 Table of Contents Table of Contents ................................................................................................................... 2 List of Appendices.................................................................................................................6 List of Figures ....................................................................................................................... 7 List of Tables ....................................................................................................................... 10 List of Abbreviation ............................................................................................................ 12 Abstract ............................................................................................................................... 13 Declaration .......................................................................................................................... 15 Copyright Statement ............................................................................................................ 16 Acknowledgements ............................................................................................................. 17 1. Protein Structure .............................................................................................................. 18 1.1 Introduction ................................................................................................................... 18 1.2 Protein structure ............................................................................................................ 20 1.2.1 Primary, secondary and tertiary protein structure ...................................................... 20 1.2.2 The helix structure ...................................................................................................... 23 1.2.3 The β-sheet structure .................................................................................................. 24 1.2.4 Disordered structure ................................................................................................... 25 1.2.5 Protein Motifs ............................................................................................................. 27 1.3 Classification of proteins and protein databases ........................................................... 29 1.4 Determination of protein structure ................................................................................ 34 1.5 References ..................................................................................................................... 37 2. Vibrational Spectroscopy ................................................................................................ 42 2.1 Introduction ................................................................................................................... 42 2.2 Vibrational Energies ..................................................................................................... 42 2.3 Raman Spectroscopy ..................................................................................................... 43 2 2.3.1 The Raman Effect ...................................................................................................... 46 2.4 Raman Optical Activity (ROA) .................................................................................... 47 2.3.1 Basic ROA Theory ..................................................................................................... 49 2.5 Spectroscopic Structural Band Assignments ................................................................ 52 2.6 Circular Dichroism ........................................................................................................ 53 3. Chemometrics analysis of vibrational spectroscopy ....................................................... 56 4. References ....................................................................................................................... 57 3. Machine Learning ........................................................................................................... 62 3.1 Introduction ................................................................................................................... 62 3.2 Support Vector Machines (SVM) Classification .......................................................... 64 3.2.1 Soft Margin ................................................................................................................ 67 3.2.2 Kernel Methods .......................................................................................................... 68 3.3 SVM Regression ........................................................................................................... 72 3.4 Random Forests ............................................................................................................. 75 3.4.1 Introduction ................................................................................................................ 75 3.4.2 Gini Index ................................................................................................................... 76 3.4.3 Random Forest prediction ........................................................................................ 77 3.4.5 Out-of-Bag (OOB) data .............................................................................................. 77 3.4.5 Distance Metric .......................................................................................................... 77 3.5 Partial Least Squares (PLS) Regression ....................................................................... 79 3.5.1 Introduction ................................................................................................................ 79 3.5.2 The PLS Model .......................................................................................................... 80 3.5.3 PLS number of components ....................................................................................... 82 3.6 Chemometric studies of protein vibrational spectroscopy .......................................... 82 3.7 Related Chemometrics Methods ................................................................................... 86 3.8 References.....................................................................................................................90 3 4 Data and Methods ............................................................................................................ 94 4.1 Datasets ......................................................................................................................... 94 4.2 Dataset representation ................................................................................................... 98 4.3 Data Processing ............................................................................................................. 99 4.3.1 Binning ....................................................................................................................... 99 4.3.2 Range Selection ....................................................................................................... 100 4.3.3 Scaling ..................................................................................................................... 100 4.4 Training the Models .................................................................................................... 101 4.4.1 SVM Models ............................................................................................................ 101 4.4.2 PLS Models .............................................................................................................. 103 4.4.3 Random Forest Models ............................................................................................ 105 4.5 References .................................................................................................................. 107 5. Support Vector Machine (SVM) Classification and Regression analyses of Raman and ROA spectra ...................................................................................................................... 108 SVM Analyses of Raman and ROA .................................................................................. 108 5.1 Data Pre-processing ................................................................................................... 109 5.1.1 Choosing the Bin Factor...........................................................................................109 5.2 Results and Discussion ................................................................................................ 111 5.2.1 SVM Classification Analyses Results of ROA and Raman spectra.........................111 5.3 SVM Regression Analyses Results of Raman and ROA spectra ................................ 111 5.3.1 SVM Regression Analyses of Raman and ROA ..................................................... 115 5.3.2 Discussion................................................................................................................ 116 5.4 References ..................................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    282 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us