Investigating Alternative Informatics Approaches for Protein Identification and Quantification Via SWATH-MS
Total Page:16
File Type:pdf, Size:1020Kb
Investigating alternative informatics approaches for protein identification and quantification via SWATH-MS A thesis submitted to the University of Manchester for the degree of Master of Philosophy in the Faculty of Biology, Medicine and Health 2020 Paul Brack School of Biological Sciences / Division of Evolution & Genomic Sciences Contents Index of figures and tables ...................................................................................................................... 4 Abstract ................................................................................................................................................... 6 Copyright statement................................................................................................................................... 7 Declaration .............................................................................................................................................. 7 Chapter 1: The path towards rapid, high resolution acquisition and analysis of the human proteome for biomarker discovery ................................................................................................................................... 8 Introduction ............................................................................................................................................ 8 Background: from genes to protein biomarkers ................................................................................... 11 Lessons from genomics ..................................................................................................................... 11 Moving forward from genomics: how does proteomics present a distinct challenge? ................... 14 Clinical applications for proteomics: ovarian cancer as an example ................................................ 16 Technologies: acquisition of the human proteome for biomarker discovery ...................................... 21 From gel electrophoresis to mass spectrometry .............................................................................. 21 A brief aside: smaller molecules are easier to identify ..................................................................... 22 The evolution of soft ionisation techniques for protein mass spectrometry ................................... 24 Tandem mass spectrometry and beyond: piecing the puzzle together ........................................... 27 SWATH-MS: a new frontier in proteome acquisition?...................................................................... 30 Analysing SWATH-MS data ............................................................................................................... 32 Summary ............................................................................................................................................... 35 Which library to use? Typical pitfalls .................................................................................................... 39 Sample specific libraries .................................................................................................................... 39 Reference species libraries ............................................................................................................... 39 Predicted libraries ............................................................................................................................. 41 A typical DIA library generation protocol for OpenSWATH .................................................................. 42 Determining “real” identifications ........................................................................................................ 44 FDR calculation.................................................................................................................................. 44 2 Quantitative fold change .................................................................................................................. 45 An entrapment approach using Pyrococcus Furiosus as a standard ................................................. 46 Size or relevance? ................................................................................................................................. 47 Materials and Methods ......................................................................................................................... 48 Samples ............................................................................................................................................. 48 Library modification .............................................................................................................................. 49 Sourcing and modifying externally generated libraries .................................................................... 49 Entrapment library creation ............................................................................................................. 50 Protein detection and quantification ................................................................................................ 53 Results ................................................................................................................................................... 54 Library composition .......................................................................................................................... 54 Number of proteins identified .......................................................................................................... 56 Effect of altering library size on protein identification ..................................................................... 58 Consistency of identification ............................................................................................................. 61 Quantitative variation ....................................................................................................................... 63 Entrapment results ........................................................................................................................... 76 TraMLMunger ................................................................................................................................... 77 Discussion.............................................................................................................................................. 79 Key findings ....................................................................................................................................... 79 Recommendations and suggested further research ........................................................................ 82 Applicability of findings ..................................................................................................................... 83 Final thoughts ................................................................................................................................... 84 Word3 count: 21,467 Index of figures and tables Figure 1: Cost Per Genome (National Human Genome Research Institute, 2016b) ............................ 13 Figure 2: Separation of Escherichia coli proteins (O'farrell, 1975) ....................................................... 21 Figure 4: An illustration of a SWATH window and the resulting SWATH map (Gillet et al., 2012)....... 31 Figure 5: In silico SWATH procession using OpenSWATH (Rost et al., 2014) ....................................... 33 Figure 6: Protein ID consensus from multi-centre benchmarking study (Navarro et al., 2016) .......... 34 Figure 7: A typical protocol to generate a SWATH-MS assay library (Schubert et al., 2015) ............... 43 Figure 8: Decoys and targets are presented as distribution curves above. (Elias and Gygi, 2010) ...... 45 Figure 9: Scatter plots comparing the amount of truncation of a spectral library and the number of proteins identified................................................................................................................................. 60 Figure 10: Venn diagrams of the identification overlap of three essentially randomly selected technical replicates per sample set. ..................................................................................................... 62 Figure 11: A scatter plot indicating the relationship between the coefficient of variation in identification and coefficient of variation in quantification. ................................................................ 65 Figure 12: Replicate:replicate scatter plots for two HYE124 replicates examined using the LFQH_100, LFQH_50 and LFQH_10 libraries. ........................................................................................................ 66 Figure 13: Replicate:replicate scatter plots for two K562 replicates examined using the PHL_100 library .................................................................................................................................................... 67 Figure 14: Box plots to demonstrate quantitative variance for HYE124:LFQH_010 and CS:PHL_100. 69 Figure 15: Box plots to demonstrate quantitative variance for HYE124:PHL-LFQH010, HYE124:PHL_010 and K562:PHL_010. .................................................................................................. 70 Figure 16: An unscaled venn diagram to indicate overlapping quantification between the PHL, PHL- LFQH and LFQH libraries across a single replicate of the HYE124 sample set. ..................................... 71 Figure