<<

A quantitative proteomics investigation of cold adaptation in the marine bacterium, Sphingopyxis alaskensis

Thesis submitted in partial fulfilment of the requirements for the Degree of Doctor of Philosophy (Ph.D.)

Lily L. J. Ting

School of Biotechnology and Biomolecular Sciences University of New South Wales January 2010

COPYRIGHT STATEMENT

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed ……………………………………………......

21st April, 2010 Date ……………………………………………......

AUTHENTICITY STATEMENT

‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

Signed ……………………………………………......

21st April, 2010 Date ……………………………………………......

Originality statement

I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.

Signed…………………………………………………

Dated………………………………………………….21st April, 2010

L. Ting, UNSW. i Abstract

The marine bacterium Sphingopyxis alaskensis was isolated as one of the most numerically abundant from cold (4–10°C) nutrient depleted waters in the North Pacific Ocean. The objective of this study was to examine cold adaptation of S. alaskensis by using proteomics to examine changes in global protein levels caused by growth at low (10ºC) and high (30ºC) . Stable isotope labelling-based quantitative proteomics was used, and a rigorous post-experimental data processing workflow adapted from microarray-based methods was developed. The approach included metabolic labelling with 14N/15N and normalisation and statistical testing of quantitative proteomics data. Approximately 400,000 tandem mass spectra were generated resulting in the confident identification of 2,135 proteins (66% genome coverage) and the quantitation of 1,172 proteins (37% genome coverage). Normalisation approaches were evaluated using cultures grown at 30ºC and labelled with 14N and 15N. For 10ºC vs. 30ºC experiments, protein quantities were normalised within each experiment using a multivariate lowess approach. Statistical significance was assessed by combining data from all experiments and applying a moderated t-test using the empirical Bayes method with the limma package in R. Proteins were ranked after calculating the B-statistic and the Storey-Tibshirani false discovery rate. 217 proteins (6% genome coverage) were determined to have significant quantitative differences. In achieving these outcomes a range of factors that impact on quantitative proteomics data quality were broadly assessed, resulting in the development of a robust approach that is generally applicable to quantitative proteomics of biological system. The significantly differentially abundant proteins from the proteomics data provided insight into molecular mechanisms of cold adaptation in S. alaskensis. Important aspects of cold adaptation included cell membrane restructuring, exopolysaccharide biosynthesis, degradation, carbohydrate and , and increased capacity of transcriptional and translational processes. A number of cold adaptive responses in S. alaskensis were novel, including a specific cold-active protein folding pathway, a possible thermally-controlled stringent response, and biosynthesis of intracellular polyhydroxyalkanoate reserve material. The overall study provided important new insight into the evolution of growth strategies necessary for the effective competition of S. alaskensis in cold, oligotrophic environments.

ii L. Ting, UNSW. Acknowledgements

Thank you to everyone who has helped me along the way. Special thanks go to my parents for their understanding, generosity and constant support. To Rick Cavicchioli and Mark Raftery, thank you for your guidance and supervision; this PhD would have been impossible without your support. Thanks also go to the past and present members of the RC lab; especially to Maz and Maine for listening and sharing, and to Haluk and Tim for your inputs of wisdom. Thanks to Linda from BMSF for your friendship, advice and proofreading! A big thanks to Mark Cowley for a fantastic collaboration. And, a special thank you to Prof. Bill O’Sullivan for your reading of this thesis and expert opinion. This work was supported by the Australian Postgraduate Award and the Australian Research Council. Mass spectrometric analysis for the work was performed at the Bioanalytical Mass Spectrometry Facility UNSW, and was supported in part by grants from the Australian Government Systemic Infrastructure Initiative and Major National Research Facilities Program (UNSW node of the Australian Proteome Analysis Facility) and by the UNSW Capital Grants Scheme. Finally, to Lachlan, thank you for your unending belief, patience, encouragement and confidence in me. I’m looking forwards to the future and the adventures we’ll have.

Lily Ting January 2010, UNSW

In memory of the late Prof. Michael Guilhaus

L. Ting, UNSW. iii Table of contents

Originality statement ...... i

Abstract ...... ii

Acknowledgements ...... iii

Table of contents ...... iv

List of figures ...... xiii

List of tables ...... xvi

Abbreviations ...... xviii

Publications ...... xx

Chapter 1. General introduction ...... 1

1.1. Sphingopyxis alaskensis ...... 1

1.2. Living in a cold environment ...... 2

1.2.1. The Arrhenius equation ...... 2 1.2.2. A definition of cold shock and cold adaptation ...... 2 1.2.3. Cellular responses in overcoming the challenges of the cold ...... 3 1.2.3.1. Membrane integrity and transport systems ...... 4 1.2.3.2. Nucleic acid replication, transcription and turnover ...... 5 1.2.3.3. Translation and protein folding ...... 6 1.2.3.4. Compatible solutes and other cryoprotectants ...... 8 1.2.3.5. Protein flexibility ...... 8 1.3. Mass spectrometry and proteomics ...... 9

1.3.1. Protein and peptide separation ...... 9 1.3.2. Tandem mass spectrometry and protein identification ...... 10 1.3.3. Quantitative proteomics ...... 11 1.3.3.1. 2DE-based quantitative proteomics ...... 13 1.3.3.2. Label-free quantitative proteomics ...... 14 1.3.3.3. Stable isotope labelling quantitative proteomics ...... 15

iv L. Ting, UNSW. 1.3.4. Post-experimental bioinformatics: Data processing, normalisation and statistical testing and validation ...... 21 1.3.4.1. Publicly available computational tools for quantitative proteomics ...... 21 1.3.4.2. Experimental design ...... 24 1.3.4.3. Normalisation: Learning from transcriptomics ...... 25 1.3.4.4. Significance testing: Learning from transcriptomics ...... 29 1.3.5. A definition of protein “regulation” vs. abundance ...... 32 1.4. Project aims ...... 32

Chapter 2. Method development for a metabolic labelling- based quantitative proteomics platform ...... 35

2.1. Summary ...... 35

2.2. Materials and methods ...... 35

2.2.1. Cell culture and harvest ...... 35 2.2.2. Metabolic labelling and determining 15N incorporation ...... 36 2.2.2.1. Theoretical calculation of 15N APE ...... 36 2.2.2.2. Experimental measurement of 15N APE ...... 36 2.2.3. Sample preparation optimisation ...... 38 2.2.3.1. Optimising sonication and evaluating Tris vs. urea buffers in protein extraction ...... 38 2.2.3.2. LC-MS/MS: Evaluating RapiGest, starting protein amount and dilution of peptides ...... 38 2.2.3.3. LC-MS/MS: Evaluating starting protein amounts, trypsin:protein ratio, inhibitor and chelating agent ...... 40 2.2.3.4. LC-MS/MS: Evaluation of Tris vs. urea, injection volume, and LC gradient length ...... 41 2.2.3.5. GeLC-MS/MS: A comparison to LC-MS/MS ...... 43 2.2.3.6. GeLC-MS/MS: Tris PE vs. urea PE ...... 44 2.2.4. An in silico analysis of amino acid composition and trypsin digestion in S. alaskensis ...... 45 2.2.5. Mass spectrometry optimisation: LTQ-FT Ultra ...... 45

L. Ting, UNSW. v 2.2.5.1. Optimisation of LC and MS parameters ...... 46 2.2.5.2. MS analysis ...... 47 2.2.6. Data processing optimisation ...... 47 2.2.6.1. 1% FDR for protein identification: Optimisation of DTA Select parameters ...... 48 2.2.6.2. Optimisation of RelEx quantitation parameters ...... 48 2.3. Results ...... 49

2.3.1. 15N APE ...... 49 2.3.1.1. Theoretical incorporation ...... 49 2.3.1.2. Experimental incorporation...... 49 2.3.2. Sample preparation optimisation ...... 59 2.3.2.1. Optimising sonication and evaluating Tris vs. urea buffers in protein extraction ...... 59 2.3.2.2. LC-MS/MS: Evaluating RapiGest, starting protein amount and dilution ...... 60 2.3.2.3. LC-MS/MS: Evaluating starting protein amounts, trypsin:protein ratio, protease inhibitor and chelating agent ...... 61 2.3.2.4. LC-MS/MS: Evaluation of Tris vs. urea, injection volume, and LC gradient length ...... 61 2.3.2.5. GeLC-MS/MS: A comparison to LC-MS/MS ...... 62 2.3.2.6. GeLC-MS/MS: Tris PE vs. urea PE...... 63 2.3.3. An in silico analysis of amino acid composition and trypsin digestion in S. alaskensis ...... 65 2.3.4. Mass Spectrometry optimisation: LCQ-FT Ultra ...... 68 2.3.4.1. LTQ-FT Ultra: Tryptic digest dilution in formic acid ...... 71 2.3.4.2. LTQ-FT Ultra: Sample injection volume...... 71 2.3.4.3. LTQ-FT Ultra: LC gradient length ...... 72 2.3.4.4. LTQ-FT Ultra: Precursor scanning ...... 73 2.3.4.5. LTQ-FT Ultra: Type and number of MSn scans ...... 75 2.3.5. Data processing optimisation ...... 76

vi L. Ting, UNSW. 2.3.5.1. 1% FDR for protein identification: Optimisation of DTA Select parameters ...... 78 2.3.5.2. Optimisation of RelEx quantitation parameters ...... 78 2.4. Discussion ...... 80

2.4.1. Metabolic labelling and 15N APE measurement ...... 80 2.4.2. Optimisation of protein extraction ...... 81 2.4.3. Assessment of LC-MS/MS for application in quantitative proteomics of S. alaskensis ...... 82 2.4.3.1. LC-MS/MS: Evaluating RapiGest, starting protein amount and dilution ...... 82 2.4.3.2. LC-MS/MS: Evaluating starting protein amounts, trypsin:protein ratio, protease inhibitor and chelating agent ...... 83 2.4.3.3. LC-MS/MS: Evaluation of Tris vs. urea, injection volume, and LC gradient length ...... 84 2.4.3.4. An in silico investigation of poor LC-MS/MS of S. alaskensis proteins ...... 85 2.4.3.5. Outcome of optimising LC-MS/MS for application in a quantitative proteomics analysis of S. alaskensis ...... 86 2.4.3.6. GeLC-MS/MS: A comparison to LC-MS/MS ...... 87 2.4.3.7. GeLC-MS/MS: Tris PE vs. urea PE ...... 88 2.4.3.8. Outcome of optimising GeLC-MS/MS for application in a quantitative proteomics analysis of S. alaskensis ...... 89 2.4.4. Mass spectrometry optimisation ...... 89 2.4.4.1. Comparison of parameters ...... 89 2.4.4.2. LTQ-FT Ultra: LC parameter optimisation...... 90 2.4.4.3. LTQ-FT Ultra: MS method optimisation ...... 91 2.4.4.4. Optimisation of RelEx quantitation parameters ...... 92 2.4.5. Final protocol for quantitative proteomics analysis of S. alaskensis proteins using a GeLC-MS/MS platform ...... 94 2.4.5.1. Cell culture and metabolic labelling ...... 94 2.4.5.2. Cell disruption ...... 94 2.4.5.3. GeLC-MS/MS ...... 94

L. Ting, UNSW. vii 2.4.5.4. Protein identification and quantitation ...... 95 2.4.6. Conclusion ...... 95 Chapter 3. Normalisation and statistical analysis of quantitative proteomics data generated by metabolic labelling ...... 97

3.1. Summary ...... 97

3.2. Materials and methods ...... 98

3.2.1. Microbial growth and physiological characterisation ...... 98 3.2.2. Metabolic labelling ...... 99 3.2.3. Quantitative proteomics using GeLC-MS/MS ...... 102 3.2.4. Data processing ...... 103 3.2.5. Normalisation ...... 103 3.2.6. Linear modelling ...... 104 3.2.7. Statistical analysis of differential abundance ...... 105 3.3. Results ...... 106

3.3.1. Normalisation of an artificial dataset ...... 106 3.3.2. A naturally skewed biological dataset ...... 112 3.3.3. Evaluating normalisation for effectively removing skew while preserving differential protein abundance ...... 112 3.3.4. Linear modelling as a framework for statistical analysis ...... 114 3.3.5. Evaluating the utility of linear modelling using an unmoderated Student’s t- test ...... 114 3.3.6. The empirical Bayes moderated t-test ...... 115 3.3.7. Comparison of the moderated t-test to an unmoderated t-test ...... 116 3.3.8. Comparison of correction approaches for multiple hypothesis testing ...... 116 3.3.9. Fold change approach ...... 118 3.4. Discussion ...... 118

3.4.1. A naturally skewed biological dataset ...... 118

viii L. Ting, UNSW. 3.4.2. Normalisation ...... 119 3.4.3. Linear modelling as a framework for statistical analysis ...... 120 3.4.4. Significance testing ...... 121 3.4.4.1. Fold change ...... 121 3.4.4.2. Comparing the empirical Bayes moderated t-test to an unmoderated t- test ...... 122 3.4.4.3. False discovery rates and correcting for multiple testing ...... 122 3.4.5. Statistical vs. biological significance ...... 123 3.5. Conclusion ...... 126

Chapter 4. Unravelling the molecular mechanisms of cold adaptation in Sphingopyxis alaskensis ...... 127

4.1. Summary ...... 127

4.2. Materials and Methods ...... 128

4.2.1. Manual functional annotation ...... 128 4.3. Results ...... 129

4.3.1. Protein identification and quantitation ...... 129 4.4. Discussion ...... 143

4.4.1. Lipid transport and metabolism and the cell membrane ...... 143 4.4.1.1. , possible energy generation and storage materials.. 144 4.4.1.2. Lipid metabolism and cold adaptation of the cell membrane ...... 145 4.4.2. Carbohydrate metabolism and energy generation ...... 147 4.4.2.1. Glycolysis using the Entner-Doudoroff pathway ...... 147 4.4.2.2. Energy generation via the TCA cycle and the electron transport chain ...... 148 4.4.3. Synthesis of storage materials ...... 151 4.4.3.1. Carbohydrate storage: Synthesis of EPS ...... 151 4.4.3.2. PHA: a carbon and energy source ...... 152 4.4.3.3. PHA and cold adaptation in S. alaskensis ...... 155 4.4.4. Amino acid metabolism ...... 156

L. Ting, UNSW. ix 4.4.4.1. Nitrogen assimilation and the glutamate and pathways ...... 156 4.4.4.2. Amino acid metabolism at 10ºC ...... 158 4.4.4.3. Amino acid metabolism at 30ºC ...... 159 4.4.4.4. An incomplete catabolic pathway and a connection to - oxidation and energy generation at 10ºC ...... 161 4.4.4.5. A lack of variation in biosynthesis ...... 163 4.4.4.6. Secreted amino acid scavenging ...... 163 4.4.5. Transcription in the cold ...... 164 4.4.6. Translation, ribosomal structure and biogenesis ...... 166 4.4.6.1. Increased RNA degradosome enzymes in the cold ...... 166 4.4.6.2. Compensating for reduced translation efficiency in the cold ...... 166 4.4.7. Protein folding in the cold ...... 168 4.4.7.1. Peptidyl-prolyl cis-trans ...... 168 4.4.7.2. A cold-induced and constitutive GroESL protein folding system ...... 168 4.4.7.3. Cold-induced and heat-induced DnaK-DnaJ-GrpE protein folding .... 171 4.4.7.4. Clp ATPases ...... 173 4.4.8. Inorganic ion and coenzyme transport ...... 175 4.4.8.1. Phosphate import...... 175 4.4.8.2. Cobalamim uptake ...... 176 4.4.8.3. Copper resistance ...... 176 4.4.8.4. Iron import and homeostasis ...... 177 4.4.8.5. Is there TonB-dependent specificity beyond iron and cobalamin? ...... 177 4.4.9. Effect of cold growth temperature on the cell envelope...... 181 4.4.10. Detoxification: Posttranslational modifications and defense ...... 181 4.4.10.1. Detoxification by posttranslational events ...... 181 4.4.10.2. Detoxification as a defense mechanism ...... 182 4.4.11. biosynthesis and a possible thermally-controlled stringent response ...... 183 4.4.12. Thermally important proteins with general function prediction only ...... 184 4.4.13. Unchanged proteins and the Sec-dependent secretion pathway ...... 185 4.4.14. The fast and the furious: Growth at 30ºC is fast and stressful ...... 186 x L. Ting, UNSW. 4.4.14.1. The fast ...... 186 4.4.14.2. The furious: Stress indicators at high temperature growth ...... 186 4.4.15. Conclusion ...... 187

Chapter 5. General Discussion ...... 191

5.1. Summary ...... 191

5.2. Method development for a quantitative proteomics analysis of S. alaskensis ...... 191

5.2.1. Developing a GeLC-MS/MS analytical platform for S. alaskensis ...... 191 5.2.2. Improving data processing and analysis of quantitative proteomics data: Normalisation and statistics ...... 192 5.2.3. Maintaining high data quality ...... 194 5.3. A continuing story of S. alaskensis ...... 194

5.3.1. S. alaskensis and unravelling the mechanisms of cold adaptation ...... 195 5.3.1.1. Expected thermal adaptive responses ...... 195 5.3.1.2. Novel cold adaptive responses ...... 196 5.4. Significance of this research ...... 199

5.4.1. Contribution to biotechnology and biomedical applications ...... 199 5.4.2. Ecological significance ...... 201 5.4.3. Searching for extra-terrestrial life: An insight into astrobiology ...... 202 5.5. Conclusion ...... 203

6. References ...... 205

Appendix A ...... 227

Appendix B ...... 262

Appendix C ...... 287

Appendix D ...... 288

Appendix E ...... 289

Appendix F ...... 290

L. Ting, UNSW. xi Appendix G ...... 292

Appendix H ...... 296

Appendix I...... 297

Appendix J ...... 298

xii L. Ting, UNSW. List of figures

Figure 1.1. Quantitative proteomics approaches ...... 12 Figure 1.2. In vivo and in vitro stable isotope labelling approaches...... 16 Figure 2.1. Workflow for LC-MS/MS: Evaluation of Rapigest, starting protein amount and sample dilution ...... 39 Figure 2.2. Workflow for LC-MS/MS: Evaluation of starting amount of protein for analysis, trypsin:protein ratio and a protein inhibitor with chelating agent ...... 41 Figure 2.3. The 14N dilution effect of exponentially doubling growth ...... 50 Figure 2.4. Experimental and theoretical isotopic profiles of Sala_1830 VLAENVAGNAAVDFANIDKAPEER from experiment 1...... 51 Figure 2.5. Experimental and theoretical isotopic profiles of Sala_1830 VLAENVAGNAAVDFANIDKAPEER from experiment 2...... 52 Figure 2.6. Experimental and theoretical isotopic profiles of Sala_1830 VGEEVEIVGIKDTK from experiment 3 ...... 53 Figure 2.7.Experimental and theoretical isotopic profiles of Sala_1830 VGEEVEIVGIKDTK from experiment 1 ...... 54 Figure 2.8. Experimental and theoretical isotopic profiles of Sala_1830 TGETMTIAASNQPK from experiment 2...... 55 Figure 2.9. Experimental and theoretical isotopic profiles of Sala_1830 TGETMTIAASNQPK from experiment 1...... 56 Figure 2.10. Experimental and theoretical isotopic profiles of Sala_1830 VTIDKDNTTIVDGAGDAEAIK from experiment 2 ...... 57 Figure 2.11. Experimental and theoretical isotopic profiles of Sala_1830 VTIDKDNTTIVDGAGDAEAIK from experiment 1 ...... 58 Figure 2.12. Number of proteins identified in a urea vs. Tris buffer using an LC- MS/MS platform ...... 62 Figure 2.13. SDS-PAGE of ~35g of protein from samples A and B ...... 63 Figure 2.14. SDS-PAGE of 1mg of protein from Tris PE vs. urea PE extractionsamples ...... 64 Figure 2.15. Number of proteins identified in a Tris PE vs. urea PE buffer using a GeLC-MS/MS platform...... 65

L. Ting, UNSW. xiii Figure 2.16. Number of peptides with increasing number of residues resulting from an in silico trypsin digestion ...... 67 Figure 2.17. Gel slice excision for GeLC-MS/MS parameter optimisation using an LTQ-FT Ultra ...... 68 Figure 2.18. Number of confident protein identifications in optimising tryptic digest dilution in formic acid for LTQ-FT analysis ...... 71 Figure 2.19. Number of confident protein identifications in optimising sample injection volume for LTQ-FT analysis ...... 72 Figure 2.20. Number of confident peptide and protein identifications in optimising LC gradient length for LTQ-FT analysis ...... 73 Figure 2.21. Number of Mascot queries in optimising LC gradient length for LTQ-FT analysis ...... 73 Figure 2.22. Number of Mascot queries generated in ion trap vs. FTICR survey scans in LTQ-FT optimisation ...... 74 Figure 2.23. Number of protein identifications in optimising precursor scan parameters for LTQ-FT analysis ...... 75 Figure 2.24. Number of confidently identified peptides in optimising number and types of MSn scans in LTQ-FT analysis ...... 76 Figure 2.25. Number of Mascot queries generated by varying number and types of MSn scans in LTQ-FT optimisation ...... 76 Figure 2.26. SDS-PAGE of 14N 10ºC combined with 15N 30ºC proteins for DTA Select and RelEx parameter optimisation ...... 77 Figure 3.1. Inverse metabolic labelling workflow to produce an artificially skewed data set ...... 100 Figure 3.2. Inverse metabolic labelling workflow of experiments A-D ...... 101 Figure 3.3. Inverse metabolic labelling workflow of experiments E-F for 10ºC vs. 30ºC experiments ...... 102 Figure 3.4. The frequency of detection of new proteins ...... 106 Figure 3.5. Normalised and unnormalised density distribution of artificially skewed data ...... 108 Figure 3.6. Q-Q plots of the artificially skewed dataset ...... 109 Figure 3.7. MA plots of peptides and proteins pre- and post-normalisation ...... 110

xiv L. Ting, UNSW. Figure 3.8. Box and whisker plots of pre- and post-intra-experiment normalisation of artificially skewed data ...... 111 Figure 3.9. Scanning electron microscope images of S. alaskensis ...... 112 Figure 3.10. Histograms of raw unadjusted p-values arising from the testing of each protein for differential abundance in 10ºC vs. 30ºC experiments ...... 117 Figure 3.11. Number of differentially abundant proteins passing q-valuethresholds .. 123 Figure 3.12. Assessing statistical vs. biological relevance ...... 125 Figure 4.1. Significantly differentially abundant proteins sorted by COGcategories .. 131 Figure 4.2. Cellular processes important for cold adaptation in S. alaskensis ...... 143 Figure 4.3. Lipid metabolism energy generation and PHA storage in S. alaskensis ... 146 Figure 4.4. Glycolysis in S. alaskensis ...... 148 Figure 4.5. Carbon and nitrogen metabolism in S. alaskensis ...... 149 Figure 4.6. Shared enzymes in branched chain amino acid degradation and -oxidation ...... 162 Figure 4.7. S. alaskensis transcriptional machinery and three transcriptional factors are increased at 10ºC ...... 165 Figure 4.8. Alignment of two S. alaskensis GroEL proteins ...... 169 Figure 4.9. Alignment of the two S. alaskensis GroES proteins...... 170 Figure 4.10. A complete cold-active protein folding cycle in S. alaskensis ...... 175

L. Ting, UNSW. xv List of tables

Table 2.1. Combination of Tris vs. urea, injection volume and LC gradient parameters for LC-MS/MS analysis ...... 43 Table 2.2. Experimental 15N APE...... 50 Table 2.3. Evaluating sonication times with a Tris or urea buffer...... 59 Table 2.4. Evaluation of LC-MS/MS with and without RapiGest...... 60 Table 2.5. Optimising starting protein amounts and trypsin for LC-MS/MS...... 61 Table 2.6. LC-MS/MS analysis of proteins extracted by urea vs. Tris buffers ...... 62 Table 2.7. Number of confident protein identifications from GeLC-MS/MS of A and B...... 63 Table 2.8. Tris PE vs. urea PE GeLC-MS/MS results...... 65 Table 2.9. Amino acid composition of S. alaskensis, E. coli, S. cerevisiae, P. angustum and NCBI...... 66 Table 2.10. LTQ parameter optimsation...... 69 Table 2.11. Evaluation of FDR using different XCorr cut-off thresholds in DTA Select...... 78 Table 2.12. Optimisation of RelEx parameters...... 79 Table 3.1. Linear modelling design matrix...... 105 Table 3.2. Protein quantitation for artificially skewed data...... 107 Table 3.3. Comparing post-normalisation protein quantities for experiments that were combined by optical density (A-B) or protein (E-F) ...... 113 Table 3.4. Comparing methods of significance testing with the 10ºC vs. 30ºC dataset...... 115 Table 3.5. Fold Change approach outcomes...... 118 Table 4.1. Evidence Rating system used for functional annotation of S. alaskensis proteins ...... 129 Table 4.2. MS results and number of proteins from identification, quantitation and statistical testing...... 130 Table 4.3. Proteins with significant differential abundance at 10ºC vs. 30ºC sorted by COG categories...... 132 Table 4.4. Proteins identified in all 20 MS experiments sorted into COG categories. 140

xvi L. Ting, UNSW. Table 4.5. Proteins associated with polyhydroxyalkanoate storage material in S. alaskensis...... 154 Table 4.6. Sequence similarities of the three S. alaskensis GroEL proteins...... 170 Table 4.7. Sequence similarity of the two S. alaskensis GroES proteins...... 170 Table 4.8. The Clp ATPases in S. alaskensis ...... 173 Table 4.9. Predicted carbohydrate uptake TonB-dependent receptors in S. alaskensis compared to Xanthomonas campestris spp. campestris...... 180

L. Ting, UNSW. xvii Abbreviations

ºC Degrees Celsius 1D One dimensional 2D Two dimensional 2DE Two dimensional electrophoresis 2D-DIGE Two dimensional difference gel electrophoresis 3D Three dimensional ADH Alanine deyhdrogenase CH3CN Acetonitrile CID Collision induced dissociation Da Dalton df Degrees of freedom DTT 1,4-Dithiothreitol EC commission number ED Entner-Doudoroff glycolytic pathway EDTA Ethylenediaminetetraacetic acid EPS Exopolysaccharide ESI Electrospray ionisation FC Fold change FDR False discovery rate FTICR Fourier transform ion cyclotron resonance FTMS Fourier transform ion cyclotron resonance mass spectrometer g Gram GDH Glutamate dehydrogenase GeLC-MS 1D-SDS-PAGE coupled with RPLC coupled to mass spectrometry GeLC-MS/MS 1D-SDS-PAGE coupled with RPLC coupled to two dimensions of mass spectrometry GOGAT Glutamate synthetase GS Glutamine synthetase HCOOH Formic acid HFBA Heptafluorobutyric acid HPLC High performance liquid chromatography HUPO Human proteome organisation ICAT Isotope-coded affinity tags IDA Iodoacetamide IF1 Translation initiation factor 1 IF2 Translation initiation factor 2 IMG Integrated microbial genomes IT Ion trap iTRAQ Isobaric tag for relative and absolute quantitation K Lysine kDa Kilodalton LC Liquid chromatography LC/LC Tandem liquid chromatography LC/LC-MS/MS Tandem liquid chromatography tandem mass spectrometry LC-MS Liquid chromatography mass spectrometry LC-MS/MS Liquid chromatography tandem mass spectrometry g Microgram L Microlitre M Micromole mg Milligram mL Millilitre mM Millimole MALDI Matrix-assisted laser desorption ionization MS Mass spectrometry MS/MS Tandem mass spectrometry

xviii L. Ting, UNSW. MS1 First dimension of mass spectrometry MS2 Second dimension of mass spectrometry MSn nth dimension of mass spectrometry m/z Mass to charge ratio OB Oligomer binding OD Optical density

OD433 Optical density  = 433nm PHA Polyhydroxyalkanoate pI Isoelectric point Pi Inorganic phosphate PMSF Phenylmethanesulphonyl fluoride PRPP Phosphoribosyl pyrophosphate pY Ribosome associated stress response Protein Y QTOF Quadrupole time of flight R RNAP DNA-directed RNA polymerase RP Reverse phase RPLC Reverse phase liquid chromatography SAM Significance analysis of microarrays SCX Strong cation exchange SDS Sodium dodecyl sulfate SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis SILAC Stable isotope labelling with amino acids in cell culture SN Signal to noise ratio SRM Single-reaction monitoring TCA Tricarboxylic acid cycle or the cycle TOF Time of flight Topt Temperature at which maximal growth rate occurs Tris PE 10mM Tris-HCl, 1mM PMSF, 1mM EDTA TPP Trans-proteomic pipeline Urea PE 8M urea, 1mM PMSF, 1mM EDTA v/v Volume per volume w/v per volume XIC Extracted ion chromatogram

L. Ting, UNSW. xix Publications

Papers Integration of genomics and proteomics into marine microbial ecology. Thomas, T., Egan, S., Burg, D., Ng, C., Ting, L., and Cavicchioli, R. 2006. Marine Ecology Progress Series, 333:247-248. (Chapter 1)

Normalization and statistical validation of quantitative proteomics data generated by 14N/15N metabolic labelling of Sphingopyxis alaskensis. Ting, L., Cowley, M.J., Guilhaus, M., Raftery, M.J., Cavicchioli, R. Molecular and Cellular Proteomics, 8:2227-2242. (Chapter 3)

Unravelling the molecular mechanisms of cold adaptation in Sphingopyxis alaskensis. Ting, L., Raftery, M., Guilhaus, M., Cavicchioli, R. Environmental Microbiology, Accepted. EMI-2009-1238.R1. (Chapter 4)

Carbon and nitrogen substrate utilization in Sphingopyxis alaskensis. Williams, T.J., Ertan, H., Ting, L., Cavicchioli, R. International Society for Microbial Ecology Journal, 3:1036-1052. (Chapter 4)

The genomic basis of trophic strategy in marine bacteria. Lauro, F.M., McDougald, D., Thomas, T., Egan, S., Rice, S., DeMaere, M.Z., Ting, L., Williams, T., Ertan, H., Johnson, J., Gerriera, S., Lapidus, A., Anderson, I., Kyrpides, N., Munk, A.C., Detter, C., Han, C.S., Brown, M.V., Robb, F.T., Kjelleberg, S., Cavicchioli, R. PNAS, 106:15527-15533.

The genome sequence of the psychrophilic archaeon, Methanococcoides burtonii: the role of genome evolution in cold adaptation. Allen, M.A., Thomas, T., Burg, D., Williams, T., Siddiqui, K.S., De Francisci, Chong, K.W.Y., Pilak, O., Chew, H.H., De Maere, M.Z., Lauro, F.M., Ting, L., Katrib, M., Ng, C., Sowers, K.R., Anderson, I.J., Ivanova, N., Dalin, E., Martinez, M., Lapidus, A., Hauser, L., Land, M., Cavicchioli, R. 2009. International Society for Microbial Ecology Journal, 3:1012-1035.

Global proteomics strategies for extreomophiles. Ting, L., Burg, D., Ng, C., Cavicchioli, R. Manuscript in preparation.

Conference abstracts 56th American Society for Mass Spectrometry Conference, Denver CO, USA, June 2008. Poster presentation: A quantitative proteomics investigation of cold adaptation in the marine bacterium, Sphingopyxis alaskensis.

3rd International Polar and Alpine Microbiology Conference, Banff ALB, Canada, May 2008. Oral presentation: A quantitative proteomics investigation of the cold adaptation of the oligotrophic marine ultramicrobacterium, Sphingopyxis alaskensis.

13th Annual Lorne Proteomics Symposium, Lorne VIC, Australia, 2008. Poster presentation: A quantitative proteomics investigation of the cold adaptation of the marine bacterium, Sphingopyxis alaskensis.

Thompson Prize oral competition finalist, Macquarie University, NSW, Australia, 2007. Oral presentation: A quantitative proteomics investigation of the cold adaptation of the marine bacterium, Sphingopyxis alaskensis.

Australian National Science Graduate Conference, Manly NSW, Australia 2006. Oral presentation: A proteomic investigation of the cold adaptation of the marine ultramicrobacterium, Sphingopyxis alaskensis.

L. Ting, UNSW. xx General introduction

Chapter 1. General introduction

A major biosphere of the Earth is a cold marine environment, where ~90% of the oceans have a temperature of 5°C or less (Russell, 1990). In addition, the ocean is colonised by between 60-90% of all of Earth’s microorganisms (Whitman et al., 1998); so that it is a significant source of cold adapted microorganisms. Investigating the mechanisms of cold adaptation is valuable in expanding understanding of different types of microbial physiology, which has broader implications in a variety of other areas such as biotechnology, ecology and astrobiology.

1.1. Sphingopyxis alaskensis

Sphingopyxis alaskensis is a cold adapted ultramicrobacterium (UMB) isolated from Resurrection Bay in Alaska, the North Sea in Europe, and the Japanese North Pacific (Schut, 1994; Eguchi et al., 1996; Schut et al., 1997; Eguchi et al., 2001). The in situ ocean temperature was determined to range from 4-10 C for the Alaskan strain (RB2256) (Schut et al., 1993). S. alaskensis RB2256 was isolated by extinction dilution from Resurrection Bay, where seawater samples were diluted in sterilised natural seawater without any nutritional supplementation, until only a few organisms remained in each dilution tube (Schut et al., 1993). These cells were unable to be cultured on complex media until they had been stored for 6-12 months at 5ºC (Schut et al., 1993). Prolonged storage or incubation of the cells at low temperature initiated an unknown mechanism to switch the cells from a viable but unculturable state to a culturable state (Schut et al., 1993; Cavicchioli et al., 2003).The genome of S. alaskensis was sequenced in 2004 by the Department of Energy Joint Genome Institute. S. alaskensis was identified as a numerically abundant species of bacteria, where cells proliferate in situ to ~6 × 105 cells/mL (Schut et al., 1993).It possesses a range of noteworthy features responsible for the observed numerical dominance (Schut et al., 1993; Eguchi et al., 2001). These features include its adaptation to heterotrophic growth under nutrient depleted (oligotrophic) conditions (Schut et al., 1997; Lauro et al., 2009; Williams et al., 2009); its ultramicro-size (< 0.1m3) and constant cell size regardless of growth phase (Schut et al., 1993; Eguchi et al., 1996; Eguchi et al., 2001). There is only a single rRNA operon in the genome, resulting in a relatively low ribosome content in the cell, although when cell volume was taken into account, ribosomal concentration

L. Ting, UNSW. 1 Chapter 1

was 2-fold that of ; and protein synthesis is, at times, uncoupled from ribosome synthesis (Fegatella et al., 1998). The uptake of substrates is achieved by using a high-affinity, low-specificity uptake system, which allows for efficient nutrient scavenging (Schut et al., 1995). The starvation response does not induce cross- protection against hydrogen peroxide, ethanol, heat, or UV-B (Eguchi et al., 1996; Joux et al., 1999; Ostrowski et al., 2001); S. alaskensis is inherently resistant to these stresses. The cold adaptation physiology of S. alaskensis, however, was not a focus in previous studies, so the cold adaptive strategies of this organism have not been characterised.

1.2. Living in a cold environment

The ability of bacteria, eukarya or archaea, to thrive at low temperatures require many adaptations to maintain metabolic rates (Feller & Gerday, 2003). Low temperature growth presents many challenges including decreased enzyme activity (that is described by the Arrhenius equation), the increased viscosity of liquid water, reduced fluidity of lipid membranes and changes in protein conformation.

1.2.1. The Arrhenius equation

The rate of all reactions is described by the Arrhenius equation (reviewed in Siddiqui & Cavicchioli, 2006). The Arrhenius equation dictates that any decrease in temperature results in the exponential decrease of reaction rate. Generally, for every 10ºC reduction in temperature, most biological systems have a reaction rate 2-3 times lower (Georlette et al., 2004; Margesin et al., 2007). This has led to cold adapted organisms developing a range of adaptive strategies in order to overcome the detrimental effects of the strong inhibition of reaction rates by low temperature (Section 1.2.3).

1.2.2. A definition of cold shock and cold adaptation

It is important to define the difference between cold shock and cold adaptation. Cold shock is a form of cold adaptation; however, the former involves a sudden and transient decrease in temperature, while the latter involves growth in permanently cold conditions. Different cellular responses are launched in response to cold shock or cold adaptation, and these responses have different roles and outcomes in maintaining cellular integrity in order to survive and grow. Also, the growth strategy and in situ environmental context of an organism is important in its response to the cold. For

2 L. Ting, UNSW. General introduction

example, stenopsychrophiles (formerly true psychrophiles) have a narrow tolerance to temperature fluctations, and can only grow in cold conditions. Eurypsychrophiles (formerly psychrotrophs or psychrotolerant) prefer growth at cold temperatures, but can tolerate a wide range of temperatures that extend into the mesophilic range (Cavicchioli, 2006). In addition, it is important to take into account the thermal growth strategy of an organism (i.e. psychrophile, mesophile, or thermophile) in order to gain valuable insight into its cold adaptation strategy. Is it possible that the response of a mesophile exposed to cold shock is similar to that of a thermophile or psychrophile? Do all organisms show defining characteristics of cold adaptation regardless of their thermal growth strategy? These questions can only be answered by purposeful investigations into the cold shock and cold adaptation of a variety of organisms adapted to different temperature ranges. Many studies have explored the cold shock and adaptation of cold adapted bacteria (Potier et al., 1990; Araki, 1991; Roberts & Inniss, 1992; Whyte & Inniss, 1992; Berger et al., 1996; Michel et al., 1997; Hebraud & Potier, 1999; Medigue et al., 2005; Methe et al., 2005; Kawamoto et al., 2007), thermophilic bacteria (Wouters et al., 1999), mesophilic bacteria (Jones et al., 1987; Graumann & Marahiel, 1999; Horton et al., 2000; Beckering et al., 2002; Gualerzi et al., 2003; Phadtare & Inouye, 2004), cold adapted archaea (Cavicchioli et al., 2000; Goodchild et al., 2004; Goodchild et al., 2005; Giaquinto et al., 2007), thermophilic archaea (Boonyaratanakornkit et al., 2005; Weinberg et al., 2005), cold adapted eukarya (Thatje et al., 2005), and mesophilic eukarya (Drobnis et al., 1993; Yu et al., 2002; Grabelnych et al., 2004; Weber & Bosworth, 2005). Although there are inherent differences between cold shock and cold adaptation, there are many dominating characteristics that can be attributed to growth in the cold (Section 1.2.3).

1.2.3. Cellular responses in overcoming the challenges of the cold

Some well-documented physiological responses to cold temperatures include changes to membrane integrity and transport systems, nucleic acid replication, transcription and transcript turnover, translation and protein folding, compatible solutes and cryoprotectants, and protein flexibility.Although many physiological adaptations to the cold have been documented, there is still much that remains unknown.

L. Ting, UNSW. 3 Chapter 1

1.2.3.1. Membrane integrity and transport systems The cell membrane structure of a microorganism must be able to function effectively for it to persist in its environment. The cell membrane is composed of a lipid bilayer with associated proteins that are important in active and passive transport (Russell, 1990; Russell, 1997). Membranes have two distinct states; liquid crystalline that allows for proper membrane function, and a gel phase where the lipid bilayer becomes rigid and impairs function due to a decrease in temperature (Russell, 1997; Russell & Nichols, 1999). Generally, a reduction in temperature is accompanied by structural or compositional changes in the membrane, including an increase in desaturation, a decrease in average chain length and an increase in methyl branching (Russell, 1997; Russell & Nichols, 1999; Weber et al., 2001; Nichols et al., 2004). These changes act to preserve membrane integrity, maintaining an optimal degree of fluidity by the reduction of hydrophobic interactions between lipid chains (Russell, 1997). Associated with the membrane are exopolysaccharides (EPS); high molecular weight carbohydrates commonly produced by marine bacteria and other organisms (Decho, 2000). In marine bacteria, EPS can either be closely associated with the cell (capsular) or in soluble form in the environment (slime) (Mancuso Nichols et al., 2005a). EPS have been implicated in a range of functions such as adhesion, biofilm formation, and the trapping of extracellular enzymes or substrates. Most significantly, EPS appear to have a role in cryoprotection having been found in brine channels of sea ice (Decho, 2000; Krembs et al., 2002), and identified in many cold adapted marine bacteria such as Pseudoalteromonas antarctica(Nevot et al., 2006), Photobacterium profundum (Lauro et al., 2008), Pseudoalteromonas haloplanktis (Corsaro et al., 2004), and a range of Antarctic marine Gammaproteobacteria (Mancuso Nichols et al., 2005b). Transport systems for substrate uptake are affected by membrane integrity. Substrate uptake is lowered with decreasing temperature and is temperature-limited due to the gradual loss of membrane function, regardless of variations in membrane structure to adapt for different ranges of temperature (Nedwell, 1999). The decreased affinity for specific substrates is a gradual process rather than a sudden decrease near the minimum temperature at which growth occurs (Nedwell, 1999). A common strategy to overcome lowered affinity is to increase the number and types of transporters in the cell membrane. This strategy has been documented in cold adapted microorganisms

4 L. Ting, UNSW. General introduction

such as the proteomic identification of a number of cell membrane transporters at low temperature growth in P.antarctica (Nevot et al., 2006) and Methanococcoides burtonii (Goodchild et al., 2004); the detection of a differential low temperature increase in a range of transporters using transposon mutagenesis in P. Profundum (Lauro et al., 2008); and the detection of a differential increase of broad specificity transporters at low temperature in Shewanella livingstonensis, using quantitative proteomics(Kawamoto et al., 2007). 1.2.3.2. Nucleic acid replication, transcription and turnover At low temperature, DNA becomes more negatively supercoiled resulting in the retardation of unwinding and of RNA polymerase access (Mizushima et al., 1997). The cold induction of DNA-modulating proteins may assist in maintaining correct or functional DNA topology. Gyrase A, an ATP-dependent nucleoid-associated protein relaxes DNA supercoils, has been extensively studued in E. coli (Jones et al., 1992), Bacillus subtilise (Grau et al., 1994; Graumann & Marahiel, 1999), and a hyperthermophillic archaeon Sulfolobus (Lopez-Garcia & Forterre, 1997; Lopez-Garcia & Forterre, 1999). Other DNA-modulating proteins such as, HU- (Giangrossi et al., 2002) and H-NS (Dersch et al., 1994) have been demonstrated to have a role in the maintenance of DNA topology in cold adaptation. The cold shock domain (CSD) contains proteins that interact with nucleic acids. A protein in the CSD superfamily is ~70 amino acids long, and possesses the nucleic acid binding motifs RNP1 and RNP2. This superfamily is broken down into 5 subgroups; the bacterial cold shock protein (CSP) family, eukaryotic Y-box proteins, the plant -rich , LIN-28 proteins from the Caenorhabditis nematode family, and the mammalian Unr protein. An example of a CSD superfamily protein involved in the turnover, stabilisation and degradation of mRNA at cold temperatures is the DEAD-box RNA helicase. In E. coli, the CsdA DEAD-box RNA helicase unwinds double-stranded RNA that is induced upon cold shock (Jones et al., 1996). Once cells have acclimatised to low temperature, CsdA may also be involved in the selective degradation of cold shock transcripts (Zangrossi et al., 2000; Yamanaka & Inouye, 2001) as part of its role in the RNA degradosome (Py et al., 1996; Prud'homme-Genereux et al., 2004). Also, CsdA has also been shown to have a role in 50S ribosomal assembly in E. coli (Charollais et al., 2003). The two B. subtilis DEAD-box RNA helicases, CshA and CshB, were found

L. Ting, UNSW. 5 Chapter 1

to co-localise with cold shock protein B and induced upon cold shock. Similarly, the respective DEAD-box RNA helicases in the cyanobacterium Anabaena (Chamot et al., 1999; Chamot & Owttrim, 2000), and Saccharomyces cerevisiae (Schade et al., 2004) were found to be induced upon cold shock. In the cold adapted M. burtonii, the mRNA of a putative DEAD-box helicase was only expressed during cold growth(Lim et al., 2000). Transcription is adversely affected by cold temperatures, and many transcriptional factors have been identified to increase in abundance in response to low temperature. They include the NusA factor which has a role in termination and anti- termination of transcripts in the mesophilic E. coli (Jones et al., 1987),and B. subtilis (Yakhnin & Babitzke, 2002), and the cold adapted Shewanella oneidensis (Gao et al., 2006). In plants, a C-repeat/dehydration-responsive element binding transcription factor is induced by cold stress in Arabidopsis thaliana (Qin et al., 2004), canola, wheat, rye and tomato (Jaglo et al., 2001) and rice (Dubouzet et al., 2003). This transcription factor not only controls the expression of stress-inducible genes, but also regulates gene expression and signal transduction (Xiong et al., 2002; Shinozaki et al., 2003).

1.2.3.3. Translation and protein folding The rate of translation is determined by the number of working ribosomes, the rate at which they are working and the rate of protein degradation (Farewell & Neidhardt, 1998). In general, it is highly susceptible to the adverse affect of low temperature, where ribosome assembly is key to rescuing stalled or slowed translation (Guthrie et al., 1969). A number of ribosome-associated proteins are induced by cold temperature including the cold shock protein A (CspA) family of proteins, or CSPs, that can represent up to 106 molecules per cell (Thieringer et al., 1998). The first CSP identified was the E. coli CspA, it was induced by cold shock and acts as an RNA chaperone to block the formation of unwanted secondary structures caused by low temperature, and to allow translation to occur without obstruction(Jones et al., 1987). Subsequently, CSPs have been identified in more than 50 species of bacteria (Graumann & Marahiel, 1998) including B. subtilis, which has three CSPs that are cold inducible (Budde et al., 2006; Hunger et al., 2006); and the induction of CspA homologues upon cold shock in S. oneidensis (Gao et al., 2006); Salmonella typhimurium (Horton et al., 2000); Listeria monocytogenes (Bayles et al., 1996); and Streptococcus thermophilus (Wouters et al.,

6 L. Ting, UNSW. General introduction

1999). CSP homologues have also been identified in eukarya including yeast (Julseth & Inniss, 1990), slime moulds (Maniak & Nellen, 1988), plants (Nakaminami et al., 2006)and animals (Tiku et al., 1996). Most significantly, using quantitative proteomics, CspA homologues or CSPs were found to be differentially abundant during consistent low temperature growth in cold adapted organisms such as fragi (Hebraud et al., 1994; Michel et al., 1997; Hebraud & Potier, 1999), Arthrobacter globiformis (Berger et al., 1996), Exiguobacterium sibiricum (Qiu et al., 2006), Colwellia sp. NJ341 (Wang et al., 2006), and Shewanella livingstonensis (Kawamoto et al., 2007). There are limited examples of CSPs in archaea, and in particular, there are no examples of CSPs in thermophiles. A complementation study found that a number of cold adapted archaeal CSP and cold shock domain (CSD) proteins from Methanogenium frigidum, M. burtonii, and Crenarchaeota symbiosum could rescue a cold sensitive E. coli CspA mutant (Giaquinto et al., 2007). Initiation factors (IFs), ribosome-associated proteins, are also induced by cold temperature. After cold shock in E. coli, the significant and transient increase of IFs result in cold-shock translational bias (Shiba et al., 1986; Goldenberg et al., 1996; Giuliodori et al., 2004; Giuliodori et al., 2007; Phadtare et al., 2007).Translation of cold shock mRNAs is favoured due to the cis elements in the cold shock mRNAs, that make them more prone to translation at low temperatures, and to trans elements associated with the translational apparatus of the cold-shocked cells (Giuliodori et al., 2007).A secondary cold adaptive role of the E. coliIF1 lies in its transcriptional antiterminator activity, where it mediates melting of cold-induced mRNA secondary structures, and allows translation to occur without obstruction(Phadtare et al., 2007); similar to the role of CspA. Both proteins are in the oligomer binding (OB) fold protein family and are structurally very similar. In E. coli, mutations in IF2 result in cold sensitive mutants (Shiba et al., 1986; Laursen et al., 2003); and IF2 in S. oneidensis was induced upon cold shock (Gao et al., 2006). Correct protein folding in the cold can be a challenge for cells, and there has been some evidence of an increase in specific proteins that assist folding. For example, peptidyl-prolyl cis-trans isomerase (PPIase) proteins catalyse the rate-limiting cis-trans isomerisation of imidic peptide bonds in oligopeptides and accelerate the folding of proteins. Quantitative proteomics studies determined that PPIase proteins were

L. Ting, UNSW. 7 Chapter 1

differentially increased at low growth temperature in the a range of cold adapted bacteria such as Shewanella sp. SIB1 (Suzuki et al., 2004), S. livingstonensis (Kawamoto et al., 2007), and Psychrobacter arcticus (Zheng et al., 2007). Similarly, a quantitative proteomics investigation of the cold adapted M. burtonii (Goodchild et al., 2004; Goodchild et al., 2005) found PPIase abundance increased during cold growth, and a transcriptomic study of the hyperthermophillic Methanococcus janaschii (Boonyaratanakornkit et al., 2005) detected increased PPIase transcript expression during cold shock. FK506-binding proteins (FKBs) also catalyse proline isomerisation, and in the nematode Caenorhabditis elegans, three out of eight FKBs also have dual PPIase domains. Deletion of these three genes resulted larval death during cold growth (Winter et al., 2007). 1.2.3.4. Compatible solutes and other cryoprotectants Compatible solutes, or osmolytes, protect macromolecules and cells against osmotic stress. They also confer tolerance to low growth temperatures in the food spoilage bacterium L. monocytogenes (Ko et al., 1994; Panoff et al., 2000). For example, glycine betaine is a well-documented compatible solute known to confer osmoprotection and adaptation to cold growth temperatures by stabilising proteins and assisting in protein folding in bacteria (Bourot et al., 2000; Angelidis & Smith, 2003), and eukarya (Naidu et al., 1991; Xing & Rajashekar, 2001). EPS and trehalose also act as cryoprotectants (Section 1.2.3.1). Trehalose has been implicated in the prevention of protein denaturation and aggregation, scavenging for free radicals and stabilisation of cell membranes in E. Coli (Kandror et al., 2002). In yeast, the induction of trehalose production has been linked with energy preservation by increasing carbohydrate reserves (Schade et al., 2004; Murata et al., 2006). 1.2.3.5. Protein flexibility Cold adapted microorganisms produce cold adapted proteins that are able to cope with a reaction rate reduction as a function of decreasing temperature (Gerday et al., 2000; Russell, 2000; D'Amico et al., 2002; Feller & Gerday, 2003; D'Amico et al., 2006; Siddiqui & Cavicchioli, 2006; Marx et al., 2007). Low temperature increases the stability of a protein and thus adversely affects the mobility required for catalytic activity. However, the properties of cold adapted proteins are unique in that they possess increased flexibility in order to be functional at low temperatures (Siddiqui &

8 L. Ting, UNSW. General introduction

Cavicchioli, 2006). Increased flexibility translates into low-activation enthalpy, low- substrate affinity, and high specific activity at low temperatures (Siddiqui & Cavicchioli, 2006). However, high flexibility is accompanied by a compromise in stability, as cold adapted proteins are generally heat labile (Zavodszky et al., 1998; Siddiqui & Cavicchioli, 2006).

1.3. Mass spectrometry and proteomics

The proteome consists of all proteins expressed by an organism under a given set of conditions and therefore represents the functional complement of the genome (Wasinger et al., 1995; Wilkins et al., 1996; Goodlett & Yi, 2002). Proteomics includes the large- scale study of protein properties such as abundance levels, posttranslational modifications and interactions with other molecules to obtain a global view of cellular processes at the protein level. Mass spectrometry (MS) is a major tool in achieving these objectives(Pandey & Mann, 2000; Rabilloud, 2002; Wittke et al., 2004). MS is based on the principle that the accurate mass of a group of peptides derived from a protein by sequence specific proteolysis is a successful method of protein identification. If a sequence database containing the protein sequence is compared to the mass spectrum of the peptide, then it is expected to be correctly identified within the database (Aebersold & Goodlett, 2001). Many different MS approaches have been developed to achieve peptide and protein identification. Most commonly, proteins are tryptically digested to give peptides cleaved at arginine (R) and lysine (K) amino acid residues. Prior to mass spectrometric analysis, proteins and/or peptides are usually separated to simplify sample complexity. Samples are then introduced into a mass spectrometer to generate mass spectral data for protein identification and quantitation.

1.3.1. Protein and peptide separation

Prior to mass spectrometric analysis, proteins can be separated using gel-based or gel- free separation. An example of a gel-based protein separation approach is two- dimensional gel electrophoresis (2DE),which allows for the generation of a differential protein profile based on the separation of proteins by isoelectric point and molecular weight (Gorg et al., 2000; Gygi et al., 2000)(Section 1.3.3.1). Individual proteins are visualised by staining techniques, most commonly silver staining and Coomassie blue

L. Ting, UNSW. 9 Chapter 1

dyes, due to their ease of use, sensitivity (silver) and cost effectiveness (Patton et al., 2002). The proteins can be subsequently excised from the gel and identified using mass spectrometric methods. The separation of proteins and peptides using a gel-free approach is most commonly achieved by liquid chromatography (LC) (Patterson, 1994; Aebersold & Goodlett, 2001; Griffin et al., 2001; Goodlett & Yi, 2002; Rabilloud, 2002; Yates III & Snyder, 2004). Pre-fractionation of intact proteins by high performance LC (HPLC) is a standard method for simplifying a complex sample. Approaches include protein separation on the basis of polarity, size and charge (Issaq, 2001). Fractionation of digested proteins (i.e. peptides) is often achieved using liquid chromatography coupled online with tandem mass spectrometry (LC-MS/MS). LC-MS/MS is a fast and sensitive method to identify proteins, that is mostly automated and very reproducible (Link et al., 1999). Peptides are separated according to hydrophobicity on a reversed phase (RP) chromatography C18 column, and eluted online into a mass spectrometer for MS analysis. Highly complex samples often merit a second dimension of LC prior to RPLC; this is two-dimensional liquid chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS), where peptides are applied to a strong cation exchange (SCX) chromatography column, resulting in peptide displacement into fractions according to the ionic strength of the eluting (Ye et al., 2000; Shen et al., 2004). A combination of a gel-based protein separation with RPLC separation of digested peptides (GeLC) is similar to the 2DE approach. Proteins are subjected to separation, in a single dimension according to molecular weight, using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). Bands of interest are excised and digested in-gel, and peptides are subjected to RPLC separation, prior to MS analysis (Wilm et al., 1996; Lasonder et al., 2002; Everley et al., 2004). This process is often termed GeLC-MS or GeLC-MS/MS.

1.3.2. Tandem mass spectrometry and protein identification

After LC, separated peptides can be introduced onto a mass spectrometer by electrospray ionization (ESI) for analysis. It should be noted, that matrix-assisted laser desorption ionization (MALDI) is another approach for ionization of peptides; however this is a non-LC approach and will not be discussed here (reviewed in Hillenkamp & Peter-Katalinic, 2007). Using ESI, the peptides are usually eluted online from RPLC

10 L. Ting, UNSW. General introduction

separation into the mass spectrometer for MS/MS analysis, where the first dimension of MS measures peptide intensity. The second dimension of MS follows, where a user- specified number of the most intense peptide ions are fragmented by collision induced dissociation (CID), to generate peptide fragment information (Aebersold & Mann, 2003; Yates III, 2004). There are many different mass spectrometers that are available to large-scale proteomics that achieve MS and MS/MS analysis in different ways. The most commonly used mass analysers include quadrupoles, ion traps (IT), time of flight (TOF), TOF-TOF, quadrupole-TOF (QTOF) hybrids, quadrupole-IT hybrids, IT- orbitrap hybrids and IT-Fourier transform ion-cyclotron resonance mass spectrometer (FTMS) hybrids (Aebersold & Mann, 2003; Yates III, 2004). Protein identification is achieved by comparing uninterrupted tandem mass spectra against theoretical spectra predicted for each peptide contained in a protein sequence database(Eng et al., 1994). The two most popular search tools are Mascot (Perkins et al., 1999) and SEQUEST (Eng et al., 1994), which use probability-based matching to compare experimental to theoretical peptides. This comparison results in a score that reflects the statistical significance of the match (Aebersold & Mann, 2003).

1.3.3. Quantitative proteomics

To understand cellular processes and answer biological questions, it is important to not only have the ability to identify proteins, but also to measure their abundance changes in different situations (Krijgsveld & Heck, 2004; MacCoss & Matthews, 2005). Quantitative proteomics is the global analysis of protein abundance in a cell. It is a valuable approach for a systems-based understanding of the cell, because protein abundance data provides insight into the molecular mechanisms of biological processes and systems (Yan & Chen, 2005). Proteins can be quantified on a relative or an absolute scale. Relative quantitation is most commonly performed, where two or more conditions are compared (e.g. disease vs. healthy, or treatment vs. control). Several approaches are used for the relative quantitation of protein abundance; visual protein intensity, MS peak intensity and spectral counting (Figure 1.1) Firstly, quantitation by visual protein intensity is represented by two- dimensional electrophoresis (2DE), where corresponding protein spot intensities are compared to achieve quantitation (Section 1.3.3.1). Secondly, proteins can also be

L. Ting, UNSW. 11 Chapter 1

Figure 1.1. Quantitative proteomics approaches.XIC, extracted ion chromatogram. MS1, first dimension of mass spectrometry. 2DE, two dimensional electrophoresis. 2D-DIGE, two dimensional difference electrophoresis. 2D gel from Ostrowski et al. (2004). quantified from MS experiments by examining the peptide peaks from a survey scan (i.e. the first dimension of MS, or MS1). The relative peak intensity (peak area or peak height) is correlated with the abundance of a peptide in the sample; this is because MS is an intensity-dependent technique, where highly abundant peptide species are measured more often than less abundant species (Washburn et al., 2002a; Liu et al., 2004). Peak intensity quantitation is used in label-free quantitation (Section 1.3.3.2) and stable isotope labelling quantitation (Section 1.3.3.3). Finally, quantitation by spectral counting uses the number of confidently identified peptides per protein as a proxy for protein abundance. This approach is only used in label-free quantitation (Section 1.3.3.2). In absolute quantitation, a spiked standard absolute amount of reference protein or peptide is introduced into each sample. The MS peak intensity approach is most commonly used to compare MS peptide peaks of interest against the known concentration of spiked standard. The spiked standard is also used to normalise peak intensities across experiments (Gerber et al., 2003; Kirkpatrick et al., 2005). Absolute quantitation will not be discussed in further detail in this report.

12 L. Ting, UNSW. General introduction

1.3.3.1. 2DE-based quantitative proteomics Quantitation of proteins using 2DE is based on the comparison of spot intensities between 2D polyacrylamide gels. It is a powerful technique that separates proteins in a sample by two dimensions of chromatography; pI followed by molecular weight (Gorg et al., 2000; Rabilloud, 2002). After protein separation by 2DE, protein spots can be excised, enzymatically digested and identified using mass spectrometric methods such as MALDI-TOF or ESI-LC-MS/MS. The main advantage of this technique is its ability to separate and visualise thousands of proteins on a single polyacrylamide gel (Fey et al., 2000; Gorg et al., 2000). Complex mixtures such as whole cell lysates can be challenging to analyse using 2DE, due to large dynamic ranges; or proteins that fall beyond the pI and mass thresholds of the technique; isoforms or co-migrating proteins can also confound results; and membrane proteins, hydrophobic proteins, very large or small proteins and very acidic or basic proteins are difficult to analyse by 2DE (Peng & Gygi, 2001; Wittke et al., 2004; Yan & Chen, 2005). The expensive and sophisticated software required for gel image comparison and spot quantitation may also be seen as a technical disadvantage. Also, since quantitation of proteins is based on spot intensity differences between gels, samples must be prepared in separate experiments, which can be the source of inaccuracy (Gorg et al., 2000; Wittke et al., 2004). The advancement of 2DE to 2D difference gel electrophoresis (2D-DIGE) involving the pre-labelling of protein samples with two-colour fluorescent dyes prior to electrophoresis, allows for two samples to be analysed on a single gel; thus alleviating the problems associated with separate experiments (Unlu et al., 1997; Marouga et al., 2005). In addition, the Amersham (now GE Healthcare) commercialised 2D-DIGE system includes a third fluorescent label for internal standards, which facilitates improved mapping of spots between gels, and within- and between-gel normalisation (Section 1.3.4.3). 2DE-MS has been widely used in quantitative proteomics investigations in bacteria (Budde et al., 2006; Graham et al., 2006; Kawamoto et al., 2007), eukarya (Miller et al., 2003; Lam et al., 2006; Lee et al., 2007) and archaea (Lim et al., 2003; Barry et al., 2006); however its popularity is decreasing with the maturation of gel-free quantitative proteomics approaches.

L. Ting, UNSW. 13 Chapter 1

1.3.3.2. Label-free quantitative proteomics The label-free approach is a recent development in quantitative proteomics methodology that is also implicitly a gel-free approach. The two general approaches for quantifying protein abundance without the use of labels (or gels) involve either the comparison of spectral features or spectral counting. The spectral features approach, quantifies proteins between two or more independent experiments by aligning peptide peaks from MS1 scans and comparing relative peak intensities of the same peptides across the samples (Bondarenko et al., 2002; Chelius & Bondarenko, 2002; Wang et al., 2003; Nesvizhskii et al., 2007). In order to ensure highly detailed, accurate and reproducible data, a high resolution mass spectrometer is required, and the same data acquisition protocol should be used for each sample (i.e. same columns, gradient, and preferably temperature controlled) (America et al., 2006). A challenge in spectral feature quantitation lies in matching each detected peptide peak from one dataset to the same peptide peak in another dataset. The exact mass to charge ratio (m/z) and retention time of the peak may differ, usually due to technical drift of LC or MS instrumentation; these factors complicate the comparison of datasets, particularly if retention time drift is non-linear (America et al., 2006). Often chromatographic standards are spiked into samples to allow for more accurate alignments and comparisons. Also, many replicate samples need to be analysed for a comprehensive insight into the abundance changes of proteins, otherwise protein ratio estimations are limited to abundant proteins with high sequence coverage (Old et al., 2005). Finally, any systematic and non-systematic variations between experiments will be reflected in the data; therefore, minimal sample handling is recommended (Bantscheff et al., 2007). Spectral counting can quantify proteins from two or more independent experiments, whereby the number of acquired spectra that can beconfidently matched to peptides correlate to the abundance of a protein. The samples are analysed in the mass spectrometer separately, using the same data acquisition protocol. Protein identification is also performed separately, to create a distinct list of proteins for each sample. These lists are then compared between experiments in order to identify proteins with differential changes in abundance, where abundance is estimated by the number of confident peptide identifications normalised by protein length or number of expected

14 L. Ting, UNSW. General introduction

tryptic peptides (Blondeau et al., 2004; Old et al., 2005; Zybailov et al., 2005; Hendrickson et al., 2006; Paoletti et al., 2006; Xia et al., 2007). Spectral counting has been demonstrated to be a more sensitive method for detecting proteins that undergo changes in abundance when compared to label-free peak area intensity measurements for quantitation (Old et al., 2005). Similar to spectral feature quantitation, spectral counting requires many replicate analyses to be performed, or else the quantitative data is limited to only highly abundant protein species (Old et al., 2005). In addition, any systematic or non-systematic variation will also be represented in the data. Spectral counting should be considered a semi-quantitative method because the physicochemical properties of each peptide are not considered during quantitation. Peptide characteristics such as size, charge, and hydrophobicity influence peptide ionisation efficiency, which affects the success of downstream protein identification and quantitation. Also, spectral counting assumes that all peptides will respond similarly and that the linearity of response will be the same (Bantscheff et al., 2007). Furthermore, more starting material was required for comparable analysis in quantitation using spectral counting when compared to stable isotope labeling (Hendrickson et al., 2006) (Section 1.3.3.3). Regardless of these limitations, advances in spectral counting such as improving the correlation of spectral counts with expected abundance ratios (Rappsilber et al., 2002; Ishihama et al., 2005), and developments in computational tools are improving the accuracy of the method (Craig et al., 2005; Tang et al., 2006). Although spectral feature and spectral counting label-free approaches are the least accurate in quantitative proteomics approaches, the advantage of using label-free approaches lies in the omission of expensive reagents or labels, a greater dynamic range than stable isotope labelling and no limit on the number of conditions to be compared (Zybailov et al., 2005; Bantscheff et al., 2007; Nesvizhskii et al., 2007). 1.3.3.3. Stable isotope labelling quantitative proteomics The fundamental concept of stable isotope labelling is the creation of heavy and light isotopic protein derivatives resulting in detectable peptide mass shifts in the MS1 spectrum. Replacement of 13C for 12C, or 15N for 14N, or 2H for 1H in proteins can generate characteristic mass shifts in protein isotopic distribution patterns without affecting their chemical or structural properties (Zhong et al., 2004; Yan & Chen, 2005). Corresponding heavy and light peptides from the same MS analysis can be quantified separately, and their ratio represents the relative abundance of the

L. Ting, UNSW. 15 Chapter 1

corresponding peptide (Nesvizhskii et al., 2007). Stable isotope labelling for quantitative proteomics was introduced in 1999 by three separate groups (Gygi et al., 1999a; Oda et al., 1999; Pasa-Tolic et al., 1999). The isotopes can be introduced into the proteins or peptides in two ways; in vitro or in vivo (Figure 1.2). There are several in vitro isotope labelling approaches that are commonly used, these include isotope-coded affinity tags (ICAT) (Section 1.3.3.3.1), isobaric tag for relative and absolute quantitation (iTRAQ) (Section 1.3.3.3.2), and 16O/18O digestion labelling (Section 1.3.3.3.3); the commonality between these methods is the introduction of a label by a chemical reaction. The in vivo labelling approaches include metabolic labelling (Section 1.3.3.3.4) and stable isotope labelling with amino acids in cell culture (SILAC) (Section 1.3.3.3.5), where the stable isotope label is biologically introduced during growth.

Figure 1.2. In vivo and in vitro stable isotope labelling approaches. SILAC, stable isotope labelling with amino acids in cell culture. ICAT, isotope-coded affinity tags. iTRAQ, isobaric tag for relative and absolute quantitation.

1.3.3.3.1 In vitro labelling: ICAT In ICAT labelling, proteins are labelled at residues with a thiol-reactive group motif prior to trypsin digestion (Gygi et al., 1999a). After digestion, the peptide mixture is separated by SCX chromatography, followed by biotin-affinity purification in an avidin column to selectively isolate ICAT-labelled peptides. Cleavage of the linker releases the peptide containing the thiol-reactive group and isotopic tag from the biotin affinity tag to reduce the overall size of the label. The peptides are further separated by RP chromatography prior to analysis by MS/MS (Gygi et al., 2002). Incorporation of

16 L. Ting, UNSW. General introduction

the heavy (13C) vs. light (12C) ICAT tag confers a consistent 8 Da mass shift between the light vs. heavy peptide derivatives; where correlation of the intensity of heavy vs. light isotopic peptide peaks from the survey scan allows for relative quantitation (Gygi et al., 2002). An advantage of using ICAT labelling is that the alkylation reaction for ICAT labelling is highly specific and is highly tolerant of the presence of salts, detergents and stabilisers (Gygi et al., 1999a). Another advantage of ICAT is that the complexity of the whole cell extract is reduced by isolating peptides that contain cysteine residues on the avidin column (Gygi et al., 1999a). Complex protein mixtures may overwhelm the resolution capacity of mass spectrometric analysis, and so that simplifying the mixture results in a more sensitive identification process (Eng et al., 1994; Link et al., 1999). The primary disadvantage of using ICAT is that only proteins that contain a cysteine will be detected. Since ICAT is a chemical labelling approach, there is also a chance that artifacts from side reactions may occur (Bantscheff et al., 2007). Also, the size of the ICAT label results in a relatively large modification that remains on each peptide through the MS analysis (Aebersold & Goodlett, 2001). Though the large label size is a potential problem, this has been overcome by advances in the automated computation of MS data, where programs such as Mascot and ProICAT take into account the ~ 0.5 kDa label when searching the sequence database. The ICAT labelling system is no longer widely used due to the development of the superior iTRAQ system. 1.3.3.3.2 In vitro labelling: iTRAQ The iTRAQ system is an amine group-based labelling methodology, where the tag consists of a reporter group, a balance group and a peptide reactive group. Samples are labelled after enzymatic digestion, where the peptide-reactive group binds to the primary amine group of a peptide (Ross et al., 2004). The 4-plex iTRAQ system can label up to four different samples for comparison since there are four reporter groups with masses of 114, 115, 116 or 117 Da, depending on the different combinations of 12C and 13C or 16O and 18O. The balance groups range in mass from 28 to 31 Da, to constitute a combined mass of 145 Da for the reporter and balance groups (Ross et al., 2004). Similarly, an 8-plex iTRAQ system has eight reporter groups with masses between 113 to 121 Da (Choe et al., 2007). The balance groups range from 192 to 184 Da, to give a total mass increase of 305 Da (Choe et al., 2007).In CID, the reporter group ions fragment from the peptide backbone, thus displaying the 114 to 117 Da

L. Ting, UNSW. 17 Chapter 1

masses, for 4-plex iTRAQ; or 113 to 121 Da masses, for 8-plex iTRAQ, in the MS/MS scan. The relative intensities of these reporter groups are used for quantitation of the peptides (Ross et al., 2004; Choe et al., 2007). An advantage of using the iTRAQ system is the ability to examine four or eight different samples at the same time in the same experiment. Once the peptides are labelled, they are isobarically and chromatographically alike, which means that the same peptide, from up to eight different samples, appear as a single peak in the survey scan. This has clear advantages for increasing signal intensity, and thus increasing the likelihood of confident protein identification from high quality data. An additional advantage of iTRAQ is the complete labelling of the sample; since the tag binds to the primary amine group of all peptides (Zieske, 2006; Chen et al., 2007; Fenselau, 2007). A disadvantage of using iTRAQ is the late stage at which the label is introduced into the samples. There are many sample handling steps prior to trypsin digestion (i.e. cell culture and harvest, protein extraction, clean up, separation and digestion) where errors may be inadvertently introduced that may result in inaccurate quantitation results. Additionally, the software for the analysis of iTRAQ experiments are still in their infancy with regards to comprehensive inter-experimental comparison, and statistical hypothesis testing (Aggarwal et al., 2006; Bantscheff et al., 2007; Fenselau, 2007). 1.3.3.3.3 In vitro labelling: 16O/18O digestion labelling Labelling of peptides with 16O or 18O is achieved either during protein digestion with a protease (most commonly trypsin) or immediately after proteolysis in a second 16 18 incubation step in the presence of either H2 O or H2 O (Yao et al., 2003). Enzymatic labelling of 18O into the C-terminus of peptides results in a 2 Da mass shift per atom in the peptide. The use of trypsin and Glu-C introduces two 18O atoms, resulting in a detectable 4 Da mass shift per oxygen (Miyagi & Rao, 2007). An advantage of the 16O/18O digestion labelling system is that the label is introduced enzymatically and not chemically. Thus, the possibility of unintentionally introducing experimental artifacts is removed. A perceived advantage of digestion labelling is that theoretically, every peptide should be labelled; but practically, this may not be possible since different peptides incorporate the label at different rates, which complicates data analysis (Johnson & Muddiman, 2004; Julka & Regnier, 2004; Miyagi & Rao, 2007; Ramos-Fernandez et al., 2007). Since high-throughput quantitative

18 L. Ting, UNSW. General introduction

proteomics experiments are inherently complicated, it is not desirable to add an extra layer of complexity into the post-experimental data analysis stage. 1.3.3.3.4 In vivo labelling: Metabolic labelling The in vivo approach of stable isotope labelling, termed metabolic labelling, involves the incorporation of isotopes into proteins during cell growth. Proteins are quantified by correlating and measuring the relative isotope ratios of light and heavy peptide pairs. The peptide peak intensities or peak areas in the MS survey scan are examined. The most commonly used isotopic substitution for metabolic labeling is 14N/15N, where incorporation of 15N into the peptide results in a 1 Da mass shift per nitrogen atom in the peptide. Metabolic labelling has been successfully employed in quantitative proteomics investigations of bacteria (Oda et al., 1999; Conrads et al., 2001; Wang et al., 2002; Zhong et al., 2004), archaea (Andreev et al., 2006; Xia et al., 2006), yeast (Washburn et al., 2002a; MacCoss et al., 2003; Zybailov et al., 2005; Zybailov et al., 2006; Usaite et al., 2008), plants (Engelsberger et al., 2006; Nelson et al., 2007), mammalian cell cultures (Conrads et al., 2001), flies and worms (Krijgsveld et al., 2003), and rats (Wu et al., 2004). An advantage of metabolic labelling is the complete biological incorporation of the isotopic labels into the sample without the need for chemical labelling. Also, since samples are labelled during growth, they can be combined during the early stages of the experiment (just before or just after protein extraction). Mixing samples early decreases chances for an experimental error being inadvertently introduced to one sample and not the other (Washburn et al., 2002b; Krijgsveld et al., 2003). If any sample handling errors do occur, it is expected to occur to heavy and light labelled proteins in parallel. A disadvantage of metabolic labelling is the impracticality of labelling samples that cannot be cultured in the laboratory. These samples include clinical specimens, body fluids or environmental samples. Also, since complete labelling requires many generations of growth, complex multicellular organisms with long culture times are difficult to label completely. However, two separate groups have successfully metabolically labelled complex eukaryotes by either labelling E. coli and yeast with 15N, to feed and label worms that, in turn, were fed to flies to achieve full labelling (Krijgsveld et al., 2003); or 15N algal cells were fed to rats to achieve long term labelling (Wu et al., 2004). It has been noted that a disadvantage exists with 15N

L. Ting, UNSW. 19 Chapter 1

metabolic labelling if an a priori knowledge of a peptide sequence is not known (Ong et al., 2003a). That is, if the genome of the organism to be metabolically labelled is not sequenced, the varying mass differentials in unlabelled vs. labelled peptide species cannot be predicted because both backbone nitrogen atoms and all side chain nitrogen atoms are labelled (Ong et al., 2003a). However, this is not the biggest problem with performing proteomics analyses on an unsequenced organism; the key challenge is in successfully identifying peptides by cross-species matching (Ostrowski et al., 2004). Using cross-species matching will usually result in far less significant protein identifications (Molloy et al., 2001; Habermann et al., 2004). 1.3.3.3.5 In vivo labelling: SILAC Similar to metabolic labelling, the SILAC approach involves growing cells in heavy or light isotopic conditions, where the label is biologically incorporated into proteins. Cells representing two different conditions are grown in media where heavy (e.g.13C, 15N or 2H) or light (e.g.12C, 14N or 1H) labelled amino acids are provided as the only source of supplementation; thereby, labelling newly synthesised peptides with either heavy or light isotopes(Ong et al., 2002). This approach is particularly amenable to growing cells in culture (e.g. immortalised cell lines) when the option of 14N/15N metabolic labelling is logistically impractical. Many of the advantages and disadvantages of SILAC are very similar to those for metabolic labelling. However, a disadvantage of using SILAC as distinct from 14N/15N metabolic labelling, is that quantitation errors can occur when using some isotopically labelled amino acids due to their interconversion in vivo. For example, in some cell lines, arginine is converted into proline (Engen et al., 2002; Ong et al., 2003a; Ong et al., 2003b); and glycine is reported to scramble to a range of other amino acids (Engen et al., 2002). These interconversions result in unpredictable isotope dilution, partial loss of labelling and unpredictable mass shifts. Thus, the introduced isotopically labelled amino acids must be at the end of a metabolic pathway to avoid undesirable amino acid conversion. Examples of amino acids without reported interconversions include lysine, leucine, , , and do not result in isotope scrambling (Engen et al., 2002).

20 L. Ting, UNSW. General introduction

1.3.4. Post-experimental bioinformatics: Data processing, normalisation and statistical testing and validation

A major analytical challenge in quantitative proteomics is the bioinformatics component of the experiment involving the identification, quantitation and statistical validation of proteins and their expression levels in complex biological systems (Venable et al., 2004). The “back-end” post-experimental component of high-throughput quantitative proteomics studies is the data processing and validation bottleneck that occurs. There is a large choice of open source analytical software packages and an increasingly stricter requirement for appropriate data processing that accounts for experimental design, normalisation and statistical testing. 1.3.4.1. Publicly available computational tools for quantitative proteomics Many open source software packages to quantify proteins, according to the labelling strategy used, have been released (Table 1.1). However, most are specific to instrument types, file types or labelling strategy. At the commencement of the study in 2004, very few completed and released software packages were available for the analysis of quantitative proteomics data (Table 1.1). At that time, the available packages were MSQuant, ASAPRatio, XPRESS, RelEx and ProQuant. Subsequently, many more programs have been released with protein quantitation functionalities (Table 1.1). Since there is no generally accepted standard for data processing, many of the software tools specifically written for protein identification and quantitation vary greatly and may be unable to account for the nuances in each experiment. The heterogeneity of data formats caused by unique raw data file outputs from mass spectrometers, different strategies for protein identification and quantitation by analytical programs, and the different file formats of the programs result in the challenge of data processing and data analysis (Keller et al., 2005). Additionally, the different combinations of labelling and separation techniques with mass spectrometer configurations, result in different spectra types that require different approaches for processing and interpretation (Cannataro, 2008). As a result, much time is often spent inefficiently in converting data files to readable formats. Currently, there are only three complete software platforms available to the general scientific community that allow for integrated data processing and analysis; the Trans-Proteomic Pipeline (TPP) (Keller et al., 2005), MaxQuant (Cox & Mann, 2008) and Mascot v 2.2 (Matrix Science, UK) released in 2008 (Table 1.1). The

L. Ting, UNSW. 21 Chapter 1 , 2008) , 2003) , 2007) , 2006) . . l l , 2005) , 2008) . , 2001) et al. et a . l et al. et a l , 2006) , 2003) . l et al. et a et a et al. et a (Cox & Mann, 2008; Graumann (Schulze & Mann, (Schulze 2004) (Han (Li (MacCoss Reference (Palagi (Park (Lu from

2 XIC of XIC of 2 1 s (intensity k isotopomers (Bouyssie 1 XIC of isotopmers XIC of isotopmers XIC of isotopmers (Andreev 1 1 1 3D-pea 1 XIC of isotopmers 1 time) scans of isotopomers intensity of MS based on RelEx, or MS 1 k m/z vs. m/z vs. Area under MS Area under centroided MS isotopmers Area under MS Area under MS Background-subtracted intensity ratio of MS Intensity of MS MS 2D representation of LC-MS data, similar to 2DE Absolute quantitation by MS single-reaction monitoring (SRM) scans or iTRAQ spectral counting, corrected by machine learning-based prior expectation of observing each peptide based on physicochemical properties N 15 N/ 14 free N metabolic N metabolic 15 15 N/ N/ labelling ICAT, SILAC Stable isotopic labelling especially Pea SILAC, label free Label free, stable isotopic labelling, iTRAQ SILAC ( possible with extra software) SILAC, ICAT (also and metabolic labelling) Stable isotopic labelling 14 labelling 14 ) ation software for quantitative proteomics experiments proteomics experiments ation software for quantitative Mascot html result file mzXML (part of the Trans-Proteomic Pipeline (TPP)), or in BioWorks package from ThermoFinnigan part of TPP mzXML, DTASelect output files (.out Web-based application, Mascot result file and Analyst .wiff required Xcalibur .raw file DTA Select output, pepXMLmzXML, SEQUEST data output mzXML Label protXML, part of TPP Label free er, Waters, ABI- QSTAR, QTOF, LTQ- QTOF, QSTAR, FT QTOF, LCQ, MALDI-TOF-TOF LTQ-FT, LTQ, QTOF, LTQ-Orbitrap ThermoFischer (.raw) High resolution mass spectrometers (LTQ- Orbitrap) Low and high-resolution mass spectrometers High resolution hybrid ion trap mass spectrometer; LTQ-FT Applied Biosystems, Bruc k SCIEX, ThermoFinnigan mass spectrometers Table 1.1. Publicly available quantit Software MSQuant MS instrumentation Input data format XPRESS ASAPRatio Labelling strategy Quantitation strategy MFPaQ QSTAR MaxQuant RelEx Census QN MSight APEX Most

22 L. Ting. UNSW General introduction , 2005) , 2005) , 2008) , 2006) , 2005) , 2006) , 2007) et al. , 2006) et al. , 2005) et al. et al. et al. et al. et al. (Li (Leptos Applied Biosystems, 2006 (Lin (Keller et al. Matrix Science, 2008 Applied Biosystems, 2004 (Monroe et al. (May (Jaffe Reference 2 , 1 (Shadforth 2

1 (Halligan for 1 1 of 2 s in MS k area from

1 1 matching of MS k XIC of isotopmers, areas from MS area from MS areas from MS 1 k and pea m/z). Quantitation of 2D map intensities of reporter ions in intensities of reporter ions in vs. scans are converted into 2D maps of reporter ions (114-117 m/z for 1 2 2 2 Peptide array, quantitation of MS Spectral feature quantitation by landmar k single ion chromatograms based on ASAPRatio algorithm features. (RT 4-plex, 113-121 m/z for 8-plex) Isotopomer pea reporter ions (114-117 m/z for 4-plex) for iTRAQ, spectral counting. area under report ion peaks in MS uses MapQuant as part of the PEPPeR pipeline SILAC. Isotopomer pea k MS MS Accurate mass and time tag quantitation from MS Accurate mass and time tag quantitation from MS Isotopomer pea k Pea k MS Pea k MS Area under MS Area under report ion pea free O labelling Isotopomer pea k iTRAQ Label free Label free SILAC, iTRAQ (4- plex and 8-plex) SILAC, 14N/15N metabolic labelling, 18O, iTRAQ 18 iTRAQ (4-plex) Label free Label ages available at the commencement of current study in 2004. Bold entries represent Raw data files mzXML Label mzXML Label .raw, mzXML, mzData, CDF and MGF file pairs mzXML and pepXML, part of TPP DTA Select .out files Raw data files mzXML iTRAQ mzXML, part of TPP iTRAQ .dta or .mgf files .dta or .mgf files .raw file .raw N metabolic labelling experiments. N metabolic labelling experiments. 15 N/ 14 or

er, Waters and QSTAR, QTrap; Applied Biosystems mass spectrometers similarly performing analyser High resolution, high accuracy QTOF Ion trap mass spectrometers QSTAR, QTrap; Applied Biosystems mass spectrometers Applied Biosystems, Bruc k ThermoFinnigan mass spectrometers QSTAR, QTOF, Most, TOF-TOF software able to process Entries shaded in grey represent software pac k er Most Software MapQuant LTQ-FT MS instrumentation PEPPeR Input data format Masic Most LTQ-Orbitrap, LTQ-FT Labelling strategy msInspect/AMT Most High resolution MS scans Quantitation strategy Label free SpecArray ZoomQuant ProQuant ProteinPilot Multi-Q Libra i-Trac k Mascot Most

L. Ting. UNSW 23 Chapter 1

advantage in the development of these integrated analytical platforms is that data analysis time is spent efficiently, and consistent data processing and analysis facilitates data comparison between experiments, as well as data exchange between researchers and research groups (Keller et al., 2005). A potential difficulty with using these platforms is that the approach for protein identification, quantitation and statistics are limited to those offered in the software package. For example, the normalisation of data offered by Mascot v 2.2 is limited to mean or median adjustment. Finally, commercial packages may not be completely transparent in data processing methods. Currently, there is no generally accepted standard for the experimental design, data processing and determination of statistical significance for global quantitative proteomics data; although, increasingly, stricter analyses and statistical validation of quantitative data are required for publication of mass spectrometric studies. Journals such as Molecular and Cellular Proteomics (Carr et al., 2004; Celis, 2004) and Proteomics (Wilkins et al., 2006) have published guidelines for MS data publication and minimum information requirements; and many other journals that regularly accept proteomics studies for publication have similar requirements on their online author information guidelines. The Human Proteome Organisation (HUPO) developed a Proteomics Standards Initiative in 2002 to define standards in sample processing, analysis, informatics, reporting and exchange of proteomics data (Kaiser, 2002; Taylor et al., 2006) and have published their required standards in Nature Biotechnology (Taylor et al., 2007; Gibson et al., 2008; Taylor et al., 2008). The challenge of providing adequate experimental information for publication and performing appropriate data analyses includes accounting for experimental design, normalisation within and between experiments, accounting for “the missing data problem”, and deciding on which statistical testing approaches are most suitable for dealing with false positives, false negatives and/or false discovery rates (FDR) in protein identification and quantitation. 1.3.4.2. Experimental design The experimental design of a quantitative proteomics study is important in the post- experimental data processing stage because all experimental factors influence the types of analyses that can be employed and the robustness of biological interpretation (Rocke, 2004; Hu et al., 2005; Karp et al., 2005b). The overarching significance of appropriately designing experimental parameters and accounting for them in post-experimental data

24 L. Ting, UNSW. General introduction

processing, is to avoid making false conclusions (Boguski & McIntosh, 2003; Karp et al., 2007). Many different criteria must be considered when designing a quantitative proteomics experiment. These include the biological compartment from which the proteins are derived (e.g. cytoplasm, membrane, secreted, nucleus or organelles), labelling strategy (if any) should be selected with knowledge of the limitations and bioinformatics requirements of the method, the effect of sample pooling (if any), methods of protein extraction (particularly with respect to downstream experimental procedures that may be incompatible for certain compounds), separation of proteins (e.g. gel-based or gel-free), separation of peptides (if any), number of LC dimensions, choice of mass spectrometers and knowledge of their strengths and limitations, experimental and instrumental consistency and calibration (Hunt et al., 2005; Biron et al., 2006; Chich et al., 2007; Stead et al., 2008). Finally, it is important to consider the experimental requirements for statistical robustness while designing a quantitative proteomics experiment in order to and ensure that the data collected will be able to confidently answer the question of interest (Hu et al., 2005). Replication is essential in any experiment because it allows for robust data analysis. The more replications that can be performed, the more precise the subsequent computational analysis can be (Chich et al., 2007). There are two types of replicates that can be utilised in proteomics; biological replicates, which are different samples from the same condition or treatment group; and technical replicates, which are repeated experiments from the same biological sample(Karp et al., 2005b). Measurements between technical replicates can result in high correlation because they are not independent samples (Molloy et al., 2003; Karp et al., 2005b), thus it is important to differentiate between technical and biological replicates in post-experimental data processing, otherwise the chance of an over-estimation of significance can occur, which results in a large number of false positives (Karp et al., 2005b; Chich et al., 2007). 1.3.4.3. Normalisation: Learning from transcriptomics Global quantitative proteomics studies strive to make meaningful biological interpretations by comparing protein abundance across many experiments. Therefore, it is important to account for random and systematic variations that occur in experiments by using appropriate normalisation approaches, if required, in order to accurately

L. Ting, UNSW. 25 Chapter 1

minimise non-biological differences in the dataset. The sources of variation in proteomics experiments include variations in sample preparation, LC separation, ionisation efficiency, and mass spectrometer performance. Transcriptomics has many similarities to quantitative proteomics. Transcript detection using gene specific oligonucleotide probes is analogous to protein detection using unique peptide fragments. Similarly, red and green fluorescent labels used for transcriptomics has parallels with 14N and 15N metabolic labelling of proteins. In transcriptomic quantitation, the “large p, small n” paradigm that describes large numbers of observations from small numbers of samples (West, 2003), presents similar statistical and analytical challenges to the assessment of a proteome. While there are clear parallels between datasets from transcriptomics and proteomics, there are also inherent aspects of proteomics analyses that discriminate it from transcriptomics. Quantitative proteomics of complex samples (e.g. whole cell extracts) typically suffers from incomplete proteome coverage (“the missing data problem”), whereas microarrays can cover an entire genome complement of genes in a single experiment. Therefore there is scope to apply post-experimental data processing methods that have been extensively developed over the last decade in the field of microarray analysis, and tailor them for use in quantitative proteomics (Pavelka et al., 2008). By convention, microarray data is normalised within experiments and the appropriate method is determined on a case-by-case basis, by visualising the data, typically using an averaged ratio vs. averaged intensity (MA) plot (Yang et al., 2002). This approach for data normalisation can be adapted for the visual representation of quantitative proteomics data. Normalisation was first applied to quantitative proteomics 2DE-MS experiments (Chang et al., 2004; Kreil et al., 2004; Karp et al., 2005a; Meunier et al., 2005; Corzett et al., 2006; Kultima et al., 2006), and has more recently been applied to LC-MS-based quantitative data (Listgarten & Emili, 2005; Callister et al., 2006; Paoletti et al., 2006; Cairns et al., 2008; Oberg et al., 2008). The normalisation strategy used has tended to be dictated by the proteomics approach adopted and the experimental design. Normalisation methods vary from simple approaches, such as dividing individual expression values by the mean or median of spot intensities from each gel, to complex model-based estimators produced by application of non-linear robust

26 L. Ting, UNSW. General introduction

regression (Lilley et al., 2002; Chang et al., 2004). Global normalisation refers to cases where all features are simultaneously used to determine a single normalisation factor between experiments, while local normalisation refers to cases where a subset of features are used at a time (i.e. different subsets for different parts of the data) (Listgarten & Emili, 2005). In the quantitative comparison of 2DE or 2D-DIGE gels, normalization of both gel background and spot intensity variation between gels by a global fixed value is common (reviewed in Lilley et al., 2002).The adaptation of microarray-based normalisation approaches to 2D-DIGE data was first presented by the Lilley research group (Karp et al., 2004; Kreil et al., 2004). Normalisation by a global mean adjustment using DeCyder software (GE Healthcare) was found to be inadequate for comprehensively accounting for experimental variation; while microarray-based normalisation, where different background fluorescence activity and spot intensity variation were adjusted for, was found to be superior (Karp et al., 2004; Kreil et al., 2004). Also in 2004, Chang et al. compared global fixed value normalisation using either protein spot volumes or a median value to quantile normalisation. Quantile normalisation was found to minimise the loss of information and non-biological differences better than a global fixed value. Fodor et al. (2005) used local intensity- dependent regression normalisation (also called lowess normalisation), which can reveal and remove protein spot intensity variation in 2D-DIGE data. They reached similar conclusions to the Lilley group and Chang et al. (2004), regarding fixed value normalisation. Kultima et al. (2007) compared eight different normalisation approaches including using the global fixed value function in the DeCyder software, removal of intensity bias as proposed by Kreil et al. (2004) and Fodor et al. (2005), quantile normalisation and lowess regression that both accounted for protein spot intensity spatial variation in 2D-DIGE data. Kultima et al. (2006) concluded that using lowess or quantile normalisation, which accounted for both intensity and spatial variation, was required to successfully and sensitively remove non-biological experimental differences. The first efforts at LC-MS data normalisation involved the use of “housekeeping” genes as internal standards. Wang et al. (1999) computed constant intensity ratios between pairs of experiments based on housekeeping protein reference

L. Ting, UNSW. 27 Chapter 1

peaks. With this approach, however, it is a challenge to find peak intensities that remain stable across all experiments (Baggerly et al., 2003). A similar approach was used by Anderle et al. (2004), where the data from multiple LC-MS experiments were normalised to a reference dataset using global median normalisation. The use of global normalisation approaches to remove experimental variation is common for LC-MS data(Wagner et al., 2003; Roy et al., 2004; Haqqani et al., 2005; Fang et al., 2006; Pavelka et al., 2008); however, a study comparing global versus local normalisation, found that local normalisation performed better at removing non-biological variance (Listgarten & Emili, 2005; Listgarten et al., 2005), especially when LC can produce irregular fluctuations in signals (Listgarten & Emili, 2005). Microarray normalisation methods have also been applied to gel-free quantitative LC-MS data. Callister et al. (2006) compared fixed value, linear regression, lowess regression and quantile normalisation. Merit was found in each of the approaches, under the condition that the appropriate normalisation approach was selected on the basis of identifying the cause of any systematic biases (Callister et al., 2006). Xia et al. (2007)used a lowess approach to normalise to spectral counting data; and Oberg et al. (2008) and Hill et al. (2008) used ANOVA modelling to remove intra- and inter-experimental variations (including tagging efficiency and total protein variation) in order to normalise the data across several iTRAQ experiments. Normalisation of LC-MS data has also been applied to raw spectra prior to data processing (Sauve & Speed, 2004; Cairns et al., 2008). Baseline correction of raw spectra using a global fixed value was required in order to successfully align spectral peaks for downstream analysis (Sauve & Speed, 2004). In summary, it is essential to identify the source and type of experimental variation, in order to select an appropriate normalisation approach. A global fixed value adjustment factor seems to be inaccurate in performing a sensitive normalisation. Local intensity-dependent regression normalisation (i.e. lowess normalisation), commonly used in microarray data processing, might be the way forward for successful and sensitive normalisation of quantitative proteomics data (Chapter 3). However, the challenge with normalisation lies in the selection of a suitable approach that is sensitive to maintaining biological differences, while removing non-biological variation.

28 L. Ting, UNSW. General introduction

1.3.4.4. Significance testing: Learning from transcriptomics The usefulness of quantitative proteomics is predicated on its ability to determine if observed differences in protein abundance are statistically significant. A range of approaches are applicable to quantitative proteomics data including fold-change (FC) thresholding or the application of a t-test. However, since quantitative proteomics is a high-throughput technique, the “multiple hypothesis testing problem” must be considered. A common approach for assigning significance is to apply a threshold to the data, so that any protein ratio exceeding this, often arbitrary, FC threshold is considered significantly changed (Levy et al., 2004; Chen et al., 2005; Cho et al., 2006; Li et al., 2007; Meng et al., 2007). A 1.5-fold or 2-fold threshold may reflect a two-standard- deviation-from-the-mean cut off. This approach does not include a significance-testing component, thus the estimation of error cannot extend past a simplistic standard deviation value. A univariate Student’s t-test can be applied to the data to determine significance. Typically, a change is considered significant if the calculated p-value falls below a prescribed significance threshold, for example p < 0.05. Two types of errors can occur when using the Student’s t-test: a false positive, when a protein is incorrectly declared as significantly changed; or a false negative, when the significance test fails to detect a significantly changed protein (Karp et al., 2007). A considerable number of false positives are likely to accumulate in all high throughput biological experiments due to the “multiple hypothesis testing problem”; applied to proteomics, this describes the repeated application of a statistical test to a set of protein (identification or abundance) measurements that result in many measurements having p-values less than a defined threshold by chance alone (Benjamini & Hochberg, 1995; Storey & Tibshirani, 2003; Manly et al., 2004). The methods for addressing multiple hypothesis testing in transcriptomics are particularly valuable in the assessment and selection of appropriate significance tests for proteomics data, because of the analogous nature of both techniques. Currently, the most popular class of method in microarray studies are those that control FDR, which is estimated by a q-value. Using a q-value provides a more direct way of interpreting significance than a p-value, which measures false positive rate. In the context of

L. Ting, UNSW. 29 Chapter 1

quantitative proteomics, p-values control the rate at which proteins with no change in abundance are deemed significant, while q-values control the rate of significantly changed proteins being false. For example, if the FDR threshold has been set at 5% (q < 0.05), then from a list of 100 proteins with significant differential abundance, there will be a tolerated error of 5 false positive proteins. The same cannot be said for a set of 100 significant proteins that have a maximum p-value of 0.05; this is because the number of false positives is calculated from the entire dataset of proteins tested. Therefore, if the dataset contains 1,000 proteins, then data for 50 proteins (5% of 1000) are likely to be false positives. Clearly, if significance testing is used without correcting for the multiple testing problem, then there is a risk that the tests are incomplete and violate statistical assumptions. The importance of FDRs for protein identification has recently been addressed(Peng et al., 2003; Higgs et al., 2007; Nesvizhskii et al., 2007; Choi et al., 2008; Choi & Nesvizhskii, 2008; K ll et al., 2008a; K ll et al., 2008b; Tabb, 2008). However, correcting for multiple testing has rarely been considered in quantitative proteomics studies. There are several recent examples, however, where multiple testing has been accounted for 2DE-based quantitative proteomics. Chang et al. (2004) compared three common approaches for multiple testing correction: the classical Bonferroni method of p-value adjustment, and the FDR approach of Benjamini- Hochberg (1995) and Storey-Tibshirani (2003) for 2DE data. The Bonferroni correction involves the adjustment of the confidence level (e.g. = 0.05) by the number of tests Ɣ ͤTͤͩ (i.e. number of proteins tested; ) ). Chang et al. (2004) concluded that the Bonferroni correction was too conservative and inconsistent, while both FDR methods were better at controlling false negatives and positives. Fodor et al. (2005) found the Benjamini-Hochberg approach suitable for removing false positives in 2DE data. Meunier et al. (2005) used a significance analysis of microarrays (SAM) method that applied a modified t-test using a data permutation technique (that enabled FDR to be estimated) on 2DE data; and found that it controlled false negative and positive occurrence better than a Student’s t-test. Biron et al. (2006) implemented t-testing and one way ANOVA modelling of 2DE data. It was concluded that 5 replicates per treatment was required for a statistically sound 2DE experiment, because experiments with <5 replicates were predicted to result in poor estimates of variability (Biron et al.,

30 L. Ting, UNSW. General introduction

2006). With the increase of data points, robust statistical methods can be utilised in order to interpret changes in protein abundance with confidence (Karp et al., 2005b). Kultima et al.(2006) devised their own FDR-analogous approach, named Differential Expression in Predefined Protein Sets for 2D-DIGE data. Karp et al. (2007) used a Student’s t-test coupled with the Storey-Tibshirani FDR to examine for significant differences in 2DE-DIGE data, and concluded that this approach was important for decreasing the risk of false biological effects while maintaining power in detecting real biological changes. There are limited examples of statistical analyses accounting for multiple testing in gel-free LC-MS-based quantitative proteomics. Fang et al. (2006) used the SAM approach for FDR estimation of label-free quantitative LC-MS/MS data and similarly, Roxas & Li (2008) demonstrated that SAM analysis of 14N/15N labelled samples was superior to a FC threshold or conventional t-test without correcting for multiple testing. The Hackett group used a two-sample t-test to estimate significance of stable isotope labelled data, and a G-test for spectral counting data; false positives were controlled by the use of the Storey-Tibshirani q-value (Hendrickson et al., 2006; Xia et al., 2007). Zhang et al. (2006) compared five different statistical tests for evaluating the significance of spectral counting data, with the inclusion of the Benjamini-Hochberg method to correct for multiple testing. They concluded that the Student’s t-test was best for when three or more replicate experiments were available for analysis; while the G- test was found to be the superior significance test when no replicates were available (Zhang et al., 2006). It is clear that the statistical analyses of quantitative proteomics data is widely varied in not only including simplistic fold change thresholds, but also extending past Student’s t-tests, to include statistical modelling and account for multiple testing. This is because different techniques create different experimental artifacts that need to be considered in the statistical model. In addition, there is no standard for determining statistically significant differential protein abundance; and clearly, a benchmark needs to be established for the minimum requirements of a well-designed and statistically robust experiment.

L. Ting, UNSW. 31 Chapter 1

1.3.5. A definition of protein “regulation” vs. abundance

Often the increase or decrease of proteins from quantitative proteomics experiments are referred to as “up-regulated” or “down-regulated” in the same way as how DNA microarray experiments measure gene expression by mRNA transcript levels. Although the increase of protein abundance is most often linked with the up-regulation of gene expression (and vice versa), there are also other events in the cell that may uncouple the abundance of proteins with “expression”; such as posttranslational modifications (Mann & Jensen, 2003), the varying lifetime of transcripts (Atwater et al., 1990), protein turnover(Bachmair et al., 1986; Belle et al., 2006) and translational control(Lange & Hengge-Aronis, 1994; Hengst & Reed, 1996). Furthermore, mRNA transcript levels are not always positively correlated with relative protein abundance levels (Gygi et al., 1999b; Chen et al., 2002). For these reasons, in this study, the increased or decreased detection of proteins are referred to as abundance changes, and not changes in gene expression.

1.4. Project aims

The work described in this thesis was to elucidate molecular mechanisms of cold adaptation in S. alaskensis. In order to achieve this objective, an analytical platform for accurate and reliable measurements of the protein profiles of cells cultured at 10ºC vs. 30ºC was established. 10 C was selected as the low culturing temperature to reflect in situ temperature. Furthermore, growth of S. alaskensis at temperatures below 10 C were deemed logistically unfeasible (>6 months culture time) for this project. 30 C was selected as the high culturing temperature because it is the temperature at which maximum growth rate occurs; and the temperature at which S. alaskensis is cultured in the majority of available published literature, thus serving as an opportunity to compare the current study to previous studies.

32 L. Ting, UNSW. General introduction

The specific project aims were: 1. To develop a 14N/15N metabolic labelling-based quantitative proteomics platform for analysis of S. alaskensis proteins; including the empirical optimisation of parameters for sample preparation, MS sample analysis, and protein identification and quantitation (Chapter 2). 2. To develop a rigorous post-experimental data processing workflow for the normalisation and statistical testing of quantitative proteomics data (Chapter 3). 3. To infer the biology of cold adaptation from the S. alaskensis quantitative proteomics data and genome data (Chapter 4).

L. Ting, UNSW. 33

34 L. Ting, UNSW. Method development

Chapter 2. Method development for a metabolic labelling-based quantitative proteomics platform 2.1. Summary

Quantitative proteomics is a powerful tool to study changes in protein abundance as a response to changing growth conditions. Metabolic labelling involves the incorporation of stable isotope labels into proteins during cell growth. The replacement of light with heavy isotopes results in characteristic mass shifts in the mass spectra of peptides, which can be used to quantify the relative abundance changes between the light vs. heavy protein derivatives. The aim of the work described in this section was to empirically optimise the experimental parameters to identify and quantify S. alaskensis proteins. For cell cultures, the number of growth generations in metabolic labelling was evaluated for complete incorporation of the 15N label. For sample preparation, the protein extraction method and buffers used were evaluated and optimised. An automated nanoLC-MS/MS protocol was compared to a GeLC-MS/MS protocol for protein analysis, and the post- experimental data processing steps of protein identification and quantitation examined for optimum parameters to yield confident and high quality data.

2.2. Materials and methods

2.2.1. Cell culture and harvest

For all experiments, S. alaskensis was grown on artificial sea water (ASW) medium (Eguchi et al., 1996) in batch culture at 10ºC or 30ºC with 100rpm rotary shaking (Fegatella et al., 1998). Cell growth was monitored spectrophotometrically (Libra S11; Biochrom, UK). The ultramicrosize of S. alaskensis (<0.1m3) confounds spectrophotometric measurements of at the standard 600nm wavelength. Thus, 1mL samples, aseptically withdrawn from growing cultures, were measured at optical density=433nm (OD433) (Eguchi et al., 1996; Fegatella et al., 1998; Ostrowski et al.,

2001). All cell cultures were harvested at mid-logarithmic growth phase at OD433 0.3. Cells were harvested by centrifugation at 4,470×g for 20min at 4ºC (Rotina 38R; Hettich, Germany). Cell pellets were either immediately used for MS analyses, or snap frozen in a dry ice-ethanol bath and stored at –80ºC.

L. Ting, UNSW. 35 Chapter 2

2.2.2. Metabolic labelling and determining 15N incorporation

For global temperature comparison experiments, cells were inversely metabolically 14 15 labelled during growth in unlabelled ( NH4Cl) and labelled (99% enriched NH4Cl; Cambridge Isotope Laboratories, USA) ASW media, where all other sources of N were eliminated. In all experiments, heavy and light NH4Cl was provided at 0.5g/L as described by Eguchi et al. (1996). 2.2.2.1. Theoretical calculation of 15N APE The atom percent excess (APE) is the commonly used unit for reporting levels of isotopic abundance. To determine the minimum number of generations required to achieve maximal or ‘complete’ 15N incorporation in S. alaskensis, the 14N isotope dilution effect of exponentially doubling growth was examined. With each generation of 15N labelled growth (i.e. increasing 15N APE), each original 14N cell is diluted out. The following equation describes the theoretical dilution of 14N cells, where n represents the 15 number of generations and the 99% purity of the NH4Cl was taken into account [1].

[1] The minimum number of growth generations was estimated to be when no further improvement of 15N enrichment could be achieved with increasing n. 2.2.2.2. Experimental measurement of 15N APE To confirm the estimated minimum number of growth generations, a range of theoretical 15N APE isotopic profiles (90% - 100% APE) of selected S. alaskensis tryptic peptides were modelled. These profiles were compared to the experimentally derived isotopic profiles of the same peptides grown at the estimated minimum number of growth generations. The 15N APE was estimated from the closest matching theoretical and experimental isotopic profiles. 15 Three biological replicates of S. alaskensis were cultured in NH4Cl ASW at 30ºC in 100mL volumes. The cultures were inoculated with 6 × 105 CFU/mL of 14N S. alaskensis cells and allowed to grow for 10 generations to yield in 3 × 108 CFU/mL of cells, which constitutes mid-logarithmic phase (OD433 0.3). Cells were harvested as described above (Section 2.2.1) and disrupted by sonication on ice in a 10mM Tris-HCl pH 8.0 extraction buffer (Merck, USA) using a digital sonifier unit (S-250D; Branson, USA). The amplitude of sonication was 30% with 0.5s pulse on and 0.5s pulse off

36 L. Ting, UNSW. Method development

exposure for 2-5min. Cell debris was pelleted by centrifugation at 30,670g at 4ºC for 25min. To examine the isotopic profiles of peptides, 100g of whole cell lysate protein was dried in vacuo, resuspended in 10mM NH4HCO3 (Sigma-Aldrich, USA) and digested overnight at 37ºC in a 1:100 trypsin:protein ratio (Promega, USA). For MS analysis, the peptides were diluted 1:80 in a 1% (v/v) formic acid (HCOOH; Sigma- Aldrich, USA):0.05% (v/v) heptafluorobutyric acid (HFBA; Pierce, USA) solution. Samples (2.5L) was loaded on a C18 RP-trap column (Peptide CapTrap; Michrom

Bioresources, USA) with H2O/acetonitrile (CH3CN; Univar, USA) (98:2, 0.1% (v/v)HCOOH; Sigma-Aldrich, USA).After a 4 min wash, the pre-column was switched into line with a fritless nano column (75m × ~10cm) containing C18 media (5m, 200Å Magic; Michrom Bioresources, USA) manufactured according to Gatlin et al.

(1998), and peptides were eluted using a linear gradient of H2O/CH3CN (64:36, 0.1% (v/v) HCOOH) at ~300nL/min over 30min. The trap was connected via a fused silica capillary to a low volume tee (Upchurch Scientific, USA) where high voltage (2300V) was applied, and the column tip was positioned ~1cm from the orifice of a hybrid quadrupole time-of-flight (QTOF) mass spectrometer (QSTAR Pulsar i; Applied Biosystems, USA). Positive ions were generated by electrospray and the QSTAR operated in information dependent acquisition mode. A TOF-MS survey scan was acquired (350-1700m/z, 0.75s) and the 2 largest multiply charged ions (counts >20, charge state 2 and 4) were sequentially selected by Q1 for MS/MS analysis. Nitrogen was used as a collision gas and an optimum collision energy chosen (based on charge state and mass). MS/MS were accumulated for 2s (65-2000m/z). Processing scripts generated data suitable for submission to the SEQUEST search algorithm (Eng et al., 1994). Extracted spectra were searched against the S. alaskensis database containing 3208 proteins, using the following parameters: trypsin was the selected enzyme with the allowance for 1 missed cleavage; carbamidomethyl and methionine oxidation were selected for variable modifications; MS tolerance was ±0.25Da; and MS/MS tolerance was ± 0.2Da. DTA Select was used to filter identifications (Tabb et al., 2002), where accepted peptides were required to have dCn 0.08, Xcorr 1.9, 2.0 and 3.3 for +1, +2 and +3 charged peptide species. The isotopic profiles of highly abundant peptides identified in two of three experiments were modelled using IsoPro 3.0 (http://members.aol.com/msmssoft) and IDCalc (http://proteome.gs.washington.edu/software/IDCalc/). Highly abundant

L. Ting, UNSW. 37 Chapter 2

peptides identified in Mascot were selected for APE estimation. The peptide sequence was entered into IsoPro, and a range of 15N APE values (70% to 100%) were iteratively modelled and compared against the experimental isotopic distribution. The closest matching experimental to theoretical isotopic profile, based on peak height distribution, was determined to be the APE value.

2.2.3. Sample preparation optimisation

The buffers for protein extraction, sonication and the best method for extraction were optimised. A gel-free LC-MS/MS platform was optimised for the analysis of extracted proteins, where the starting protein quantity for analysis, tryptic peptide dilution, the use of RapiGest SF (Waters, USA), trypsin:protein ratio, protease inhibitors and chelators, injection volume and LC gradient length were evaluated. A GeLC-MS/MS platform, where a gel-based protein separation step was employed prior to LC-MS/MS, was also evaluated by direct comparison to LC-MS/MS. 2.2.3.1. Optimising sonication and evaluating Tris vs. urea buffers in protein extraction

14 Cell cultures were grown at 30ºC in NH4Cl ASW (50mL), and harvested at mid-log phase as outlined above (Section 2.2.2.2). Cell pellets were resuspended in 10mM Tris- HCl pH 8.0 (hereafter referred to as the Tris buffer or Tris experiment) or in 8M urea (Sigma-Aldrich, USA) (hereafter referred to as the urea buffer or urea experiment) in a 1.5mL tube. Cell pellets were disrupted by sonication on ice as outlined in Section 2.2.2.2for 2, 3, 4 and 5min, in triplicate. The supernatant, containing soluble whole cell lysate proteins, was collected as described above (Section 2.2.2.2).Protein yields were measured using a Bradford protein assay (Bradford, 1976) using BSA (Sigma-Aldrich, USA) as a standard. The performance of the Tris vs. urea protein extraction buffer at each time point was evaluated by the protein yield from sonication. 2.2.3.2. LC-MS/MS: Evaluating RapiGest, starting protein amount and dilution of peptides A gel-free LC-MS/MS approach was the preferred platform for protein identification and quantitation. To evaluate the success of an LC-MS/MS approach, two test samples (A and B) were processed: with and without RapiGest during trypsin digestion; with two different starting protein amounts for digestion; and in three different ratios of digested peptide sample:formic acid dilutions prior to LC-MS/MS analysis (Figure 2.1).

38 L. Ting, UNSW. Method development

RapiGest is a commercially available reagent used to enhance enzymatic digestion of proteins. RapiGest helps solubilise proteins, making them more susceptible to enzymatic cleavage without inhibiting enzyme activity.

Figure 2.1. Workflow for LC-MS/MS: Evaluation of Rapigest, starting protein amount and sample dilution. Samples A and B were cultured in 50mL volumes, two starting amounts of protein (35g and 50g) and three tryptic peptide:formic acid (HCOOH) (1:8, 1:10, and 4:1) solution were evaluated. Samples were analysed by LC-MS/MS and proteins identified using Mascot.

Cell pellets (50mL) were prepared according to Section 2.2.2.2, with a 4min sonication period in the Tris buffer. For trypsin digestion, 35g or 50g of protein was dried in vacuo and prepared as described in Section 2.2.2.2. After an overnight digestion, the resulting peptides were solubilised and diluted 1:8, 1:10 or 4:1 in 1% HCOOH/0.05% HFBA. Peptidic digests were separated by online nano-LC using an Applied Biosystems Microgradient system (USA). Samples (2.5L) were concentrated and desalted on a micro C18 pre-column with H2O/CH3CN (98:2, 0.1% (v/v) HCOOH) at 20L/min. After a 4min wash, the pre-column was switched into line with a fritless nano column (75m × ~10cm) containing C18 media (5m, 200Å Magic; Michrom, USA) manufactured according to Gatlin et al. (1998). Peptides were eluted using a

L. Ting, UNSW. 39 Chapter 2

linear gradient of H2O/CH3CN (98:2, 0.1% (v/v) HCOOH to H2O/CH3CN (38:62, 0.1% (v/v) HCOOH) at ~ 300nL/min over 45min and electrosprayed directly using high voltage (1.8kV) into a 3D ion trap mass spectrometer (LCQ Deca XP+, Thermo Electron, Germany). A survey scan (350-1800m/z) was collected, followed by data dependent acquisition of MS/MS spectra at 35% normalized collision energy of the most intense parent ion from the MS scan. Activation was set at q = 0.25 with an activation time of 30ms and a minimum of 5 × 106 counts for MS was required. Dynamic exclusion was enabled, where after a maximum of 3 repeated MS/MS, the parent ion was excluded for 1.5min. Highly abundant singly charged ions of 391.25, 445.6 and 463.50 ± 1.5m/z were excluded. Samples A1, A2, B1 and B2 were also analysed by LC-MS/MS using the QSTAR Pulsar i. Processing of samples were as described above (Section 2.2.2.2). For protein identification MS/MS were interrogated against the completed S. alaskensis genome database using the Mascot search algorithm (v 2.0; Matrix Science, UK). For both LCQ and QSTAR data files, the search parameters used were trypsin as the selected enzyme with the allowance for 1 missed cleavage, and variable carbamidomethyl and methionine oxidation modifications. For LCQ data files, ± 1.2Da MS tolerance and ± 0.6Da MS/MS fragment ion tolerance was used and ESI- trap was selected as the instrument type. For QSTAR data files,MS tolerance was set at ± 0.25Da;MS/MS tolerance was ± 0.2Da; and ESI-QSTAR was selected as the instrument. All peptide identifications were manually verified and accepted as a confident identification according to the following requirements: a minimum of 4 consecutive y or b ions; expect value < 0.05; peptide score MOWSE value for “identity” (p < 0.05). The number of Mascot queries and number of confident protein identifications were used to compare the tested parameters in an LC-MS/MS analytical platform. 2.2.3.3. LC-MS/MS: Evaluating starting protein amounts, trypsin:protein ratio, protease inhibitor and chelating agent The starting protein amount for LC-MS/MS was further evaluated with different trypsin:protein ratios, and the absence or presence of a protease inhibitor with a chelating agent. Two 50mL 14N 30ºC cultures (C and D) were used, where proteins from C were extracted in the Tris buffer, and proteins from D were extracted in the Tris buffer containing 1mM phenylmethanesulphonyl fluoride (PMSF) (Sigma-Aldrich,

40 L. Ting, UNSW. Method development

USA) and 1mM ethylenediaminetetraacetic acid (EDTA) (Sigma-Aldrich, USA), hereafter referred to as Tris PE. The cell disruption method was sonication on ice for 4min, as described above (Section 2.2.2.2). For LC-MS/MS analysis, two starting amounts of protein (50g and 500g) and two trypsin:protein ratios (1:100 and 1:20) were evaluated (Figure 2.2). LC-MS/MS analysis was with the QSTAR Pulsar i hybrid tandem mass spectrometer, where samples were processed according to Section 2.2.2.2. Proteins were identified using Mascot with search parameters were as described in Section 2.2.3.2. The number of Mascot queries and number of confident protein identifications were used to compare the tested parameters.

Figure 2.2. Workflow for LC-MS/MS: Evaluation of starting amount of protein for analysis, trypsin:protein ratio and a protein inhibitor with chelating agent. Samples C and D were cultured in 50mL volumes. Proteins from sample C were extracted in 10mM Tris-HCl pH 8.0. Proteins from sample D were extracted in 10mM Tris-HCl pH 8.0 with 1mM PMSF and 1mM EDTA (Tris PE). Two starting amounts of protein (50g and 500g) and two trypsin:protein ratios (1:100 and 1:20) solution were evaluated. Samples were analysed by LC-MS/MS and proteins identified using Mascot

2.2.3.4. LC-MS/MS: Evaluation of Tris vs. urea, injection volume, and LC gradient length The Tris-and urea-based protein extraction buffers were further evaluated with varying injection volumes and LC gradients. Tris PE was compared to the urea buffer with the

L. Ting, UNSW. 41 Chapter 2

inclusion of 1mM PMSF and 1mM EDTA, hereafter referred to as urea PE. Two 50mL 30ºC 14N S. alaskensis cultures (T and U) were harvested at mid-logarithmic phase and proteins were extracted by sonication on ice as described above(Section 2.2.2.2). Proteins from culture T were extracted in the Tris PE buffer and sonicated for 4min. Proteins from culture U were extracted in the Urea PE buffer and sonicated for 2min. Each sample was reduced with 10mM 1,4-dithiothreitol (DTT; Roche, Switzerland) at 37ºC for 30min and alkylated with 25mM iodoacetamide (IDA; Sigma-Aldrich, USA) at 37ºC for 30min. For digestion, 100g of each sample was diluted with 500L of

20mM NH4HCO3 and digested with 1:20 trypsin:protein overnight at 37ºC. Prior to MS analysis, 25L of each sample was diluted with 10L of 1%HCOOH/0.05% HFBA. Samples (2.5, 5 and 10L; Table 2.1) were concentrated and desalted on a micro

C18 precolumn (2mm × 500μm, Michrom Bioresources, USA) with H2O/CH3CN (98:2, 0.05% HFBA) at 15l/min. After a 4min wash, the pre-column was automatically switched (10 port valve; Valco, USA) into line with a fritless nano column (Gatlin et al.,

1998). Peptides were eluted using a linear gradient of H2O/CH3CN (98:2, 0.1% formic acid) to H2O/CH3CN (74:36, 0.1% formic acid) at ~300nl/min over 30, 60 and 90min (Table 2.1). The precolumn was connected via a fused silica capillary (25μm  10cm) to a low volume tee (Upchurch Scientific, USA) where HV (2400 V) was applied and the column tip positioned ~1cm from the Z-spray inlet of a hybrid QTOF tandem mass spectrometer (QTof Ultima API hybrid; Micromass, UK). Positive ions were generated by electrospray and the QTof operated in data dependent acquisition mode. A TOF MS survey scan was acquired (350-1700m/z, 1s) and the 2 largest multiply charged ions (counts > 20) were sequentially selected by Q1 for MS-MS analysis. Argon was used as collision gas and an optimum collision energy chosen (based on charge state and mass). Tandem mass spectra were accumulated for up to 2s (50-2000 m/z). Peak lists were generated by MassLynx (v 4.0 SP4; Micromass, UK) using the Mass Measure program and submitted to a Mascot search of the S. alaskensis database as described above (Section 2.2.3.2), where search parameters were: precursor and product ion tolerance ± 0.25 and ± 0.2 Da, respectively. The number of Mascot queries and confident protein identifications were used to compare the experiments evaluating Tris PE vs. urea PE, injection volume and LC gradients. In order to compare the Tris PE vs. urea PE buffers, all confidently identified

42 L. Ting, UNSW. Method development

Table 2.1. Combination of Tris vs. urea, injection volume and LC gradient parameters for LC-MS/MS analysis Injection LC gradient Experiment volume (L) (min) T1 2.5 30 T2a 5 60 T2b 5 60 U1a 2.5 30 U1b 2.5 30 U2 5 60 U3a 10 90 U3b 10 90 T; Tris PE buffer, U; urea PE buffer. Experiments T2, U1 and U3 were performed in duplicate: a and b.

proteins from the Tris PE extraction buffer were pooled, as were all confidently identified proteins from the urea PE extraction buffer. The numbers of unique protein identifications from both extraction buffers were compared. Similarly, the number of Mascot queries and confident protein identifications were used as a measure of comparison for the injection volumes and LC gradients. 2.2.3.5. GeLC-MS/MS: A comparison to LC-MS/MS The same samples (A and B) were used to directly compare a GeLC-MS/MS analytical platform to the LC-MS/MS experiments performed in Section 2.2.3.2. The same amount of protein (~35g) was separated on a 10 lane 0.75mm sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) gel cast at 12% total acrylamide- bisacrylamide monomer concentration (T), 5% crosslinker concentration (C). The 12% resolving gel consisted of 12% polyacrylamide (37.5 bis:acrylamide; BioRad, USA),

25% (v/v) 1.5M Tris-HCl pH 8.8, 43.4% (v/v) deionised H2O (Millipore, USA), 1% (w/v) SDS (Sigma-Aldrich, USA), 1% (v/v) N, N, N’, N’-Tetra-methyl-ethylenediamine (TEMED; BioRad, USA), and 5% (w/v) ammonium persulfate (Amersham, UK). The 5% stacking gel consisted of 5% polyacrylamide, 6.3% (v/v) Tris-HCl pH 6.8, 35.3% (v/v) MilliQ, 0.5% SDS, 1% TEMED, and 5% (w/v) ammonium persulfate. The SDS- PAGE sample buffer consisted of 50mM Tris-HCl pH 6.8, 100mM DTT, 2% (w/v) SDS, 0.1% (w/v) bromophenol blue (Millipore, USA), and 10% (v/v) glycerol (Univar, USA). SDS-PAGE was performed in a running buffer that consisted of 0.302% (w/v) Tris base (Merck, USA), 1.88% (w/v) glycine (ICN, USA), and 0.1% (w/v) SDS in a

L. Ting, UNSW. 43 Chapter 2

Mini PROTEAN 3 Cell unit (BioRad, USA). Running conditions were 50V for 15min, followed by 150V for 1h, or until the bromophenol blue dye front reached the bottom of the gel. The gels were stained using Coomassie blue R250 (Sigma-Aldrich, USA) in a staining solution consisting of 45% (v/v) methanol (Ajax Finechem, Australia), 45%

(v/v) deionised H2O, 10% (v/v) acetic acid (Ajax Finechem, Australia), and 0.0025% (w/v) Coomassie blue R250, for 30 min. Gels were destained twice with a solution consisting of 30% (v/v) methanol, 60% (v/v) deionisedH2O, and 10% (v/v) acetic acid. Entire lanes containing separated proteins from sample A or B were excised into 14 horizontal slices with a clean razor blade (Figure 2.13). The slices were further diced into ~1mm3 cubes and transferred into clean 1.5mL centrifuge tubes. The gel pieces were washed with deionised H2O and destained in 1:1.3 100mM NH4HCO3/CH3CN until the gel pieces were clear. The gel pieces were dehydrated with 2 changes of

CH3CN and dried in vacuo. Proteins in the gel pieces were reduced with 10mM DTT at 37ºC for 1h, after which all liquid was removed. Proteins were immediately alkylated with 25mM IDA at 37ºC for 1h in the absence of light. After alkylation, all liquid from the tubes were removed and the gel pieces washed in 2 changes of deionised H2O and once with 10mM NH4HCO3. Gel pieces were dehydrated with 2 changes of CH3CN and then dried in vacuo. Proteins were digested with a 1:25 trypsin:protein ratio. Gel pieces were rehydrated with a 20ng/L trypsin solution and incubated at 4ºC for 1h to allow for full rehydration. Tubes were transferred to 37ºC and incubated for 14h. The tryptic peptide solution was removed from each tube and transferred to fresh 1.5mL centrifuge tubes.

The remaining gel pieces were dehydrated with 2 changes of CH3CN, and the liquid was combined with the tryptic peptides. The tryptic peptide samples were dried in vacuo and resuspended in 20L of 1% HCOOH/0.05% HFBA. The peptides from each gel fraction were analysed by LC-MS/MS with an LCQ Deca XP+(Section 2.2.3.2), with the inclusion of a variable acrylamide modification in Mascot and SEQUEST database interrogations. 2.2.3.6. GeLC-MS/MS: Tris PE vs. urea PE The GeLC-MS/MS analytical platform was further evaluated with the Tris PE vs. urea PE protein extraction buffers. Also, the GeLC-MS/MS experiments were scaled up, where 800g of protein was separated using a 5 lane 1.5mm SDS-PAGE.

44 L. Ting, UNSW. Method development

Two biological samples of S. alaskensis were cultured at 30ºC in 250mL of 14N ASW. Both cultures were prepared as described in Section 2.2.2.2. Proteins from one culture was extracted using the Tris PE buffer and sonicated for 4min, and proteins from the second culture were extracted in the urea PE buffer and sonicated for 2min. Approximately 800g of Tris PE proteins and urea PE proteins were separated by SDS- PAGE (Section 2.2.3.5). The Tris PE gel was sliced into 18 slices and the urea gel was sliced into 16 slices. Sample processing and LC-MS/MS were performed as described above (Section 2.2.3.5). To evaluate the extraction efficiency of using the Tris PE or urea PE buffers with GeLC-MS/MS, all raw data from each gel slice was combined into a single SEQUEST (Section 2.2.3.2), with the inclusion of a variable acrylamide modification in SEQUEST database interrogations.Protein identifications were filtered using DTA Select (Section 2.2.2.2), and the number of unique proteins and non-redundant peptides were used as a measure for comparison of Tris PE vs. urea PE.

2.2.4. An in silico analysis of amino acid composition and trypsin digestion in S. alaskensis

Due to weak mass spectra and subsequent poor identification data using LC-MS/MS, an in silico analysis of the amino acid composition of the S. alaskensis proteome was performed, where the number and percentage of all amino acids were calculated and compared to the amino acid compositions of E. coli, S. cerevisiae, P. angustum S14 and all organisms in the NCBI database. The aim of determining amino acid composition was to examine the proportion of R and K residues in the S. alaskensis proteome, due to the proteolytic specificity of trypsin to these residues. Further, the theoretical peptides produced by full trypsin digestion were generated in silico for S. alaskensis in order to determine the range of tryptic peptide lengths, the average peptide length and the proportion of peptides between 7-25 amino acids in length. The theoretical tryptic peptides were also generated for E. coli and S. cerevisiae, and compared to S. alaskensis. Bioinformatic analysis was performed courtesy of Neil F. W. Saunders in the School of Biotechnology and Biomolecular Sciences, UNSW.

2.2.5. Mass spectrometry optimisation:LTQ-FT Ultra

A large component of GeLC-MS/MS work was performed using a 3D ion trap mass spectrometer (LCQ Deca XP+; Chapter 3). Partway into the current study (2006), a

L. Ting, UNSW. 45 Chapter 2

hybrid linear ion trap/Fourier transform ion cyclotron resonance (FTICR) mass spectrometer (LTQ-FT Ultra; ThermoFisher Scientific, USA), interfaced with an autosampler nano-LC system (Ultimate 3000; Dionex, Netherlands), was commissioned at the Bioanalytical Mass Spectrometry Facility at UNSW. The improved scan rate, resolution and sensitivity of this instrument allowed for greater numbers of MS/MS spectra, and therefore identifications, to be obtained. Therefore, the MS analysis of S. alaskensis samples was transferred to the LTQ-FT Ultra, thus requiring some additional optimisation of LC and instrument methods. S. alaskensis was grown at 10ºC 14N ASW (250mL) and at 30ºC 15N ASW (250mL). Both cultures were harvested at mid-logarithmic phase and cell pellets were combined 1:1 (v/v) prior to protein extraction by sonication on ice in Tris PE for 4min (Section 2.2.2.2). Approximately 1mg of protein was subjected to separation using a 5 lane 1.5mm SDS-PAGE (Section 2.2.3.6). A ~5mm gel slice was excised from the ~55kDa region and diced, washed, reduced, alkylated and digested as described above (Section 2.2.3.5). A range of LC and MS parameters using the LTQ-FT Ultra were evaluated (Section 2.2.5.1) using the ~55kDa gel slice. The resulting MS data was searched against the S. alaskensis database using Mascot as described in Section 2.2.3.2.The search parameters were: peptide tolerance ± 0.8Da; fragment ion tolerance ± 0.2Da; ESI-TRAP instrument type. For SEQUEST searches, protein identifications were filtered using DTA Select (Section 2.2.2.2). The number of unique proteins, queries, score and percent coverage for the top protein hit (for Mascot searches); and the number of unique proteins, non-redundant peptides and percent coverage for the top DTA Select hit (for SEQUEST searches) were used as a measure for comparison of the tested parameters. 2.2.5.1. Optimisation of LC and MS parameters LC gradient lengths of 30, 45, 60 and 75min; tryptic digest dilution ratios of 1:100, 1:80, 1:60, 1:20 in 1% HCOOH/0.05% HFBA; 2.5L and 5L sample injection volumes were evaluated. The MS instrumentation parameters evaluated were: precursor scanning using the LTQ ion trap vs. the FTICR cell, the number of MS2 scans, and MS2-5 scans. The combinations of these parameters are shown in Table 2.10.

46 L. Ting, UNSW. Method development

2.2.5.2. MS analysis Peptides were separated using nano-LC on an Ultimate 3000 HPLC and autosampler system (Dionex, Netherlands). Samples were concentrated and desalted on a micro C18 precolumn (500m2mm, Michrom Bioresources, USA) with H2O/ACN (98:2, 0.05% (v/v) HFBA) at 20L/min. After a 4min wash, the precolumn was switched (10 port valve, Valco, USA) into line with a fritless nano column (75m × ~10cm) containing C18 media (5m, 200Å Magic; Michrom, USA) (Gatlin et al., 1998). Peptides were eluted using a linear gradient of H2O/CH3CN (98:2, 0.1% (v/v) formic acid) to

H2O/CH3CN (55:45, 0.1% (v/v) formic acid) at 250nL/min over the tested gradient lengths. High voltage (1.8kV) was applied to a low volume tee (Upchurch Scientific, USA) and the column tip positioned ~0.5cm from the heated capillary (T = 200°C) of the LTQ-FT Ultra mass spectrometer. Positive ions were generated by electrospray and the mass spectrometer operated in data dependent acquisition mode. A survey scan was collected by either the LTQ analyser (350-1750 m/z)or FT analyser (350-1750 m/z). This was followed by 2 MS2 scans, where the 1st and 2nd most intense precursor ions from the MS trace were sequentially isolated and fragmented; or 4 MS2 scans, where the 1st to 4th most intense precursor ions from the MS trace were sequentially isolated and fragmented; or MS2-5 scans, where the 1st most intense precursor ion from the MS trace was isolated and fragmented in MS2, followed by the isolation and fragmentation of the 1st most intense MS2 fragment ion in MS3and so on. Fragmentation was achieved using CID at 35% normalized collision energy with an activation q = 0.25 and activation time of 30ms, with a minimum signal required at 2,000 counts. Dynamic exclusion was enabled, where after a maximum of 2 repeated MS2, the parent ion was excluded for 3min.

2.2.6. Data processing optimisation

The post-experimental component of MS analysis involved adjusting a range of parameters associated with protein identification or quantitation. Using an example dataset, DTA Select parameters were evaluated for confident protein identifications, and RelEx parameters were evaluated for high quality protein quantitation. S. alaskensis was cultured as described in Section 2.2.5, where a 10ºC culture and 30ºC culture were labelled with 14N and 15N, respectively, harvested, combined and ~1mg of extracted protein was processed by GeLC-MS/MS (Section 2.2.3.6) for

L. Ting, UNSW. 47 Chapter 2

optimisation of DTA Select and RelEx parameters. In addition, ~100g of protein was processed by GeLC-MS/MS in 1 lane of a 5 lane 1.5mm SDS-PAGE gel for RelEx parameter optimisation to assess performance with 10-fold less protein. The samples were analysed using the LTQ mass spectrometer (Section2.2.5.2), with the optimised running conditions determined from Section 2.2.5.1 (for results, see Section 2.3.4). Briefly, 5L of tryptically digested peptides were diluted 1:20 in 1% HCOOH/0.05% HFBA, and separated by RPLC using a 75min gradient. Precursor scanning was performed by the LTQ, followed by 2 MS2 scans. Gel slice data from the ~1mg protein SDS-PAGE were combined into a single SEQUEST search against the S. alaskensis database with search parameters outlined in Section 2.2.5. The gel slice data from the ~100g protein SDS-PAGE was processed in an identical fashion.

2.2.6.1. 1% FDR for protein identification: Optimisation of DTA Select parameters The optimisation of DTA Select filtering parameters aimed to achieve 1% FDR. To calculate FDRs, a randomised decoy S. alaskensis database, the same size as the actual S. alaskensis database, was created, against which the combined ~1mg protein gel slice data was also searched using SEQUEST. Both datasets were filtered by DTA Select, where the dCN cut-off was 0.08 with a minimum of 1 peptide per loci, and XCorr values were incrementally increased from 1.9 to 3.3 for +1, +2 and +3 peptides. The number of singly, doubly and triply charged peptides that satisfied the thresholds were tallied using a Perl script written by Matt Z. DeMaere in the School of Biotechnology and Biomolecular Sciences, UNSW. FDRs were calculated using equation [2] (Peng et al., 2003).

[2]

Where % FDR is the estimated false discovery rate; ndecoy is the number of peptides identified using SEQUEST and filtered by DTA Select from the decoy database; and nreal is the number of peptides identified using SEQUEST and filtered by DTA Select from the actual S. alaskensis database. 2.2.6.2. Optimisation of RelEx quantitation parameters The aim of optimising RelEx quantitation parameters was to establish stringent parameters in order to confidently measure differential abundance of proteins in S. alaskensis at 10ºC vs. 30ºC. The data from the ~1mg protein and ~100g SDS-PAGE

48 L. Ting, UNSW. Method development

were both used to optimise RelEx quantitation parameters. After searching the spectra from both datasets against the S. alaskensis database using SEQUEST, DTA Select was used to filter the data using the optimised parameters determined in Section 2.2.6.1.

2.3. Results

2.3.1. 15N APE

2.3.1.1. Theoretical incorporation 15N APE increases as 14N cells are diluted out in each generation of growth. By the 10th generation (210), the theoretical 15N APE is 98.9% and further doublings provide minimal improvement of the 15N label (Figure 2.3). 2.3.1.2. Experimental incorporation Four highly abundant peptides from three proteins were examined from three biological replicas to experimentally determine 15N APE. The selected peptides were VGEEVEIVGIKDTK and VLAENVAGNAAVDFANIDKAPEER from Sala_1830; TGETMTIAASNQPK from Sala_1117; and VTIDKDNTTIVDGAGDAEAIK from Sala_0966 (Table 2.2). The 15N isotopic pattern for each peptide was modelled from 70% APE to 100% APE using IsoPro 3.0, with only the best matching APE profiles shown (Figure 2.4 to Figure 2.11).The average 15NAPE of inspected peptides was 15 99.7% ± 0.1% after 10 generations of growth in NH4Cl ASW. An extra 11 proteins (3 peptides per protein) from 4 independent experiments were used to measure 15N APE using IDCalc (Appendix A) and the overall APE was determined as 99.5% ± 0.2%.

L. Ting, UNSW. 49 Chapter 2

Figure 2.3. The 14N dilution effect of exponentially doubling growth. Pink coloured cells are 15N labelled and blue coloured cells are 14N. 2n represents the number of growth generations, where n represents the generation number.

Table 2.2. Experimental 15N APE. Peptide Locus tag 15N APE Experiment Figure 99.6 1 Figure 2.4 VLAENVAGNAAVDFANIDKAPEER Sala_1830 99.7 2 Figure 2.5 99.8 3 Figure 2.6 VGEEVEIVGIKDTK Sala_1830 99.6 1 Figure 2.7 99.6 2 Figure 2.8 TGETMTIAASNQPK Sala_1117 99.5 1 Figure 2.9 99.5 2 Figure 2.10 VTIDKDNTTIVDGAGDAEAIK Sala_0966 99.75 1 Figure 2.11 APE, atomic percent excess. Experimental isotopic 15N peptide profiles were compared to theoretically generated 15N profiles using IsoPro 3.0. 15N APE was determined by closest matches between experimental and theoretical profiles.

50 L. Ting, UNSW. Method development N 15 C E N APE; E, 99.7% N 15 A, Experimental QSTAR A, Experimental QSTAR . N N 99.8% 99.7% 15 15 N APE; D, 99.6% N 15 les. IsoPro 3.0 was used for isotopic modelling. N 15 -12 vs. theoretical isotopic profi B D Sala_1830 VLAENVAGNAAVDFANIDKAPEER from experiment 1 from VLAENVAGNAAVDFANIDKAPEER Sala_1830 N species; C, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi N species; C, 14 les of les VLAENVAGNAAVDFANIDKAPEER Ions Score: 123 Expect: 1.7e le of N N 99.6% 14 15 N APE match between the experimental N 15 N 14 A TOF-MS survey scan; B, Theoretical isotopic profi survey scan; B, TOF-MS Figure 2.4. Experimental and theoretical isotopic profi 2.4. Experimental and theoretical Figure APE. Text in red represents the closest Text APE.

L. Ting, UNSW. 51 Chapter 2 C N APE. Text in red red in Text APE. N 15 A, Experimental QSTAR A, Experimental QSTAR . N 99.8% 15 N APE; D, 99.7% N 15 N 15

les. IsoPro 3.0 was used for isotopic modelling. -012 B D vs. theoretical isotopic profi Sala_1830 VLAENVAGNAAVDFANIDKAPEER from experiment 2 from VLAENVAGNAAVDFANIDKAPEER Sala_1830 N species; C, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi N species; C, 14 les of les N VLAENVAGNAAVDFANIDKAPEER VLAENVAGNAAVDFANIDKAPEER Ions Score: 124 Expect: 1.2e le of N 99.7% 15 14 N APE match between the experimental N N 15 14 A TOF-MS survey scan; B, Theoretical isotopic profi survey scan; B, TOF-MS Figure 2.5. Experimental and theoretical isotopic profi 2.5. Experimental and theoretical Figure represents the closest

52 L. Ting, UNSW. Method development C E N APE. Text in red Text APE. N 15 N N 99% 99.5% 15 15 N APE; E, 99.5% N A, Experimental QSTAR TOF-MS sur- TOF-MS A, Experimental QSTAR 15 . N APE; D, 99.8% N 15 les. IsoPro 3.0 was used for isotopic modelling. N

15 -007 D B Sala_1830 VGEEVEIVGIKDTK from experiment 3 VGEEVEIVGIKDTK from Sala_1830 vs. theoretical isotopic profi les of les VGEEVEIVGIKDTK Ions Score: 67 Expect: 7.8e N N 14 99.8% 15 N species; C, Theoretical isotopic profi le of 99% Theoretical isotopic profi N species; C, 14 le of le N 14 N APE match between the experimental N 15 represents the closest Figure 2.6. Experimental and theoretical isotopic profi 2.6. Experimental and theoretical Figure vey scan; B, Theoretical isotopic profi vey scan; B, A

L. Ting, UNSW. 53 Chapter 2 C E N APE. Text N APE. in Text 15 N N 99.8% 99.7% 15 15 N APE; E, 99.5% N A, Experimental QSTAR TOF-MS TOF-MS A, Experimental QSTAR 15 . N APE; D, 99.6% N 15 les. IsoPro 3.0 was used for isotopic modelling. N 15 -007 B D vs. theoretical isotopic profi Sala_1830 VGEEVEIVGIKDTK from experiment 1 VGEEVEIVGIKDTK from Sala_1830 les of Ions Score: 71 Expect: 3.1e VGEEVEIVGIKDTK N N 99.6% 14 15 N species; C, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi N species; C, 14 le of le N APE match between the experimental N 15 N 14 A Figure 2.7. Experimental and theoretical isotopic profi 2.7. Experimental and theoretical Figure survey scan; B, Theoretical isotopic profi survey scan; B, red represents the closest

54 L. Ting, UNSW. Method development C E N APE. Text N APE. Text 15 N N 99.8% 99.7% 15 15 N APE; E, 99.7% N 15 A, Experimental QSTAR TOF-MS TOF-MS A, Experimental QSTAR . N APE; D, 99.6% N 15 les. IsoPro 3.0 was used for isotopic modelling. N 15

-008 B D vs. theoretical isotopic profi Sala_1117 TGETMTIAASNQPK from experiment 2 TGETMTIAASNQPK from Sala_1117 les of TGETMTIAASNQPK Ions Score: 83 Expect: 2.4e N N 99.6% 14 15 N species; C, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi N species; C, 14 le of le N APE match between the experimental N 15 N 14 A Figure 2.8. Experimental and theoretical isotopic profi 2.8. Experimental and theoretical Figure in red represents the closest survey scan; B, Theoretical isotopic profi survey scan; B,

L. Ting, UNSW. 55 Chapter 2 C E N APE. Text N APE. in Text 15 N APE; E, 99.6% N A, Experimental QSTAR TOF-MS TOF-MS A, Experimental QSTAR N N 15 . 99.8% 99.6% 15 15 N APE; D, 99.5% N 15 N les. IsoPro 3.0 was used for isotopic modelling. 15 -007 B D vs. theoretical isotopic profi Sala_1117 TGETMTIAASNQPK from experiment 1 TGETMTIAASNQPK from Sala_1117 les of TGETMTIAASNQPK Ions Score: 70 Expect: 4.9e N N 99.5% 15 14 N species; C, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi N species; C, 14 le of le N N APE match between the experimental N 15 14 A Figure 2.9. Experimental and theoretical isotopic profi 2.9. Experimental and theoretical Figure red represents the closest survey scan; B, Theoretical isotopic profi survey scan; B,

56 L. Ting, UNSW. Method development N 15 C E N APE; E, 99.6% N 15 A, Experimental QSTAR A, Experimental QSTAR . N N 99.8% 99.7% 15 15 N APE; D, 99.5% N 15 les. IsoPro 3.0 was used for isotopic modelling. N 15 vs. theoretical isotopic profi B D Sala_0966 VTIDKDNTTIVDGAGDAEAIK from experiment 2 VTIDKDNTTIVDGAGDAEAIK from Sala_0966 les of N species; C, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi N species; C, 14 N Ions Score: 40 Expect: 0.00032 VTIDKDNTTIVDGAGDAEAIK le of N 99.75% 15 14 N APE match between the experimental N 15 N 14 A APE. Text in red represents the closest Text APE. TOF-MS survey scan; B, Theoretical isotopic profi survey scan; B, TOF-MS Figure 2.10. Experimental and theoretical isotopic profi 2.10. Experimental and theoretical Figure

L. Ting, UNSW. 57 Chapter 2 N 15 C E N APE; E, 99.7% N 15 A, Experimental QSTAR A, Experimental QSTAR . N N 99.8% 99.7% 15 15 N APE; D, 99.75% N 15 les. IsoPro 3.0 was used for isotopic modelling. N 15 -006 vs. theoretical isotopic profi B D Sala_0966 VTIDKDNTTIVDGAGDAEAIK from experiment 1 VTIDKDNTTIVDGAGDAEAIK from Sala_0966 les of N species; C, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi N species; C, 14 VTIDKDNTTIVDGAGDAEAIK Ions Score: 56 Expect: 8.2e N N 99.75% 15 le of 14 N APE match between the experimental N 15 N 14 A Figure 2.11. Experimental and theoretical isotopic profi Experimental and theoretical 2.11. Figure TOF-MS survey scan; B, Theoretical isotopic profi survey scan; B, TOF-MS APE. Text in red represents the closest Text APE.

58 L. Ting, UNSW. Method development

analysis, ~35g of protein was subjected to separation using SDS-PAGE, sliced into 14 slices and analysed by LC-MS/MS (Figure 2.13).

Figure 2.13. SDS-PAGE of ~35g of protein from samples A and B.Protein was subjected to SDS-PAGE separation. The entire lane was excised into 14 fractions and analysed by GeLC-MS/MS. Both samples had very similar protein profiles. MW, Molecular weight.

The SDS-PAGE clearly shows that there was a significant amount of protein in the sample across all size ranges of the gel for both samples. Proteins were identified from each gel slice, using Mascot, and combined into a single list of unique identifications (Table 2.7).

Table 2.7. Number of confident protein identifications from GeLC-MS/MS of A and B. Starting amount No. confident Experiment No. gel slices of protein (g) protein ID A 33 14 98 B 34 14 72

2.3.2.6. GeLC-MS/MS: Tris PE vs. urea PE To further evaluate the GeLC-MS/MS approach, the experiments were scaled up by increasing the culture size; increasing the starting amount of protein for SDS-PAGE

L. Ting, UNSW. 63 Chapter 2

separation; increasing SDS-PAGE gel size; and protein extraction using either a Tris PE or urea PE buffer. The SDS-PAGE gels from the Tris PE and urea PE extraction were visually similar in intensity and banding patterns (Table 2.8 and Figure 2.14).

Tris PE

Urea PE

Figure 2.14. SDS-PAGE of 1mg of protein from Tris PE vs. urea PE extraction samples. Proteins from Tris PE or urea PE extraction conditions were subjected to SDS-PAGE separation and GeLC-MS/MS analysis. Both extraction conditions generated proteins across all size ranges of the gel and the protein profiles of Tris PE and urea PE were very similar. MW, molecular weight.

64 L. Ting, UNSW. Method development

The number of confidently and uniquely identified proteins and non-redundant peptides were used as a measure for comparison of Tris PE vs. urea PE. The data from all gel slices from the Tris PE extraction were combined and interrogated against the S. alaskensis database as a single SEQUEST job. Similarly, all gel slice data from the urea PE extraction were combined and searched using SEQUEST. After filtering by DTA Select, similar numbers of proteins and non-redundant peptides were identified in both Tris PE and urea PE extractions (Table 2.8). The numbers of unique proteins were very similar between the both extraction buffers (Figure 2.15).

Table 2.8. Tris PE vs. urea PE GeLC-MS/MS results. Tris PE Urea PE mg protein separated by SDS-PAGE 1 1 No. gel slices 18 16 No. proteins from DTA Select 235 230 No. NR peptides from DTA Select 1351 1420

Figure 2.15. Number of proteins identified in a Tris PE vs. urea PE buffer using a GeLC-MS/MS platform.

2.3.3. An in silico analysis of amino acid composition and trypsin digestion in S. alaskensis

An in silico analysis of the theoretical S. alaskensis proteome was performed to elucidate the reason for poor protein identification using LC-MS/MS. The number and percentage of all S. alaskensis amino acids were calculated and compared to the amino acid compositions of E. coli, S. cerevisiae, P. angustum S14 and all organisms in the NCBI database (accessed October, 2005) (Table 2.9). The proportion of R residues in

L. Ting, UNSW. 65 Chapter 2

S. alaskensis was 7.8%, which was almost double that of S. cerevisiae and P. angustum. In contrast, the proportion of K residues in S. alaskensis was 2.9%, which was approximately half that of S. cerevisiae, P. angustum and all organisms in the NCBI (Table 2.9). These trends disappeared when the proportion of R and K residues were summed, so that 9.8-11% of the entire theoretical proteomes of the tested organisms were R and K residues.

Table 2.9. Amino acid composition of S. alaskensis, E. coli, S. cerevisiae, P. angustum and NCBI. Amino acid % Sala % Ecoli % Yeast % Pang NCBI Alanine (A) 14.0 9.5 5.5 8.3 7.5 Arginine (R) 7.8 5.5 4.4 4.2 5.2 Asparagine (N) 2.4 3.9 6.2 4.7 4.5 Aspartic acid (D) 6.2 5.1 5.9 5.6 5.2 Cysteine (C) 0.8 1.2 1.3 1.1 1.8 Glutamine (Q) 2.9 4.4 4.0 4.5 4.1 (E) 5.3 5.8 6.5 5.8 6.3 Glycine (G) 8.9 7.4 5.0 6.6 7.1 (H) 2.0 2.3 2.2 2.3 2.2 (I) 5.1 6.0 6 6. 6.9 5.5 Leucine (L) 9.7 10.7 9.5 10.3 9.1 Lysine (K) 2.9 4.4 7.4 5.6 5.8 Methionine (M) 2.4 2.9 2.1 2.7 2.3 (F) 3.5 3.9 4.4 4.2 3.9 Proline (P) 5.4 4.4 4.4 3.9 5.1 Serine (S) 5.0 5.8 9.0 6.6 7.3 (T) 5.0 5.4 5.9 5.6 6.0 (W) 1.5 1.5 1.0 1.2 1.3 Tyrosine (Y) 2.1 2.9 3.4 3.2 3.3 Valine (V) 7.0 7.1 5.6 6.8 6.5 R + K residues 10.7 9.9 11.8 9.8 11 Sala, S. alaskensis; Ecoli, E. coli; Yeast, S. cerevisiae; Pang, P. angustum; NCBI, NCBI non redundant database accessed on October, 2005. R + K residues represent the summed proportion of arginine and lysine residues for each organism. All values are expressed as a percentage. Entries in bold are of interest.

In addition, the proteome of S. alaskensis was theoretically digested with trypsin, cleaving at R and K and compared to E. coli and S. cerevisiae (Figure 2.16). For S. alaskensis, the theoretical digest resulted in 71,786 peptides, where 36,146 (50%) were between 7-25 amino acid residues long. In E. coli, 89,762 peptides were generated in silico, where 45,295 (51%) were between 7-25 residues long. In S. cerevisiae, the in silico digest created 213,632 peptides, where 107,469 (50%) were between 7-25 residues long.

66 L. Ting, UNSW. Method development

Figure 2.16. Number of peptides with increasing number of residues resulting from an in silico trypsin digestion. A, S. alaskensis; B, E. coli; C, S. cerevisiae.

L. Ting, UNSW. 67 Chapter 2

2.3.4. Mass Spectrometry optimisation:LCQ-FT Ultra

S. alaskensis was grown at 10ºC and 30ºC in 14N and 15N ASW, respectively. The cultures were harvested and combined, followed by protein extraction and separation of ~ 1mg of protein using SDS-PAGE (Figure 2.17).One gel slice at ~55kDa was excised and processed by GeLC-MS/MS using the LTQ-FT Ultra mass spectrometer, where a range of LC and MS parameters were evaluated for the maximal yield in protein and peptide identifications (Table 2.10). The LC and MS parameters evaluated were tryptic digest dilution in formic acid; injection volume of sample; LC gradient length; precursor scanning using the ion trap or FTICR cell; number of MS2 scans; and MS2-5 scans. Each tested parameter was compared according to the number of queries and number of confident protein identifications from Mascot and SEQUEST searches against the S. alaskensis database.

Figure 2.17. Gel slice excision for GeLC-MS/MS parameter optimisation using an LTQ-FT Ultra. MW, molecular weight.

All of the tested parameters were evaluated in combination with each other due to limitations in availability of machine time and the expense of sample analysis. Therefore the resulting number of confident peptide and protein identifications are caused by a mixed effect of the tested parameters. However, examining the number of Mascot queries and the number of unique SEQUEST peptides and proteins was of merit, when assessing the trends of varying parameter values and their success in maximising the number of peptide and protein identifications.

68 L. Ting, UNSW. Method development

2.3.2. Sample preparation optimisation

The performance of the Tris and urea protein extraction buffers and a range of sonication times were evaluated by examining protein yield. The analysis of S. alaskensis proteins using an LC-MS/MS vs. GeLC-MS/MS platform was compared. The addition of RapiGest, starting protein amount for analysis, tryptic peptide dilution, trypsin:protein ratio, protease inhibitors and chelators, injection volume and LC gradient length were trialled in LC-MS/MS. The Tris PE and urea PE protein extraction buffers were evaluated in both platforms. 2.3.2.1. Optimising sonication and evaluating Tris vs. urea buffers in protein extraction Sonication time and a Tris vs. urea protein extraction buffer were evaluated for an LC- MS/MS method. Sonication at 2, 3, 4 and 5 min in either the Tris buffer or the urea buffer was compared in triplicate, and protein yields were measured using a Bradford protein assay to assess optimal sonication time (Table 2.3).

Table 2.3. Evaluating sonication times with a Tris or urea buffer. Mean Extraction Sonication [protein] SD buffer time (min) mg/mL T 1 0.7 0.1 T 2 1.6 0.2 T 3 1.9 0.3 T 4 2.3 0.7 T 5 2.5 0.7 U 0.5 2.2 0.1 U 2 2.6 1.1 U 3 2.6 0.9 U 4 2.7 1.0 U 5 2.7 0.9 T and U represent the Tris-based and urea-based protein extraction buffers, respectively; mean [protein] represents the mean protein concentration from triplicate experiments; SD, standard deviation.

Using a Tris protein extraction buffer, sonication for 2 and 3min gave similar protein yields, while sonication at 4 and 5min gave similar protein yields. There was a gradual increase of protein yield as sonication time increased. When proteins were extracted in the urea buffer, sonication at 2-5min gave similar yields, despite an increase

L. Ting, UNSW. 59 Chapter 2

in sonication time. In addition, the variance between urea protein yields was much larger than the Tris protein yield variation. Overall, the extraction of proteins in the urea buffer gave a ~2-fold increase in protein concentration compared to the Tris buffer. 2.3.2.2. LC-MS/MS: Evaluating RapiGest, starting protein amount and dilution Two biological replicates, A and B, were used to investigate LC-MS/MS with two starting protein amounts (35g and 50g) for analysis, with and without RapiGest during trypsin digestion, and three dilution ratios (1:8, 1:10 or 4:1) of tryptic peptides in 1% HCOOH/0.05% HFBA (Figure 2.1). The number of Mascot queries and confident protein identifications were used to compare the performance of these parameters. The inclusion of RapiGest improved the number of protein identifications from both LCQ and QSTAR spectra (Table 2.4). The QSTAR gave more queries and protein identifications for samples A1 and B1 in comparison to the LCQ; however there was no improvement of data with samples A2 and B2 (Table 2.4).

Table 2.4. Evaluation of LC-MS/MS with and without RapiGest. QSTAR LC- LCQ Deca XP+ LC-MS/MS Starting MS/MS Experiment protein RapiGest Mascot Mascot Sample:formic Injection amount No. No. No. No. acid volume queries IDs queries IDs A1 35g yes 1:8 2.5 230 12 317 35 B1 35g yes 1:8 2.5 248 8 377 42 A2 35g no 1:8 2.5 174 1 92 1 B2 35g no 1:8 2.5 152 6 81 4 A3 50g yes 1:10 2.5 226 12 - - B3 50g yes 1:10 2.5 142 8 - - A4 50g yes 4:1 2.5 328 19 - - B4 50g yes 4:1 2.5 317 18 - - A5 50g no 1:10 2.5 197 5 - - B5 50g no 1:10 2.5 165 3 - - A6 50g no 4:1 2.5 175 6 - - B6 50g no 4:1 2.5 160 6 - - Sample:formic acid represents the dilution ratio of tryptic peptide to 1% HCOOH/0.05% HFBA; No. IDs, number of confident manually verified protein identifications.

From the LCQ data, increasing the starting amount of protein for analysis, from 35g to 50g,resulted in some improvement in the number of queries and protein identifications. Changing the dilution ratio of tryptic peptides:formic acid from 1:10 to 4:1, when starting with 50g of protein with RapiGest gave ~2-fold improvement of the numbers of protein identifications (A3, B3 vs. A4, B4; Table 2.4). However, in the

60 L. Ting, UNSW. Method development

absence of RapiGest, query numbers remained similar between both dilutions (A5, B5 vs. A6, B6; Table 2.4). It should be noted that, even though the samples were whole cell lysates, MS intensity was generally weak and very few proteins were confidently identified (<1% genome coverage). 2.3.2.3. LC-MS/MS: Evaluating starting protein amounts, trypsin:protein ratio, protease inhibitor and chelating agent Two biological replicates, C and D, were used to investigate LC-MS/MS with two starting protein amounts (50g and 500g); two trypsin:protein ratios (1:100 and 1:20); and the absence or presence of PMSF and EDTA (Figure 2.2). The number of Mascot queries and number of confident protein identifications were used to compare the tested parameters. The increase of starting protein amount for analysis, from 50g to 500g, resulted in ~2-fold more protein identifications and a consistent increase in Mascot query numbers (Table 2.5). The increase of trypsin:protein from 1:100 to 1:20 resulted in a small and consistent increase in the number of protein identifications (Table 2.5). The use of PMSF and EDTA did not improve the number of confident protein identifications; however, the number of queries in Mascot were generally increased.

Table 2.5. Optimising starting protein amounts and trypsin for LC-MS/MS. Extraction Starting protein Mascot Experiment trypsin:protein buffer amount (g) No. queries No. IDs A1 Tris 50 1:100 395 8 A2 Tris 50 1:20 359 12 A3 Tris 500 1:100 399 20 A4 Tris 500 1:20 403 21 B1 Tris PE 50 1:100 330 6 B2 Tris PE 50 1:20 409 10 B3 Tris PE 500 1:100 451 16 B4 Tris PE 500 1:20 464 20 No. IDs, number of confident manually verified protein identifications; Tris, 10mM Tris-HCl pH 8.0 protein extraction buffer; Tris PE, 10mM Tris-HCl pH 8.0, 1mM PMSF and 1mM EDTA protein extraction buffer.

2.3.2.4. LC-MS/MS: Evaluation of Tris vs. urea, injection volume, and LC gradient length The inclusion of PMSF and EDTA in a Tris or urea buffer was further evaluated for LC- MS/MS. In the LC-MS/MS analysis of Tris PE vs. urea PE samples, increasing injection volume and LC gradient improved the number of confident protein identifications

L. Ting, UNSW. 61 Chapter 2

(Table 2.6). Further, using a urea PE extraction buffer for LC-MS/MS-based analysis resulted in more spectra and more protein identifications.

Table 2.6. LC-MS/MS analysis of proteins extracted by urea vs. Tris buffers Mascot Injection LC gradient Experiment No. No. volume (L) (min) queries IDs T1 2.5 30 75 4 T2a 5 60 227 14 T2b 5 60 187 11 U1a 2.5 30 249 14 U1b 2.5 30 269 18 U2 5 60 329 40 U3a 10 90 366 41 U3b 10 90 454 32 No. IDs, number of confident manually verified protein identifications; T, 10mM Tris-HCl pH 8.0; U, 8M urea. Experiments T2, U1 and U3 were performed in duplicate: a and b.

The identified proteins were pooled according to extraction buffer, and duplicate identifications were removed. The overall number of identifications were compared between the two extraction buffers (Figure 2.12).The number of unique proteins identified in the urea extraction significantly outnumbered those from the Tris extraction.

Figure 2.12. Number of proteins identified in a urea vs. Tris buffer using an LC-MS/MS platform. Confident protein identifications from the Tris and urea extraction conditions were compared to determine proteins identified in a Tris- only condition, urea-only condition and both Tris and urea.

2.3.2.5. GeLC-MS/MS: A comparison to LC-MS/MS GeLC-MS/MS was directly compared to LC-MS/MS using the same samples (Section 2.2.3.2) and the same starting amount of protein for analysis. For GeLC-MS/MS

62 L. Ting, UNSW. Method development coverage Top hit % 180 70.7 156 61.2 159 75.5 106 55.2 181 66.0 237 68.4 318 75.5 399 77.9 208 78.2 255 75.8 419 77.9 453 75.8 210 73.1 340 75.5 162 50.4 275 75.8 188 60.9 280 70.4 408 73.9 315 78.5 401 68.0 382 72.1 288 72.8 286 62.4 349 65.4 306 75.5 387 66.0 391 80.3 211 68.7 312 72.8 No. NR No. peptides 27 28 27 24 24 29 41 52 28 38 55 59 33 46 28 41 29 38 48 42 55 48 42 39 45 41 50 49 32 39 No. No. proteins SEQUEST/DTA Select results coverage Top hit % score Top hit No. No. queries

2 3243 1529 3243 48 1190 3369 48 1319 2980 50 963 2099 40 2-5 2-5 2-5 2-5 scans No. MS No. scan Precursor LTQ-FT parameters Mascot results .1. LTQ parameter optimsation. L) μ vol ( Injection (min) LC gradient LC parameters Sample dilution 1 1:100 60 MS 2.5 2 1:100 60 LTQ MS 3 1:80 60 MS 5 LTQ LTQ 2.5 4 1:80 60 MS 5 LTQ 5a 1:100 30 40 798 2.5 LTQ 1833 2 5c 1:100 30 58 1462 6a 1:100 30 2.5 LTQ 3343 58 2 1182 5 6c 1:100 30 LTQ 1966 2 62 7a 1:100 60 1389 61 5 1763 LTQ 2.5 3290 LTQ 2 5943 2 7c 1:100 60 72 1970 8a 1:100 60 2.5 LTQ 3444 63 4 1558 5 8c 1:100 62 60 LTQ 5302 1716 2 65 9a 1:80 30 1899 5 3459 2.5 LTQ LTQ 2 4791 4 5b 1:100 30 68 1444 2.5 LTQ 1762 2 6b 1:100 30 58 1482 5 LTQ 2453 2 7b 1:100 60 56 1560 2.5 LTQ 3886 2 8b 1:100 60 63 1550 5 LTQ 4883 2 Error! No text of specified style in document. 10a 1:80 30 41 1349 5 11a 1:80 60 LTQ 51 3465 2 1209 2.5 LTQ 11c 1:80 4068 2 60 71 2076 12a 1:80 60 2.5 LTQ 3564 63 4 1456 5 12c 1:80 60 LTQ 4850 2 13a 1:60 65 30 2025 51 5 1070 LTQ 2.5 4945 LTQ 4 14a 1:60 2526 2 60 65 1454 2.5 LTQ 3574 2 9b* 1:80 30 63 1235 2.5 LTQ 10b 1:80 1966 2 30 54 1665 5 11b 1:80 60 LTQ 61 2845 2 1420 2.5 LTQ 5315 2 12b 1:80 60 62 1638 5 LTQ 5708 2 13b 1:60 30 53 1306 5 14b 1:60 60 LTQ 2636 2 66 1467 5 LTQ 4103 2 Experiment Table

L. Ting. UNSW 69 Chapter 2

e are

ore MS 42.0 61.8 46.2 56.8 46.2 66.3 55.8 47.2 47.4 46.7 58.5 64.9 63.4 63.4 coverage Top hit % most intense most intense st st , 1 2-5 367 552 315 475 502 409 565 327 379 492 522 415 800 782 No. NR No. peptides

62 79 56 69 74 85 83 53 63 72 75 88 97 SEQUEST/DTA Select results No. No. 105 proteins

46 52 49 65 51 63 49 51 49 52 59 66 63 61

coverage Top hit %

1294 1197 1665 1243 1424 1192 1338 1459 1431 1395 1574 1334 1626 1194 score Top hit Mascot results 30ºC experiments.* Indicates that the instrument was not Indicates that the instrument 30ºC experiments.* vs. No. No. 950 1128 1981 1742 1895 6579 1144 1002 2120 1793 1932 6683 12006 11712 queries , followed by the isolation and fragmentation of 1

2

2

2 4 2 4 4 2 2 4 2 4 4 2 2 2 scans No. MS No.

FT FT FT FT FT FT FT FT FT FT scan LTQ LTQ LTQ LTQ LTQ-FT parameters Precursor

L)

μ

5 5 5 5 5 5 5 5 5 5 5 5 5 5 vol ( Injection 45 45 45 75 75 75 75 75 45 45 45 75 75 75 LC parameters (min) and so on. No. protein, number of confident protein identifications. No. NR peptides, number of non- 3 LC gradient

1:20 1:20 1:20 1:20 1:20 1:20 1:20 1:20 1:20 1:20 1:20 1:20 1:20 1:20 Sample dilution

15a 16a 17a 18a 19a 19c 20a 15b 16b 17b 18b 19b 19d 20b fragment ion in MS 2 Experiment MS precursor ion from the MS trace was isolated and fragmented in MS Table 2.10. continued. LTQ parameter optimisation Grey shaded experiments represent the final parameters used for 10ºC redundant peptide identifications. The number of Mascot queries are the same as the number of unfiltered SEQUEST peptides, thes an indication of the number of spectra generated by MS analysis. The larger the number of queries or unfiltered peptides, the m spectra there are in the data. Since these two values were same, only Mascot query numbers have been provided. running consistently, there were gaps in the TIC because electrospray was unstable due to sample pooling. MSrunning consistently, there were gaps in the TIC because

70 L. Ting. UNSW Method development

2.3.4.1. LTQ-FT Ultra: Tryptic digest dilution in formic acid The number of confidently identified proteins using SEQUEST and DTA Select were considerably higher when diluted the least in the HCOOH/HFBA solution (Table 2.10 and Figure 2.18). The 1:100, 1:80 and 1:60 dilutions of tryptic peptides performed similarly, where an average of 37, 41 and 37 proteins were identified in each dilution ratio, respectively. In contrast, a dilution of 1:20 gave an average of 75 protein identifications; this constituted ~2-fold more identities regardless of other tested parameters (Figure 2.17).

Figure 2.18. Number of confident protein identifications in optimising tryptic digest dilution in formic acid for LTQ-FT analysis. Tryptically digested samples were diluted in 1% (v/v) formic acid and 0.05% (v/v) HFBA, proteins were identified using SEQUEST and filtered DTA Select. The columns represent the average number of proteins identified in 14 experiments, and the error bars represent the standard deviation across these experiments.

2.3.4.2. LTQ-FT Ultra: Sample injection volume

Two sample injection volumes, 2.5L and 5L, were investigated for LTQ-FT Ultra optimisation. Using a 5L injection volume of sample gave ~1.5-fold more confident protein identifications with SEQUEST (Table 2.10 and Figure 2.19), regardless of all other parameters.

L. Ting, UNSW. 71 Chapter 2

Figure 2.19. Number of confident protein identifications in optimising sample injection volume for LTQ-FT analysis. Samples were injected in either 2.5L or 5L volumes, analysed by LC-MS/MS. Proteins were identified using SEQUEST and filtered with DTA Select. The columns represent the average number of proteins identified across 15 experiments for 2uL injection, and 29 experiments for 5uL injection, the error bars represent standard deviation.

2.3.4.3. LTQ-FT Ultra: LC gradient length Four different RPLC gradients were evaluated for optimal peptide separation with regards to maximising the number of protein identifications. A 75min gradient yielded the most protein identifications, such that an average of 83 proteins and 550 peptides were identified using SEQUEST and DTA Select (Table 2.10 and Figure 2.20). The number of SEQUEST identified peptides and proteins, using a 60min gradient, was less than with a 45min gradient (Figure 2.20). In contrast, the number of Mascot queries from the 60min gradient experiment was greater than at 45min (Figure 2.21).

72 L. Ting, UNSW. Method development

Figure 2.20. Number of confident peptide and protein identifications in optimising LC gradient length for LTQ-FT analysis. Peptides were separated by RPLC for 30, 45, 60 or 75min and analysed by LC-MS/MS. Proteins were identified using SEQUEST and filtered with DTA Select. The columns represent average proteins or peptides identified across 12 experiments for 30min, 6 experiments for 45min, 18 experiments for 60min, and 8 experiments for 75min. The error bars represent standard deviation.

Figure 2.21. Number of Mascot queries in optimising LC gradient length for LTQ-FT analysis. Peptides were separated by RPLC for 30, 45, 60 or 75min and analysed by LC-MS/MS. Proteins were identified using Mascot. The columns represent average proteins identified across 12 experiments for 30min, 6 experiments for 45min, 18 experiments for 60min, and 8 experiments for 75min. The error bars represent standard deviation.

2.3.4.4. LTQ-FT Ultra: Precursor scanning The performance of precursor scanning, using either the LTQ linear ion trap or FTICR cell, were compared with respect to the number of Mascot queries generated. Using the

L. Ting, UNSW. 73 Chapter 2

ion trap for MS survey scans clearly generated ~4-fold more queries than when the FTICR cell was used (Table 2.10 and Figure 2.22).

Figure 2.22. Number of Mascot queries generated in ion trap vs. FTICR survey scans in LTQ-FT optimisation. The columns represent the average number of queries across 34 experiments for LTQ precursor scans, and 10 experiments for FT Ultra precursor scans. The error bars represent standard deviation.

The performance of the LTQ ion trap vs. FTICR cell precursor scanning was examined, where 8 experiments (15a, 15b, 16a, 16b, 18a, 18b, 20a, 20b; Table 2.10) were compared. Experiment type A involved the analysis of 5L of a 1:20 formic acid diluted sample separated by a 45min RPLC gradient where 2 MS2 scans were performed (Figure 2.23). Two experiments that used the ion trap to perform survey scans (15a and 15b, Table 2.10) were compared to two experiments that used the FTICR cell to perform survey scans (16a and 16b, Table 2.10). Experiment type B had a 75min LC gradient, and all other settings were the same as experiment type A (Figure 2.23). Two experiments that used the ion trap to perform survey scans (18a and 18b, Table 2.10) were compared to two experiments that used the FTICR cell to perform survey scans (20a and 20b, Table 2.10). The experiments where precursor ions were scanned using the LTQ ion trap consistently gave ~1.5-fold more confident protein identifications after SEQUEST searches and DTA Select filtering against the S. alaskensis database (Figure 2.23).

74 L. Ting, UNSW. Method development

Figure 2.23. Number of protein identifications in optimising precursor scan parameters for LTQ-FT analysis. Experiment A involved the analysis of 5L of a 1:20 formic acid diluted sample separated by a 45min RPLC gradient where 2 MS2 scans were performed. Experiment B involved the analysis of 5L of a 1:20 formic acid diluted sample separated by a 75min RPLC gradient where 2 MS2 scans were performed. In both experiments, precursor scanning was performed using either the LTQ ion trap (grey columns) or the FTICR cell (black columns). The columns represent the average number of confident identifications, the error bars represent standard deviation.

2.3.4.5. LTQ-FT Ultra: Type and number of MSn scans The type and number of MSn scans were evaluated with respect to the number of confident protein identifications after SEQUEST searching and DTA Select filtering. On average, there were more confident protein identifications using 4 MS2 scans than 2 MS2 scans; however, the difference was small (Table 2.10 and Figure 2.24). Using MS2-5 scanning gave the least number of protein identifications, ~2.5-fold less than 2 and 4 MS2 scans. In addition, the numbers of Mascot queries generated by each of the three MSn scan types were compared (Figure 2.25). Using 2 MS2 scans gave the most Mascot queries, ~1.5-fold more queries than using 4MS2 or MS2-5 scans, and 4 MS2 scans and MS2-5 scans performed similarly

L. Ting, UNSW. 75 Chapter 2

Figure 2.24. Number of confidently identified peptides in optimising number and types of MSn scans in LTQ-FT analysis. Peptides were identified using SEQUEST and filtered with DTA Select. The columns represent the number of confident peptide identifications, the error bars represent standard deviation.

Figure 2.25. Number of Mascot queries generated by varying number and types of MSn scans in LTQ-FT optimisation. The columns represent the number of confident peptide identifications, the error bars represent standard deviation.

2.3.5. Data processing optimisation

S. alaskensis cultures (14N 10ºC and 15N 30ºC) were harvested and combined, followed by protein extraction. Approximately 1mg of protein was subjected to SDS-PAGE separation (Figure 2.26B).

76 L. Ting, UNSW. Method development

Figure 2.26. SDS-PAGE of 14N 10ºC combined with 15N 30ºC proteins for DTA Select and RelEx parameter optimisation. S. alaskensis cultures were grown at 10ºC and 30ºC in 14N and 15N, respectively.A, ~100g of protein was loaded into 1 lane of a 5 lane 1.5mm gel, this data was only used for RelEx parameter optimisation; B, ~1mg of protein was loaded across 4 lanes of a 1.5mm gel, the data was used for both DTA Select and RelEx parameter optimisation.

The entire gel was sliced into 22 fractions and processed by GeLC-MS/MS using the LTQ-FT Ultra mass spectrometer, and the spectral data were searched against the actual S. alaskensis database and a decoy S. alaskensis database using SEQUEST. These data were used to optimise DTA Select filtering parameters (Table 2.11) and RelEx quantitation parameters (Table 2.12). An additional GeLC-MS/MS experiment was performed with 10-fold less protein (Figure 2.26A) to test whether RelEx quantitation

L. Ting, UNSW. 77 Chapter 2

performed consistently with different amounts of starting protein for analysis (Table 2.12). 2.3.5.1. 1% FDR for protein identification: Optimisation of DTA Select parameters The FDR for protein identification using DTA Select was evaluated according to Peng et al.(2003). As XCorr values were incrementally increased from 1.9 to 3.3, the number of singly, doubly and triply charged peptides passing the XCorr threshold were tallied and FDR was calculated (Table 2.11). The XCorr cut-offs that gave FDR ~1% were 2.1 for singly charged, 2.7 for doubly charged and 3.2 for triply charged peptides.

Table 2.11. Evaluation of FDR using different XCorr cut-off thresholds in DTA Select. +1 +2 +3 XCorr % FDR n % FDR n % FDR n 1.9 2.6 194 23.7 9583 65.0 9305 2.0 1.3 157 17.0 8446 53.4 7072 2.1 0.8 127 12.4 7616 41.0 5517 2.2 0.0 106 8.5 6946 31.2 4479 2.6 2.0 5284 14.3 3106 2.7 1.2 5003 9.8 2855 2.8 0.7 4716 5.8 2538 2.9 0.4 4476 3.8 2296 3.0 0.3 4274 2.3 2180 3.1 0.3 3995 1.7 2086 3.2 0.2 3784 1.0 1998 3.3 0.7 1918 Mass spectra were searched using SEQUEST against the actual S. alaskensis database and decoy S. alaskensis databse both containing 3208 proteins. Trypsin was the selected enzyme with 1 missed cleavage allowed, carbamidomethyation and methionine oxidation were selected for variable modifications, MS tolerance was ±0.8Da, and MS/MS tolerance was ± 0.2Da. DTA Select was used to filter identifications, dCN was 0.8 and XCorr values for +1, singly charged peptide species; +2, doubly charged peptide species; +3, triply charged peptide species were tested (first column); FDR, false discovery rate; n, sum of peptides above the XCorr threshold. Bold entries were those that gave FDR ~1%.

2.3.5.2. Optimisation of RelEx quantitation parameters The aim of optimising RelEx (MacCoss et al., 2003) quantitation parameters was to establish stringent parameters for protein quantitation of S. alaskensis samples. The data from the ~1mg and ~100g protein SDS-PAGE gels were both used to optimise RelEx quantitation parameters. Firstly, after searching the spectra from both datasets against the S. alaskensis database using SEQUEST, DTA Select was used to filter the data using the optimised 1% FDR parameters determined in Section 2.3.5.1. Using both

78 L. Ting, UNSW. Method development

datasets, the RelEx parameters evaluated were: 2 or 4 scans before and after peak detection; 0.7, 0.8 or 0.9 regression factor at signal to noise ratio (S/N) of 1; 0.4, 0.5, 0.7 or 0.8 regression factor at S/N of 10; S/N cut-off at 5 or 100; and a minimum number of 1 or 2 peptides for analysis Table 2.12.

Table 2.12. Optimisation of RelEx parameters. Integration settings A B C D E F G H Peak detect 14N 14N 14N 14N 14N 14N 14N 14N Peak shift 1 1 1 1 1 1 1 1 Scans before 4 2 4 2 2 2 4 4 Scans after 4 2 4 2 2 2 4 4 Threshold factor 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 Apply Savitsky-Golay filter Yes Yes Yes Yes Yes Yes Yes Yes Number of points 7 7 7 7 7 7 7 7 Apply regression factor No No Yes Yes Yes Yes Yes Yes Min correlation at 1 - - 0.7 0.7 0.7 0.8 0.9 0.8 Min correlation at 10 0.4 0.4 0.4 0.4 0.4 0.5 0.8 0.7 Apply S/N filter No Yes Yes Yes Yes Yes Yes Yes S/N cutoff - 5 5 5 5 10 5 5 Apply min peptide filter No No No No Yes Yes Yes Yes Min number of peptides - - - - 2 2 2 2 Express relative abundance Ratio Ratio Ratio Ratio Ratio Ratio Ratio Ratio Proteins quantified Experiment A 89 88 86 76 48 36 42 52 Experiment B 732 697 676 598 382 294 308 425 Grey shaded settings represent the final optimised parameters used for 10ºC vs. 30ºC experiments. Experiment A is the ~100g gel and Experiment B is the ~1mg gel. S/N, signal to noise ratio.

The success of the RelEx parameters was judged by examining the number of proteins quantified, as well as the quality of quantification data. The aim of optimising RelEx parameters was to control false negative and false positive information. Eight different combinations of RelEx parameters were tested against both datasets, where the parameters in columns A to G ranged from least to most stringent, respectively (Table 2.12), and columns F and G were of similar stringency. The parameters in column H were the final optimised settings: where 4 scans before and after peak detection were included; regression of 0.8 was applied at S/N of 1, and 0.7 at S/N of 10; S/N cut-off of 5 was applied; and a minimum of 2 peptides per protein was required for quantitation.

L. Ting, UNSW. 79 Chapter 2

2.4. Discussion

2.4.1. Metabolic labelling and 15N APE measurement

A high and consistent 15N APE is required for accurate quantitation because it is the basis of the estimation of amino acid mass shifts in database searching and the estimation of 15N peptide peak m/z distance from the 14N peak for ion chromatogram extraction during quantitation (Beynon & Pratt, 2005; Wu & MacCoss, 2007). It has been suggested that a GC-MS approach is required for precisely measuring 15N incorporation in complex heterogenous samples (such as brain tissues) (Wu & MacCoss, 2007). This method was deemed unnecessary for homogenous samples that are easily labelled to completion (such as pure bacterial cultures) (MacCoss et al., 2005). The alternative approach for measuring 15N APE requires the comparison of theoretically predicted isotope distributions of peptides against experimental isotope distributions generated by high resolution precursor ion mass scanning. This alternative approach was used in the current study. The extent to which labelling can be considered ‘complete’ is dependent on the purity of the 15N label. The Cambridge Isotope Laboratories 15N label used in the current study was > 99% purity, which was factored into the theoretical calculation of APE [1]. The theoretical calculation of 15N APE after 10 generations of growth was 98.9%; interestingly, the experimentally measured 15N APE was slightly higher at an average of 99.6% (Figure 2.4 to Figure 2.11 and Appendix A). Previous metabolic labelling studies have achieved complete 15N labelling with at least 8 generations of growth (MacCoss et al., 2005; Snijders et al., 2005). In the current study, the 15N APE experiments confirmed that a minimum of 10 growth generations achieves maximal (> 99%) 15N incorporation into S. alaskensis cells. It should be noted that culturing S. alaskensis for additional generations in 15N media would potentially increase 15N APE. However, the improvement of 15N APE with increased generations was deemed not sufficiently significant to merit additional incubation, because of the extended incubation times required for 10 generations of growth at 10ºC (~6 weeks). A control for the theoretical modelling of isotope distributions of selected peptides was the theoretical modelling of the 14N peptide species. All of the theoretical

80 L. Ting, UNSW. Method development

14N profiles of the peptides that were examined matched the experimentally derived isotopic peaks, thus indicating the approach was reliable.

2.4.2. Optimisation of protein extraction

Sonication was employed as a successful method for the extraction of S. alaskensis proteins for proteomics analyses in previous studies performed in the Cavicchioli laboratory (Fegatella et al., 1999; Ostrowski et al., 2001; Ostrowski et al., 2004). The published sonication times were 2-5min (Fegatella et al., 1999; Ostrowski et al., 2001) and 5min (Ostrowski et al., 2004). The digital Branson sonicator used in the current study was different to those used in the cited publications. Therefore, the aim of evaluating the optimum sonication time was to achieve high protein yield while minimising unwanted protein degradation. Over-sonication of proteins may result in unpredictable protein denaturation (unfolding and breakage) leading to unpredictable peptides after protease digestion (Morel et al., 2000; Einarson & Orlinick, 2002). As a result, peptides may not be accurately identified by MS analysis. It was concluded that maximum cell disruption occurred when the mean protein concentration reached ~2.0-2.5 mg/mL. The optimal sonication time for S. alaskensis extracted in urea was 2min, and 4min in Tris. Urea is capable of disrupting cells by permeabilising the cell wall due to its chaotropic activity. Also, urea breaks non-covalent bonds in proteins by encouraging the solvation of hydrophobic groups in the protein core. Thus, protein denaturation by urea occurs directly and indirectly (Bennion & Daggett, 2003). Direct denaturation involves the hydrogen bonding of urea directly to the polar molecules of the protein, namely the peptide backbone. The peripheral component of the hydrophobic core is bound by urea first, which causes the unfolding of the protein, and consequently allows the infiltration of water to the inner hydrophobic core, followed by urea infiltration and binding (Bennion & Daggett, 2003). Indirect urea denaturation of proteins involves the weakening of water-water interaction by urea, thus allowing for the easier solvation of proteins (Bennion & Daggett, 2003).The combination of a urea buffer and sonication, with less exposure to the latter is expected because of the combined chemical and mechanical disruption. Consistently high protein yields at shorter sonication times were achieved compared to the Tris extraction buffer, which required longer sonication times

L. Ting, UNSW. 81 Chapter 2

to achieve similar protein yields. The latter is indicative of the performance of mechanical disruption alone.

2.4.3. Assessment of LC-MS/MS for application in quantitative proteomics of S. alaskensis

2.4.3.1. LC-MS/MS: Evaluating RapiGest, starting protein amount and dilution In an LC-MS/MS analytical platform for protein identification, the g amount of protein for analysis, the addition of RapiGest and the dilution ratios of tryptic peptides were evaluated. The number of Mascot queries and confident peptide and protein identifications from SEQUEST were used to compare the parameters. The digestion and MS analysis of 50g initial protein resulted in more protein identifications than with 35g total protein. Also, the inclusion of RapiGest, during trypsin digestion, resulted in more queries. In addition, a smaller dilution ratio of 4:1 tryptic digest:HCOOH/HFBA solution yielded more protein identifications than a 1:10 dilution. Samples were initially analysed using a 3D ion trap mass spectrometer (LCQ Deca XP+); however, as a consequence of the generally weak spectra, the low number of confident protein identifications, some samples were analysed using a hybrid QTOF mass spectrometer (QSTAR Pulsar i), a more sensitive mass spectrometer with a higher duty cycle. The aim of using the QSTAR was to eliminate the possibility that poor protein identification was due to an LCQ instrumental problem, and to confirm the low intensity of proteins in the samples. More proteins were expected to be identified using the QSTAR, and digestion with RapiGest yielded ~3-5-fold more proteins (A1, B1; Table 2.4). However, samples without RapiGest had so few proteins identified using the LCQ, that the lack of improvement when analysed by the QSTAR was not surprising. Overall, the data indicated that the intensity of the samples analysed were weak. This was unexpected as the samples were whole cell lysates from an organism with 3208 protein coding genes. It was clear that there were other factors than starting amount of protein for analysis affecting protein identification. Qualitative analysis of the same amounts of protein from the same samples, using a Coomassie stained SDS-PAGE (Figure 2.13), revealed that a large amount of protein was present in the sample. It indicated that the low number of proteins identified could be due to sub-optimal sample preparation or protein extraction methods.

82 L. Ting, UNSW. Method development

2.4.3.2. LC-MS/MS: Evaluating starting protein amounts, trypsin:protein ratio, protease inhibitor and chelating agent

The starting amount of protein (50g and 500g); 1:20 and 1:100 trypsin:protein ratios; the absence or presence of PMSF and EDTA were evaluated for LC-MS/MS. The number of queries and protein identifications were used to compare the tested parameters. A 500g starting amount of protein with a 1:20 trypsin:protein ratio for LC- MS/MS analysis resulted in a the most protein identifications (Table 2.5). The number of proteins identified between the Tris and the Tris PE buffer were similar. The number of queries between each buffer type, however, was higher with the Tris PE buffer (Table 2.5). PMSF is an irreversible inhibitor and EDTA is a chelating agent that inhibits metalloproteases (North, 1989). It was expected that the addition of these two compounds would inhibit any active endogenous proteases from S. alaskensis, and consequently increase the number of protein identifications. The improvement of the number of queries suggests that there was an increase in the number of spectra corresponding to peptides. This is probably a result of the protection of proteins against PMSF- or EDTA-inhibited endogenous S. alaskensis serine and metalloproteases. However, since the number of identified proteins did not increase with the addition of PMSF and EDTA, it is possible that the query number improvement may correspond to an increase in the sequence coverage of identified proteins. The improvement of query number due to the inclusion of 1mM PMSF and 1mM EDTA led to the decision to include these compounds in all further work. The minimal improvement of query number and protein identifications was not expected when the starting amount of protein for analysis was increased from 50g to 500g. It is noteworthy, that the samples analysed by LC-MS/MS all had weak spectra, which impacted directly on the low number of queries and confident protein identifications. Usually, this effect is consistent with low starting amounts of protein for analysis. Further, the amount of injected protein for analysis in LC-MS/MS is limited to the upper limit of detection of the mass spectrometer. This is combined with the care that was taken to avoid contaminating the mass spectrometers with S. alaskensis proteins by injecting too much sample. Overall, the data and observations indicated that the low intensity problem was more complex than simply starting amounts of protein for analysis.

L. Ting, UNSW. 83 Chapter 2

2.4.3.3. LC-MS/MS: Evaluation of Tris vs. urea, injection volume, and LC gradient length The addition of PMSF and EDTA to Tris and urea was evaluated for LC-MS/MS analysis of proteins. The injection volume was also increased in order to increase the MS peak height of peptides. In addition, the LC gradients were increased with the aim of increasing MS peak width in order to compensate for possible peptide co-elutions and MS duty cycle limitations. Increasing sample injection volume and LC gradient generally improved data. In contrast, the separation of a 10L sample over 90min for samples U3a and U3b (Table 2.6) resulted in an equivalent or lesser number of identified proteins when compared to 5L separated over 60min. The MS performance of U3a and U3b illustrated that at a sample-specific point (dependent on amount of protein present in the sample), the further increase of injection volume and LC time did not further improve spectra or number of identified proteins. LC gradients that are too long, negatively impact on peak height, resulting in lowered signal to noise and poorer quality peptide matches. As a further measure of the superior performance of urea PE compared to Tris PE, when all of the proteins from the urea extraction were pooled and compared to the pooled proteins from the Tris extraction, ~3.5-fold more proteins were identified using the former (Figure 2.12). Also, the 6 proteins unique to the Tris buffer may not be only due to a buffer effect; it is known that with each additional MS analysis of the same sample, the number of protein identifications will increase (Link et al., 1999; Goodchild et al., 2004). Compared to the Tris PE extraction buffer, the urea PE buffer generated more spectral data and protein identifications. However, the disadvantage of using a urea-based extraction buffer is its incompatibility with LC-MS/MS. High of urea in samples affects the quality of mass spectra. Because urea is a highly positively charged compound, if it is not removed prior to LC-MS/MS analysis, it will release all the contaminants bound to the RP C18 trap into the mass spectrometer, resulting in high background noise, undesired peaks and contribute to the overall accumulation of contaminants in the mass spectrometer. Samples containing urea should be cleaned up, usually with an offline C18 column or ZipTip. It should be noted, however, that the intensity of the urea PE samples were still low, and in general the number of confident protein identifications were <1% genome coverage, indicating that a clean-up step to remove urea prior to LC-MS/MS would not significantly benefit

84 L. Ting, UNSW. Method development

analysis. Finally, the low intensity of the data indicated that the method required more optimisation to achieve comprehensiveness in surveying the proteins in a sample. 2.4.3.4. An in silico investigation of poor LC-MS/MS of S. alaskensis proteins An in silico analysis of the theoretical S. alaskensis proteome was performed to elucidate the reason for poor protein identification using LC-MS/MS. Many sample preparation optimisation experiments were performed, which generally resulted in weak spectra and few identifications. It was hypothesised that the challenge faced with improving identification could be caused by an anomaly in the proteome. The S. alaskensis proteome was hypothesised to be resistant to trypsin digestion due to either a low proportion of R and K residues; or after tryptic digestion, the tryptic peptides produced were not within the optimal range for MS analysis. The S. alaskensis proteome characteristics were compared to a number of organisms known to perform well in MS-based investigation, a comparison was also made to all the sequenced organisms in the NCBI database. To investigate the first hypothesis, the proportion of R and K residues were examined in the S. alaskensis proteome and compared to E. coli, S. cerevisiae, P. angustum and all organisms in the NCBI database. Trypsin cleavage of proteins occurs at Rand K residues; therefore, the summed R and K residues were examined (Table 2.9). The proportion of R and K residues in all tested organisms and the NCBI database were found to be very similar, thus indicating that the proteome of S. alaskensis should not be resistant to trypsin proteolysis. It should be noted, that this analysis does not account for species-specific protein folding, where R and K residues may become inaccessible to trypsin. The use of a Tris or Tris PE buffer would be an example of this situation, because Tris has no protein unfolding action. The use of a protein extraction buffer such as urea or urea PE is expected to unfold globular proteins and generally make proteins more amenable for digestion; therefore resulting in increased protein identification data. The outcomes from the LC-MS/MS experiments follow these expectations; however, the number of queries or spectra and number of confident protein identifications were far below what was expected from a whole cell lysate. Additionally, the intensity of all samples analysed were consistently weak, i.e. low S/N. To address the second hypothesis, where S. alaskensis tryptic peptides were expected to be outside the optimum range for MS analysis, trypically proteolysed

L. Ting, UNSW. 85 Chapter 2

S. alaskensis peptides were generated in silico, and compared to E. coli and S. cerevisiae. The optimal peptide length for MS analysis is ~7-25 amino acid residues(Siuzdak, 1996). Very small peptides have small m/z values, thus, they may fall below the detection limit of the mass spectrometer, and similarly, very large peptides have large m/z values that can also be beyond the detection of the mass spectrometer. Although the three organisms examined have different sized theoretical proteomes, where S. alaskensis, E. coli and S. cerevisiae have 3,208, 4,391 and 5,869 protein coding genes, respectively, the theoretical tryptic peptide profiles of these organisms were identical, with ~50% of all tryptic peptides between 7-25 residues long. Therefore, the data demonstrated that S. alaskensis tryptic peptides were amenable for MS analysis, and that poor performance of LC-MS/MS was not due to sub-optimal peptide length. The in silico analyses of the S. alaskensis proteome clearly showed that the frequency of R and K residues and the trypsin digested peptide lengths were similar to that of a range of organisms and the NCBI database. Most significantly, E. coli and S. cerevisiae are model organisms commonly used in MS analyses, and do not have problems with trypsin digestion. Therefore, the low intensity samples generated by gel- free LC-MS/MS cannot be attributed to an anomaly in the proteome of S. alaskensis. 2.4.3.5. Outcome of optimising LC-MS/MS for application in a quantitative proteomics analysis of S. alaskensis Many of the parameters investigated were performed in parallel with other parameters due to the expense and limited number of hours of machine time. Regardless of the mixed effects of multiple parameters on the performance of LC-MS/MS, it is clear that there was a challenge in achieving a comprehensive survey of the proteins present in the samples.The experiments evaluating LC-MS/MS as a method for analysis of S. alaskensis clearly demonstrated that the low intensity of spectra resulting in low Mascot query numbers and low SEQUEST and DTA Select filtered peptide and protein identifications was not a simple problem, but the most likely solution lies in a multifaceted approach for optimising sample preparation. Further experimental options to improve sample quality could include a change in the cell disruption approach. The sonication time required to optimally disrupt S. alaskensis cells is very long when compared to other organisms such as Escherichia coli (~30s to 1min) (Harlow & Lane, 1999). Sonication has been documented to cause local heating in the sample, resulting in protein denaturation and aggregation (Harlow &

86 L. Ting, UNSW. Method development

Lane, 1999); however, this probably did not occur in the above samples because a GeLC-MS/MS platform was successful in generating many confident protein identifications from the same samples that performed poorly in LC-MS/MS. Regardless of this, it would be of interest to investigate the difference in LC-MS/MS performance, if any, with a different method of cell disruption such as French press or detergent lysis. Further parameters for optimisation could include modifying protein extraction buffer composition, extra clean-up procedures such as TCA precipitation. Optimising LC-MS/MS to achieve strong MS signals and acceptable numbers of protein identifications (in the hundreds to thousands) was considered to be a large task and unnecessary. A GeLC-MS/MS approach for analysis was also evaluated, and met the requirements for application inthe quantitative proteomics study of S. alaskensis proteins (Section 2.4.3.6). 2.4.3.6. GeLC-MS/MS: A comparison to LC-MS/MS To directly compare the performance of a gel-free LC-MS/MS vs. GeLC-MS/MS platform, the same samples (A and B) were analysed by GeLC-MS/MS. The GeLC- MS/MS identified 98 and 72 proteins, with high confidence, for samples A and B, respectively. For a direct comparison with LC-MS/MS, samples A2 and B2 (Table 2.4) had the most similar conditions tested; no RapiGest, and all LCQ parameters were identical. However, only 1 and 6 proteins were identified from samples A2 and B2, respectively. Even with different parameters expected to improve protein identification numbers for LC-MS/MS (A3-B6, Table 2.4), there were generally ~2- to 6-fold more identifications using GeLC-MS/MS(Table 2.4, Table 2.5 and Table 2.6). GeLC-MS/MS was expected to generate more protein identifications because of the extra gel-based dimension of protein separation. Comparing GeLC-MS/MS to LC/LC-MS/MS instead of LC-MS/MS would have been a more realistic comparison, because the extra dimension of separation results in additional protein identifications. However, the overall weak intensity and low number of protein identifications using LC-MS/MS alone did not merit an extra dimension of LC. Thus, analysing the samples by LC/LC-MS/MS would not have improved protein identifications due to the absence of strong peptide signals. The ideal method for sample preparation was a gel-free approach where most of the steps are automated because it is fast and minimises sample handling. One of the primary reasons for the reduction of sample handling is to avoid loss of proteins, and

L. Ting, UNSW. 87 Chapter 2

consequently quantitative information of the S. alaskensis proteome. It should be noted, however, that quantitative inaccuracies caused by sample handling error was expected to be negated once the 14N and 15N samples were combined – this is one of the advantages of metabolic labelling, where errors or loss would occur in parallel for both conditions tested. Consequently, protein loss would most probably not result in quantitative errors, but result in a general loss of information of cell biology. The advantage of using a GeLC-MS/MS platform was that the lysate from every experiment was visualised by SDS-PAGE. The efficiency of protein extraction and separation could be qualitatively confirmed with every gel run. Further, using a gel- based separation of proteins, where proteins were boiled in SDS and DTT, can be more successful than an LC-based separation of proteins or peptides. Proteins unfolded by SDS and thermal denaturation are more vulnerable to proteolytic attack; therefore more proteins were expected to be available for tryptic digestion, resulting in more identifications from MS analysis using GeLC-MS/MS. 2.4.3.7. GeLC-MS/MS: Tris PE vs. urea PE Increasing the culture volume and starting amount of protein for a larger capacity SDS- PAGE gel directly resulted in a consistent increase in the number of confidently identified proteins. To illustrate this point, 98 and 72 proteins were confidently identified by GeLC-MS/MS analysis of 35g total protein from a 50mL culture (Table 2.7), while 235 and 230 proteins were identified from 1mg of total protein from a 250mL culture (Table 2.8. Tris PE vs. urea PE GeLC-MS/MS results.).In addition, using Tris PE or urea PE for protein extraction generated very similar numbers of confidently identified proteins (Table 2.8 and Figure 2.15), as opposed to gel-free LC- MS/MS, where there was a clear skew in the number of protein identifications towards a urea extraction. A large proportion of confidently identified proteins were extracted in both Tris PE vs. urea PE buffers, and equal proportions of proteins were unique to either Tris PE or urea PE (Figure 2.15). This observation clearly supports the use of both Tris- and urea-based extraction buffers in order to increase proteome coverage by use of different buffers.

88 L. Ting, UNSW. Method development

2.4.3.8. Outcome of optimising GeLC-MS/MS for application in a quantitative proteomics analysis of S. alaskensis GeLC-MS/MS as an analytical platform outperforms LC-MS/MS and was determined as the best option for a comprehensive quantitative survey of S. alaskensis. The advantage of using GeLC-MS/MS includes the tolerance of SDS-PAGE to urea and the qualitative visualisation step in separating proteins using polyacrylamide gels. A disadvantage, however, is the labour intensive experimental protocol. Nevertheless the large number of high quality and confidently identified peptides and proteins using GeLC-MS/MS is convincing evidence to support the use of GeLC-MS/MS for quantitative analysis of the S. alaskensis proteome.

2.4.4. Mass spectrometry optimisation

The LCQ Deca XP+ was the primary mass spectrometer used for evaluating LC-MS/MS vs. GeLC-MS/MS and optimising the sample preparation protocols. In 2006, one year after commencement of the current study, an LTQ-FT Ultra was commissioned at the Bioanalytical Mass Spectrometry Facility in UNSW. Due to the superior performance of the instrument, the MS analysis of S. alaskensis samples was transferred to the LTQ-FT Ultra. LC and MS optimisation experiments were performed prior to using the LTQ-FT Ultra for quantitative analysis of biological samples. 2.4.4.1. Comparison of parameters The performance of each tested LC and MS parameter was compared and judged by the number of Mascot queries and number of confident protein identifications from Mascot and SEQUEST searches against the S. alaskensis database (Table 2.10). These were the most important results with respect to LC and MS parameters. Since SEQUEST and DTA Select output files were required for RelEx protein quantitation, the number of non-redundant peptides and proteins after SEQUEST database interrogation and DTA Select filtering were central in assessing the performance of the tested parameters. Although the data generated from Mascot database searches cannot be directly quantified using RelEx, these searches were still performed in order to gain extra information regarding parameter optimisation. In addition, the number of Mascot queries provided an indication of the number of spectra generated in each experiment, where the larger the number of queries or unfiltered peptides, the more MS spectra present in the data. It should be noted, however, that the number of Mascot queries are

L. Ting, UNSW. 89 Chapter 2

the same as the number of unfiltered SEQUEST peptides. Since these two values were the same, only the Mascot query numbers were provided (Table 2.10). Secondary to the number of queries, peptide and proteins; the protein score and percent coverage for the top protein hit were also provided for Mascot and SEQUEST searches (Table 2.10). It was decided that although these data were important in protein identification studies, they were not effective in differentiating the performance of the LC and MS parameters tested. For example, all of the protein scores for the Mascot top hits were very similar, and did not show any trend according to the tested parameters (Table 2.10). 2.4.4.2. LTQ-FT Ultra: LC parameter optimisation A smaller dilution of tryptically digested peptides in the HCOOH/HFBA solution resulted in more protein identifications (Table 2.10 and Figure 2.18). Similarly, injecting more sample volume resulted in more protein identifications (Table 2.10 and Figure 2.19). The primary aim of testing four different sample dilution ratios and two sample injection volumes, from least to most amount of protein, was to avoid signal saturation by introducing excessive amounts of protein into the mass spectrometers and thus contaminating the instruments. After testing four different LC gradients, a 75min gradient generated the most protein identifications. The longer the separation gradient, the more data that can be obtained due to the effect of broadening peptide peaks in MS1. However, this feature is dependent on the initial amount of peptide, where insufficient amounts of starting sample or overly long gradients decrease the maximum intensity of peptide peaks. The reduction in peak intensity can result in a reduction of S/N to the point of losing protein identifications. The number of identified peptides and proteins from SEQUEST were primarily used to assess the performance of increasing LC gradient lengths. It was not anticipated that a 60min gradient did not yield more protein identifications when compared to a 45min gradient (Table 2.10 and Figure 2.20). A consistent increase in the number of confidently identified proteins was expected from 30min to 75min. The expected increase was present on inspection of the average number of Mascot queries generated by each gradient length (Figure 2.21). Since Mascot queries correlate to the number of spectra in each experiment, it is clear that in general, more spectra were generated as the LC gradient time increased. The most likely reason for the decreased number of

90 L. Ting, UNSW. Method development

identified proteins and peptides using a 60min vs. 45min gradient was that 45min samples were analysed at higher concentrations than 60min samples. Specifically, all 45min samples were diluted 1:20 in HCOOH/HFBA, while 60min samples were diluted 1:60, 1:80 or 1:100 (Table 2.10). 2.4.4.3. LTQ-FT Ultra: MS method optimisation The comparison of the performance of LTQ ion trap vs. FTICR cell survey scans clearly demonstrated that using the ion trap for MS scans generated more Mascot queries and protein identifications. For this reason, and to allow for closer comparisons with the data already collected from the LCQ, it was decided to use the linear ion trap to obtain survey scans for quantitative measurements. Many groups have successfully used MS scans from an LCQ (e.g. MacCoss et al., 2003) or LTQ for quantitative studies (e.g. Zybailov et al., 2005). The aim of optimising the number and type of MSn scans was to maximise quantitative information. Peptides were identified in SEQUEST using fragment ion information, i.e. MS2 spectra; while peptides were quantified in RelEx using precursor ion information, i.e. MS1 spectra. Therefore, to maximise quantitative information it was expected that reducing the number of MS2 scans to increase the frequency of survey scanning would increase data. However, a balance was sought between maximising protein quantitation with maximising identifications, as RelEx cannot quantify proteins without identification information. Therefore, sufficient MS2 scans were also required in order to achieve high quality confident identifications. Regardless of whether the ion trap or FTICR cell was used for precursor ion scanning, more protein identifications were generated with 4 MS2 scans, while more quantitative information was generated with 2 MS2 scans (Table 2.10). The number of peptide and protein identifications were examined in combination with the number of queries. The number of queries was used as an estimate for the quantity of data available for quantitation because the number of queries represent the number of possible peptides in each experiment. As the former increases, the greater the number of candidate peptides for quantitation. It is noteworthy that the number of protein identifications using 2 MS2 was comparable to 4 MS2 (Table 2.10). However, with 2 MS2 scanning there was a significant improvement in query number, therefore, 2 MS2 scanning was determined as the most favourable for the purposes of this project.

L. Ting, UNSW. 91 Chapter 2

In addition, MS2-5 scanning was performed to investigate its effect on identification and quantitation of proteins. As expected, the numbers of identifications were lower using MS2-5 scanning when compared to only MS2 scanning. MSn scans above MS2 yield additional structural information of selected precursor ions, resulting in its application in top down proteomics (Macek et al., 2006), glycosylation studies (Khidekel et al., 2004), and the structural analysis of small molecules (Miraglia et al., 2002), carbohydrates (Ashline et al., 2005), and (Ejsing et al., 2006). Since these applications were not the aim of the current study, it is therefore consistent that MS2-5 scanning did not produce as many queries or protein identifications when compared to only MS2 analysis. It could be attributed to the loss of MS2 information that was required for Mascot or SEQUEST for protein identifications while MS3, MS4 and MS5 scans were performed. 2.4.4.4. Optimisation of RelEx quantitation parameters The aim of optimising RelEx (MacCoss et al., 2003) was to establish a set of stringent parameters in order to confidently measure the differential abundance of proteins in S. alaskensis at 10ºC vs. 30ºC. The success of the RelEx approach was judged by examining the number of proteins quantified, as well as the quality of quantification data. A balance was sought between accepting too many false positives and rejecting too many false negatives. Experiment B (~1mg protein) was used to evaluate eight different combinations of RelEx parameters; and setting H was determined to successfully control both false positives and negatives and give high quality and confident data for protein quantitation (Table 2.12). The eight combinations of RelEx parameters were also tested against experiment A (~100g protein) to evaluate the performance of protein quantitation from a small starting sample. To evaluate the best combination of RelEx parameters, eight different combinations of parameters were applied to the data from Experiment B (Table 2.12). Peak detection was set on the default 4 scans before and after. A large peak detection window would increase the chance of erroneous quantitative measurements, thus a 2 scan window was tested (Settings B, D, E and F; Table 2.12). When compared to 4 scan parameter settings, less protein quantitative information resulted. For example, a 4 scan peak detection window in setting H resulted in 425 proteins with confident quantitative information, and a 2 scan peak detection window in setting E gave 382 proteins. It should be noted that setting H had a stricter minimum correlation value set for ratios 1

92 L. Ting, UNSW. Method development

and 10, clearly demonstrating that reducing the scan window contributed significantly in quantitation stringency, regardless of other parameters. Is it most likely that all the resulting proteins from a 2 window quantitation scan will be correct; however, the compromise in using such a stringent parameter is the loss of correct quantitative information that is beyond a 2 scan peak detection scan. Thus, an indiscriminate small peak detection window was deemed too strict, due to the loss of information (i.e. unacceptable false negatives); therefore, a more tolerant 4 scan peak detection window was chosen. To increase stringency, the application of a regression factor was required. A range of minimum correlation factors for regression were tested. Parameter settings C, G and H were directly comparable to each other, as all other parameters were the same (except where C did not have a minimum 2 peptide requirement) (Table 2.12). The lack of a 2 peptide minimum for quantitation using parameter C was expected to increase the resulting number of quantified proteins; however, this was not expected to confound the comparison of minimum correlation. The increase of minimum correlation factors consistently decreased the resulting number of quantified proteins due to an increase in quantitation stringency. However, the minimum correlation in parameter setting G (0.9 at ratios 1 and 0.8 at ratios 10), was deemed too strict, and was relaxed to 0.8 at ratios 1 and 0.7 at ratios 10 for parameter H. A minimum of 2 peptides per protein was required for quantitation in order to increase confidence in the measurements. A single peptide measurement is regarded statistically unreliable and a minimum of 2 peptides is generally accepted as requisite for confident quantitation of proteins (e.g. Lacerda et al., 2008; Usaite et al., 2008; Xun et al., 2008). In addition, a S/N filter was also evaluated with respect to maximising the number of quantified proteins while maintaining stringent conditions in rejecting peptides with signal intensities similar to the background noise. S/N values of 5 and 10 were tested, and it was established that using a S/N 10 cut-off was likely to result in a large proportion of false negatives (Table 2.12). This was illustrated by the comparison of parameter setting F to H, where all other parameters were the same (except for minimum correlation for regression at 10). Approximately half the number of quantified proteins were rejected when S/N was increased from 5 (setting H) to 10 (setting F). Also, a recent publication established that over half of all confidently identified peptides had S/N <10 from their model large-scale SILAC experiment, and quantitation accuracy was closely correlated with S/N (Bakalarski et al., 2008).

L. Ting, UNSW. 93 Chapter 2

The eight different combinations of RelEx parameters were tested against Experiment A to compare the performance of quantitation using a sample with 10-fold less starting protein. The trends observed with the different parameter settings were identical to those observed with Experiment B. Thus, the 10-fold decrease in the number of quantified proteins closely correlated with the 10-fold decrease in starting protein. It indicated that, unlike the LC-MS/MS experiments, the processing of S. alaskensis proteins using a GeLC-MS/MS platform results in predictable numbers of identified and quantified proteins. Finally, the analysis of 100g of protein clearly indicated that increasing the starting amount of protein for SDS-PAGE separation results in more protein identification and quantitation data. Thus, ~1mg of protein was determined to be the preferred amount for analysis.

2.4.5. Final protocol for quantitative proteomics analysis of S. alaskensis proteins using a GeLC-MS/MS platform

2.4.5.1. Cell culture and metabolic labelling

14 15 Cells were grown in 250mL volume cultures in NH4Cl or NH4Cl ASW and allowed to grow for 10 generations. Growth was monitored by spectrophotometry, and cells at mid-logarithmic phase (OD433 0.3) were harvested as described in Section 2.2.1. 2.4.5.2. Cell disruption Cell pellets were resuspended in 1.25mL of 10mM Tris-HCl pH 8.0, 1mM PMSF, 1mM EDTA or in 8M urea, 1mM PMSF, 1mM EDTA in a 1.5mL tube; the same buffer for each inverse metabolic labelling experiment pair was used. Pellets were combined and disrupted by sonication on ice for 4min in Tris PE or 2min in urea PE. Cell debris was pelleted by centrifugation at 30,670g at 4ºC for 25min, and discarded.

2.4.5.3. GeLC-MS/MS Approximately 1mg of protein was subjected to separation using a 5 lane 1.5mm SDS- PAGE and the entire 4 lanes were sliced horizontally into ~22 fractions, washed, reduced, alkylated and digested with trypsin in a 1:20 trypsin:protein ratio (Section 2.2.3.5). Tryptic peptides dried in vacuo were resuspended in 20L of 1% (v/v) formic acid and 0.05% (v/v) HFBA and diluted 1:20 in formic acid and HFBA. The samples (5L) were then separated using a nano-LC on an Ultimate 3000 HPLC and autosampler system. They were concentrated and desalted on a micro C18 precolumn

94 L. Ting, UNSW. Method development

-1 (500m × 2mm) with H2O/ACN (98:2, 0.05% (v/v) HFBA) at 20L min . After a 4min wash, the precolumn was switched (10 port valve, Valco) into line with a fritless nano column (75m × ~10cm) containing C18 media (5, 200Å Magic; Michrom

Bioresources, USA). Peptides were eluted using a linear gradient of H2O/CH3CN (98:2,

0.1% (v/v) formic acid) to H2O/CH3CN (55:45, 0.1% (v/v) formic acid) at 250nL/min over 75min. High voltage (1.8kV) was applied to a low volume tee (Upchurch Scientific, USA) and the column tip positioned ~0.5cm from the heated capillary (T = 200°C) of an LTQ-FT Ultra mass spectrometer. Positive ions were generated by electrospray and the mass spectrometer operated in data dependent acquisition mode. A survey scan was collected by the LTQ linear ion trap (350-1750 m/z) followed by ion trap 2 MS2 scans, where the 1st and 2nd most intense precursor ions from the MS tracewere sequentially isolated and fragmented using CID at 35% normalized collision energy with an activation q = 0.25 and activation time of 30ms with a minimum signal required at 2000 counts. Dynamic exclusion was enabled, where after a maximum of 2 repeated MS2, the parent ion was excluded for 3min. 2.4.5.4. Protein identification and quantitation MS data were interrogated against the S. alaskensis database using SEQUEST, with trypsin as the selected enzyme with the allowance for 1 missed cleavage; carbamidomethyl, methionine oxidation and acrylamide were selected for variable modifications; MS tolerance was ± 0.8Da; and MS/MS tolerance was ± 0.2Da. DTA Select was used to filter identification data, where accepted peptides were required to have dCn 0.08; Xcorr 2.1, 2.7 and 3.2 for +1, +2 and +3 charged peptide species, respectively. Proteins were quantified using RelEx, where 4 scans before and after peak detection were included; regression of 0.8 was applied at 14N/15N ratios 1 and 0.7 at ratios 10; S/N cut-off of 5 was applied; and a minimum of 2 peptides were required for quantitation.

2.4.6. Conclusion

As this study represented the first example of the application of metabolic labelling- based quantitative proteomics to S. alaskensis, a range of experimental and post- experimental parameters were evaluated and optimised for analysis. The aim of these method development experiments were to determine optimum sample preparation, LC and MS parameters to yield confident and high quality data in order to examine the cold

L. Ting, UNSW. 95 Chapter 2

adaptation biology of S. alaskensis, by comparing protein profiles at 10ºC vs. 30ºC. Cell culture, protein extraction, GeLC-MS/MS analysis and post-experimental data processing were optimised to give a standard proteomics workflow applicable to S. alaskensis samples.

96 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

Chapter 3. Normalisation and statistical analysis of quantitative proteomics data generated by metabolic labelling

3.1. Summary

Quantitative proteomics is a powerful analytical method for monitoring the responses of biological systems to changes in growth parameters. To make confident inferences about biological responses, it is essential that proteomics approaches incorporate appropriate statistical measures of the data. A critical issue in quantitative proteomics is the lack of a general standard for the determining statistical significance of differential abundance. To create a rigorous statistical approach, a range of practical criteria relevant to proteomics experiments need to be considered. These include accounting for biological and technical replicates (Karp et al., 2005; Chich et al., 2007), sample pooling (Karp et al., 2005), normalisation within and between experiments (Kreil et al., 2004; Callister et al., 2006; Paoletti et al., 2006), accounting for “the missing data problem” (Chang et al., 2004; Jung et al., 2005; Chich et al., 2007; Pedreschi et al., 2008), and deciding on which statistical testing approaches are most suitable for dealing with false positives, false negatives and/or false discovery rates (FDR) in protein identification (Choi & Nesvizhskii, 2008; K ll et al., 2008) and quantitation (Karp et al., 2007). Also, the multiple testing problem has rarely been considered in quantitative proteomics studies, to date. In this section, microarray-based normalisation and statistical analysis (significance testing) methods were employed to analyse quantitative proteomics data generated from the 14N/15N metabolic labelling of a S. alaskensis. To test approaches for normalisation, cells were grown at a single temperature and metabolically labelled with 14N or 15N and samples combined in controlled ratios to give artificially skewed datasets. Inspection of MA plots determined that a fixed-value median normalisation was most suitable for the data. To determine an appropriate statistical method for assessing differential abundance, a fold change approach, Student’s t-test, unmoderated t-test and empirical Bayes moderated t-test were applied to a large proteomics dataset from cells grown at two temperatures. Inverse metabolic labelling was used with

L. Ting, UNSW. 97 Chapter 3

multiple technical and biological replicates, and proteomics was performed on cells that were combined based on equal optical density of cultures (which resulted in bias due to different light scattering properties of the cultures when grown at different temperatures) or on cell extracts that were combined to give equal amounts of protein (which removed bias). A total of 2,135 high confidence protein identifications representing 66% of the coding capacity of the genome were made and 1,172 proteins with high quality data were quantified. To account for arbitrarily complex experiment-specific parameters, a linear modelling approach was used to analyse the data using the limma package in R/Bioconductor. A high quality list of statistically significant differentially abundant proteins was obtained by using lowess normalisation (after inspection of MA plots) and applying the empirical Bayes moderated t-test. The approach also effectively controlled for the number of false discoveries, and corrected for the multiple testing problem using the Storey-Tibshirani FDR. The approach developed is generally applicable to quantitative proteomics analyses of diverse biological systems.

3.2. Materials and methods

3.2.1. Microbial growth and physiological characterisation

S. alaskensis, was grown in artificial sea water (ASW) medium (Eguchi et al., 1996) at 10 C and 30°C with rotary shaking at 100 rpm as previously described (Fegatella et al., 1998). Colony forming units were measured by the drop plate method on VNSS media as previously described (Eguchi et al., 1996; Fegatella et al., 1998). Protein yields were measured spectrophotometrically, using a Bradford assay with BSA as a standard (Bradford, 1976). Scanning electron microscopy was performed with cells grown at 10ºC or 30ºC to mid-log phase. Cells were fixed with 2% (w/v) glutaraldehyde for 2h and filtered through a 0.2m membrane filter. The filters were washed with 75%, 50%, and 25% ASW that contained 0.04% (w/v) MOPS for 3min in each washing step. The cells were dehydrated in a graded series of 30%, 50%, 70%, 80%, 90% and 95% ethanol for 10min. The final dehydration step was performed three times for 10min with 100% ethanol. Samples were critical point dried with carbon dioxide using a BAL-TEC CPD 030 critical point dryer (BAL-TEC, Liechtenstein) and were sputter coated with chromium to ~0.5nm thickness using a K575X Peltier cooled high resolution sputter

98 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

coater (Emitech, UK). Coated samples were viewed under vacuum at an accelerating voltage of 15kV and magnifications from 2,500 to 35,000 on a Hitachi S-3400N automated vacuum pump scanning electron microscope (Hitachi, Japan). From the electron micrographs of each growth temperature, 100 cells were selected using a 36 point sampling grid used in unbiased stereology, to visually compare and measure cell biovolume (V), using equation [1]. ͐Ɣ_ Ɛ͑ͦ Ɛʚ͆Ǝʛ ͨ ͧ [1] Where W is width and L is length of the cell (Krambeck et al., 1981). All scanning electron microscopy work was performed by Seah Lay Hoon in the School of Biotechnology and Biomolecular Sciences, UNSW.

3.2.2. Metabolic labelling

Cells grown at 10ºC and 30ºC were metabolically labelled during growth in unlabelled 14 15 ( NH4Cl) and labelled (99% enriched NH4Cl) media where all other sources of N had 15 15 been eliminated. Cells grown in NH4Cl ASW werelabelled to 99% N incorporation in 10 generations of growth. 14N and 15N cells were combined after cell harvest at mid logarithmic growth phase of OD433 0.3 (Fegatella et al., 1998). To produce an artificially skewed data set, cells were grown at 30 C with 14N or 15N and cell pellets combined at OD433 0.3 in 0.8:1, 1:1, and 1.2:1 ratios in triplicate (total of 9 experiments) (Figure 3.1).To assess differential abundance, cells were grown at 10ºC or 30ºC, comprising six biological replicates and a total of 20 experiments. Four biological replicates (A-D), with two experimental replicates and two MS instrumental replicates per sample, representing 16 experiments of 14N/15N and inversed 15N/14N, 10°C and

30°C cell pellets were combined 1:1 from cultures harvested at OD433 0.3 (Figure 3.2). An additional two biological replicates (E-F) with two MS instrumental replicates per sample, providing four experiments, were cultured as above, proteins were extracted separately from the 14N and 15N, 10°C and 30°C samples, and the extracts were combined 1:1 (14N/15N and inversed 15N/14N, 10°C and 30 C) based on protein concentration (Figure 3.3).

L. Ting, UNSW. 99 Chapter 3

Figure 3.1. Inverse metabolic labelling workflow to produce an artificially skewed data set. S. alaskensis cells grown at 30ºC were metabolically labelled 14 15 during growth in unlabelled ( NH4Cl) and labelled (99% enriched NH4Cl) media where all other sources of N had been eliminated. Samples were combined in14N/15N 0.8:1, 1:1, and 1.2:1 ratios in three biological replicates representing nine technical replicates in a Tris PE protein extraction buffer. GeLC-MS/MS was used as a platform for protein separation and MS analysis, followed by protein identification, quantitation, and intra-experimental normalisation and statistical testing.

100 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

Figure 3.2. Inverse metabolic labelling workflow of experiments A-D. S. alaskensis cells grown at 10ºC and 30ºC were metabolically labelled during 14 15 growth in unlabelled ( NH4Cl) and labelled (99% enriched NH4Cl) media where all other sources of N had been eliminated. 10ºC and 30ºC samples were 14 15 combined 1:1 N/ N as cell pellets with OD433 0.3in four biological replicates (experiments A-D) representing 16 technical replicates in either a Tris (experiments A-B) or urea (experiments C-D) protein extraction buffer. GeLC- MS/MS was used as a platform for protein separation and MS analysis, followed by protein identification, quantitation, and intra-experimental normalisation and statistical testing.

L. Ting, UNSW. 101 Chapter 3

Figure 3.3. Inverse metabolic labelling workflow of experiments E-F for 10ºC vs. 30ºC experiments. S. alaskensis cells were processed as for experiments A-D (Figure 3.2), with the exception that proteins from the 10ºC and 30ºC samples were extracted separately in a Tris extraction buffer and combined 1:1 14N/15N based on protein concentration. Two biological replicates, representing four technical replicates were processed.

3.2.3. Quantitative proteomics using GeLC-MS/MS

Quantitative proteomics methodology has been detailed elsewhere (Section 2.4.5; and Ting et al., 2009). In brief, to enhance proteome coverage, two different extraction buffers were used to extract proteins. Proteins from all artificial skew experiments and temperature comparison experiments A, B, E and F were extracted in a 10mM Tris-HCl pH 8.0, 1mM EDTA, and 1mM PMSF buffer (Figure 3.1, Figure 3.2 and Figure 3.3). Proteins from experiments C and D in the 10ºC vs. 30ºC experiments were extracted in 8M Urea, 1mM EDTA, and 1mM PMSF buffer (Figure 3.2). Using the GeLC-MS/MS

102 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

platform, ~1 mg protein was subjected to separation by 1D SDS-PAGE, and entire lanes were sliced into ~22 fractions. Protein gel bands were reduced, alkylated and digested overnight with trypsin. Extracted peptides were analysed by online nanoLC-MS/MS on a linear ion trap (LTQ, ThermoFisher Scientific, USA) after concentration and desalting on a micro C18 precolumn. A survey scan of 350-1750 (m/z) was collected, followed by two MS/MS scans where the 1st and 2nd most intense precursor ions from the MS trace were sequentially isolated and fragmented using CID. Proteins were identified using the completed S. alaskensis genome (3208 proteins) with the SEQUEST search algorithm in Bioworks Biobrowser (v 3.3), and filtered using DTA Select (Tabb et al., 2002). A decoy database search was performed, using a randomised S. alaskensis sequence, to ensure a ~1% false positive identification rate (Peng et al., 2003). Proteins were quantified using MS survey scans in RelEx software (v 0.92) (MacCoss et al., 2003), and only those with two or more peptides were considered for quantitation.

3.2.4. Data processing

Raw data produced by RelEx were imported into R (v 2.5.1) (Ihaka & Gentleman, 1996), a statistical analysis program, using custom code. Since there was no R software for analysing proteomics data, a library of computer code which extended the limma (v 2.1) library (Gentleman et al., 2004) in R/Bioconductor(Smyth, 2004) was developed

(Section 3.2.7 and Appendix B). Peptide ratios were log2 transformed, and S/N ratios 14 15 were log10 transformed and averaged to obtain the abundance ratio of N/ N, and the average S/N for each protein. Since RelEx does not output absolute abundance estimates for each metabolically labelled sample, S/N was used as a proxy for abundance.

3.2.5. Normalisation

An MA plot was used for each experiment to determine whether ratios were symmetrically distributed about the 1:1 ratio, or whether there were any systematic trends warranting removal by normalisation. For the artificially skewed dataset, median normalisation was performed according to Yang et al. (2002). For the 10ºC vs. 30ºC dataset, lowess normalisation (Dudoit et al., 2002) was performed using a sliding window to remove non-linear systematic trends by subtracting a value from each protein ratio that was computed using the ratios of proteins with similar S/N values; protein ratios that were closer to the ratio being adjusted received higher weighting.

L. Ting, UNSW. 103 Chapter 3

3.2.6. Linear modelling

To estimate the magnitude of effect of changing temperature on protein abundance, a linear modelling approach, which is widely used in DNA microarray experiments (Smyth, 2004),was adopted. For each experimental parameter of interest, linear models produce estimates of the effect size, standard error and residual degrees of freedom (df), for each protein, which can be used for downstream statistical analysis. The effect size due to changing temperature and extraction buffer were determined by fitting the following linear model to the abundance ratios for each protein according to equation [2]:

yi = temp  Itemp + buffer  Ibuffer + i [2] 14 15 Where yi is the normalised observed N/ N log2 ratio for protein i across all 20 experiments; temp is a coefficient estimating the average log2 fold change (FC) between

10ºC vs. 30ºC; Itemp is an indicator variable of +1 when the 10ºC sample was labelled with 14N and the 30ºC sample was labelled with 15N, or -1 in the inverse labelled condition; buffer is a coefficient estimating the average log2 FC due to different extraction buffers; Itemp is an indicator variable that is 0 when a Tris extraction buffer was used, or 1 when a urea buffer was used (Table 3.1); and i is the residual error. Thus 14 15 the observed N/ N ratio (yi) of each protein is a combination of the 10ºC vs. 30ºC effect (temp), the buffer effect (buffer) and a residual error ( i). Additionally, prior to fitting the linear model [2], the average magnitude of correlation between technical replicates from within the same biological sample, was determined according to Smyth (2004).The values of the coefficients temp and buffer were estimated by least squares regression (Smyth, 2004), including the replicate correlation calculated above. To demonstrate the utility of properly accounting for the effects due to technical replication and the extraction buffer, the results from the full linear model above were compared to those obtained from fitting a simple linear model [3]:

yi = temp × Itemp + i [3] This linear model only corrects for the 14N/15N label reversal, but otherwise treats all observations as independent, and any observed variation is due to the effect of a change in temperature.

104 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

Table 3.1. Linear modelling design matrix. Description of experiments Indicator variables Experiment Name 14N 15N Buffer Biorep 10ºC vs. 30ºC buffer 1 A1_1 10ºC 30ºC Tris A 1 0 2 A1_2 10ºC 30ºC Tris A 1 0 3 A2_1 10ºC 30ºC Tris A 1 0 4 A2_2 10ºC 30ºC Tris A 1 0 5 B1_1 30ºC 10ºC Tris B -1 0 6 B1_2 30ºC 10ºC Tris B -1 0 7 B2_1 30ºC 10ºC Tris B -1 0 8 B2_2 30ºC 10ºC Tris B -1 0 9 C1_1 10ºC 30ºC Urea C 1 1 10 C1_2 10ºC 30ºC Urea C 1 1 11 C2_1 10ºC 30ºC Urea C 1 1 12 C2_2 10ºC 30ºC Urea C 1 1 13 D1_1 30ºC 10ºC Urea D -1 1 14 D1_2 30ºC 10ºC Urea D -1 1 15 D2_1 30ºC 10ºC Urea D -1 1 16 D2_2 30ºC 10ºC Urea D -1 1 17 E_1 10ºC 30ºC Tris E 1 0 18 E_2 10ºC 30ºC Tris E 1 0 19 F_1 30ºC 10ºC Tris F -1 0 20 F_2 30ºC 10ºC Tris F -1 0 The linear model design matrix accounts for experimental variables including the experimental 14N/15N label swap, where some 10ºC samples are labelled with 14N (+1) while others are labelled with 15N (-1), the Tris (0) or urea (1) protein extraction buffer, and the number of technical replicates in each experiment (two for experiments E-F and four for experiments A-D). The biorep factor groups the experiments into biological replicates, and the technical replicates within each biological replicate are expected to have a higher correlation to each other than between biological replicates.

3.2.7. Statistical analysis of differential abundance

Following the estimation of the average FC, standard error and df of each protein by equation [2], an empirical Bayes moderated t-statistic was calculated for each protein (Smyth, 2004). The error estimate for each protein was replaced with a moderated estimate; i.e., one that was pooled towards the population estimate of the average standard error. The df are augmented, thereby allowing statistical analysis of proteins with as little as 1 measurement. To demonstrate the utility of the moderated t-statistic, and the benefits due to accounting for technical replicate correlation and the extraction buffer, t-statistics were also calculated using, 1) a one sample two-sided Student’s t-test using the simple linear model that only accounted for 14N and 15N label reversal [3] (hereafter referred to as the Student’s t-test), and 2) a one sample two-sided Student’s t- test where data was fitted to the full linear model accounting for 14N and 15N label reversal, correlation due to technical replicates from the same biological sample, and

L. Ting, UNSW. 105 Chapter 3

extraction buffers [2] (hereafter referred to as the unmoderated t-test). P-values were determined for all proteins using the t-statistic and the df by standard lookup tables. P-values from each of the 3 methods were adjusted for multiple testing in R by both the Bonferroni-correction and by the Storey-Tibshirani FDR (Storey & Tibshirani, 2003) using the methods p.adjust and qvalue, respectively. All data processing, normalisation and statistical analyses were performed in collaboration with Mark J. Cowley in the School of Biotechnology and Biomolecular Sciences, UNSW.

3.3. Results

Over 400,000 tandem mass spectra were generated resulting in 2,135 unique and high confidence protein identifications (66% genome coverage) and 1,172 proteins with high quality data were quantified using two or more peptides (37% genome coverage) (Appendices C, D and E). Approximately 230 proteins were identified in only one experiment, and 65 of the most abundant proteins were detected in all 20 experiments (Figure 3.4).

Figure 3.4. The frequency of detection of new proteins. Quantitative proteomics data from 10ºC vs. 30ºC experiments, from 6 biological replicates and a total of 20 MS analyses. 3.3.1. Normalisation of an artificial dataset

In order to test whether normalisation was useful for effectively removing skews in data, S. alaskensis cultures grown at 30ºC and labelled with either 14N or 15N were 14 15 combined in known ratios of 0.8:1, 1:1 and 1.2:1. The N/ N log2 ratios of each

106 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

peptide from the same protein were measured. From the distributions of protein ratios from each experiment, the medians were 0.88  0.1, 1.04  0.13, and 1.21  0.15 for experiments mixed 0.8:1, 1:1, and 1.2:1, respectively (Table 3.2). This demonstrated the ability of these methods to detect relatively subtle differences in protein abundance.

Table 3.2. Protein quantitation for artificially skewed data. DTA Select RelEx No No. No. NR No. NR No Experiment No. proteins matched proteins peptides proteins medianb minc maxd SDe spectra after spectra identified identified quantifieda filtering 0.8:1a 42888 1618 153 836 147 106 0.86 0.63 1.31 0.11 0.8:1b 52119 4900 432 1660 399 225 0.94 0.70 1.36 0.10 0.8:1c 49808 4237 496 1975 469 264 0.79 0.55 1.30 0.09 0.86 0.63 1.33 0.10 1:1a 43083 1204 138 678 133 84 1.01 0.72 1.49 0.15 1:1b 33738 3185 365 1419 333 192 1.02 0.68 1.46 0.13 1:1c 43420 4282 415 1779 368 212 1.04 0.67 1.68 0.12 1.02 0.69 1.54 0.13 1.2:1a 28519 815 129 527 122 62 1.28 0.92 1.63 0.16 1.2:1b 32324 3272 401 1467 363 189 1.14 0.83 1.76 0.13 1.2:1c 58369 7146 644 3042 585 379 1.20 0.68 1.74 0.16 1.21 0.81 1.71 0.15 aThe number of proteins identified in each experiment with at least two peptides. The bmedian, cminimum, dmaximum, and estandard deviation of 14N/15N ratios. The mean 14N/15Nratios were within 0.02 of the median in all cases (data not shown). Numbers in bold are the average values for each 14N/15N ratio combination set.

The distributions of protein ratios were plotted against a normal distribution with the same mean and standard deviation, and a normal distribution with the same standard deviation but using the expected mean (Figure 3.5a-c). The distribution of observed ratios against expected ratios under a normal distribution were compared using a Q-Q plot (Figure 3.6) and revealed that, in general, the protein ratios were heavy-tailed. This was caused by the detection of a greater number of proteins with large ratios than would be expected for a normal distribution, and was likely to reflect proteins that are naturally variable in their abundance. Such a pattern of protein abundance is similar to two-colour microarray analysis of RNA abundance, suggesting that microarray normalisation approaches are applicable to quantitative proteomics data.

L. Ting, UNSW. 107 Chapter 3

Figure 3.5. Normalised and unnormalised density distribution of artificially skewed data. A, Three 1:1 14N/15N experiments; B, Three 0.8:1 experiments; C, Three 1.2:1 experiments. Data (A, B, C) were plotted using 14N/15N ratio (x-axis) and probability density (y-axis). Experimentally observed distribution of the 14N/15N ratios, solid line; normal distribution with the mean and standard deviation of the observed data, dotted line; normal distribution with the expected mean and observed standard deviation, dashed line.

108 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

Figure 3.6. Q-Q plots of the artificially skewed dataset. The observed 14N/15N ratio of each protein was plotted (y-axis) against the theoretical value expected under a normal distribution (x-axis). The majority of the data points lie along the straight line, indicating that a majority are consistent with a normal distribution. However, the tails of the distribution show that the observed protein ratios are heavy tailed, most likely caused by the detection of a larger number of proteins with large ratios than would be expected for a normal distribution. A.Three 1:1 14N/15N experiments; B. Three 0.8:1 experiments; C. Three 1.2:1 experiments.

Inspecting MA plots prior to normalisation helps to determine whether a relationship or trend exists between the protein ratio, and abundance levels. With the artificially skewed 30ºC dataset, such a trend was not detected (Figure 3.7), and the ratios appeared to be symmetrically distributed, differing only in their median values (Figure 3.5). Consequently, median normalisation was performed to remove the skew,

L. Ting, UNSW. 109 Chapter 3

Figure 3.7. MA plots of peptides and proteins pre- and post-normalisation. Unnormalised peptides (column 1), proteins (column 2) and normalised proteins 14 15 (column 3) were plotted in MA plots, using the log2 N/ N ratio of abundance (y-axis) and log10 S/N (x-axis), and a non-linear, locally weighted regression

110 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

line drawn (pink line). Four different datasets are shown (rows 1-4). 1. Artificially skewed 1.2:1 14N/15N experiment, where the absence of a systematic trend in ratio vs. S/N validated a median normalisation of proteins across the experiment. 2. Skewed dataset created by combining 1:1, 10ºC 14N and30ºC 15N samples from experiments A-D (combined based on the same OD). 3. As for 2. except inversely labelled (10ºC 15N and30ºC 14N). 4. Non-skewed dataset generated by combining 1:1, 10ºC 14N and30ºC 15N samples from experiments E-F (combined based on equal protein concentration). Proteins were globally normalised using lowess normalisation for all 10ºC vs. 30ºC experiments (2.-4.).

which forced the medians of each distribution of protein ratios to a log2FC of zero (Figure 3.7 and Appendix F). Since the shapes of the distributions were very similar (Figure 3.8b), no additional inter-experimental normalisation was performed.

Unnormalised protein ratios Normalised protein ratios

Figure 3.8. Box and whisker plots of pre- and post-intra-experiment normalisation of artificially skewed data. The median ratio from each experiment, shown as a thick black line, is surrounded by a box which contains the median ± 25% of the data, and the whiskers extend two standard deviations from the median. The horizontal width of each box is proportional to the number of proteins in each experiment. The distribution of protein ratios in the artificially skewed dataset before (panel A) and after (panel B) median- normalisation show that the skew (0.8:1 and 1.2:1) can be successfully removed by median normalisation as the dataset does not contain any systematic non- linear trends.

L. Ting, UNSW. 111 Chapter 3

3.3.2. A naturally skewed biological dataset

In the four biological replicates (A-D) of 10ºC vs. 30ºC cultures (representing 16 MS runs), the 14N and 15N labelled cells were combined based on equal OD, which produced a 4- to 8-fold skew towards protein abundance for the 30ºC samples (both in the 14N/15N and 15N/14N inverse labelling conditions) (Figure 3.7 and Appendix G). In contrast, biological samples E and F were combined 1:1 by protein concentration and no skew was observed (Figure 3.7 and Appendix H). To examine the biological cause for the skew, morphological examinations were performed on S. alaskensis cells grown at 10ºC and 30ºC. Scanning electron micrographs showed that at 10ºC, cells appear primarily as individuals, while at 30ºC cells tend to be clumped together, connected by an extracellular matrix (Figure 3.9). The biovolume of cells grown at 10ºC was 0.13 ± 0.03m3, whereas they were approximately 1.4-fold larger (0.18 ± 0.03m3) at 30ºC.

Figure 3.9. Scanning electron microscope images of S. alaskensis. Cells grown at 10ºC (A-C) and 30ºC (D-F). Scale bars: A and D, 10m; B and E, 2m; C and F, 1m.

3.3.3. Evaluating normalisation for effectively removing skew while preserving differential protein abundance

MA plots revealed there was a strong, non-linear trend between the protein ratios, and the S/N ratios, such that the ratio became larger as the S/N increased in the 10ºC vs.

112 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

30ºC experiment (Figure 3.7). In two-colour microarrays, lowess normalisation is commonly adopted to remove non-linear dependence between the abundance-ratio and the abundance-level (Yang et al., 2002). Applying lowess normalisation to datasets A-D removed the non-linear trend and produced protein ratios with a median of zero (Figure 3.7, Appendices G and H). The distributions of normalised protein ratios across experiments were very similar and additional inter-experiment normalisation was not performed. Although the normalisation appeared to effectively remove the systematic bias in protein ratios, the large skew was further evaluated to determine whether it compromised the ability to score relative protein abundance. Four samples that had been combined based on equal OD (A1_1, A2_1, B1_1, B2_1)were compared to four samples that had been combined based on equal protein concentration (E_1, E_2, F_1, F_2). A total of 607 proteins were common to the E-F and A-B datasets. The common proteins were sorted into five groups by examining their average FC values (Table 3.3).

Table 3.3. Comparing post-normalisation protein quantities for experiments that were combined by optical density (A-B) or protein concentration (E-F)

Category Condition No. proteins Correlation 1 Both proteins FCa> 1.5-fold, in the same 37 positive directionb 2 One protein FC > 1.5-fold and FC < 1.5-fold 101 positive in the other protein, both in the same direction 3 Both proteins FC < 1.5-fold in the same 272 positive direction 4 Both proteins FC < 1.5-fold in different 145 uncertain directions 5 Both proteins FC > 1.5-fold, in different 52 negative directions aFC; FC is expressed as a 14N/15N value. bDirection refers to the relative increased abundance of proteins towards either the 14N or 15N label.

The majority (groups 1-3 representing 67.5%) of normalised protein abundances were positively correlated (i.e. equivalent trends with growth temperature in both datasets). An additional 23.9% (group 4) had opposing trends, but with small FCs (< 1.5-fold), and correlation was judged as uncertain. Only 8.6% (group 5) were judged poorly correlated as their normalised quantities were opposing and above a 1.5 FC. The overall

L. Ting, UNSW. 113 Chapter 3

agreement between the two datasets was a strong indication that the normalisation effectively removed skew in the data while preserving the more discreet abundance differences that were generated as a function of temperature (Table 3.3).

3.3.4. Linear modelling as a framework for statistical analysis

A linear model that accounted for the relevant experimental parameters (i.e. the correlation expected to exist due to extraction buffer effect, technical replicates and the 14N/15N label swap) was fitted to the observed protein ratios [2]. Estimates of the effect size, standard errors, and residual df were calculated by least squares regression. Additionally, to demonstrate the downstream effect of accounting for correlations due to experimental parameters, a simple linear model that ignored the effect of the extraction buffer and technical replicate correlation, was fitted [3]. To determine the statistical significance of each protein, p-values for each protein were estimated under both linear models, using the Student’s t-distribution with the appropriate df for each protein (Table 3.1). The additional correlation that existed between the technical replicates from each experiment (A-F) was estimated to be 0.672 (Section3.2.6). Using this value and the experiment label (A-F) as a blocking variable (Table 3.1), the data from the 20 10ºC vs. 30ºC experiments were fitted to a linear model (Section3.3.4), and the coefficients for the effect of temperature (temp) and the extraction buffer (buffer) on protein abundance were estimated. The coefficient estimates were directly interpreted as the 14 15 average N/ N FC (log2) due to their respective experimental parameter; thus, proteins with |temp| >0.585 (log2) or >1.0 (log2) represent proteins with a 1.5 or 2 FC, respectively. The temp coefficient estimates were used in subsequent significance testing.

3.3.5. Evaluating the utility of linear modelling using an unmoderated Student’s t-test

From the 1172 proteins that were identified, 954 were detected at least twice, and could be analysed by the simple linear model [3] (Table 3.4). The simple linear model coupled with an unmoderated Student’s t-test tested the null hypothesis that the average abundance ratio due to the effect of temperature = 0, with (n-1) df, where n is the number of observations for each protein. Using p < 0.05, 325 proteins had significant

114 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

changes in abundance due to temperature (Table 3.4). Similarly, 830 proteins detected at least three times, or only two times from the same extraction buffer (i.e. no estimate for buffer, but a valid estimate for temp), could be analysed by the full linear model [2] (Table 3.4). Using p < 0.05, the full linear model estimated 144 proteins with significant abundance changes.

Table 3.4. Comparing methods of significance testing with the 10ºC vs. 30ºC dataset. Student's t-testa Unmoderated t-testb Moderated t-testc P d Bonf e FDRf P Bonf FDR P Bonf FDR Ng 954 954 954 830 830 830 1172 1172 1172 < 0.05h 325 56 272 144 14 58 214 11 45 E(FP)i 48 0.05 14 42 0.05 3 59 0.05 2 FDR (%)j 15 0.1 5 29 0.4 5 28 0.5 5 Three methods for estimating differential protein abundance were compared: aStudent’s t-test where each protein abundance measurement was treated as independent, using a simple linear model accounting for 14N and 15N label reversal only; bUnmoderated t-test with the full linear model; cEmpirical Bayes moderated t-test with the full linear model. hThe number of differentially abundant proteins with statistic < 0.05 for dunadjusted p-values, eBonferroni corrected p-values or fFDR corrected q-values.gThe number of proteins analysed from the 1172 quantified proteins; proteins analysed by the Student’s t-test had at least two observations, the unmoderated t-test had at least three observations or two observations from the same extraction buffer, and the moderated t-test had at least one observation. iExpected false positives. jFDR was calculated by dividing E(FP) by the number of differentially abundant proteins with statistic < 0.05.

3.3.6. The empirical Bayes moderated t-test

The moderated t-statistic, which aims to improve the standard error estimate by borrowing information from all observed proteins about the typical standard errors, has been widely used in the analysis of microarrays. The standard error estimate, and number of residual df for each protein are replaced by an estimate that is smoothed towards the population average. This has the additional benefit that proteins with only a single observation can be tested for differential abundance (Lonnstedt & Speed, 2002; Smyth, 2004). Using the full linear model, a moderated t-statistic was calculated for all 1,172 quantified proteins, and p-values were estimated using the adjusted standard errors and df with standard lookup tables (Appendix I). The abundance of 214 proteins was significantly different (p < 0.05) (Table 3.4).

L. Ting, UNSW. 115 Chapter 3

3.3.7. Comparison of the moderated t-test to an unmoderated t-test

Given that the moderated t-test was capable of analysing 342 (41%) more proteins than the umoderated t-test (Table 3.4), and there were an additional 70 (49%) differentially abundant proteins in the moderated t-test relative to the unmoderated t-test, the observed increase in the number of differentially abundant proteins could be due to chance. In order to perform an unbiased comparison between the unmoderated and the moderated t-tests, the moderated t-test was restricted to the same 830 proteins tested in the unmoderated t-test. Accordingly, 168 proteins were determined to be significantly differentially abundant; 24 more than the unmoderated t-test (data not shown).

3.3.8. Comparison of correction approaches for multiple hypothesis testing

To correct for multiple hypothesis testing, the popular, yet conservative method of Bonferroni correction was compared to the positive FDR proposed by Storey-Tibshirani (2003) using data from the moderated t-test (Figure 3.10).Histograms of unadjusted p- values were plotted (Figure 3.10). The accumulation of proteins with small p-values was indicative of a number of true alternative hypotheses; i.e. there were proteins with evidence for differential abundance. The shapes of the distributions were compatible with the assumptions required for using the Storey-Tibshirani positive FDR. Thus the FDR for each protein was conferred as a q-value (Appendix E), where the q-value provided an intuitive measure of the likelihood of each protein being differentially abundant (Kerr & Churchill, 2001). Using the p-values or q-values estimated by the moderated t-test with the full linear model; 214 proteins with no correction, 11 proteins with the Bonferroni correction and 45 proteins with the FDR correction were significantly differentially abundant. The expected number of false positive proteins was 59, 0 and 2, respectively (Table 3.4).

116 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data pi0 under H0 estimated pi0 vs. null (none cated that a unadjusted P−values 0.0 0.2 0.4 0.6 0.8 1.0 0

50

200 150 100 Frequency pi0 under H0 estimated pi0 N label reversal; B, An unmoderated t-test with the full linear there was a large number of true alternate hypotheses. A, A there was a large number of true alternate hypotheses. 15

i.e. N and 14 unadjusted P−values 0.0 0.2 0.4 0.6 0.8 1.0 0

50

150 100 Frequency pi0 under H0 estimated pi0 In order to determine the appropriate significance test to adjust for multiple-hypothesis testing in an experiment, a unadjusted P−values histogram of unadjusted p-values must be examined. The dashed line is the expected density of each bin if all proteins were of the proteins were differentially abundant) (H0). The estimated proportion of truly null hypotheses (pi0) (dotted line), indi large number of proteins show evidence for differential abundance, Student’s t-test with a simple linear model accounting only for Figure 3.10. Histograms of raw unadjusted p-values arising from the testing of each protein for differential abundance in 10ºC 30ºC experiments. model; C, An empirical Bayes moderated t-test with the full linear model 0.0 0.2 0.4 0.6 0.8 1.0

0 50

300 250 200 150 100 ABC Frequency

L. Ting. UNSW 117 Chapter 3

3.3.9. Fold change approach

Using the FC approach, 278 and 84 proteins with FC> 1.5 and 2, respectively, were differentially abundant (Table 3.5).

Table 3.5. Fold Change approach outcomes.

1.5-fold change 2-fold change Na 280 84 Rb 892 1088 E(FP)c - - aThe number of proteins with 14N/15N ratios greater than the FC threshold. bThe number of rejected proteins below the FC threshold. cThe expected number of false positives (FP), which, by using an FC threshold approach can not be determined.

3.4. Discussion

The objective of this work was to examine approaches for normalisation, and select an appropriate significance test that maximised the final list of protein candidates from quantitative proteomics analyses. Also, the number of acceptable and interpretable levels of false discoveries had to be controlled in order to enable effective biological interpretation of the data. It was concluded that an empirical Bayes moderated t-test incorporating FDR q-values gave the largest set of data with a sensible and controlled number of false discoveries.

3.4.1. A naturally skewed biological dataset

Spectrophotometric measurement of cells (OD) is based on the light absorbing quality of the cells in solution, and can be affected by cell size, the properties of the membrane, the internal structure of the cell and the presence of materials that absorb light (McGann et al., 1988). Cell clumping (as observed for S. alaskensis) may cause inconsistent OD measurements and may result in OD values not providing a true reflection of culture turbidity (Meyers et al., 1998). Conceivably, cell aggregation of S. alaskensis at 30ºC could increase light absorbance (and hence OD), but aggregated cells could sink more rapidly than free cells and have the effect of lowering OD readings. The enhanced levels

118 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

of extracellular matrix at 30ºC could have also impacted on OD measurements. Furthermore, S. alaskensis is a yellow-pigmented bacterium that produces , and yellow colouration was more intense in cells grown at 10ºC; thus differences in colour may also affect OD readings. Also, CFU/mL measurements might have been skewed due to different responses in plate viability at 10ºC and 30ºC. It has been noted that the temperature at which the cells are cultured affects plate viability when measuring the CFU of S. alaskensis (Schut, 1994). Cells grown in liquid culture at 5ºC for extended periods of time performed better in plate viability measurements than those grown at 30ºC (Schut, 1994), indicating that measuring CFU of 30ºC cultures would result in an under-estimation. These analyses identify factors that could contribute to the skew in protein abundance that is observed for 10ºC vs. 30ºC grown cells, but the explanation is complicated and involves multiple contributing factors. Irrespective of biological basis for the skew, it is relevant that the skew is caused by growth temperature. In effect, it is important to be able to compensate for the skew so that biological differences between 10ºC and 30ºC cultures (including the skew itself) may be inferred from the comparative proteomics data.

3.4.2. Normalisation

Normalisation of quantitative proteomics data, or indeed of high-throughput biological data in general, greatly assists in reducing differences between datasets caused by experimental artifacts, and highlights real biological differences. Experimental artifacts that might contribute to differences in 14N/15N ratios include pippetting errors (at various stages of sample processing) and sample quality, and various unpredictable or potentially uncontrollable factors. For microarrays, normalisation is typically performed to control for intra-experiment variation, and subsequently for inter-experiment variation (Yang et al., 2002). Determining which normalisation procedures to use requires careful assessment of the data that is generated. This work demonstrated that MA plots were useful for first inspecting metabolic labelling data and assessing symmetry around the 1:1 ratio and identifying non-linear relationships, that then warrant removal by normalisation. Fixed value median normalisation was suitable for the artificially skewed dataset because it did not contain any systematic non-linear trends (Table 3.1 and Figure 3.7). Lowess normalisation was

L. Ting, UNSW. 119 Chapter 3

useful for the OD-based comparative proteomics dataset (A-D) because there was a systematic non-linear trend associated with the 4- to 8-fold skew (Figure 3.7). As illustrated by the use of these two specific approaches, normalisation needs to be considered on a case-by-case basis. The point should also be made that to avoid over- manipulation of data, normalisation should be kept to a minimum and only used when there is good evidence that it is required (e.g. from MA plots).

3.4.3. Linear modelling as a framework for statistical analysis

Accounting for the correlation between arbitrary experimental effects prior to statistical testing is vital for an accurate assessment of significance. If these are not properly accounted for, the estimation of effects due to the primary experimental parameter of interest, in this case, temperature, will be confounded. Experimental effects are unique to each experiment and must be carefully considered; they can include technical replicates, extraction buffers, label reversals, different mass spectrometers, and varying LC and MS parameters. The issue of technical replication is not immediately obvious; however, the peptide ratios obtained from multiple technical replicate runs from the same biological sample were not independent because smaller measurement errors than expected exist between technical replicates. Modelling of MS-based proteomics data for post-experimental data processing has not been extensively used. Limited examples include the work by Fernandez et al. (2008), where a linear mixed model was applied to 2DE data. Fodor et al. (2005) applied a linear model to 2D-DIGE data, and Daly et al. (2008) introduced mixed effects modelling for their gel-free LC-MS/MS data. In the current study, fitting the quantitative proteomics data to a simple linear model [3] versus the full linear model [2] in a Student’s t-test was used to demonstrate the value of linear modelling on downstream statistical analyses (Table 3.4). Despite only an additional 124 (15%) proteins analysed in the simple linear model, there were an additional 185 (129%) differentially abundant proteins in the simple model, relative to the more accurate full model. This marked enrichment existed over a number of different p-value thresholds, and is indicative of the gross under-estimation of the true measurement variation that is otherwise ignored in the simple linear model. Finally, the increased number of apparently “significant” proteins is likely to adversely affect biological conclusions drawn from the data; accounting for the correlation between

120 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

arbitrary experimental effects is critical.

3.4.4. Significance testing

Significance testing was absent in the early development and publication of quantitative proteomics studies. Its purpose is to create a list of confident protein abundances to enable robust biological interpretations based on the observed changes between test samples. An appropriate test should maximise the number of proteins with significant changes in abundance and minimise the number of false discoveries and false negatives. A measure of the suitability of the approach is to examine the FDR, or the expected false positives generated by the test against the number of proteins deemed significant. The four significance testing approaches (FC, Student’s t-test, unmoderated t-test, and moderated t-test) combined with three multiple testing correction approaches (none, Bonferroni, Storey-Tibshirani FDR) provided markedly different outcomes, highlighting (as for normalisation) the need to adopt a considered approach to data treatment. 3.4.4.1. Fold change A commonly used method for identifying differentially abundant proteins is FC. By this approach, proteins with a FC larger than a defined cut off (e.g.1.5-fold or 2-fold) are classified as differentially abundant. While this is an intuitively simple approach, there are a number of limitations that have been widely discussed for microarray work (Gusnanto et al., 2007) which are relevant to quantitative proteomics. The FC approach assumes that all proteins have the same variance (or standard error of measurement). However, for many reasons this may not be the case. Proteins with low abundance that are close to the detection limit of the mass spectrometer may tend to have more variability than those with higher cellular abundances. In addition, if a highly abundant protein (e.g.10,000 copies per cell) such as a ribosomal or cell structure protein increases 1.4-fold, this represents a large increase in the balance of protein synthesis and has implications for nutrient and energy utilisation. An equivalent 1.4-fold increase for a protein (e.g. gene regulatory protein) with 10 copies per cell will have negligible impact on the energy balance. It is clear that the biological roles of individual proteins must be considered when judging protein abundance differences (Section 3.4.5). Another factor that is not effectively dealt with by only considering FC is how many times the FC difference is observed. Clearly, a single measurement is not as reliable as a FC

L. Ting, UNSW. 121 Chapter 3

measurement for a protein that is quantified in 20 out of 20 experiments. One of the most important limitations of basing assessments on FC is the lack of statistical confidence defining the probability of differential abundance. Taking this approach, the risk of making false biological conclusions is, therefore, high. 3.4.4.2. Comparing the empirical Bayes moderated t-test to an unmoderated t-test The empirical Bayes moderated t-test is an extension of a one-sample Student’s t-test and shrinks large variances towards the population estimate, but also increases the variance for proteins that have very small variances and few observations (Lonnstedt & Speed, 2002). The benefit of this approach is largest for proteins with few observations and even allows the estimation of p-values for proteins with a single measurement. The unmoderated t-test (a Student’s t-test with data fitted to the full linear model) calculates variance from the data that is available for each protein, while the moderated t-test borrows information from the entire dataset to better estimate the variance of each protein. Furthermore, the unmoderated t-test performs poorly when the number of data points in each group is small. In the unbiased comparison of both approaches where the data was fitted to a full linear model, and the same 830 proteins were analysed, the moderated t-test generated 24 extra significant proteins, suggesting that it had more power to detect differential protein abundance than the unmoderated t-test (Appendix E). Further, using Sala_1422 as a specific example, a moderated t-test was more conservative with estimating significance, where Sala_1422 with a small average log2 FC of 0.463 and only 2 observations, had an unmoderated t-statistic of 39.28 (p = 0.016) due to a small standard error estimate; the moderation of standard errors resulted in a far more conservative moderated t-statistic of 1.057 (p = 0.327) (Appendix E). Overall, the data demonstrated that compared to an unmoderated t-test, the moderated t-test had more power to detect significant changes in protein abundance while being more conservative in estimating significance. 3.4.4.3. False discovery rates and correcting for multiple testing Determining a suitable statistical threshold is a trade off between the false positive and false negative rates. Using a full linear model with the empirical Bayes moderated t-test, the Bonferroni and FDR methods for multiple testing correction were compared to each other, and to no correction. The Bonferroni method clearly demonstrated a strong

122 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data

control over the false positive rate at the expense of identifying differentially abundant proteins (Table 3.4). The FDR method (q < 0.05) identified 4 times as many differentially abundant proteins as the Bonferroni method, with 2 false positives expected. Using FDR thresholds that are frequently used in microarray studies, 90, 138 and 217 differentially abundant proteins were identified with q < 0.1, 0.15 or 0.2, respectively (Figure 3.11). Lastly, it is important to note that there was a 28% FDR, where p < 0.05 was used, without correcting for multiple testing. This serves to highlight that using a q < 0.2 (20% FDR) value (which may seem to be a large error value) provides a better statistical outcome than a p < 0.05 in an uncorrected significance test.

Figure 3.11. Number of differentially abundant proteins passing q-value thresholds. FDR q-value thresholds of < 0.05, 0.1, 0.15 and 0.2 were applied to the 10ºC vs. 30ºC dataset after significance testing using a moderated t-test. There was a linear increase in the number of significantly changed proteins as the q-value increased, and as a result, a linear increase of the number of expected false positives.

3.4.5. Statistical vs. biological significance

As discussed above (Section 3.4.4.1) there are good reasons to distinguish statistical significance from biological significance; the most important being that without having confidence in the proteomics outcomes (statistical significance) it is not possible to

L. Ting, UNSW. 123 Chapter 3

draw confident inferences about the biology. From the 280 proteins with FC> 1.5 (Table 3.5), approximately half (n = 135) had a q > 0.2 (Appendix E). This illustrates the potential difficulties that would be created for interpreting the biology if half of the proteins are in fact, not reliably associated with the test conditions being examined (in this case, the effect of temperature). One method for exploring this difference is via a volcano plot, where the relationship between FC and statistical significance (q-value) can be examined (Figure 3.12a). Proteins that have a q-value < 0.2, and a FC > 1.5 are differentially abundant by both the statistical, and FC approaches. Proteins that satisfy both statistical and FC criteria (Figure 3.12b), the FC criteria only (Figure 3.12c), or the statistical criteria only (Figure 3.12d), have been highlighted. The majority of proteins with large FCs, but insignificant changes (Figure 3.12c), arise from proteins with fewer than five observations. In these cases, the large FC may be associated with high variance, and in cases that involved few measurements, are less likely to be indicative of a consistent and important biological change. Importantly, the statistical approach is capable of identifying proteins that have small but consistent changes in abundance (Figure 3.12d) that would have been overlooked using a FC thresholding approach.

124 L. Ting, UNSW. Normalisation and statistical analysis of quantitative proteomics data es < 1.5. B-D, ed proteins from the 10ºC vs. 30ºC dataset, ABCDEF ABCDEF ABCDEF

# $ 0.0

1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 −0.2 −0.4 −0.6 −0.8 Fold Change (log2) Change Fold L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L Fold Change (log2) L L L −4 −2 0 2 4 1

0.1 0.2 FDR 0.01

0.001 1e−05 1e−04 FDR !" Normalised FC values of three representative proteins were plotted across all 10ºC vs. 30ºC experiments to illustrate experimental variance. FC value (i.e. the protein satisfi abundance (q < 0.2) and a large cant differential representative protein with statistically signifi A B, A abundant (q > 0.2). D, cantly differentially FC that is not signifi representative protein with a large A both statistical and FC criteria). C, abundance (q < 0.2) and small FC. cant differential representative protein with statistically signifi Figure 3.12. Assessing statistical vs. biological relevance. A, Volcano plot of all 1172 quantifi plot of all 1172 Volcano A, Assessing statistical vs. biological relevance. Figure 3.12. displaying the relationship between statistical signifi The log2 FC (x-axis) was plotted against the –log10 cance and FC of each protein. displaying the relationship between statistical signifi q-value threshold of 0.2 (dashed horizontal line) and > 1.5-FC (vertical dashed lines) are shown, where open circles are A q-value (y-axis). FCs, and closed circles are proteins with q-values < 0.2 FC abundant proteins with small q-values and large differentially

L. Ting. UNSW 125 Chapter 3

3.5. Conclusion

Complex quantitative proteomics experiments represent an analytical challenge for computing the probability of differential protein abundance while correctly accounting for an experimental design that can include label swapping, different extraction buffers, and biological and technical replicates. This study demonstrated that an optimum normalisation approach is dependent upon the individual experiment and must be assessed on a case-by-case basis by inspection of MA plots. Fitting the data to a full linear model accounting for correlation due to technical replication and extraction buffer, and statistical testing using an empirical Bayes moderated t-test coupled with a Storey-Tibshirani FDR was the best approach for maximising quantitative proteomics data while controlling false discoveries and correcting for multiple testing. The normalisation and statistical testing approach provided rigorous data processing and evaluation of differential abundance in a high throughput quantitative proteomics dataset and is generally applicable to global quantitative proteomics analyses.

126 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

Chapter 4. Unravelling the molecular mechanisms of cold adaptation in Sphingopyxis alaskensis

4.1. Summary

A major biosphere of the Earth is a cold marine environment, where ~ 90% of the ocean is less than 5ºC. Low temperature growth is presented with several challenges including decreased enzyme activity (where higher activation temperatures are required), the increased viscosity of liquid water, reduced fluidity of lipid membranes and changes in protein conformation. As described in Chapter 1, cold environments affect a range of cellular aspects including the uptake of nutrients, metabolic activity, energy generation, and growth rate. Organisms colonising cold environments must therefore adapt to the fundamental challenges presented by low temperature. As described in Chapter 1, S. alaskensis was isolated as a numerically dominant species from permanently cold, nutrient depleted (oligotrophic) ocean waters (4-10ºC) in the Alaskan and Japanese North Pacific, and the European North Sea (Eguchi et al., 1996; Schut et al., 1997; Eguchi et al., 2001). Its ability proliferate to high numbers in its relatively extreme environment demonstrates that it is successfully adapted to low temperature and oligotrophic conditions. The oligotrophy of S. alaskensis was the focus of previous research (Fegatella et al., 1998; Fegatella et al., 1999; Fegatella & Cavicchioli, 2000; Ostrowski et al., 2001; Ostrowski et al., 2004). This is the first study of S. alaskensis that is concerned with the ability to adapt to the cold. The aim of the work described in this chapter was to elucidate the molecular mechanisms of cold adaptation in S. alaskensis by comparing the global quantitative protein profile at 10 C vs. 30 C. Robust experimental approaches involving inverse metabolic labelling, data processing, intra-experimental normalisation and rigorous statistics provided a large dataset with which a global perspective of cold adaptation was gained. Key components of S. alaskensis cell biology that play a role in adaptation to the cold are lipid transport and metabolism, the cell wall, membrane and envelope, synthesis of exopolysaccharides and storage compounds, amino acid metabolism, transcription, translation, protein folding, and inorganic ion transport and metabolism. The pathways involved in the adaptation to high temperature are energy production and conversion, carbohydrate transport and metabolism, and defense mechanisms.

L. Ting, UNSW. 127 Chapter 4

4.2. Materials and Methods

Cell culture, sample preparation, quantitative proteomics, data processing, normalisation and statistical testing methodology have been detailed in Section 2.4.5, Chapter 3 and Ting et al. (2009).

4.2.1. Manual functional annotation

Significantly differentially abundant proteins at 10ºC vs. 30ºC were considered for biological significance. Proteins of interest were manually annotated for function using an in-house Evidence Rating (ER) system (Allen et al., 2009). Briefly, the protein sequence was searched against the Swiss-Prot and Protein Data Bank databases, using BLAST to identify the most closely related, experimentally characterised homologue available. Experimental evidence that was considered acceptable to define function included references detailing the expression and characterisation of the protein, protein crystallography studies, and mutation and complementation studies. References that only documented the nucleotide sequence, including genome sequences, were not considered to provide sufficient evidence of function (Allen et al., 2009). Similarity of the S. alaskensis protein to the BLAST-selected homolog was confirmed by examining InterPro and Pfam domain assignments. ER values (from 1 to 5) were assigned to each protein according to the confidence of functional assignments, with 1 being the highest confidence (Table 4.1). The validated and updated gene descriptions were uploaded onto the Intergrated Microbial Genomes (IMG) platform (Markowitz et al., 2007). After S. alaskensis proteins of interest were functionally annotated according to the ER system, the proteins were sorted into Clusters of Orthologous Genes (COG) categories.

128 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

Table 4.1. Evidence Rating system used for functional annotation of S. alaskensis proteins ER assignment Requirements ER1 Functional characterisation of the protein in S. alaskensis

ER2 Functional characterisation of the protein in any organism except for S. alaskensis. BLAST match must be 35%, all domains/motifs required for function must be present

ER3 Functional characterisation of the protein in any organism except for S. alaskensis. BLAST match is <35%, all domains/motifs required for function must be present

ER4 No experimental evidence, domains/motifs present ER5 Hypothetical protein, no experimental evidence, no domain/motif matches

4.3. Results

4.3.1. Protein identification and quantitation

The premise of the global quantitative proteomics experiments was that temperature sensitive changes in protein abundance correspond to metabolic pathways, thus providing insight into cold adaptation in S. alaskensis. A metabolic labelling platform using 14N/15N was developed to comprehensively identify and quantify proteins from S. alaskensis at 10°C (low temperature) vs. 30°C (high temperature). Over 400,000 tandem mass spectra lead to the high confidence identification of 2,135 proteins (66% genome coverage) (Appendix D) and the quantitation of 1,172 proteins (37% genome coverage) (Appendix E). Protein quantities were normalised using a lowess approach and the data from all 20 experiments were combined using linear modelling. To test for statistically significant changes in protein abundance, an empirical Bayes moderated t- test coupled with a positive False Discovery Rate (FDR) was applied to the data. Using a q < 0.2 FDR threshold for significance, 217 proteins had significant quantitative differences at 10ºC vs. 30ºC (Table 4.2).

L. Ting, UNSW. 129 Chapter 4

Table 4.2. MS results and number of proteins from identification, quantitation and statistical testing. No. No. MS/MS No. proteins No. proteins No. proteins with significant experiments generated identified quantified differential abundance

20 > 400,000 2,135 1,172 217 Proteins were identified with <1% false discovery and quantified with 2 peptides. Significantly differentially abundant proteins had q < 0.2

Proteins with significant changes in abundance between 10ºC and 30ºC were functionally annotated and sorted into COG functional categories (Figure 4.1 and Table 4.3). The number of differentially abundant proteins found in each COG category between 10 C vs. 30 C, revealed that the functional categories of proteins with key roles in cold adaptation included: lipid transport and metabolism; the cell wall, membrane and envelope; transcription and translation; replication, recombination and repair and inorganic ion transport. Closer inspection of the individual proteins that were significantly increased at 10ºC, also indicated the importance of protein folding in cold adaptation (Figure 4.1 and Table 4.3). In contrast, COG categories with increased representation at 30ºC included amino acid transport and metabolism; energy production and conversion; carbohydrate transport and metabolism; defense; and cell motility (Figure 4.1 and Table 4.3). Additionally, 65 proteins were identified in all 20 MS experiments, and they were also functionally annotated and sorted into COG categories (Table 4.4). Since mass spectrometry is an intensity-based technique, where the most abundant peptides in a parent ion scan (MS1) are selected preferentially for second dimension (MS2) fragmentation, then the most abundant proteins in a cell will be identified first, followed by proteins of decreasing abundance. As a result, a qualitative assessment can be made regarding the overall relative abundance of a protein based on the number of peptides confidently identified. Therefore, the proteins in 65 Table 4.4 are qualitatively assigned as the most abundant proteins in the cell. Although these proteins were not all significantly differentially abundant, their presence at high levels over 20 experiments may also suggest some ‘hard-wired’ role in cold adaptation. Some of these proteins are discussed below (Section 4.4.13). Finally, examination of the proteins with unchanged abundance (high q-values) provided insight into metabolic processes that are important

130 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

to the cell regardless of growth temperature. A list of 150 proteins with the largest q- values were sorted into COG categories (Appendix J).

Figure 4.1. Significantly differentially abundant proteins sorted by COG categories. Proteins increased at 30ºC are represented by black columns and proteins increased at 10ºC are represented by white columns. Proteins considered significantly differentially abundant were confidently identified using SEQUEST and DTA Select, quantified with a minimum of 2 peptides per protein using RelEx and had q < 0.2 after significance testing using an Empirical Bayes moderated t-test.

L. Ting, UNSW. 131 Chapter 4 0.04 0.01 0.18 0.05 0.15 0.19 0.07 0.11 0.16 0.14 0.14 0.14 0.19 0.01 0.16 0.13 2.E-03 2.E-03 rmation ly increased n all 20 MS 1.6 2.0 1.5 1.3 1.4 1.4 1.4 1.5 1.5 1.7 1.7 1.8 1.8 2.4 2.5 5.8 2.2 1.3 FC FDR ed from ER1-ER5 k etothiolase) (EC 2.3.1.16) (ER2) etothiolase) (EC 2.3.1.16) (ER2) etoacyl synthetase) (EC 2.3.1.41) (ER2) etoacyl-(acyl-carrier protein) reductase) (EC 1.1.1.100) k N fold change. FDR; false discovery rate expressed as a q-value, where a expressed as a N fold change. FDR; false discovery rate 15 N/ 14 etoacyl-CoA thiolase) (Beta- k etoacyl-CoA thiolase) (Beta- k Lipid transport and metabolism (I) anoate (PHA) synthesis regulator (ER4) inase (EC 2.7.1.30) (ER2) k etoacyl-acyl-carrier-protein synthase I (beta- (ER2) Acetyl-CoA C-acyltransferase (3- k 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) (ER2) Biotin carboxyl carrier protein of acetyl-CoA carboxylase (ER2) Acetyl-CoA carboxylase carboxyl subunit alpha (EC 6.4.1.2) (ER2) Secreted 3-hydroxybutyrate dehydrogenase (EC 1.1.1.30) (ER2) Acyl-CoA dehydrogenase (EC 1.3.99.3) (ER3) Acetyl-CoA C-acyltransferase (3- k Phosphatidylserine decarboxylase (EC 4.1.1.65) (ER3) Propionyl-CoA carboxylase (EC 6.4.1.3) (ER2) Long chain fatty acid-CoA (acyl-CoA synthetase) (EC 6.2.1.3) (ER3) Phasin (ER4) Polyhydroxyal k Glycerol

, 2009) (See Table 4.1 for full requirements of the Evidence Rating system). Proteins in bold were identified all 20 MS Sala_2162 Sala_3157 Sala_1438 Sala_1796 Sala_0938 Sala_1105 Sala_3158 Sala_1963 Sala_1223 Sala_0220 Sala_0504 Sala_1229 Sala_1948 et al. Table 4.3 Proteins with significant differential abundance at 10ºC vs 30ºC sorted by COG categories. vs 30ºC. Accession number After statistical testing, 217 proteins were determined to have significant differences in abundance at 10ºC refers to the RefSeq accession number for each protein. FC; with regard to confidence in function prediction, where ER1 denotes the highest confidence for function and ER5 no info with regard to confidence in function prediction, (Allen experiments, and thus were amongst the most abundant proteins in cell (see Table 4.4 for full list of identified i experiments). q < 0.2 significance cut-off was used. Unshaded proteins were significantly increased at 30ºC, shaded significant q < 0.2 significance cut-off at 10ºC. ER; the in-house Evidence Rating system used for manual functional annotation of proteins. Proteins were ran at 10ºC. ER; the in-house Evidence Rating system used for manual functional annotation of proteins. YP_617297 YP_617917 Sala_2255 Acetoacetyl-CoA reductase (EC 1.1.1.36) (ER2) Sala_2879 Glutaryl-CoA dehydrogenase (EC 1.3.99.7) (ER2) YP_615337 Sala_0281 Beta- k YP_615337 Sala_0281 YP_615226 Sala_0169 AMP-dependent synthetase and ligase (ER3) YP_617204 YP_618193 YP_616484 YP_616841 YP_615989 YP_616155 YP_618194 YP_617008 YP_616271 YP_615277 YP_615558 YP_616277 YP_616993 YP_615144 Sala_0085 3-oxoacyl-(acyl-carrier-protein) reductase (beta- k ACCESSION LOCUS DESCRIPTION

132 L. Ting. UNSW Mollecular mechanisms of cold adaptation in S. alaskensis 0.20 0.12 0.06 0.04 0.14 0.02 0.14 0.03 0.17 0.13 0.17 0.14 0.18 0.02 0.17 0.06 0.18 0.17 0.14 0.15 0.18 0.18 0.16 0.01 0.02 0.18 3.E-04 1.7 2.7 1.5 1.3 1.4 0.11 1.8 1.3 1.5 0.17 1.4 2.0 1.4 1.9 1.4 1.6 2.0 2.3 1.4 1.3 1.3 1.4 1.2 1.7 1.2 1.8 1.5 1.3

eto-3-deoxygluconate ) (EC 1.1.1.125) (ER2) Energy production and conversion (C) se (EC 1.6.99.5) (ER3) Carbohydrate transport and metabolism (G) B family protein (ER3) k inase pf k n oxidoreductase, Old Yellow Enzyme family inase (EC 2.7.1.2) (ER2) (ER2) (ER2) Aconitate hydratase 1 (EC 4.2.1.3) (ER2) Cytochrome c oxidase subunit II (EC 1.9.3.1) (ER3) Ribose/ isomerase family protein (EC 5.3.1.-) (ER4) 2-deoxy-D-gluconate 3-dehydrogenase (2- k Sala_2096 Sala_1405 Sala_3034 Sala_3035 YP_618094 Sala_3057 Inorganic diphosphatase (EC 3.6.1.1) (ER2) YP_616285 NADH:flavi YP_618022 Sala_2984 Sala_1237 Dihydrolipoamide dehydrogenase (2-oxoglutarate dehydrogenase, E3 component) (EC 1.8.1.4) (ER2) 1.4 YP_616221 YP_616364 Carbohydrate Sala_1172 YP_617910 Sala_2872 Triosephosphate isomerase (EC 5.3.1.1) (ER2) Sala_1317 (ER2) -bisphosphate aldolase (EC 4.1.2.13) YP_616283 Sala_1235 Dihydrolipoamide succinyltransferase (2-oxoglutarate dehydrogenase complex, E2 component) (EC 2.3.1.61) YP_615236 YP_615580 Sala_0179 Electron transfer flavoprotein (Etf), alpha-subunit (ER2) Sala_0526 Pyruvate dehydrogenase E1 component, beta subunit (EC 1.2.4.1) (ER2) YP_615577 YP_615845 Gluco k Sala_0523 YP_616160 Sala_1110 Enolase (2-phosphoglycerate dehydratase) (EC 4.2.1.11) (ER2) Sala_0792 Ribose-5-phosphate isomerase B (EC 5.3.1.6) (ER2) YP_616344 Sala_1297 NADH-quinone oxidoreducta NADH-quinone YP_616344 Sala_1297 YP_617272 YP_617016 Sala_2230 YP_615235 Malate dehydrogenase (EC 1.1.1.37) (ER2) Sala_1971 YP_617267 FAD-dependent pyridine nucleotide-disulphide oxidoreductase (1.18.1.-) (ER2) Sala_0178 YP_617269 Electron transfer flavoprotein (Etf), beta-subunit (ER2) Sala_2225 Dihydrolipoamide dehydrogenase (2-oxoglutarate dehydrogenase, E3 component) (EC 1.8.1.4) (ER2) Sala_2227 Dihydrolipoamide succinyltransferase (2-oxoglutarate dehydrogenase complex, E2 component) (EC 2.3.1.61) 1.5 YP_617146 Sala_2104 Putative ATP12 family chaperone protein (ER3) 12.8 YP_615233 YP_617330 Sala_0176 YP_616884 Succinyl-CoA synthetase, beta subunit (EC 6.2.1.5) (ER2) Sala_2288 ATP synthase F1, beta subunit (EC 3.6.3.15) (ER2) Sala_1839 Zn-dependent alcohol dehydrogenase (ER3) YP_617138 YP_616452 YP_618071 YP_618072 YP_615241 YP_615248 Sala_0184 Alpha glucosidase (EC 3.2.1.20) (ER2) Sala_0191 6-phosphogluconate dehydratase (EC 4.2.1.12) (ER2) YP_616366 Sala_1319 -3-phosphate dehydrogenase (EC 1.2.1.12) (ER2)

L. Ting. UNSW 133 Chapter 4 0.10 0.14 0.10 0.04 0.18 0.06 0.17 0.06 0.17 0.19 0.16 0.07 0.20 0.18 0.12 0.07 0.18 0.02 0.17 0.06 0.06 0.16 0.15 0.13 0.07 0.11 0.20 2.E-05 2.E-03 7 .5 1.6 1.8 1.3 1.3 1.5 1.5 1.6 1.7 1.8 2.5 3.0 1.6 1.5 1.3 1.8 2.5 1.5 1.5 1.7 1.4 1.4 1.4 1.3 2.2 1.6 1.5 doreductase) (PGDH) (EC 1.1.1.95) (ER2) 1.5 Nucleotide transport and metabolism (F) Amino acid transport and metabolism (E) inase (EC 2.7.2.4) (ER3) k 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase (EC 2.3.1.117) (ER2) 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase (EC 2.3.1.117) (ER2) Secreted Zn-dependent membrane (EC 3.4.13.19) (ER2) Histidinol phosphate aminotransferase (EC 2.6.1.9) (ER3) (ER2) Imidazole glycerol phosphate synthase subunit hisH (EC 2.4.2.-) Glutamate-5-semialdehyde dehydrogenase (EC 1.2.1.41) (ER2) Anthranilate synthase (EC 4.1.3.27) (ER2) Aspartate semialdehyde dehydrogenase (EC 1.2.1.11) (ER2) Aminotransferase class III (EC 2.6.1.-) (ER4) Sala_2828 Sala_2312 Sala_2887 Sala_3148 Sala_0581 Sala_1174 Sala_2718 Sala_1112 YP_616596 Sala_1550 Diadenosine tetraphosphate (EC 3.6.1.41) (ER2) YP_617866 YP_617354 YP_617925 YP_618184 YP_615635 YP_616223 YP_617756 YP_616221 YP_616171 Sala_1122 Beta-alanine-pyruvate aminotransferase (omega-amino acid--pyruvate aminotransferase) (EC 2.6.1.18) (ER2) 2. YP_616247 Sala_1198 Diaminopimelate decarboxylase (DAP decarboxylase) (EC 4.1.1.20) (ER3) YP_616913 YP_617862 Sala_1868 Glycine dehydrogenase (decarboxylating) alpha subunit (EC 1.4.4.2) (ER2) Sala_2824 Secreted beta-aspartyl-peptidase (EC 3.4.19.5) (ER2) YP_615224 YP_615826 Sala_0167 YP_615863 Phosphoserine aminotransferase (EC 2.6.1.52) (ER2) Sala_0773 Succinylglutamate-semialdehyde dehydrogenase (EC 1.2.1.71) (ER3) Sala_0810 Homoserine dehydrogenase (EC 1.1.1.3) (ER2) YP_616567 YP_615106 Sala_1521 Leucine dehydrogenase (EC 1.4.1.9) (ER2) Sala_0047 Branched-chain amino acid aminotransferase (EC 2.6.1.42) (ER2) YP_618033 Sala_2995 Secreted oligopeptidase B (Prolyl ) (Clan SC, family S9, subfamily S9A) (EC 3.4.21.83) (ER3) 1 YP_615670 Sala_0616 D-3-phosphoglycerate dehydrogenase (Phosphoglycerate oxi YP_616135 YP_616406 Sala_1085 YP_616518 3-isopropylmalate dehydrogenase (EC 1.1.1.85) (ER2) Sala_1359 Acetylornithine aminotransferase (EC 2.6.1.11) (ER2) Sala_1472 Ketol-acid reductoisomerase (EC 1.1.1.86) (ER2) YP_615549 Sala_0495 (ER2) Argininosuccinate synthetase (EC 6.3.4.5) YP_616774 YP_615206 Sala_1728 Secreted Zn-dependent membrane oligopeptidase (EC 3.4.24.71) (ER2) Sala_0149 Glutamine synthetase I (Glutamate--ammonia ligase I) (EC 6.3.1.2) (ER2) YP_615229 Sala_0172 Aspartate aminotransferase (EC 2.6.1.1) (ER2) YP_616912 Aspartate YP_617991 Sala_2953 Sala_1867 Glycine dehydrogenase (decarboxylating) beta subunit (EC 1.4.4.2) (ER2)

134 L. Ting. UNSW Mollecular mechanisms of cold adaptation in S. alaskensis 0.19 0.09 0.04 0.12 0.13 0.01 0.16 0.14 0.19 0.12 0.06 0.16 0.09 0.02 0.17 0.16 0.17 0.09 0.19 0.05 0.16 0.15 0.13 0.04 3.E-05 2.E-03 3.2 1.4 0.18 1.7 2.5 1.3 1.7 2.1 1.6 4.0 4.2 5.2 9.2 1.6 1.4 1.4 1.5 1.5 1.8 1.3 1.7 1.4 1.7 1.9 1.9 2.8 R4) 33.5 inase (EC 2.7.6.1) (ER2) k Coenzyme transport and metabolism (H) Replication, recombination and repair (L) Inorganic ion transport and metabolism (P) Inorganic ion transport and metabolism (P) inase (EC 2.7.1.24) (ER3) k Secondary metabolites biosynthesis, transport and catabolism (Q) inase (EC 2.7.4.8) (ER2) k aline (EC 3.1.3.1) (ER2) formyltransferase (AICAR transformylase); IMP cyclohydrolase (Inosinicase)] (EC 2.1.2.3) 3.5.4.10) (ER2) Guanylate Guanosine-5'-triphosphate,3'-diphosphate diphosphatase (pppGpp-5'-phosphohydrolase) (EC 3.6.1.40) (ER3) TonB-dependent receptor with predicted cobalamin (vitamin B12) specificity (ER3) Biotin synthase (EC 2.8.1.6) (ER2) Cobaltochelatase CobT subunit (EC 6.6.1.2) (ER2) TonB-dependent siderophore receptor (ER3) Copper resistance protein B precursor (CopB homolog) (ER3) Bacterioferritin (ER2) Al k Phosphate import ATP-binding protein pstB homolog (Phosphate-transporting ATPase) (EC 3.6.3.27) (ER2) Secreted amidohydrolase (ER4) Helix-destabilising single stranded DNA binding protein (Ssb) (ER2) DNA gyrase subunit A (EC 5.99.1.3) (ER2) DNA Polymerase I (EC 2.7.7.7) (ER2) DNA topoisomerase I (EC 5.99.1.2) (ER2) (ER2) Integration host factor, alpha subunit Sala_3155 Sala_1753 Sala_3108 Sala_1222 Sala_0642 Sala_1913 Sala_0785 Sala_0588 Sala_1625 Sala_0823 Sala_0955 Sala_2988 Sala_1164 Sala_0625 Sala_1211 Sala_1417 YP_617980 Sala_2942 Ribose-phosphate pyrophospho Ribose-phosphate YP_617980 Sala_2942 YP_615998 YP_615088 Sala_0947 TonB-dependent siderophore receptor (ER3) Sala_0029 TonB-dependent siderophore receptor (ER3) YP_617133 Sala_2091 TonB-dependent siderophore receptor (ER3) receptor siderophore TonB-dependent YP_617133 Sala_2091 YP_617877 Sala_2839 Dephospho-CoA YP_617877 Sala_2839 YP_616195 Sala_1146 Predicted small secreted protein, putative entericidin antidote to entericidin B (E A membrane lipoprotein, YP_617990 YP_616173 Sala_2952 Predicted ectoine hydroxylase (EC 1.17.-.-) (ER3) Sala_1124 Fumarylacetoacetate hydrolase domain-containing protein (ER4) 1.4 YP_618159 Sala_3123 Bifunctional purine biosynthesis protein purH [Includes: Phosphoribosylaminoimidazolecarboxamide YP_618191 YP_616799 YP_618145 YP_616270 YP_615696 YP_616958 YP_615838 YP_615642 YP_616671 YP_615874 YP_616006 YP_618026 YP_616213 YP_615679 YP_616260 YP_616464 YP_615215 Sala_0158 Inactivated superfamily I helicase domain protein (ER4) YP_616956 Sala_1911 GTP cyclohydrolase I (EC 3.5.4.16) (ER2)

L. Ting. UNSW 135 Chapter 4 0.06 0.17 0.16 0.16 0.08 0.08 0.01 0.08 0.01 0.07 0.06 0.18 0.08 0.17 0.18 0.08 0.20 0.19 0.04 0.10 0.11 0.17 0.20 0.06 0.09 0.07 0.19 4.E-03 2.2 1.7 0.06 1.8 1.3 1.3 1.3 1.3 1.7 1.7 2.1 23.6 1.4 1.4 1.4 1.5 1.5 1.5 1.6 1.6 1.6 1.6 2.8 1.3 1.4 1.4 1.6 1.7 2.7 t (EC 2.7.7.6) (ER2) Transcription (K) subunit (EC 2.7.7.6) (ER2) Cell wall/membrane/envelope biogenesis (M) Translation, ribosomal structure and biogenesis (J) protein, CspA-family homolog (ER2) (ER3) III (EC 3.1.11.2) (ER2) DNA-directed RNA polymerase, beta subuni t (EC 2.7.7.6) (ER2) DNA-directed RNA polymerase, alpha DNA-directed RNA polymerase beta-prime subuni Transcription factor NusA (ER2) Transcription antitermination protein NusG (ER2) Transcription termination factor Rho (ATP-dependent helicase rho) (ER2) Uncharacterised HTH-type transcriptional regulator (ER4) Cold shoc k Peptide chain release factor 2 (RF-2) (ER2) D (RNase D) (EC 3.1.13.5) (ER3) Ribonuclease E (RNase E) (EC 3.1.4.-) (ER3) Proline--tRNA ligase (EC 6.1.1.15) (ER2) Putative ribosomal N-acetyltransferase (ER3) DEAD box ATP-dependent RNA helicase (EC 3.6.1.-) (ER3) Translation initiation factor 2 (IF-2) (ER2) 50S ribosomal protein L28 (ER3) 30S ribosomal protein S17 (ER2) Polyribonucleotide nucleotidyltransferase (Polynucleotide phosphorylase) (PNPase) (EC 2.7.7.8) (ER2) Putative inhibited division (GidA) family protein (ER4) lipoprotein (OmpAOuter membrane peptidoglycan-associated family) (ER3) Lytic transglycosylase (ER3) UDP-4-amino-4-deoxy-L-arabinose--oxoglutarate aminotransferase (EC 2.6.1.-) (ER3) Outer-membrane lipoprotein carrier protein (ER3) Sala_2180 Sala_1486 Sala_0565 Sala_1485 Sala_0610 Sala_1444 Sala_2845 Sala_2675 Sala_1480 Sala_3177 Sala_2343 Sala_3172 Sala_1230 Sala_1431 Sala_1832 Sala_0612 Sala_2750 Sala_2809 Sala_0543 Sala_1159 Sala_1988 Sala_1732 Sala_1573 Sala_2670 YP_618044 Sala_3006 Putative (Endoribonuclease L-PSP family) (ER4) YP_617875 Sala_2837 Ribosome-associated stress response protein (YfiA/ProteinY (pY) and sigma54 homolog (RaiA domain) YP_617222 YP_616532 YP_615619 YP_616531 YP_615664 YP_616490 YP_617883 YP_617713 YP_616526 YP_618213 YP_617385 YP_618208 YP_616278 YP_616478 YP_616877 YP_615666 YP_617788 YP_617847 YP_615597 YP_616208 YP_617032 YP_616778 YP_616619 YP_617708 YP_617711 Sala_2673 Response regulator receiver protein (ER3) 1.4 YP_615294 YP_616609 Sala_0237 Outer membrane protein with OmpA domain (ER3) Sala_1563 UDP-glucose 6-dehydrogenase (EC 1.1.1.22) (ER2)

136 L. Ting. UNSW Mollecular mechanisms of cold adaptation in S. alaskensis 0.12 0.20 0.12 0.18 0.17 0.13 0.14 0.08 0.01 0.05 0.13 0.09 0.18 0.17 0.02 0.08 0.18 0.01 0.11 0.13 0.01 0.11 0.08 3.E-04 3.E-04 1.4 1.8 1.8 2.4 3.3 4.1 1.7 1.3 1.7 1.8 1.4 1.5 1.5 2.9 1.9 3.2 1.5 1.3 13.9 7.8 1.7 Cell motility (N) ing, secretion and vesicular transport (U) ing, secretion and vesicular transport (U) Defense mechanisms (V) Signal transduction mechanisms (T) Intracellular traffic k Cell cycle control, cell division, chromosome partitioning (D) protein, IbpA homolog (HSP20 family) (ER2) Posttranslation modifications, protein turnover and chaperones (O) UDP-glucuronate decarboxylase (EC 4.1.1.35) (ER2) Periplasmic C-terminal processing peptidase (EC 3.4.21.102) (ER2) Outer membrane OmpA family protein (ER2) UDP-glucose 4-epimerase (EC 5.1.3.2) (ER3) Putative secreted polysaccharide export protein (ER3) OmpR/PhoB family response regulator with CheY-like receiver and winged-helix DNA-binding domains (ER2) GTP-binding protein TypA/BipA homolog (ER2) Predicted protein subunit YajC (ER3) Preprotein translocase subunit SecB (ER2) 3-mercaptopyruvate sulfurtransferase (EC 2.8.1.2) (ER2) Maf protein (ER2) Sala_1566 Sala_0574 Sala_3101 Sala_1585 Sala_1928 Sala_0801 Sala_2303 Sala_1715 Sala_2015 Sala_3061 Sala_1915 YP_616140 YP_616495 Sala_1090 YP_616590 Putative beta-lactamase (EC 3.5.2.6) (ER3) Sala_1449 YP_615526 Multidrug resistance protein AcrB (ER2) Sala_1544 6-aminohexanoate-dimer hydrolase (EC 3.5.1.46) (ER3) Sala_0472 Multidrug resistance protein MdtB (multidrug transporter) (ER2) 1.5 3.5 1.7 YP_616176 Sala_1127 Predicted cell division ATPase (ER3) 2.7 YP_617226 Sala_2184 Peroxiredoxin (EC1.11.1.15) (ER3) (EC1.11.1.15) Peroxiredoxin YP_617226 Sala_2184 YP_617868 Sala_2830 Small heat shoc k YP_616441 Sala_1394 Diguanylate cyclase (EC 2.7.7.65) (ER3) YP_617976 Sala_2938 Flagellin protein (ER2) YP_616612 YP_615628 YP_618138 YP_616631 YP_616973 YP_615854 YP_617345 YP_616761 YP_617059 YP_618098 YP_616960 YP_617101 Sala_2059 Chaperone protein DnaJ (ER2) YP_617416 YP_615308 Sala_2374 (ER2) Organic hydroperoxide resistance protein (EC 1.11.1.15) Sala_0252 GrpE adenine nucleotide exchange factor (ER2) YP_617303 Sala_2261 Protein-L-isoaspartate(D-aspartate) O-methyltransferase (EC 2.1.1.77) (ER2) YP_617289 Sala_2247 Predicted periplasmic HtrA/DegP-like serine protease (EC 3.4.21.107) (ER3)

L. Ting. UNSW 137 Chapter 4 0.14 0.14 0.16 0.10 0.03 0.06 0.05 0.10 0.12 0.12 0.19 0.18 0.17 0.06 0.08 0.02 0.02 0.04 0.11 0.20 0.12 0.16 0.14 0.06 0.06 0.14 0.17 0.16 0.11 0.05 1.9 1.6 1.3 1.4 1.4 1.5 1.5 1.6 1.8 3.7 1.3 1.4 1.9 1.9 1.9 2.0 2.2 2.2 2.9 3.7 5.9 21.7 9.1 5.2 6.0 3.5 1.7 1.6 3.8 3.5 nown (S) Function un k e TPP-binding General function prediction only (R) e e e nown function DUF336 e curved DNA-binding protein (ER3) eto reductase S-tranferase (EC 2.5.1.18) (ER3) Glutathione S-tranferase (EC 2.5.1.18) (ER3) Peptidyl-prolyl cis-trans isomerase (PPIase) (EC 5.2.1.8) (ER3) Thioredoxin (ER3) Chaperonin GroEL (ER2) (ER2) Chaperone ClpB DnaJ-li k Aspartate beta-hydroxylase domain-containing protein (EC 1.14.11.-) (ER3) Beta-lactamase-li k Conserved hypothetical protein 730 Small GTP-binding Conserved hypothetical protein, similar to SMa10599 Secreted hypothetical protein with surface antigen 2 domain (ER4) HAD-superfamily hydrolase subfamily IA, variant 3 Integral membrane protein with predicted TerC domain (ER4) Putative catechol 2,3-dioxygenase (EC 1.13.11.2) (ER3) Aldo/ k Thiamine pyrophosphate enzyme-li k Protein of un k Sala_1722 Sala_1476 Sala_2665 Sala_0452 Sala_0406 Sala_0404 Sala_3011 Sala_1293 Sala_0619 Sala_1705 Sala_2595 Sala_1092 Sala_2997 Sala_2846 Sala_3120 Sala_1661 Sala_0639 Sala_0106 YP_617316 Sala_2274 Hypothetical protein with TadE domain and 1 transmembrane (ER4) YP_617333 Sala_2291 Methyltransferase type 11 YP_615613 Sala_0559 Hypothetical protein Hypothetical YP_615613 Sala_0559 YP_615463 Sala_0408 Hypothetical protein Hypothetical YP_615463 Sala_0408 YP_616027 Sala_0977 Hypothetical protein Hypothetical YP_616027 Sala_0977 YP_616768 YP_616522 YP_617703 YP_615506 YP_615461 YP_615459 YP_618049 YP_616340 YP_615673 YP_616751 YP_617634 YP_616142 YP_618035 YP_617884 YP_618156 YP_616707 YP_615693 YP_615164 YP_617528 Sala_2488 RadC domain protein (ER4) YP_617745 Sala_2707 Putative signal-transduction protein with CBS domains YP_617424 Sala_2382 Alpha-2-macroglobulin-li k YP_617424 Sala_2382 YP_617554 YP_618181 Sala_2514 YP_615726 Hypothetical protein with Abi domain (ER4) Sala_3145 3-oxoacyl-[acyl-carrier protein] synthase Sala_0672 Putative acetyltransferase (ER3) YP_615639 Sala_0585 Peptidase M16-li k Peptidase YP_615639 Sala_0585

138 L. Ting. UNSW Mollecular mechanisms of cold adaptation in S. alaskensis 0.06 0.11 0.13 0.14 0.17 0.16 0.19 0.20 0.07 0.13 0.01 0.17 0.19 0.18 0.16 0.04 0.18 0.04 0.16 0.05 0.01 0.16 0.17 0.13 0.00 0.16 0.13 0.02 0.01 3.E-04 1.6 3.1 1.8 1.3 1.5 1.5 1.6 1.7 1.7 1.8 1.9 2.0 2.0 2.1 2.5 2.8 3.3 4.3 4.6 3.7 3.4 2.8 2.5 1.6 1.5 1.4 1.2 1.7 1.6 nown function UPF0040 nown function DUF152 nown function DUF11 nown function DUF28 k k Conserved hypothetical protein Uncharacterised conserved protein UCP033924 Conserved hypothetical protein Hypothetical protein Hypothetical protein with signal peptide and GumN domain (ER4) Conserved hypothetical protein Hypothetical protein containing a signal peptide (ER5) Protein of un k Conserved hypothetical protein a signal peptide (ER5) Hypothetical protein containing Hypothetical protein Hypothetical protein (ER5) Hypothetical protein containing carboxymuconolactone decarboxylase domain (ER4) Secreted hypothetical protein with SAF domain (ER4) Protein of un k Hypothetical protein containing a signal peptide (ER5) Sala_0039 Sala_1340 Sala_2712 Sala_1219 Sala_1615 Sala_2734 Sala_0409 Sala_1890 Sala_0104 Sala_1951 Sala_1746 Sala_0100 Sala_2163 Sala_1130 Sala_0485 Sala_2471 YP_617077 Sala_2033 Hypothetical protein Hypothetical YP_617077 Sala_2033 YP_615098 YP_616387 YP_617750 YP_616267 YP_616661 YP_617772 YP_615464 YP_616935 YP_615162 YP_616996 YP_616792 YP_615158 YP_617205 YP_616179 YP_615539 YP_617511 YP_615997 YP_616403 Sala_0946 Hypothetical protein containing 2 TM helices, a PepSY-associated TM helix and PiuB domain (ER4) Sala_1356 Conserved hypothetical protein 3.6 YP_615441 Sala_0385 Hypothetical protein YP_615083 Hypothetical YP_615441 Sala_0385 Sala_0024 Hypothetical protein containing DUF583 domain (ER4) YP_615573 Sala_0519 Conserved hypothetical protein YP_616374 Sala_1327 Protein of un of YP_617482 Protein YP_616374 Sala_1327 Sala_2442 Putative copper efflux system periplasmic protein CusF YP_615393 Sala_0337 Protein of un of Protein YP_615393 Sala_0337 YP_615990 Sala_0939 Conserved hypothetical protein YP_615513 Sala_0459 Hypothetical protein YP_616023 Hypothetical YP_615513 Sala_0459 Sala_0973 Hypothetical protein containing a signal peptide and fasciclin domain (ER4) YP_616252 Sala_1203 Hypothetical protein Hypothetical YP_616252 Sala_1203 YP_615996 Sala_0945 Hypothetical protein Hypothetical YP_615996 Sala_0945

L. Ting. UNSW 139 Chapter 4 h 0.54 0.71 0.62 0.18 0.59 0.45 0.40 FC FDR 1.0 0.70 1.1 1.0 1.0 1.4 1.1 1.1 1.2 1.3 0.14 1.0 0.68 1.5 0.01 1.1 0.67 1.3 1.2 0.15 0.28 1.3 0.17 1.5 0.06 1.2 0.40 N ratios increased towards 15 N/ 14 4.1.13) (ER2) 1.3 0.29 , 2009). et al. Lipid transport and metabolism (I) Energy production and conversion (C) Amino acid transport and metabolism (E) Carbohydrate transport and metabolism (G) hase, alpha subunit (NADPH-GOGAT) (EC 1. ed from 1-5 (Allen k N ratios increased towards 10ºC. ER; the in-house Evidence Rating system used for manual functional Acetyl-CoA C-acetyltransferase (Acetoacetyl-CoA thiolase) (EC 2.3.1.9) (ER2) Phosphoenolpyruvate carboxylase (PEPCase) (PEPC) (EC 4.1.1.31) (ER3) Isocitrate dehydrogenase (NADP) (EC 1.1.1.42) (ER2) Aconitate hydratase 1 (EC 4.2.1.3) (ER2) Pyrroline-5-carboxylate reducatse (P5CR) (EC 1.5.1.2) (ER3) Dipeptidyl carboxydipeptidase (EC 3.4.15.1) (ER2) Dipeptidyl carboxydipeptidase (EC 3.4.15.1) (ER2) 15 N/ 14 Sala_0781 Sala_3081 Sala_1964 Sala_2096 Sala_0045 Sala_0456 Sala_3161 N fold change. FDR; false discovery rate expressed as a q-value. Unshaded proteins had as a q-value. N fold change. FDR; false discovery rate expressed 15 N/ 14 YP_615580 YP_615233 Sala_0526 YP_617330 Pyruvate dehydrogenase E1 component, beta subunit (EC 1.2.4.1) (ER2) Sala_0176 Succinyl-CoA synthetase, beta subunit (EC 6.2.1.5) (ER2) Sala_2288 YP_618069 ATP synthase F1, beta subunit (EC 3.6.3.15) (ER2) Sala_3032 Succinate dehydrogenase flavoprotein subunit (SdhA) (EC 1.3.99.1) (ER2) YP_615235 Sala_0178 Electron transfer flavoprotein (Etf), beta-subunit (ER2) YP_615579 Sala_0525 Pyruvate dehydrogenase E1 component, alpha subunit (EC 1.2.4.1) (ER2) YP_617182 Sala_2140 Glutamate synt Glutamate YP_617182 Sala_2140 YP_617022 YP_615612 Sala_1978 Secreted prolyl oligopeptidase (Prolyl endopeptidase) (EC 3.4.21.26) (ER2) Sala_0558 YP_616517 Alanine dehydrogenase (EC 1.4.1.1) (ER2) Sala_1471 Acetolactate synthase, small subunit (ilvH) (EC 2.2.1.6) (ER2) 1.3 0.24 YP_615735 Sala_0681 Gamma-glutamyltransferase (EC2.3.2.2) (ER2) ACCESSION LOCUS PROTEIN ACCESSION LOCUS PROTEIN YP_617297 YP_615834 Sala_2255 Acetoacetyl-CoA reductase (EC 1.1.1.36) (ER2) YP_618118 YP_617009 YP_617138 YP_615104 YP_615510 YP_618197 YP_616366 YP_615577 Sala_1319 Glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12) (ER2) Sala_0523 Enolase (2-phosphoglycerate dehydratase) (EC 4.2.1.11) (ER2) 2.0 1.2 2.8E-04 0.13 protein. FC; annotation of proteins, proteins were ran 30ºC, shaded proteins had Table 4.4. Proteins identified in all 20 MS experiments sorted into COG categories. There were 65 proteins identified in all 20 MS experiments. Accession number refers to the RefSeq accession number for eac

140 L. Ting. UNSW Mollecular mechanisms of cold adaptation in S. alaskensis 0.33 0.12 0.70 0.36 0.24 0.17 0.08 0.08 0.16 0.48 0.53 0.44 0.06 2.4E-03 3.7E-03 1.7 0.06 1.2 1.3 1.0 1.6 1.2 1.2 1.4 1.3 1.3 1.3 1.7 1.1 1.1 1.1 1.4 1.0 0.71 1.1 0.60 1.2 0.22 1.1 0.50 1.1 0.57 me subunit (EC 2.7.7.6) (ER2) subunit (EC:2.7.7.6) (ER2) Transcription (K) Nucleotide transport and metabolism (F) Coenzyme transport and metabolism (H) Replication, recombination and repair (L) Replication, recombination and repair (L) Inorganic ion transport and metabolism (P) Inorganic ion transport and metabolism (P) Translation, ribosomal structure and biogenesis (J) homolog (RaiA domain) (ER3) TonB-dependent receptor precursor with predicted cobalamin (vitamin B12) specificity (BtuB homolog) (ER3) TonB-dependent receptor precursor with predicted cobalamin (vitamin B12) specificity (BtuB homolog) (ER3) Predicted phosphate-specific transport protein pstS homolog (ER3) TonB-dependent siderophore receptor precursor (ER3) MotA/TolQ/ExbB proton channel family protein (ER3) Preprotein translocase SecA subunit (ER2) Helix-destabilising single stranded DNA binding protein (Ssb) (ER2) Transcription factor NusA (ER2) DNA-directed RNA polymerase beta-pri DNA-directed RNA polymerase, beta Transcription termination factor Rho (ATP-dependent helicase rho) (ER2) 30S ribosomal protein S1 (ER2) 30S ribosomal protein S3 (ER2) 50S ribosomal protein L25 (General stress CTC) (ER2) Ribonuclease D (RNase D) (3.1.13.5) (ER3) Sala_1228 Sala_3108 Sala_0826 Sala_1913 Sala_0807 Sala_1817 Sala_2988 Sala_0610 Sala_1485 Sala_1486 Sala_2845 Sala_1495 Sala_2812 Sala_1617 Sala_2343 YP_618058 YP_617821 Sala_3020 Leucyl-tRNA synthetase (EC 6.1.1.4) (ER2) Sala_2783 Phenylalanyl-tRNA synthetase, beta subunit (EC 6.1.1.20) (ER3) YP_617858 Sala_2820 Elongation factor Tu (EF-Tu) (ER2) YP_617859 Sala_2821 Elongation factor G (EF-G) (ER2) YP_616118 Sala_1068 (EC 4.3.2.2) (ER2) 1.2 0.37 YP_617875 Sala_2837 (pY) and sigma54 Ribosome-associated stress response protein (YfiA/ProteinY YP_617412 Sala_2370 Thiamine biosynthesis protein ThiC (ER2) YP_616276 YP_618145 YP_615877 YP_616958 YP_615860 YP_616862 YP_618026 YP_615664 YP_616531 YP_616532 YP_617883 YP_616541 YP_617850 YP_616663 YP_617385

L. Ting. UNSW 141 Chapter 4 0.04 0.40 0.11 0.02 0.02 0.52 0.30 0.43 0.62 3.0E-04 3.3E-04 2.7E-04 1.6 1.1 1.3 2.4 1.7 1.6 1.5 1.2 1.3 1.1 1.1 2.0 1.1 0.56 1.2 0.33 1.2 0.33 1.5 0.08 1.3 1.1 1.1 0.33 0.54 0.52 e receiver and winged-helix nown) (ER4) k k nown (S) tase (NAD(P)H nitroreductase) (ER3) 1.1 0.62 e serine protease (EC 3.4.21.107) (ER3) 1.3 0.08 Cell motility (N) Function un k Signal transduction mechanisms (T) General function prediction only (R) e (ER4) Cell wall/membrane/envelope biogenesis (M) protein DnaK (ER2) protein DnaK (ER2) Posttranslational modification, protein turnover, chaperones (O) Polyribonucleotide nucleotidyltransferase (Polynucleotide phosphorylase) (PNPase) (EC 2.7.7.8) Outer membrane lipoprotein Omp16 homolog (OmpA family, PAL subfamily) (ER2) Outer membrane peptidoglycan-associated lipoprotein (OmpA family) (ER3) Outer membrane OmpA family protein (ER2) OmpR/PhoB family response regulator with CheY-li DNA-binding domains (ER2) Chaperone ClpB (ER2) Chaperonin GroEL (ER2) Predicted outer membrane assembly factor precursor YaeT (Omp85 family) (ER3) Predicted M28 family peptidase (Clan MH, family M28) (ER3) Hypothetical protein containing a signal peptide (ER5) Hypothetical protein containing a signal peptide (ER5) Hypothetical protein containing a signal peptide (ER5) Sala_0543 Sala_0327 Sala_1988 Sala_3101 Sala_0801 Sala_0406 Sala_0452 Sala_1952 Sala_1031 Sala_0494 Sala_1694 Sala_1951 YP_616041 Sala_0991 LemA protein (uncharacterised protein, function un YP_617112 Sala_2070 Predicted FMN-dependent nitroreduc FMN-dependent Predicted YP_617112 Sala_2070 YP_617782 YP_617300 Sala_2744 Putative outer membrane protein with a periplasmic TonB box domain (ER4) Sala_2258 Putative glutamate dehydrogenase (NAD-specific) (EC 1.4.1.2) (ER3) 1.1 0.42 YP_617429 Sala_2387 Peptidase M16-li k Peptidase YP_617429 Sala_2387 YP_615597 YP_615383 YP_617032 YP_618138 YP_615854 YP_615461 YP_615506 YP_616997 YP_616081 YP_615548 YP_616740 YP_616996 YP_617976 Sala_2938 Flagellin protein (ER2) YP_615457 Sala_0402 Heat shoc k Heat shoc k YP_615457 Sala_0402 YP_615851 Heat YP_617100 Sala_2058 Sala_0798 (ER2) ATP-dependent Lon serine protease S16 (EC 3.4.21.53) YP_617289 Sala_2247 Predicted periplasmic HtrA/DegP-li k

142 L. Ting. UNSW Molecular mechanisms of cold adaptation in S. alaskensis

4.4. Discussion

The proteins determined to be differentially abundant (q < 0.2) at 10ºC vs. 30ºC provided a snapshot of the molecular mechanisms of cold adaptation in S. alaskensis. Functional categories with interesting thermally-induced biological effects include lipid transport and metabolism, the structure of the cell membrane, energy generation, carbohydrate metabolism, amino acid metabolism, transcription, translation, protein folding, detoxification and secretion (Figure 4.1 and Figure 4.2).

Figure 4.2. Cellular processes important for cold adaptation in S. alaskensis. The numbers in parentheses represent proteins with significantly decreased in abundance at 10ºC compared with 30ºC, and the number of proteins with significantly increased abundance at 10ºC vs. 30ºC, respectively.

4.4.1. Lipid transport and metabolism and the cell membrane

The enzymes of the fatty acid degradation pathway (-oxidation) were consistently increased at low growth temperature, which could affect the efficiency of energy generation or supply of substrate for polyhydroxyalkanoate (PHA) storage material synthesis in the cold (Table 4.3 and Figure 4.1). Also, the degradation of the lipid membrane would allow for membrane restructuring to occur. Overall, the breakdown of

L. Ting, UNSW. 143 Chapter 4

lipids is an important adaptation to the cold that affects other metabolic pathways or cell structures dependent on the end products of -oxidation.

4.4.1.1. Lipid metabolism, possible energy generation and storage materials

In the -oxidation pathway, the -ketothiolase (Sala_2162), acyl-CoA dehydrogenase (Sala_1105), 3-hydroxyacyl-CoA dehydrogenase (Sala_3157) and acetyl-CoA C- acyltransferase (Sala_2162, Sala_3158) enzymes for lipid degradation had significantly increased abundance at 10ºC (Table 4.3 and Figure 4.3). The acetyl-CoA end product of lipid degradation can enter the TCA cycle as a specific component of the energy generation process. This is particularly relevant because the oxidation of fatty acids produces more energy per carbon atom than the oxidation of carbohydrates. It is probable that free fatty acids, and possibly fatty acids from the cell membrane, are utilized for efficient energy generation an adaptation to the challenge of the cold (Section 4.4.1.2). The enzymes for lipid degradation also have a role in the propionate pathway, where propionyl-CoA is the central metabolic intermediate. Propionyl-CoA is formed as an end product of -oxidation of fatty acids with an odd number of carbons. It is also a product of the metabolism of valine, leucine and isoleucine, and -alanine. Since propionyl-CoA is a three-carbon fatty acid, it is unable to enter the TCA cycle directly, and must first be carboxylated to S-methylmalonyl-CoA by propionyl-CoA carboxylase (Sala_1223). The abundance ofthe latter was1.8-fold increased at 10 C (Table 4.3). The enzymes responsible for the subsequent isomerisation and final conversion of methylmalonyl-CoA to succinyl-CoA were also increased at 10ºC, and even though they were above the q < 0.2 threshold of statistical significance, their increased abundance is consistent with the general increase of lipid degradation. Succinyl-CoA is an end product of the propionate pathways and, together with the acetyl-CoA end product of -oxidation, they would anapleurotically replenish the TCA cycle for energy generation. Also, the enzymes for fatty acid degradation overlap with some of the enzymes involved in leucine, isoleucine and valine degradation (Section 4.4.4.4). Hydroxyacyl-CoA is a mid-pathway product of -oxidation and is the central substrate for PHA synthesis (Section 4.4.3.2). The increase in the lipid degradation pathway that provides PHA substrate is consistent with an increase in PHA synthesis

144 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

during 10ºC growth. Further, the energy generating membrane-bound components of the electron transport chain were generally increased at 30ºC (Section 4.4.2.2).

4.4.1.2. Lipid metabolism and cold adaptation of the cell membrane A change in the physiology of the lipid membrane is a well-documented structural adaptation to the cold in archaea, bacteria and eukarya (Russell, 1997; Russell & Nichols, 1999; Weber et al., 2001; Nichols et al., 2004). Most commonly, cold adaptive changes to the membrane are characterised by the desaturation and reduction of fatty acid chain length and increased branched methyl groups on the fatty acid chain to regulate membrane fluidity (Russell & Nichols, 1999). There are three desaturase enzymes in the S. alaskensis genome; however, these enzymes were either not identified from the proteomics experiments (Sala_0465 and Sala_3136), or identified by a single peptide (Sala_2178, LVANLVAGSAR). The lack of identification of the desaturases indicated that the cold-induced modification of lipids by desaturases may not occur at 10ºC. It is possible that desaturases may be more responsive when the cells experience cold shock, where the existing lipid composition must be modified. In the current study, cells were acclimatised to cold growth, thus, it is possible that de novo synthesis of shorter unsaturated lipids occurs at 10ºC as the first committed and irreversible step of fatty acid biosynthesis, involving the biotin- dependent acetyl-CoA carboxylase enzyme (Sala_1796). Sala_1796 and its biotin carboxyl carrier protein (Sala_1438) and biotin synthase (Sala_1222), were significantly increased at 10ºC (Table 4.3), which provides strong evidence for this hypothesis. Acetyl-CoA C-acyltransferase (Sala_2162 and Sala_3158) was significantly more abundant at 10 C (Table 4.3). It catalyses the fourth and final step of -oxidation and is also involved in polyunsaturated fatty acid biosynthesis. The increase of polyunsaturated fatty acids are a well-documented adaptation to the cold (Russell & Nichols, 1999). Also, a peptidoglycan lytic transglycosylase (Sala_1732), responsible for the degradation of cell wall material and recycling of the membrane in bacteria (Dijkstra & Keck, 1996; Blackburn & Clarke, 2001) and bacteriophage (Blackburn & Clarke, 2001), was more abundant at 10ºC (Table 4.3). The increased abundance of these two proteins at 10ºC confirmed that membrane restructuring was occurring as a cold adaptive response.

L. Ting, UNSW. 145 Chapter 4

Figure 4.3. Lipid metabolism energy generation and PHA storage in S. alaskensis. The degradation of lipids by -oxidation is increased at 10ºC, while the generation of energy is generally increased at 30ºC (also see Section 4.4.2). A by-product of -oxidation is hydroxyacyl-CoA, which is the substrate for PHA synthesis. The end products of -oxidation can enter the TCA cycle and the electron transport chain as a non-carbohydrate source of substrate. Pink shading represents enzymatic reactions increased at 30ºC, and blue shading represents reactions increased at 10ºC. ETFred, electron transfer flavoprotein reducing; ETFox, electron transfer flavoprotein oxidizing. I, II, II and IV refer to the electron transport chain complexes I to IV; complexes I, III and IV are proton pumps, as indicated by the intracellular small black arrow. UQ, ubiquinone. Cty c, cytochrome c. ? represents a possible source of fatty acyl chains from the cell membrane. Free, represents a source of free fatty acyl chains from the cytoplasm.

The abundance of phosphatidylserine decarboxylase (Sala_1963) and glycerol kinase (Sala_1948) were significantly increased at 10ºC (Table 4.3). Phosphatidylserine

146 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

decarboxylase catalyses the synthesis of phosphatidylethanolamine, a key aminoglycerophospholipid in cell membranes (Voelker, 1997).Glycerol kinase phosphorylates glycerol into glycerol-3-phosphate, which forms the backbone of all phospholipids and is a major component of biological membranes (Morbidoni et al., 1995; Lemieux et al., 2004). Glycerol is either sourced from outside of the cell or from the degradation of monoacylglycerol – a key lipid constituent of the bacterial cell wall (Goldfine, 1982). As it was not provided in the growth medium, the only source of glycerol must be from monoacylglycerol. The latter is broken down into its glycerol and fatty acid components by a monoacylglycerol (Sala_0197); however this enzyme was identified by only a single peptide in a single experiment (DRAPEIGLPLLLQHGAEDR), indicating that it was present at very low levels. Regardless of this, the increased abundance of phosphatidylserine decarboxylase and glycerol kinase provides further evidence to suggest that at 10ºC the membrane of S. alaskensis is being degraded and restructured.

4.4.2. Carbohydrate metabolism and energy generation

The majority of proteins with differential abundance in carbohydrate metabolism were from the 30ºC growth condition (Figure 4.1). The over-representation of proteins at 30ºC is consistent with the increased demand for energy and substrate for biomass synthesis due to an increased growth rate at this temperature.

4.4.2.1. Glycolysis using the Entner-Doudoroff pathway The Emden-Meyerhoff pathway for glycolysis is incomplete in S. alaskensisas 6-phosphofructokinase, responsible for the phosphorylation of fructose-6-phosphate to fructose-1,6-bisphosphate, is absent from the genome (Williams et al., 2009). As an alternative, glucose can be metabolised using the Entner-Doudoroff (ED) glycolytic pathway (Figure 4.4). Glucokinase (Sala_1110) and 6-phosphogluconate dehydrogenase (Sala_0191), two enzymes in the Entner-Doudoroff pathway, were significantly increased at 30ºC (Table 4.3 and Figure 4.4). Glucokinase is particularly interesting because it is the first committed step in the Entner-Doudoroff pathway, due to the requirement of ATP for the phosphorylation of glucose to glucose-6-phosphate. Finally, the remaining glycolytic enzymes; triosephosphate isomerase (Sala_1172), fructose bisphosphate aldolase (Sala_1317), glyceraldehyde-3-phosphate dehydrogenase and enolase (Sala_0523) had increased abundance at 30ºC (Table 4.3 and Figure 4.4).

L. Ting, UNSW. 147 Chapter 4

4.4.2.2. Energy generation via the TCA cycle and the electron transport chain The energy generation COG category had an over-representation of proteins with increased abundance at 30ºC (Figure 4.1). This is closely linked with the increased requirement demands on carbohydrate metabolism at 30ºC. The TCA cycle enzymes dihydrolipoamide succinyltransferase (Sala_2227 and Sala_1235), succinyl-CoA synthetase (Sala_0176), and malate dehydrogenase (Sala_2230) were significantly increased at 30ºC (Table 4.3 and Figure 4.5), as were three components of the electron transport chain (Figure 4.3 and Section 4.4.14).

Figure 4.4. Glycolysis in S. alaskensis. Glycolysis by the Embden-Meyerhof pathway is not possible due to the absence of 6PFK in S. alaskensis. Instead, glycolysis occurs via the Entner-Doudoroff pathway. Text in blue or pink represent enzymes significantly increased at 10ºC and 30ºC, respectively. X represents an enzyme absent from the S. alaskensis genome; represents an enzyme identified in all 20 MS experiments; Z represents a predicted spontaneous non-enzymatic reaction. 6PDH; 6-phosphogluconate dehydrogenase (EC 4.2.1.12) 6PFR, 6-phosphofructokinase (EC 2.7.1.11); BPA, bisphosphate aldolase (EC 4.1.2.13); EN, enolase (EC4.2.1.11); G3DH, glyceraldehyde 3-phosphate dehydrogenase; TPI, triosephosphate isomerase (EC 5.3.1.1).

148 L. Ting, UNSW. Mollecular mechanisms of cold adaptation in S. alaskensis 3 GS glutamate NH cantly increased cantly 3

GDH NH cycle glutamine Glutamate Fatty acid degradation GOGAT GOGAT 2 x glutamate isocitrate

CoA CoA IDH 2-oxoglutarate DSCS succinyl- ACON SCS acetyl-CoA acetyl-CoA glyoxylate acetyl-CoA acetyl-CoA Amino acid biosynthesis glutamate PDH SDH MDH oxaloacetate citrate X ARG succinate malate fumarate ACOAT ALT X AST AST

X ALT 2-oxo- argininosuccitate argininosuccitate glutarate Glycolysis glutamate ADH alanine pyruvate asparate 3 X ASS ASD arginine citrulline arginine NH AK Alanine metabolism DAPDC BAPT BAPT bisoynthesis Peptidoglycan lysine amino acids 4-aspartyl- phosphate Aspartate family -alanine malonate

semialdehyde ASADH CoA bisoynthesis THDPST Pyrimidine metabolism

aspartyl- degradation 6-oxopimelate 4-semialdehyde succinyl-2-amino- Valine and isoleucine Valine Propanoate metabolism HSDH Homoserine biosynthesis homeserine Serine, glycine, theronine metabolism theronine at 10ºC and 30ºC, respectively. Dotted line represents a connection of compound to another metabolic pathway; X an enzyme at 10ºC and 30ºC, respectively. acetylornithine transaminase (EC ACOAT, ed in all 20 MS experiments. absent from the S. alaskensis genome;represents an enzyme identifi AK; aspartate kinase ADH, alanine dehydrogenase (EC 1.4.1.1); ACON, aconitate hydratase (EC 4.2.1.3); 2.6.1.11); Figure 4.5. Carbon and nitrogen metabolism in S. alaskensis. The glutamate cycle is central to the assimilation of nitrogen, which is required Figure 4.5. Carbon and nitrogen metabolism in S. alaskensis. in blue or pink represent enzymes pathways signifi Text and peptidoglycan biosynthesis. for amino acid, nucleotide, CoA

L. Ting. UNSW 149 Chapter 4 ASADH, aspartate semialdehyde dehydrogenase (EC 1.2.1.11); ASD, aspartate-4-decarboxylase (EC 4.1.1.12); AST, aspartate transaminase AST, ASD, aspartate-4-decarboxylase (EC 4.1.1.12); aspartate semialdehyde dehydrogenase (EC 1.2.1.11); β-alanine-pyruvate transaminase (EC 2.6.1.18); DAPDC, diaminopimelate decarboxylase 4.1.1.20); DSCS, (EC 2.6.1.1); BAPT, glutamate synthetase (EC 2.3.1.61); GDH, glutamate dehydrogenase 1.4.1.2); GOGAT, dihydrolipoyllysine-residue succinyl-CoA synthetase (EC1.4.1.13); GS, glutamine (EC 6.3.1.2); HSDH, homoserine dehydrogenase 1.1.1.3); IDH, isocitrate dehydrogenase (EC 1.1.1.42); MDH, malate 1.1.1.37); PDH, pyruvate 1.2.4.1); SDH, succinate 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate THDPST, synthetase (EC 6.2.1.5); dehydrogenase (EC 1.3.99.1); SCS, succinyl-CoA N-succinyltransferase (EC 2.3.1.117). (EC 2.7.2.4); ALT, alanine transaminase (EC 2.6.1.2); ARG, arginase (EC 6.3.4.5); ASS, argininosuccinate synthase (EC 6.3.4.5); ASS, argininosuccinate (EC 6.3.4.5); ARG, arginase alanine transaminase (EC 2.6.1.2); ALT, (EC 2.7.2.4);

150 L. Ting. UNSW Molecular mechanisms of cold adaptation in S. alaskensis

4.4.3. Synthesis of storage materials

S. alaskensis preferentially utilises amino acids as a source of carbon and nitrogen (Schut, 1994; Schut et al., 1995). Exogenous glucose (even when provided as the only carbon source) is taken up by the cell but not utilised immediately as a carbon or energy source, as evidenced by a lack of CO2 liberation (Schut, 1994; Schut et al., 1995). Instead, glucose is polymerised into a polysaccharide (and probably slowly released for metabolism, as required), and a second crystalline storage material was also detected in the same study (Schut et al., 1995). Rhizobia synthesise similar storage materials when under carbon excess and limitation of a range of other nutrients including nitrogen, phosphate or iron. These storage materials were found to be exopolysaccharides (EPS) and PHA (Zevenhuizen, 1971; Patel & Gerson, 1974; Ghai et al., 1981). When carbon is exhausted in the media, PHAs are preferentially utilised as a carbon source, and exopolysaccharides are not metabolised (Zevenhuizen, 1981). When microorganisms experience carbon excess and limitation of nitrogen, phosphate, magnesium, iron, or oxygen; the carbon taken up by the cells can be converted into EPS (Dawes & Senior, 1973; Repaske & Repaske, 1976; Ward et al., 1977; Taurhesia & McNeil, 1994; Reeslev et al., 1996), (Taylor et al., 1998) and PHA reserves (Dawes & Senior, 1973; Jackson & Dawes, 1976; Sudesh et al., 2000).

4.4.3.1. Carbohydrate storage: Synthesis of EPS The polysaccharide material detected by Schut et al. (1995) cannot be because the pathway is absent in S. alaskensis. It is possible that a component of the polysaccharide material is xylose because the abundance of UDP-glucuronate decarboxylase (Sala_1566) was significantly greater at 10ºC (Table 4.3). UDP- glucuronate decarboxylase converts UDP-glucuronate to the nucleotide sugar, UDP- xlyose, indicating that there was an increased production of UDP-xylose at low temperature. UDP-xylose is a sugar donor for poly- and oligosaccharides, proteoglycans and glycoproteins in eukaryotesincluding yeast (Moriarity et al., 2002) and the pathogenic fungus Cryptococcus neoformans (Bar-Peled et al., 2001). The biosynthesis of xylose is particularly well-studied in plants such as barley (Zhang et al., 2005), soybean (Hayashi et al., 1988) tobacco (Bindschedler et al., 2005) and Arabidopsis species (Harper & Bar-Peled, 2002). In bacteria, nucleotide sugars are precursors for the

L. Ting, UNSW. 151 Chapter 4

synthesis of EPS, capsular polysacchrides, glycosphingolipidsand biofilm material, which are usually composed of sugar residues such as glucose, galactose, , glucan, maltose, glucuronate, and rhamnose (Sutherland, 1985; Roberts, 1996; Sutherland, 2001). A search of the literature revealed that there are currently only two other bacterial species with evidence of de novo xylose biosynthesis by UDP- glucuronate decarboxylase; Mycobacterium marinum (Ren et al., 2007) and Bordatella bronchiseptica (Irie et al., 2006). The enzymes for the conversion of exogenous glucose to glucose-6-phosphate, the central metabolite for nucleotide sugar synthesis, were found to be present in the genome. Glucose-6-phosphate can subsequently be converted to either fructose-6- phosphate, to ultimately produce GDP-mannose; or GDP-glucose to produce UDP- galactose, UDP-glucuronate, or UDP-xylose. A defining characteristic of organisms is the distinct lack of lipopolysaccharide, which is replaced by glycosphingolipids (Kawahara et al., 1991; White et al., 1996; Batrakov et al., 1999). The side chains are positioned facing the extracellular environment, while the are embedded in the membrane (Kawahara et al., 1999). Most organisms in the genera produce capsular polysaccharides (Yamazaki et al., 1996; Harding et al., 2004). S. alaskensis was recently reclassified into the Sphingopyxis genus from Sphingomonas (Cavicchioli et al., 2003; Godoy et al., 2003), so it is not surprising to also see the presence of the pathway for capsular polysaccharide production. EPS may also provide the ability to localise extracellular enzyme activity, enhance survival and persistence of cells in extreme environments (Decho, 2000). Most significantly, EPS appear to have a role in cryoprotection, having been found in brine channels of sea ice (Decho, 2000; Krembs et al., 2002), and identified in many cold adapted marine bacteria such Pseudoalteromonas antarctica (Nevot et al., 2006), P. profundum (Lauro et al., 2008), Pseudoaltermonas haloplanktis (Corsaro et al., 2004), and a range of Antarctic marine Gammaproteobacteria (Mancuso Nichols et al., 2005).

4.4.3.2. PHA: a carbon and energy source The unidentified crystalline material detected by Schut et al. (1995) is likely to consist of hydrophobic PHA granules. Detected in a wide range of Gram negative and positive bacteria and archaea, PHAs are intracellular storage polyesters deposited as intracellular

152 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

insoluble granules that are used as a carbon and energy source when exogenous nutrients are exhausted (Rehm & Steinbuchel, 1999). In separate studies, S. alaskensis was reported to accumulate PHA while growing on a glucose medium (Godoy et al., 2003) and to immediately polymerise glucose instead of metabolising it (Schut, 1994; Schut et al., 1995). The endogenous pre-existing PHAs are the most likely substrate source for carbon and energy requirements, while the cell polymerises newly uptaken glucose for EPS, glycosphingolipids and possibly de novo PHA synthesis. Hydroxyacyl-CoA, the central substrate for PHA synthesis, and can be supplied from a number of central metabolic pathways including fatty acid metabolism and biosynthesis, carbohydrate metabolism, amino acid metabolism, propanoate metabolism and butanoate metabolism (Rehm & Steinbuchel, 1999). PhaG transacylase (EC 2.4.1.-), the enzyme that links fatty acid biosynthesis to PHA synthesis, is absent from the S. alaskensis genome, so that the supply of hydroxyacyl-CoA for PHA synthesis cannot enter via fatty acid synthesis. Apart from PhaG, S. alaskensis possesses all of the genes in the PHA anabolic and catabolic pathways (Table 4.5). The increased abundance of the enzymes in the -oxidation pathway at 10ºC would result an increase in the supply of hydroxyacyl-CoA substrate required for PHA synthesis (Section 4.4.1.1), consistent with the 2.4-fold increase of phasin PhaP (Sala_0504) at 10 C (Table 4.5). Phasins are the major PHA granule-associated protein and as PHA chains are elongated, phasin proteins and a phospholipid monolayer coat the surface of PHA to form granules in Ralstonia eutropha (York et al., 2001) (the model organism for PHA synthesis), Paracoccus denitrificans (Maehara et al., 2002) Bacillus megaterium (Griebel et al., 1968) and various strains of Pseudomonas (Brandl et al., 1988; Valentin et al., 1994). PHA synthesis is regulated by the DNA-binding polyhydroxyalkanoate synthesis regulator (PhaR). Under conditions unfavourable for PHA synthesis, PhaR autoregulates its own expression and represses PhaP expression by binding to the DNA regions directly upstream of the phaR and phaP genes in E. Coli (Maehara et al., 1999), R. eutropha (Potter et al., 2002), and P. denitrificans (Maehara et al., 2002). Under conditions favouring PHA synthesis, PhaR recognises and binds directly to PHA polymers as they are synthesised. Repression of PhaP is lifted and phasins are synthesised (Maehara et al., 1999; Potter et al., 2002).

L. Ting, UNSW. 153 Chapter 4

Table 4.5. Proteins associated with polyhydroxyalkanoate storage material in S. alaskensis. Pha EC FDR Accession Locus tag Description FC Pathway homologue number (q-value) YP_618194 Sala_3158 Beta-ketothiolase PhaA 2.3.1.16 1.7 0.14 BO YP_617297 Sala_2255 Acetoacetyl-CoA PhaB 1.1.1.36 1.5 0.007 BM reductase YP_616941 Sala_1896 Acetoacetyl-CoA reductase PhaB 1.1.1.36 0.8 0.34 BM YP_618105 Sala_3068 3-hydroxyacyl-CoA PhaB 1.1.1.35 1.0 0.70 BO dehydrogenase YP_618126 Sala_3089 3-hydroxyacyl-CoA PhaB 1.1.1.35 0.9 0.45 BO dehydrogenase YP_615559 Sala_0505 PHA synthase PhaC 2.3.1.- 0.9 0.42 PHA - - 3-hydroxyacyl-ACP-CoA PhaG 2.4.1.- - - PHA transacylase YP_618091 Sala_3054 Enoyl-CoA hydratase PhaJ 4.2.1.17 1.3 0.22 BO YP_615558 Sala_0504 Phasin PhaP NA 2.4 0.002 PHA YP_616277 Sala_1229 PHA synthesis regulator PhaR NA 2.5 0.002 PHA YP_617028 Sala_1984 PHA depolymerase PhaZ 3.1.1.75 1.5 0.33 PHA YP_615989 Sala_0938 Secreted 3- - 1.1.1.30 1.5 0.14 PHA hydroxybutyrate dehydrogenase PHA, polyhydroxyalkanoate. Accession, RefSeq accession number. EC number, . FC, 14N/15N fold change. FDR, false discovery rate expressed as a q-value, where proteins with q < 0.2 had significant changes in abundance due to temperature. NA, EC number not applicable because the entries are not enzymes. BO, -oxidation pathway; BM, butanoate metabolism; PHA, PHA synthesis or metabolism. Grey shaded proteins had 14N/15N FC increased at 10ºC, unshaded proteins had 14N/15N FC increased at 30ºC.Bold entries are significantly changed proteins with q < 0.2.

Acetoacetyl-CoA reductase (Sala_2255) was one of the most abundant proteins in the cell (Table 4.4), and also showed a 1.5-fold increased at 30 C when compared to 10ºC (Table 4.5). It is involved in the degradation of PHA towards acetyl-CoA after PhaZ depolymerisation. The end-product of PHA depolymerisation is acetyl-CoA, which can enter a number of metabolic processes including energy generation via the TCA cycle; or satisfy demand for biomass production by conversion to pyruvate and phosphoenolpyruvate and enter gluconeogenesis followed by the pentose phosphate pathway; or peptidoglycan biosynthesis via pyruvate and alanine; or lipid biosynthesis. At 30ºC, the TCA reactions succeeding the glyoxylate shunt reactions were significantly increased (Table 4.3 and Figure 4.5). When the TCA cycle is fuelled by acetyl-CoA only, and substrate is required for biomass production, the glyoxylate shunt must be used in order to avoid carbon losses in the form of CO2 (Johnson et al., 1966; Kornberg, 1966). The hypothesised depletion of endogenous PHA and the shunting of substrate

154 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

toward energy generation, metabolism and biomass generation, clearly illustrate that growth at 30ºC in S. alaskensis requires more carbon and energy sources to fuel a faster growth rate. Finally, it is interesting to note that in Gram negative bacteria, maximum PHA yield is achieved via the Entner-Doudoroff pathway, rather than the Embdem- Meyerhoff pathway (Gomez et al., 1996). Since S. alaskensis has an incomplete Embden-Meyerhoff pathway (Section 4.4.2.1), it seems that this organism is genetically hard-wired for efficient PHA synthesis.

4.4.3.3. PHA and cold adaptation in S. alaskensis At low temperature, reactive oxygen species (ROS) have increased (Georlette et al., 2004; D'Amico et al., 2006). In nitrogen fixing bacteria, a role of PHA was predicted to provide a carbon and reductant reserve in order to maintain respiratory activities that protect the nitrogenase enzyme from oxidative damage (Kretovich et al., 1977; Romanov et al., 1980; Bergersen et al., 1991; Bergersen & Turner, 1992). Similarly, the increased amount of PHA in S. alaskensis in the cold could reflect the increased potential for oxidative damage detoxification (Section 4.4.10). The increased abundance of enzymes responsible for PHA synthesis at low temperature in S. alaskensis may reflect the compounded evolutionary challenge of a cold and nutrient limited environment, where enzyme activity and nutrient uptake are less efficient. S. alaskensis is comprised of 30% PHA compounds (Schut et al., 1997), indicating that the production of PHA is the normal response in the natural cold environment. The increase of PHA production appears to be a novel physiological adaptation that may reflect the combined challenge of a permanently cold and nutrient depleted environment. It is relevant to note that genomic analysis of the cold adapted bacterium, Colwellia psychrerythraea, revealed a genetic capacity for PHA synthesis (Methe et al., 2005) and PHAs were characterised in the permafrost psychrotroph Exiguobacterium sibiricum (Rodrigues et al., 2006).These examples provide further evidence to suggest that the accumulation of PHA is an adaptation to the cold (Rodrigues & Tiedje, 2008). The exact mechanism of PHAs in cold adaptation has not been elucidated, though they may also act as an intracellular carbon and energy store as described above (Section 4.4.3.2). The abundance of a secreted 3-hydroxybutyrate dehydrogenase enzyme was 1.5-fold increased at 10ºC in S. alaskensis. Ralstonia pickettii, has a secreted PHA

L. Ting, UNSW. 155 Chapter 4

depolymerase that degrades PHA into 3-hydroxybutyrate oligomers, and a secreted 3-hydroxybutyrate dehydrogenase that breaks down the oligomers to monomers and to acetoacetate, which can be converted to acetyl-CoA (Sugimoto et al., 2008). Additionally, Alcaligenes faecalis (Tanio et al., 1982), and various species of Comamonas (Jendrossek et al., 1993; Kasuya et al., 1994), and Pseudomonas (Nakayama et al., 1985; Uefuji et al., 1997) secrete 3-hydroxybutyrate dehydrogenase. In S. alaskensis, the increase of 3-hydroxybutyrate dehydrogenase seems to be contradictory to the general trend towards the synthesis of PHA at low temperature; however, the primary source of extracellular PHA is from the release of granules after cell death and lysis (Jendrossek & Handrick, 2002). Thus, it is possible that as an adaptation to growth at low temperature, S. alaskensis scavenges for extracellular PHA released by lysed cells. Finally, the lack of signal sequences at the N terminus on all the PHA associated enzymes indicates that PHA storage materials are intracellular, and should be considered separately from the secreted PHA enzymes.

4.4.4. Amino acid metabolism

The quantitative proteomics data combined with the genomic analysis of the S. alaskensis provided valuable information in reconstructing the nitrogen assimilatory pathway. The quantitative proteomics data revealed that there were temperature- dependent preferences for the biosynthesis of specific amino acids at 10ºC (histidine, tryptophan and proline) and at 30ºC (valine, leucine, isoleucine, arginine, serine, and some indication of increased glycine and aspartate metabolism).There was a lack of temperature variation for lysine biosynthesis, and the incomplete catabolic pathway for leucine degradation was increased at 10ºC. Finally, the secretion of amino acid scavenging enzymes at 10ºC and 30ºC reflects the low nutrient environment from which the organism was isolated.

4.4.4.1. Nitrogen assimilation and the glutamate and alanine pathways Glutamate dehydrogenase (GDH) (Sala_2258), glutamate synthetase (GOGAT) (Sala_2140) and alanine deyhdrogenase (ADH) (Sala_0558) were among the most abundant proteins in the cell (Table 4.4) and glutamine synthetase (GS) (Sala_0149) was 1.3-fold increased at 30ºC (Table 4.3). The significance of the abundance of these proteins must be considered with respect to the nitrogen source supplied during growth because of their nitrogen assimilatory function. S. alaskensis was cultured in ASW,

156 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

where nitrogen was supplemented in the form of NH4Cl. In S. alaskensis, there are two major ammonia assimilatory pathways either via ADH, or via GS and GOGAT in the GS-GOGAT pathway (Figure 4.5). In a separate study, GDH was not operative in ammonia assimilation, but rather, functioned in a catabolic direction, where glutamate was deaminated to give 2-oxoglutarate (Williams et al., 2009). Furthermore, the amination of pyruvate to give alanine (facilitated by ADH) cannot enter nitrogen metabolism via glutamate or aspartate because the necessary genes are absent from the genome (Williams et al., 2009) so that GS-GOGAT is the only available pathway for the biosynthesis of glutamate by ammonia assimilation. Glutamate is a central substrate for downstream amino acid biosynthesis and all nitrogen containing metabolites, and is therefore vital for growth. Since the GS-GOGAT pathway is solely responsible for glutamate synthesis, and GOGAT does not require ATP, it is consistent that GOGAT was exceptionally abundant in the cell regardless of growth temperature (Table 4.4). GS requires ATP and is thus energetically expensive to maintain at high levels, which could be attributed to the significantly decreased abundance of GS at 10ºC. The expense of ammonia assimilation using the glutamate cycle is that 2-oxoglutarate, the key amino acceptor, is diverted from the TCA cycle (Figure 4.5), so that the TCA cycle stalls, and energy generation by substrate level phosphorylation stops. The general high abundance of catabolic GDH is consistent with returning 2-oxoglutarate back to the TCA cycle (Table 4.4). Since alanine cannot enter nitrogen metabolism via glutamate or aspartate, alanine synthesised via ammonia assimilation must be metabolised by a different route. It is most likely that alanine is used to supply several anabolic pathways for which it is a substrate (Figure 4.5). For example, in CoA synthesis via -alanine, Sala_1122 is a -alanine-pyruvate aminotransferase (omega-amino acid-pyruvate aminotransferase) and was 2.7-fold more abundant at 30ºC (Table 4.3). This enzyme facilitates the transamination of an amino group from L-alanine to malonate semialdehyde to give pyruvate and -alanine, where -alanine is a substrate of coenzyme-A biosynthesis via pantothenate (vitamin B5) (Yonaha et al., 1992). Dephospho-CoA kinase (Sala_2839), which facilitates the last step of CoA biosynthesis, was 3.2-fold increased at 30ºC (Table 4.3). The results highlight a segregation of metabolic roles in S. alaskensis;

L. Ting, UNSW. 157 Chapter 4

where alanine supplies nitrogen to non-amino acid biosynthetic pathways, and glutamate supplies nitrogen to amino acid biosynthetic pathways. The alanine content in the amino acid composition of S. alaskensis is remarkably high when compared to other organisms in the NCBI database (Table 2.9). This trend may be related to the high-GC content of the S. alaskensis genome (65.46%), where high-GC content bacteria have a skew of amino acid composition towards the alanine, arginine, glycine and proline amino acids (Singer & Hickey, 2000). Singer & Hickey (2000) determined that alanine was the major representative for alanine, arginine, glycine and proline residue biases. GC content has also been linked to resource limitation in the form of carbon or nitrogen starvation, where high-GC bacteria have a positive correlation with nitrogen and negative correlation with carbon (Baudouin- Cornu et al., 2004; Bragg & Hyder, 2004). Specifically, low carbon availability is correlated with low carbon content in the genome and proteome; and organisms with low carbon content encode proteins enriched with alanine and glycine (Baudouin-Cornu et al., 2004).

4.4.4.2. Amino acid metabolism at 10ºC A number of amino acid biosynthetic pathways including histidine, tryptophan and proline, were increased at 10ºC. The imidazole glycerol phosphate synthase subunit hisH (Sala_3148) and histidinol-phosphate aminotransferase (Sala_2887) were 1.6-fold and 1.5-fold, respectively, more abundant at 10ºC (Table 4.3) Both these enzymes are involved in histidine synthesis from phosphoribosyl pyrophosphate (PRPP), the end product of the pentose phosphate pathway. The anabolic reactions of both enzymes require amino group donations from glutamine and glutamate (Section 4.4.4.1). Anthranilate synthase component I (Sala_1174), which is the first committed step of tryptophan synthesis, was also more abundant at 10ºC (Table 4.3). The donation of an amino group from glutamine results in the production of glutamate. In the next step, PRPP donates a ribosyl and phosphate group to the tryptophan precursor. The increase of tryptophan biosynthesis during low temperature growth is similar to the rate limiting uptake of tryptophan in yeast during low temperature growth (Kawamura et al., 1994). In addition, a cold sensitive Salmonella enterica required tryptophan supplementation for low temperature growth (Hoffmann & Ingraham, 1970), and L. monocytogenes transcripts for tryptophan biosynthesis were increased during cold growth (Liu et al., 2002).

158 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

With PRPP taking part in both histidine and tryptophan biosynthesis, it seems likely that at 10ºC, it is diverted toward . PRPP is essential for purine and pyrimidine synthesis; however, the growth rate of S. alaskensis at low temperature is relatively slow (see Section 4.4.13) so that the demands for DNA replication would not be as considerable as at 30ºC. Additionally, the requirement of glutamine or glutamate for the amino group donation for amino acid synthesis is consistent with the pivotal role of the GS-GOGAT route of the glutamate cycle in S. alaskensis. Glutamate-5-semialdehyde dehydrogenase (Sala_0581) is involved in the biosynthetic pathway of glutamate conversion to proline, and was 1.7-fold more abundant at 10ºC (Table 4.3).Also, proline-tRNA ligase (Sala_1230) was 1.5-fold increased (Table 4.3), so that proline usage appears to be favoured in protein biosynthesis at low temperature. In summary, growth of S. alaskensis at the in situ isolation temperature provided insight into the preferred amino acids synthesised, metabolised and utilised in protein synthesis, with enzymes in the histidine, tryptophan and proline biosynthetic pathways being increased at 10ºC.

4.4.4.3. Amino acid metabolism at 30ºC Aspartate aminotransferase (Sala_0172), aspartate kinase (Sala 2953) and homoserine dehydrogenase (Sala_0810) were 1.3-fold, 1.5-fold and 1.6-fold, respectively, increased at 30ºC (Table 4.3). The synthesis of aspartate is core to the amino acids in the aspartate amino acid family (Viola, 2001); one example is the downstream synthesis of homoserine (Figure 4.5), an intermediate in methionine biosynthesis.Approximately 30% of the metabolic flow of nitrogen passes through aspartate and into the aspartate family amino acids including homoserine, threonine, methionine, isoleucine and asparagine (Reitzer & Magasanik, 1996) and into pyrimidine and purine nucleotide biosynthesis (Figure 4.5). An acetylornithine transaminase (Sala_1359) and an argininosuccinate synthetase (Sala_0495) were both 1.4-fold increased at 30ºC (Table 4.3). Acetylornithine transaminase converts glutamate to citrulline via ornithine (Grisolia & Cohen, 1953; Ramos et al., 1970); and argininosuccinate synthetase catalyses the ligation of aspartate to citrulline to form argininosuccinate – the second last step in arginine biosynthesis using the urea cycle (Cunin et al., 1986) (Figure 4.5). Arginase

L. Ting, UNSW. 159 Chapter 4

converts arginine to ornithine, completing the urea cycle; and it is absent from the genome (Figure 4.5). This is consistent with a greater demand for arginine at high temperature. The entire three-step serine biosynthetic pathway was increased at 30ºC. D-3-phosphoglycerate dehydrogenase (Sala_0616) and phosphoserine aminotransferase (Sala_0167), the first and second enzymes, were more increased by 1.5-fold and 1.8-fold, respectively (Table 4.3). Phosphoserine phosphatase (Sala_1468), the third enzyme, showed a moderate but not significant increase in abundance at 30ºC (FC = 1.8, q = 0.5). Serine is an intermediate for the synthesis of sphingolipids, purines and pyrimidines, as well as other amino acids such as glycine, cysteine, tryptophan (Staugger, 1996). The alpha and beta subunits of glycine dehydrogenase (decarboxylating) (Sala_1867 and Sala_1868) were more abundant at 30ºC (Table 4.3), and are part of a four subunit enzyme complex involved in the cleavage of glycine. Glycine cleavageprovides the precursor 5,10-methylene tetrahydrofolate, for purine, pyrimidine and haem biosynthesis (Staugger, 1996). Components of the valine, leucine and isoleucine biosynthetic pathways were increased at 30ºC. Ketol-acid reductoisomerase (Sala_1472) and 3-isopropylmalate dehydrogenase (Sala_1085), involved in the initial stages of the valine, leucine and isoleucine synthesis pathway, were significantly more abundant at high temperature (Table 4.3). A branched-chain amino acid aminotransferase (Sala_0047) was also increased (Table 4.3); and it is responsible for the final transamination of the valine, leucine and isoleucine -keto acid precursor to the amino acid product, a reaction that requires glutamate as the amino donor. Leucine dehydrogenase (Sala_1521) was increased at 30ºC (Table 4.3), and it has substrate specificity for leucine, valine and isoleucine, and catalyses the deamination of these amino acids by liberation of ammonium ions to form their keto analogues. Finally, acetolactate synthase (Sala_1472), which catalyses the first step of the valine, leucine and isoleucine biosynthetic pathway; and leucyl-tRNA synthetase (Sala_3020), which catalyses the ligation of leucine to tRNA, were among the most abundant proteins in the cell (Table 4.4). Thus, it appears that the key steps in the biosynthetic pathway for branched chain amino acids are always turned on.

160 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

In summary, cells grown at 30ºC had relatively increased synthesis of valine, leucine, isoleucine, arginine, serine, homoserine and some indication of increased glycine and aspartate metabolism (possible shunting into different pathways as substrates). The hypothesised changes in amino acid synthesis as a function of temperature reflect the changes in internal solutes and the changes to the intracellular pool of amino acids.

4.4.4.4. An incomplete leucine catabolic pathway and a connection to -oxidation and energy generation at 10ºC Many of the enzymes responsible for leucine, isoleucine and valine degradation were significantly increased at 10ºC (Table 4.3 and Figure 4.6). Leucine, along with lysine, are the only amino acids that are solely ketogenic, that is, giving rise only to acetyl-CoA or acetoacetyl-CoA, neither of which can bring about net glucose production (Noda & Ichihara, 1976). However, in S. alaskensis the enzymes that catalyse the fourth to sixth reactions of leucine degradation are absent from the genome (Figure 4.6). A possible solution for the completion of leucine catabolism would require the end product of the third reaction of leucine degradation, 3-methylbut-2- enoyl-CoA, to enter the fatty acid -oxidation pathway. Here, enoyl-CoA hydratase (Sala_3054) catalyses the hydration of 3-methylbut-2-enoyl-CoA to 3-hydroxyisovaleryl-CoA (Figure 4.6). Leucine degradation and -oxidation have common enzymes such as acyl-CoA dehydrogenase (Sala_1105), and enoyl-CoA hydratase (Sala_3054). Valine and isoleucine degradation and -oxidation share additional enzymes including 3-hydroxyacyl-CoA dehydrogenase (Sala_3157), acetyl- CoA C-acyltransferase (-ketothiolase) (Sala_3158), and acyl-CoA dehydrogenase (Sala_1105). All these enzymes, except for enoyl-CoA hydratase (FC=1.3, q = 0.22), were significantly increased (q < 0.2) at 10ºC (Table 4.3). Similar to S. alaskensis, a number of bacteria from the Vibrionaceae and Shewanellaceae family also lack the last few enzymes in the leucine degradation pathway (Nemecek-Marshall et al., 1999). Instead, they possess a multistep alternative pathway of leucine catabolism via 3-hydroxyisovaleryl-CoA that results in its decomposition into and acetyl-CoA. The exact mechanism has not been elucidated; however, this pathway prevents the stalling of leucine catabolism by

L. Ting, UNSW. 161 Chapter 4 cycle. acetyl- ation. ee steps ACA ACA TCA TCA acetyl-CoA acetyl-CoA acetoacetate-CoA acetoacetate-CoA ADH X X X Leucine degradation 3-methylbutaoly-CoA 3-methylbutaoly-CoA 3-methylbut-2-enotyl-CoA acetoacetate ? TCA TCA acetone + acetyl-CoA acetyl-CoA ECH 3-hydroxyisovaleryl-CoA 3-hydroxyisovaleryl-CoA A

C BDH ADH l ECH ACA ACA CoA

- l TCA TCA Lipid fatty acyl + acetyl-CoA + acetyl-CoA enoyl-CoA enoyl-CoA degradation fatty acyl-CoA fatty acyl-CoA fatty acyl-CoA fatty acyl-CoA - k etoacyl-CoA hydroxyacyl-CoA hydroxyacyl-CoA ECH HDH BDH ADH

ethylbut-2-en ethylbutyryl C Isoleucine Propionate metabolism degradation propanoyl-CoA propanoyl-CoA trans-2-methylbut-2-enoyl-CoA trans-2-methylbut-2-enoyl-CoA 3-hydroxy-isobutrtate 2-methylacetoacetyl-CoA 2-methylacetoacetyl-CoA 2-methylbutanoyl-CoA 2-methylbutanoyl-CoA 3-hydroxy-2-methylbutyryl-CoA 3-hydroxy-2-methylbutyryl-CoA TCA TCA A

Co l ry HDH BDH ADH ECH ut lyl-CoA

ob ry Valine Valine metabolism degradation Propioanate semiadehyde isobutryl-CoA isobutryl-CoA methylmalonate methylacrylyl-CoA methylacrylyl-CoA 3-hydroxy-isobutrtate 3-hydroxy-isobutrtate 3-hydroxy-isobutryl-CoA 3-hydroxy-isobutryl-CoA of leucine catabolism is absent from the S. alaskensis genome, and may be compensated for by ECH enzyme also used in β-oxid cantly increased abundance at 10ºC. Dotted line represents a connection of compound to another Blue text represents enzymes with signifi ACA, metabolic pathway; X represents an enzyme absent from the S. alaskensis genome; ? unknown enzymatic pathway. dehydrogenase (EC dehydrogenase (EC 1.3.99.3); BDH, butyryl-CoA ADH, acyl-CoA C-acyltransferase (EC 2.3.1.16, EC 2.3.1.9); CoA TCA, tricarboxylic acid dehydrogenase (EC 1.1.1.35); hydratase (EC 4.2.1.17); HDH, 3-hydroxyacyl-CoA 1.3.99.2); ECH, enoyl-CoA Figure 4.6. Shared enzymes in branched chain amino acid degradation and β-oxidation. The enzymes for degradation of the branched chain Figure 4.6. Shared enzymes in branched chain amino acid degradation and β-oxidation. The last thr amino acids leucine, isoleucine and valine are common with the enzymes responsible for β-oxidation of fatty acids.

162 L. Ting. UNSW Molecular mechanisms of cold adaptation in S. alaskensis

allowing for the production of acetyl-coA, thus providing substrate for the TCA cycle. Most significantly, the quantitative proteomics data indicate that S. alaskensis may be the first example of a family member to possess an alternative pathway for leucine catabolism. Since all the bacteria in the Nemecek-Marshall study (1999) were also marine bacteria, the presence of an alternate leucine catabolic pathway might be associated with common environmental and ecological . Finally, the degradation of branched chain amino acids – especially leucine – along with fatty acid oxidation pathways could be used as a source of substrate for generating energy. The end-products of -oxidation and the degradation of branched chain amino acids include acetyl-CoA, acetoacetyl-CoA, succiniyl-CoA, methylmalonyl-CoA and propionyl-CoA that ultimately contribute to the TCA cycle followed by energy generation via the electron transport chain (Section 4.4.1.1).

4.4.4.5. A lack of temperature variation in lysine biosynthesis The lysine biosynthetic pathway did not display consistent temperature variation. Aspartate semialdehyde dehydrogenase (Sala_2718) and 2,3,4,5-tetrahydropyridine- 2,6-dicarboxylate N-succinyltransferase (Sala_2828) were increased by 2.5-fold and 1.3-fold, respectively, at 10ºC (Table 4.3). Aspartate is the key substrate for lysine biosynthesis and aspartate semialdehyde dehydrogenase links alanine and aspartate metabolism to lysine biosynthesis. In contrast, the first and last steps of lysine biosynthesis, catalysed by aspartate kinase (Sala_2953) and diaminopimelate decarboxylase (Sala_1198), were 1.5-fold and 2.5-fold, respectively, increased at 30ºC (Table 4.3). The relative increase of various enzymes in lysine biosynthesis at 10ºC vs. 30ºC suggested that low or high temperatures influenced different aspects in these steps, either in substrate supply or energy requirements.

4.4.4.6. Secreted amino acid scavenging enzymes A secreted beta-aspartyl-peptidase (Sala_2824), a secreted oligopeptidase B (Sala_2995) and a secreted Zn-dependent membrane oligopeptidase (Sala_1728) were significantly more abundant at 30ºC (Table 4.3). Beta-aspartyl-peptidase catalyses the hydrolytic cleavage of a beta-linked aspartate residue from the N-terminus of a polypeptide; however other isopeptide bonds such as gamma-glutamyl and beta-alanyl are not hydrolysed (Marti-Arbona et al., 2005); and oligopeptidase B cleaves arginine and lysine bonds on oligopeptides (Polgár, 2002). Mammalian enzymes homologous to

L. Ting, UNSW. 163 Chapter 4

the secreted Zn-dependent membrane oligopeptidase are endothelin converting enzymes involved in the vasoconstriction of endothelial cells, a process involving the cleavage of oligopeptides into bioactive peptides by the integral membrane Zn-dependent peptidase (Emoto & Yanagisawa, 1995). In contrast, at 10ºC, the abundance of a secreted Zn-dependent (Sala_2312) was increased 1.5-fold (Table 4.3). Homologous mammalian renal are involved in renal metabolism of glutathione and its conjugates (Hinchman & Ballatori, 1990). S. alaskensis has one glutathione S-transferase (Sala_1722) and it was significantly increased at 10ºC (Table 4.3). In eukaryotes, glutathione-S- are involved in detoxification of cells and possibly in protein transport (Vuilleumier & Pagni, 2002). In bacteria, their function in degradation of xenobiotics as carbon and energy sources is more important than detoxification. Also, they may have a regulatory role in bacterial adaptation to changing conditions (Vuilleumier & Pagni, 2002). The increase of glutathione-related enzymes at low temperature may reflect another cold adaptation strategy in S. alaskensis. Secreted peptidases would scavenge for dipeptides and oligopeptides, possibly from lysed cells in the culture. Most of the carbon- and/or nitrogen-containing nutritional substrates in the open ocean are macromolecular structures, as opposed to dissolved organic and inorganic compounds that can be directly absorbed by marine microorganisms (Marx et al., 2007). To utilise the macromolecular nutrients in their environment, enzymes able to hydrolyse the compounds must be secreted, and the cells import the hydrolysed products before they are diluted and lost in the ocean. The secreted enzymes must be anchored to the exterior of the cell, after substrate binding, and the hydrolysed nutrients are directly taken up by the cell or immobilised to the exterior of the cell (Marx et al., 2007). The significantly increased amounts of different secreted di- or oligopeptidases at both 10ºC and 30ºC suggests that regardless of growth temperature, the scavenging of extracellular nitrogen and carbon sources are an innate characteristic of the scavenging lifestyle of S. alaskensis, resulting from the permanent exposure to an oligotrophic environment.

4.4.5. Transcription in the cold

The core components of the S. alaskensis DNA-directed RNA polymerase (RNAP) had increased abundance at 10ºC. These included the consistent 1.3-fold increase of the ,

164 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

, and ’ subunits (Table 4.3). RNAP is made up of the , , ’, subunits, and a factor to form the RNAP holoenzyme (Figure 4.7). In addition to the transcriptional RNAP machinery, transcriptional factor NusA, transcription antitermination factor NusG and transcription termination factor Rho, were increased at 10ºC (Table 4.3 and Figure 4.7). The RNAP  and ’ subunits, NusA and Rho were identified in all proteomics experiments (Table 4.4), making them among the most abundant proteins in the cell. However, if only the thermally-induced abundance changes are considered, the overall increased presence of transcriptional machinery and transcriptional factors during cold growth may be a key cold adaptation strategy to overcome the challenge of the cold and sustain transcriptional processes.

Figure 4.7. S. alaskensis transcriptional machinery and three transcriptional factors are increased at 10ºC. The , , subunits ’ of RNAP were significantly more abundant at 10ºC, while a stress-associated protein Y ( 54 homolog) was increased at 30ºC. Three transcriptional factors with termination and anti-termination activity were increased at 10ºC.Text in blue represents increased abundance at 10ºC, text in pink represents increased abundance at 30ºC. RNAP, DNA-directed RNA polymerase.

In contrast, at 30ºC, a ribosome associated stress response protein (protein Y or pY) (Sala_2837) was 2-fold increased (Table 4.3 and Figure 4.7). In E. coli, pY binds to and stabilises the ribosomes from dissociation when the cell is exposed to

L. Ting, UNSW. 165 Chapter 4

environmental stress (Ye et al., 2002). The increased abundance of pY at high temperature could be an indicator that S. alaskensis is experiencing stress (Section 4.4.14.2). A response regulator receiver protein (Sala_2673) was also 1.4-fold increased at 30ºC (Table 4.3) and is a possible response mechanism to elevated temperatures.

4.4.6. Translation, ribosomal structure and biogenesis

There was an over-representation of the number of proteins increased at 10ºC in the translation COG category (Table 4.3). The over-representation reflects the adverse effect of cold temperature on the initiation of protein synthesis and stabilisation of mRNA by the formation of secondary structures, which result in translation inhibition (Russell, 1990).

4.4.6.1. Increased RNA degradosome enzymes in the cold The quantitative proteomics data revealed that RNA degradosome enzymes and proteins associated with increasing translation efficiency were increased in the cold. The increased abundance of the RNA degradosome core proteins at 10ºC: polyribonucleotide nucleotidyltransferase by 1.6-fold, ribonuclease (RNase) D by 1.4-fold, RNase E by 1.4-fold, and a DEAD-box ATP-dependent RNA helicase by 1.5-fold (Table 4.3); indicated that mRNA processing and turnover was increased at low temperature and is consistent with the ‘cold shock degradosome’ theory (Beran & Simons, 2001; Prud'homme-Genereux et al., 2004). In E. coli, the ‘cold shock degradosome’ involves the interaction of the cold-induced CsdA (a DEAD-box RNA helicase) directly with RNase E in the degradosome to unwind stabilised mRNA secondary structures (Prud'homme-Genereux et al., 2004). Similarly, in E. coli, the increase of polyribonucleotide nucleotidyltransferase is required for the adaptive growth resumption of cold shock-treated cells (Yamanaka & Inouye, 2001); in Pseudomonas syringae, RNase D is crucial for RNA metabolism at low temperature (Purusharth et al., 2007); and in the cold adapted archaeon Methanococcoides burtonii, the mRNA of a putative DEAD-box helicase was only expressed during cold growth (Lim et al., 2000).

4.4.6.2. Compensating for reduced translation efficiency in the cold There was a 1.6-fold increase of translation initiation factor IF2 (Sala_0612) at 10ºC (Table 4.3). In E. coli, mutations in IF2 result in cold sensitive mutants (Shiba et al., 1986; Laursen et al., 2003); and IF2 in S. oneidensis was induced upon cold shock (Gao et al., 2006). IF2 interacts with unfolded and denatured proteins, similar to molecular

166 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

chaperones, to promote functional protein folding and prevent aggregation in E. coli (Caldas et al., 2000). In addition a peptide chain release factor 2 was increased by 1.4-fold at 10ºC (Table 4.3). Translation initiation factor IF1 has also been associated with cold adaptation, particularly in E. coli (Goldenberg et al., 1996; Giuliodori et al., 2004; Giuliodori et al., 2007; Phadtare et al., 2007). The cold adaptive role of IF1 lies in its transcriptional antiterminator activity, where it mediates melting of cold-induced mRNA secondary structures, and allows translation to occur without obstruction (Phadtare et al., 2007). Cold shock protein CspA is a functional homologue of IF1, where both proteins belong to the oligomer binding (OB) protein family. There are two IF1 genes in S. alaskensis (Sala_1481 and Sala_1916); Sala_1916 was not differentially abundant at 10ºC vs. 30ºC, and Sala_1481 was not identified. However, a CspA homologue (Sala_1480) was 23-fold more abundant at 10ºC (Table 4.3). Therefore, it appears that CspA, rather than IF1, may be responsible for low temperature melting of mRNA secondary structures. Two structural ribosomal proteins were increased at 10ºC and this increase in the number of ribosomal components could compensate for the cold induced reduction of translation rates. Few ribosomal proteins were identified in the proteomics experiments because these proteins are often found in the insoluble protein fraction after cell disruption (Burg, D., Pers. Comm). The insoluble protein fraction, including membrane proteins and other large hydrophobic protein complexes, were not examined in the current study. Since MS techniques are abundance-dependent, it is likely the identification and thus quantitative measurement of many structural ribosomal proteins may have been masked in MS analyses by more abundant soluble proteins in the samples. A GTP-binding protein TypA/BipA homolog (Sala_2303) was 1.3-fold increased at 10ºC (Table 4.3). TypA or tyrosine phosphorylation protein A, is predicted to be involved in signalling. In S. meloloti, TypA is required for stress adaptation to a number of conditions including low temperature (Kiss et al., 2004). In E. coli, BipA is a highly conserved GTPase translation factor that allows for the efficient expression of the Fis transcriptional modulator by destabilising the strong interaction of the 5’ untranslated region (UTR) of fis mRNA to the ribosome. Fis, in turn, modulates a range of downstream processes such as DNA metabolism and secretion (Owens et al.,

L. Ting, UNSW. 167 Chapter 4

2004). Most significantly, a bipA mutant in E. coli is cold sensitive (Krishnan & Flower, 2008). In summary, the data indicate that the increased release of polypeptide, mRNA turnover and processing, structural ribosomal proteins and the increase of translation factors such as IF2, TypA/BipA and CspA are required in S. alaskensis to compensate for reduced translational efficiency during cold growth.

4.4.7. Protein folding in the cold

4.4.7.1. Peptidyl-prolyl cis-trans isomerase A peptidyl-prolyl cis-trans isomerase (PPIase) was 1.4-fold increased at 10ºC (Table 4.3). PPIase proteins are involved in protein folding at the rate-limiting step of the cis- trans isomerisation of peptidyl-prolyl bonds. A comparative proteomics and biochemical study of the cold adapted Shewanella sp. SIB1 elucidated a cold adaptive role of PPIase in the cell (Suzuki et al., 2004). PPIase was identified in a number of proteomics studies to have a role in cold adaptation in cold adapted bacteria including Shewanella livingstonensis (Kawamoto et al., 2007), Psychrobacter arcticus (Zheng et al., 2007), and the cold adapted archaeon, Methanococcoides burtonii (Goodchild et al., 2004). Also, a transriptomics investigation of the hyperthermophillic Methanococcus janaschii archaeon identified the upregulation of a PPIase during cold shock (Boonyaratanakornkit et al., 2005). Finally, a gene knockout study determined that three secreted FK506-binding PPIase enzymes in Caenorhabditis elegans are required for normal nematode development and exoskeleton formation during cold growth (Winter et al., 2007). Thus, as observed in the current study, the isomerisation of peptidyl-prolyl bonds is important in cold adaptation.

4.4.7.2. A cold-induced and constitutive GroESL protein folding system The GroESL complex is involved in the reduction of protein aggregates and correct folding of proteins and is induced upon heat shock and stress in all domains of life (Horwich et al., 2006). It is rare to find organisms with more than one GroESL system, and the purpose of what seems to be a has not been elucidated. There has been speculation on different functions of each repeated system in the organisms in which this phenomenon has been found (Rodríguez-Quiñones et al., 2005). The Rhizobiaceae are reported to have the most number of groE operons, most notably Bradyrhizobium japonicum has seven groEL genes and five groES genes (Kaneko et

168 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

al., 2002). It is uncommon for groEL homologues to outnumber the co-chaperonin groES in the organisms with multiple sets (Fischer et al., 1993; Rusanganwa & Gupta, 1993). In S. alaskensis, there are two complete groESL gene sets (Sala_0452 and Sala_0453; and Sala_2079 and Sala_2078) and one transposon truncated groEL (Sala_0415). The sequences of the three GroEL proteins (Table 4.6) and the two GroES proteins (Table 4.7) were compared using BLAST, and sequence alignments (Figure 4.8and Figure 4.9) clearly demonstrated that the proteins are unique and that the two complete GroESL protein sets are distinct. The Sala_0452 GroEL is one of the most abundant proteins in S. alaskensis where it was identified and quantified in all 20 experiments (Table 4.3 and Table 4.4), and the Sala_0453 GroES protein was identified in 11 of the 20 experiments and quantified in 5 (Table 4.3). Accordingly, these data indicate that the Sala_0452 GroEL and, by extrapolation of genetic proximity, the Sala_0453 GroES partner are the primary subunits of the S. alaskensis GroESL protein folding system. The bias towards one gene set as the primary GroESL is typical in organisms with GroESL redundancies (Rodríguez-Quiñones et al., 2005). Sala_2079_chaperonin_GroEL MAAKEVKFASDARDRMLRGVDTLANAVKVTLGPKGRNVVIEKSFGAPRIT 50 Sala_0452_chaperonin_GroEL MAAKDVKFSRDARERILKGVDILADAVKVTLGPKGRNVVIDKSFGAPRIT 50 ****:***: ***:*:*:*** **:***************:********* Sala_2079_chaperonin_GroEL KDGVTVAKEIELADKFENMGAQMLREVASKQNDKAGDGTTTATVLAQAIV 100 Sala_0452_chaperonin_GroEL KDGVSVAKEIELKDKFENMGAQMLREVASKANDKAGDGTTTATVLAQAIV 100 ****:******* ***************** ******************* Sala_2079_chaperonin_GroEL REGSKAVAAGMNPMDVKRGIDLAVKAVVKDLETHAKKVSANSEIAQVATI 150 Sala_0452_chaperonin_GroEL REGMKSVAAGMNPMDLKRGIDLAVTKVVEDLKARSTPVSGSSEIAQVGII 150 *** *:*********:********. **:**::::. **..******. * Sala_2079_chaperonin_GroEL SANGDEEVGRILAEAMDKVGNEGVITVEEAKSLATELETVEGMQFDRGYL 200 Sala_0452_chaperonin_GroEL SANGDVEVGEKIAEAMEKVGKEGVITVEEAKGLEFELDVVEGMQFDRGYL 200 ***** ***. :****:***:**********.* **:.*********** Sala_2079_chaperonin_GroEL SPYFITNAEKLKVELDDPYILIHEKKLSNLQAMLPLLEAVVQSGKPLLII 250 Sala_0452_chaperonin_GroEL SPYFITNPEKMIVELTDPYILIFEKKLSNLQSMLPILEAVVQSGRPLLII 250 *******.**: *** ******.********:***:********:***** Sala_2079_chaperonin_GroEL AEDVEGEALATLVVNRLRGGLKVAAVKAPGFGDRRKAMLEDIAILTGGNV 300 Sala_0452_chaperonin_GroEL AEDIEGEALATLVVNRLRGGLKVAAVKAPGFGDRRKAMLQDIAILTKGEM 300 ***:***********************************:****** *:: Sala_2079_chaperonin_GroEL VSEDLGIKLENVTVNMLGRAKKVVIDKDNTTIVDGVGARTDIDARIAQIR 350 Sala_0452_chaperonin_GroEL ISEDLGIKLENVTLNMLGQAKRVTIDKDNTTIVDGAGDAEAIKGRVEQIR 350 :************:****:**:*.***********.* *..*: *** Sala_2079_chaperonin_GroEL QQIDTTTSDYDREKLQERLAKLAGGVAVIRVGGATEVEVKERKDRVDDAL 400 Sala_0452_chaperonin_GroEL AQIETTTSDYDREKLQERLAKLAGGVAVIKVGGATEVEVKERKDRVDDAL 400 **:*************************:******************** Sala_2079_chaperonin_GroEL HATRAAVEEGILPGGGIALLRALKALDGLKAANDDQQSGIDIVRRALRAP 450 Sala_0452_chaperonin_GroEL HATRAAVEEGIVPGGGTALLYATKALEGLKGANDDQTRGIDIIRKAIETP 450 ***********:**** *** * ***:***.***** ****:*:*:.:* Sala_2079_chaperonin_GroEL ARQIADNAGEDGAWIVGKLLESSDYNWGFNAATGEYEDLVKAGVIDPAKV 500 Sala_0452_chaperonin_GroEL LRQIAANAGHDGAVVAGNLLRVGDVEQGFNAATDVYENLKAAGVIDPTKV 500 **** ***.*** :.*:**. .* : ******. **:* ******:** Sala_2079_chaperonin_GroEL VRTALQDAASVAALLITTEALVAELPKEEKAAPMP------AMDF 539 Sala_0452_chaperonin_GroEL VRTALQDAASVAGLLITTEAAVSELPEDKPAMPMGSGGMGGMGGMDF 547 ************.******* *:***::: * ** .***

Figure 4.8. Alignment of two S. alaskensis GroEL proteins. The third transposon-truncated GroEL protein is not included in this alignment due to its small size (see Table 4.6).

L. Ting, UNSW. 169 Chapter 4

Table 4.6. Sequence similarities of the three S. alaskensis GroEL proteins. Molecular Sequence similarity Protein Amino acids weight (kDa) Sala_0452 Sala_2079 Sala_0415 Sala_0452 547 58.0 100% Sala_2079 547 57.4 79% 100% Sala_0415 51 5.2 78% 80% 100% E. coli GroEL 548 57.3 70% 66% 61% Similarity values were derived from BLAST searches

Sala_2078_chaperonin_GroES MHFRPLHDRVVVRRIEAEEKSSGGIIIPDTAKEKPQEGEVVAVGPGARAE 50 Sala_0453_chaperonin_GroES MQFRPLHDRVLVRRIEAEEKTAGGIIIPDTAKEKPQEGEVVSVGTGARAD 50 *:********:*********::*******************:**.****: Sala_2078_chaperonin_GroES DGTVTAPDVRVGDRVLFGKWSGTEVRIDGEDLLIMKESDILGVIEQAEAL 100 Sala_0453_chaperonin_GroES DGKVTPLDVKAGDRILFGKWSGTEVKVDGEELLIMKESDILGVIA----- 95 **.**. **:.***:**********::***:************* Sala_2078_chaperonin_GroES KKAA 104 Sala_0453_chaperonin_GroES ----

Figure 4.9. Alignment of the two S. alaskensis GroES proteins.

Table 4.7. Sequence similarity of the two S. alaskensis GroES proteins. Molecular Sequence similarity Protein Amino acids weight (kDa) Sala_0453 Sala_2078 Sala_0453 95 10.3 100% Sala_2078 104 11.3 82% 100% E. coli GroES 97 10.4 51% 38% Similarity values were derived from BLAST searches

In examining the thermal changes in abundance of the GroESL proteins, only the Sala_0452 GroEL had a significant abundance increase at 10ºC (Table 4.3). Sala_0453 was also consistently increased at 10ºC though this change was not deemed statistically significant, though the data suggest that at low temperature there was an increase in the Sala_0452-Sala_0453 GroESL. Further, the Sala_2079 GroEL had no change in abundance between the two temperatures (FC = 1.2, q = 0.6), and the Sala_2078 GroES and transposon truncated Sala_0415 GroEL were not quantified. Similarly, in the close relative Rhizobium leguminosarum, which also has three groEL genes (Cpn60.1, Cpn60.2, and Cpn60.3); Cpn60.1 was detected at higher levels than Cpn60.2, and Cpn60.3 could not be detected (Rodríguez-Quiñones et al., 2005). Additionally, these three R. leguminosarum proteins displayed distinct properties in

170 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

vitro (George et al., 2004). These findings may represent a precedent that at 10ºC in S. alaskensis, there could be either a cold active homooligomeric GroESL (only Sala_0452 and Sala_0453) or a stoichiometric increase of the Sala_0452 protein at low temperature in a heterooligomeric GroESL complex (Sala_0452, Sala_0453, Sala_2079, Sala_2078), where the Sala_2079 and Sala_2078 subunits may be constitutively expressed. The GroESL system has traditionally been described in a heat shock response; however, it is possible that GroESL in S. alaskensis is a cold adapted, cold specific protein folding machinery. In Gram negative bacteria, RpoH or 32 are the heat-induced regulators of the GroESL system (Yura & Nakahigashi, 1999). Upon heat shock, the cellular level of 32 increases both by enhanced translation of rpoH mRNA and by transient stabilization. This increase, in turn, is responsible for a transient enhanced transcription of heat shock genes (Yura & Nakahigashi, 1999). RpoH in S. alaskensis (Sala_1743) was quantified in two biologically independent experiments and did not show a significant change in abundance at 10ºC vs. 30ºC (FC = 1.2, q = 0.7). The growth rate of recombinant E. coli (with GroEL and GroES inserted from a cold adapted Antarctic bacterium Oleispira Antarctica) grew significantly faster than wild- type E. coli during low temperature growth (Ferrer et al., 2003; Ferrer et al., 2004); GroESL was concluded to be the limiting factor to cold growth, presumably due to slower protein folding (Ferrer et al., 2003). Also, a recent DNA microarray study identified the increased expression of GroES during cold shock in E. coli (White- Ziegler et al., 2008). Additionally, in L. monocytogenes, GroEL was increased at low temperature growth; and it was suggested that it was involved in the maintenance of protein solubility and function in the cytoplasm (Liu et al., 2002).

4.4.7.3. Cold-induced and heat-induced DnaK-DnaJ-GrpE protein folding Active folding of nascent polypeptide chains is mediated by DnaK, with its co- chaperone DnaJ, and the nucleotide exchange factor GrpE (Zolkiewski, 1999). Short hydrophobic sections of polypeptide are recognised by DnaK, where binding and release of the polypeptide is coupled to ATP hydrolysis. In the ATP-bound state DnaK does not stably bind polypeptide. DnaJ mediates ATP hydrolysis, upon which the polypeptide is stably bound and can undergo correct protein folding. The GrpE co-

L. Ting, UNSW. 171 Chapter 4

chaperone returns DnaK to the ATP-bound state, and the folded protein can then be released (Mayer et al., 2000; Ben-Zvi & Goloubinoff, 2001). S. alaskensis has two sets of dnaK-dnaJ-grpE genes and all the proteins were identified in the MS experiments. The two DnaK proteins were among the most abundant proteins in the cell (Table 4.4). Based on genetic proximity, it is most likely that the Sala_0402 DnaK is associated with the Sala_0404 DnaJ and Sala_0401 GrpE; and similarly the Sala_2058 DnaK with the Sala_2059 DnaJ and Sala_0252 GrpE. Additionally, Sala_0252 GrpE and Sala_2059 DnaJ were 1.7-fold and 2.9-fold increased at 30ºC, respectively (Table 4.3), which is indicative of a heat-induced function of these proteins. In contrast, Sala_0404 DnaJ was 1.8-fold increased at 10ºC (Table 4.3), which suggests that the Sala_0402-Sala_0404-Sala_0401 Dnak-DnaJ-GrpE complex is important for protein folding in the cold. It is noteworthy that the E. coli DnaJ homologue of Sala_0404 is not induced upon heat shock, and this protein was determined to bind to curved DNA sections and to function similarly to DnaJ (Yamashino et al., 1994). Another DnaJ homologue in E. coli (Hsc66) was found to be cold induced (Lelivelt & Kawula, 1995). A complementation study of a cold active dnaK from Shewanella livingstonensis sp Ac10 in low temperature growing E. coli was successful in supporting the growth of dnaK-null mutants (Yoshimune et al., 2005). The data provide evidence to suggest that at low temperature, the increased Sala_0404 DnaJ- like abundance is indicative of a cold active Sala_0402-Sala_0404-Sala_0401 DnaK- DnaJ-GrpE protein folding complex. Currently, there are only three examples of multiple dnaK genes in bacteria. There are two genes in E. coli (Seaton & Vickery, 1994; Lelivelt & Kawula, 1995), and two genes in Borrelia burgdorferi (Fraser et al., 1997). Recently, dnaK was found to be cold-induced in a cold shock DNA microarray study of E. coli (White-Ziegler et al., 2008). The most widely reported cases of multiple dnaK copies are in Cyanobacteria; there are three dnaK homologues in Synechococcus PCC7942, Synechocystis PCC6803, Anabaena PCC7120 (Nimura et al., 2001). Also a comparative genomics study identified between 4 to 5 dnaK homologues in a range of photosynthetic eukaryotes (Renner & Waters, 2007), which were found to consistently differ in their heat inducibility and functionality (Kovacs et al., 2001; Nimura et al., 2001; Katano et al., 2006).There are 15 hsp70 genes (the eukaryotic dnaK) in yeast that are expressed

172 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

differently under a range of physiological conditions (Werner-Washburne et al., 1989). These examples of multiple DnaK chaperones with different heat inducibilities provide a precedent to suggest that one of the two copies of the dnaK (and thus including dnaJ and grpE) in S. alaskensis has a function in cold protein folding at 10ºC; while the other has a more classical heat shock protein folding response at 30ºC.

4.4.7.4. Clp ATPases Encoded in the S. alaskensis genome are clpA, clpP, clpX, clpS, clpQ, clpY and two clpB genes (Table 4.8). Clp ATPases are ATP-dependent molecular chaperones that mediate correct protein folding and disaggregation as a chaperones (e.g. ClpA, ClpB, ClpX, ClpY (or HslU), and ClpS), or proteases that eliminate damaged proteins unable to be rescued by the chaperones (e.g. ClpP and ClpQ (or HslV)) (Hendrick & Hartl, 1993; Ben-Zvi & Goloubinoff, 2001; Hoskins et al., 2001; Chandu & Nandi, 2004; Young et al., 2004).

Table 4.8. The Clp ATPases in S. alaskensis Clp protease Locus tag 14N/15N FDR ratio (q-value) ClpA Sala_0168 1.16 0.60 ClpB Sala_0406 1.55 0.03 ClpB Sala_2391 1.03 0.95 ClpP Sala_2742 1.05 0.85 ClpX Sala_2743 1.14 0.69 ClpS Sala_0503 ND ND ClpQ (HslV) Sala_1594 1.33 0.54 ClpY (HslU) Sala_1595 1.29 0.42 ND, not determined by lack of quantitative proteomics data. Grey shaded proteins had 14N/15N fold change values increased at 10ºC, unshaded proteins had 14N/15N fold change values increased at 30ºC. Bolded entry represents statistically significant difference in abundance at 10ºC vs. 30ºC (q < 0.2).

Clp ATPases can function alone or in combination. For example, ClpP alone can only degrade short peptides, and ClpA alone has a chaperone function similar to that of DnaK; but a ClpAP complex can degrade large specific proteins (Wickner et al., 1994; Hoskins et al., 2001). ClpS modulates the specificity of the ClpAP-mediated ATP- dependent protein degradation by binding to the N terminus of the ClpA protein (Dougan et al., 2002). ClpYQ or HslVU is the heat shock locus proteasome, where the proteins form a multimeric proteasome arranged in a two-layered ring(Missiakas et al.,

L. Ting, UNSW. 173 Chapter 4

1996; Rohrwild et al., 1996). It has been suggested that the HslVU system is not essential for bacteria because it is a eukaryotic feature; the presence of this system may be a result of lateral gene transfer (Gille et al., 2003). Finally, ClpB acts exclusively as a chaperone in reactivating denatured proteins (Barnett et al., 2000). From the quantitative proteomics experiments, all the Clp proteins, except for ClpS, were identified and quantified. None of the Clp ATPases, except for ClpB, in S. alaskensis displayed any differential changes in abundance between 10ºC and 30ºC; the Clp ATPases are most probably essential for protein folding regardless of temperature. Only Sala_0406 ClpB had significantly increased abundance in the cell at 10 C (Table 4.3). By genetic proximity, it is likely that the Sala_0406 ClpB interacts with the cold active DnaK-DnaJ-GrpE protein folding complex (Sala_0402 Sala_0404 Sala_0401). In E. coli, ClpB-mediated disaggregation is performed in conjunction with the DnaK chaperone system, and usually, the ClpB-DnaK system is induced upon heat shock (Goloubinoff et al., 1999; Mogk et al., 2003). Cold-induced ClpB proteins have also been reported in Synechococcus sp. Strain PCC7942 (Porankiewicz & Clarke, 1997), L. monocytogenes (Liu et al., 2002), and E. coli(Phadtare & Inouye, 2004). In summary, the quantitative proteomics data indicate that in S. alaskensis, there is a complete protein folding system adapted to the cold. As a polypeptide is synthesised and extruded from the ribosome, it can fold correctly into protein either spontaneously or by assistance of the DnaK-DnaJ-GrpE and GroESL complexes, where PPIase action is crucial in cold adaptive protein folding (Figure 4.10). Finally, the action of rescuing misfolded or aggregated protein by the ClpB chaperone seems to be particularly important at 10ºC in S. alaskensis, probably due to the impairment of enzyme activity at low temperature. The consistency of the proteomics data is compelling in attributing a cold-specific function for these proteins in S. alaskensis.

174 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

Figure 4.10. A complete cold-active protein folding cycle in S. alaskensis. As a polypeptide is synthesised and extruded from the ribosome, the polypeptyide can fold correctly into protein either spontenously or by assistance of the DnaK- DnaJ-GrpE and GroESL complexes. Misfolded or aggregated protein can be rescued by ClpB chaperone activity and correct folding can occur. Blue arrows indicate reactions significantly increased at 10ºC.

4.4.8. Inorganic ion and coenzyme transport

At 10ºC, many differentially abundant proteins were associated with the active transport of inorganic ions and coenzymes across the cell membrane in S. alaskensis. The majority of the transport systems that appear to be employed for uptake were TonB- dependent receptors, specific for iron and cobalamin, and transport systems specific for copper and inorganic phosphate (Pi). At 30ºC, TonB-dependent receptors specific for iron were the only proteins that were significantly increased.

4.4.8.1. Phosphate import An (Sala_1625) and phosphate import ATP-binding protein PstB homolog (Sala_0823) were 5.2-fold and 9.2-fold more abundant at 10ºC, respectively (Table 4.3). Alkaline are located in the periplasm and are non-specific metalloenzymes that hydrolyse phosphate esters to provide Pi required for many metabolic reactions and biomolecules such as nucleic acids, phosphorylated sugars and

L. Ting, UNSW. 175 Chapter 4

proteins (Posen, 1967). The increased production of Pi at low temperature in S. alaskensis is closely linked with the specific PstB import protein that was also consistently increased at 10ºC. The entire high affinity ABC-type Pi uptake system pstSCAB is the only phosphate transport system in the genome. A low affinity phosphate transport system is absent from the genome, which could be due to genomic streamlining commonly found in organisms inhabiting extreme environments, particularly nutrient depleted conditions (Garcia-Fernandez et al., 2004; Giovannoni et al., 2005), including S. alaskensis(Williams et al., 2009). Genomic streamlining involves a reduction in genome size such that only the most fundamental metabolic and regulatory functions remain (Giovannoni et al., 2005). The S. alaskensis genome is not as intensely streamlined as Pelagibacter ubique, a model oligotrophic bacterium; however, in the former, there is obvious metabolic simplification at the level of uptake and subsequent processing of substrates (Williams et al., 2009).

4.4.8.2. Cobalamim uptake Cobalamin is an essential in many enzymatic reactions, and S. alaskensis obligately requires cobalamin supplementation because the genomic pathway for its synthesis is absent. Cobalamin is the only vitamin that must be supplemented for growth of S. alaskensis(Eguchi, 1999). It is likely that the absence of the cobalamin pathway is most probably due genomic streamlining, as mentioned above. Therefore, it is not surprising that a TonB-dependent receptor with predicted cobalamin specificity (Sala_3108) was amongst the most abundant proteins in the cell regardless of growth temperature (Table 4.4). Also, the same importer protein displayed significantly differential abundance at 10ºC (Table 4.3). The increased cobalamin uptake at low temperature growth may indicate that more cobalamin presence is required to compensate for reduced enzyme efficiency or that cobalamin-requiring reactions are also increased in the cold. For example, the abundance of methylmalonyl CoA mutase, a cobalamin-dependent enzyme in the propionate pathway was also increased at 10ºC (Section 4.4.1.1).

4.4.8.3. Copper resistance A copper resistance protein B precursor (CopB homolog) (Sala_0785) was 4-fold increased at 10ºC in S. alaskensis (Table 4.3). Copper is an essential trace element which serves as a cofactor for many enzymes; however it is toxic and can give rise to

176 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

free radicals and ultimately, oxidative degradation of the cell. Copper homeostasis is crucial for all organisms, and in Enterococcus hirae (Solioz & Odermatt, 1995), Moraxella catarrhalis (Sethi et al., 1997), Pseudomonas syringae (Cha & Cooksey, 1991), and Archaeoglobus fulgidus (Mana-Capelli et al., 2003),the CopB protein extrudes copper an ATPase when it approaches toxic levels. It is relevant that the CopB outer membrane protein is similar to the TonB-dependent receptor protein(Aebi et al., 1996). The data implies that at low temperature in S. alaskensis, the uptake of copper could exceed its demand in enzymatic reactions, therefore increasing the requirement for copper resistance.

4.4.8.4. Iron import and homeostasis A TonB-dependent receptor specific for iron (Sala_1913) and the bacterioferritin iron storage protein (Sala_0588) were 1.6-fold and 4.2-fold, respectively, increased at 10ºC (Table 4.3). At low temperature, the uptake of iron is sufficient to supply the metabolic needs of the cell so that it is stored in the bacterioferritin protein. At 30ºC, however, three TonB-dependent receptors for iron (Sala_0029, Sala_0947 and Sala_2091) were significantly increased (Table 4.3), consistent with the significantly elevated requirement for iron in a faster growing cell. The biological function of iron relies upon its incorporation into proteins, either as a mono or binuclear species, in iron-sulfur clusters, as haem groups, and mixed metal centres (Andrews et al., 2003). However, free iron is poorly available in oxygen-rich environments and is also potentially toxic to the cell, therefore cells require effective iron homeostasis by scavenging for iron, ensuring adequate intracellular reserves are maintained and providing sufficient protection from its toxic effects (Andrews et al., 2003).

4.4.8.5. Is there TonB-dependent receptor specificity beyond iron and cobalamin? TonB-dependent receptors are classically associated with the active uptake of iron chelates after siderophore capture and binding (Ferguson et al., 1998). The Gram negative bacterial TonB-dependent receptor transport systems have components in the outer membrane, cytoplasmic membrane and periplasm. The energy required for actively transporting extracellular nutrients into the cell is transduced by coupling the outer membrane to the inner membrane by an ExbB, ExbD and TonB complex (Wiener, 2005).

L. Ting, UNSW. 177 Chapter 4

Most bacteria possess less than 14 TonB-dependent receptors per proteome, while a small proportion of bacteria, usually - and -, aquatic bacteria and phytopathogens, have more than 30 TonB-dependent receptors (Blanvillain et al., 2007). S. alaskensis falls into the latter group with 43 predicted TonB-dependent receptors. In a large-scale comparative genomics study, no correlation was found to exist between the number of TonB-dependent receptors and the genome size of the organism; rather, the ecological niche and physiology of the organism appeared to influence TonB-dependent receptor over-representation (Schauer et al., 2008). Other Sphingomonadales organisms (e.g. Novosphingobium aromaticivorans and Sphingomonas sp SKA58), aquatic psychrophiles (e.g. Colwellia psychroerythraea, Pseudoalteromonas haloplanktis and Pseudoalteromonas antarctica), and oligotrophs (e.g. Caulobacter crescentus) also qualify into the latter category of TonB-dependent receptor over-representation (Blanvillain et al., 2007). These three features (Sphingomonadales order, marine environment and oligotrophic lifestyle) are defining characteristics of S. alaskensis and could be the reason for TonB-dependent receptor over-representation. There is a clear indication that not all TonB-dependent receptors are involved in iron or cobalamin uptake. Only 9 out of the 72 TonB-dependent receptors in a genetic study of Xanthomonas campestris were found to have direct roles in iron uptake (Blanvillain et al., 2007); 11 of 34 TonB dependent receptors identified from a proteomics study of Pseudomonas aeruginosa were regulated by iron starvation (Llamas et al., 2003). It is most likely that the remainder of non-iron-regulated TonB- dependent receptors are involved in different biological functions such as carbohydrate scavenging or metal uptake (Blanvillain et al.; Schauer et al., 2008). Two major genetic studies have identified TonB-dependent receptors specific for the transport of a range of carbohydrates in Xanthomonas campestris (Blanvillain et al., 2007) and maltodextrin in Caulobacter crescentus (Neugebauer et al., 2005); and a proteomics-based study of Sphingomonas sp. A1 identified outer membrane proteins, including TonB-dependent receptors, responsible for the uptake of polysaccharides (Hashimoto et al., 2005). With regards to metal specificity, genetic studies have also revealed nickel specific TonB- dependent receptors in Helicobacter pylori (Schauer et al., 2007) and Rhodobacter capsulatus (Rodionov et al., 2006), and cobalt specificity in R. capsulatus and Salmonella enteric (Rodionov et al., 2006).

178 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

The bioinformatic predictions of non-iron or cobalamin specificity of TonB- dependent receptors are commonly based on genomic proximity of carbohydrate or metal transporters or enzymes to the TonB-dependent receptor, and the presence of upstream regulatory elements (Blanvillain et al., 2007). Using this approach, there are several TonB-dependent receptors in S. alaskensis with predicted carbohydrate enzymes nearby (Table 4.9); but none of these receptors had changed abundances as a result of temperature change. The uncharacterised receptors are important to the cell regardless of temperature, consistent with the overall importance of carbohydrates in central metabolism, and also the oligotrophic conditions from which S. alaskensis was isolated. Most importantly, there are a very low number of specific ABC-transporter types on the cell surface (Williams et al., 2009) for which carbohydrate specific TonB dependent receptor systems may compensate.

L. Ting, UNSW. 179 Chapter 4

. nown or S. alaskensis Xanthomonas campestris e protein, es, using k and both organisms spp. spp. SapC B family. -amylase, neighbouring genes regulator -glucosidase α α inase pf k S. alaskensis k phosphoglucomutase secretion system genes secretion system genes askensis Sugar transporter MFS-1, Isoprenylcysteine carboxylmethyl S. a l Sugar transporter MFS-1, altronate dehydratase, glucuronate isomerase, k mannitol dehydrogenase-li pectate lyase, cupin 2, a putative sugar phosphate isomerase, and carbohydrate Glycoside hydrolase, beta galactosidase, beta glucosidase, sugar transporters, LacI transferase, patatin, entericidin, peptidase genome to search for potential

Xanthomonas campestris

N Y Y Y Y N N PASS1, askensis Identified in S. alaskensis S. a l compared to askensis S. a l e; CAZy, carbohydrate active enzymes. . This organism is phylogenetically related to k proteases transporter genes Maltose transporter, to plant carbohydrates Pectin methyl , Secretion system genes campestris Xcc neighbouring genes xylose and xylan induced Neighbouring carbohydrate Sugar transporter, glutathione S- hydrolase, aconitate hydratase 1. Sugar transporters and extracellular Inosine- preferring nucleoside cyclomaltodextrin glucanotransferase polygalacturonate induced, specificity transferase, SapC, PASS1. Arabinose, malA) locus locus (Ortholog of Putative partial Putative partial conserved locus Xcc description Conserved TonB CAZy associated dependent receptor dependent receptor Putative CUT locus C. crescentus Putative partial CUT Putative partial CUT conserved CUT locus conserved CUT locus, CAZy associated TonB spp. Xanthomonas campestris ely not to be carbohydrate specific, but were examined because of predicted carbohydrate association in ely carbohydrate specific TonB-dependentely receptors, unshaded proteins did not have nearby carbohydrate transporter or

Xcc, a a . (iroN) (iroN) (fecA) (fyuA) (fhuA) Xcc1037 Xcc2469 Xcc0304 Xcc2665 Xcc2208 Xcc0120 Xcc3427 Xcc2828 Xcc3963 Xcc gene homolog (fecA) and (fhuA) and campestris spp. Locus tag Sala_0027 Sala_0181 Sala_0305 Sala_1015 Sala_3041 Sala_0313 Sala_0914 Table 4.9. Predicted carbohydrate uptake TonB-dependent receptors in Grey shaded proteins are li k metabolic enzymes and are li k share similar physiological characteristics, including an overrepresentation in TonB-dependent receptors. BLAST homology search share similar physiological characteristics, including an overrepresentation in TonB-dependent receptors, were performed predicted Xcc carbohydrate associated TonB-dependent against the campestris carbohydrate associated TonB-dependent receptor. CUT, carbohydrate upta

180 L. Ting. UNSW Molecular mechanisms of cold adaptation in S. alaskensis

4.4.9. Effect of cold growth temperature on the cell envelope

A common adaptation to the cold involves reorganisation of the cell envelope that is characterised by an increased number of desaturated lipid bonds, increased methyl branching, and shortened acyl chains (Section 4.4.1.2). From the quantitative MS data, many more proteins had significant increases at 10ºC than 30ºC in the cell wall, membrane and envelope biogenesis COG category (Figure 4.1). Many of the proteins increased at 10ºC are involved in EPS synthesis (Section 4.4.3.1). Two structural membrane proteins (Sala_3101 and Sala_1988) were significantly increased at 10ºC (Table 4.3) and are probably associated with cold adaptive restructuring of the membrane (Section 4.4.1.2). Also, a periplasmic C-terminal processing peptidase (Sala_0574), with a protease domain similar to the DegS protein in Gram negative bacteria, was 1.8-fold increased at 10ºC (Table 4.3).It also has a PDZ sensor domain that, in Gram negative bacteria, is a sensor of misfolded periplasmic proteins. Activation of the sensor domain could result in the activation of the protease domain (Wilken et al., 2004). Therefore, the increased abundance of a periplasmic misfolded protein-specific protease indicates that low temperature growth not onlyresults in membrane restructuring, but also in the increased denaturation of exported proteins.

4.4.10. Detoxification: Posttranslational modifications and defense

Detoxification is an important aspect of cell biology that must respond immediately to transient or prolonged exposure to toxic compounds. The quantitative proteomics data suggest that in S. alaskensis, the detoxification of ROS and antibiotic compounds are temperature-influenced events.

4.4.10.1. Detoxification by posttranslational events There was an equal representation of proteins in the posttranslational modifications COG category (Figure 4.1). Enzymes responsible for ROS detoxification were increased at both growth temperatures. Peroxiredoxin (Sala_2184) was 14-fold increased and an organic hydroperoxide resistance protein (Sala_2374) was 2-fold increased at 30ºC (Table 4.3); both proteins belong to the peroxiredoxin family. Peroxiredoxins interact with hydrogen peroxide, participate in oxidant scavenging and redox signal transduction in all domains of life (Fourquet et al., 2008). Peroxiredoxin oligomerisation confers chaperone activity that prevents heat shock induced protein aggregation and contributes

L. Ting, UNSW. 181 Chapter 4

to heat shock tolerance in bacteria (Chuang et al., 2006), yeast (Moraitis & Curran, 2004; Jara et al., 2007) and human cells (Jang et al., 2004; Moon et al., 2005; Lee et al., 2007). Thus, the 14-fold increase of Sala_2184 and the 2-fold increase of Sala_2374 is probably due to the compound effect of faster growth at high temperature resulting in the creation of more ROS, and the chaperone-like heat shock response function of peroxiredoxins. At 10ºC, a thioredoxin (Sala_2665) was 1.5-fold increased (Table 4.3). Thioredoxins are small disulfide-containing redox proteins that act as a general protein disulfide oxidoreductase and are found in all domains of life (Holmgren, 1985). Thioredoxin was increased during cold shock in the mesophillic B. subitlis (Seo et al., 2004), and during cold growth in the psychrophillic Bacillus psychrosaccharolyticus (Seo et al., 2004) and L. monocytogenes (Liu et al., 2002). The increased detoxification of oxygen radicals in the cold is due to their increased solubility at low temperature (Georlette et al., 2004; D'Amico et al., 2006). There seems to be a preference for peroxiredoxin-based detoxification of ROS at high temperature, and thioredoxin-based detoxification at low temperature. S. alaskensis is inherently resistant to hydrogen peroxide (Eguchi et al., 1996; Ostrowski et al., 2001); an investigation into the oxidative stress resistance of this organism concluded that catalase was not responsible for hydrogen peroxide resistance (Ostrowski et al., 2001). The quantitative proteomics data in the current study indicated that peroxiredoxin and thioredoxin are likely candidates to contribute to the inherent resistance of S. alaskensis to hydrogen peroxide. The ability to withstand the toxic effects of stress-inducing agents such as hydrogen peroxide is ecologically significant in S. alaskensis because oxidative agents are a common challenge in the marine environment (Gourmelon et al., 1994; Oda et al., 1997).

4.4.10.2. Detoxification as a defense mechanism There was an over-representation of proteins at 30ºC in the defense and secondary metabolites COG category (Figure 4.1). All of the proteins increased at 30ºC facilitate the detoxification of antibiotics or xenobiotics (Table 4.3). The only protein increased at 10ºC was a 3-mercaptopyruvate sulfurtransferase (Sala_3061) (Table 4.3) that facilitates the detoxification of cyanide. Sulfuration of cyanide by donation of sulfur from 3-mercaptopyruvate yields thiocyanate and pyruvate. Several species of plants exposed to cold temperatures have superior cyanide resistance (Van De Venter, 1985;

182 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

Grabelnych et al., 2004). Also, bacteria synthesise and secrete cyanide as a competition strategy (Blumer & Haas, 2000). The 3-mercaptopyruvate sulfurtransferase-faciliated cyanide detoxification in S. alaskensis may contribute to the persistence and competitive dominance of this organism in its native environment.

4.4.11. Nucleotide biosynthesis and a possible thermally-controlled stringent response

At 30ºC, the bifunctional purine biosynthesis protein PurH (Sala_3123), with phosphoribosylaminoimidazolecarboxamide formyltransferase and inosine monophosphate cyclohydrolase activity, was increased 1.4-fold (Table 4.3). Consistent with the increase of PurH at 30ºC, GTP cyclohydrolase I (Sala_1911), which facilitates the first step in tetrahydrofolate (THF) biosynthesis, was 1.7-fold increased at 30ºC (Table 4.3). THF is a coenzyme in the formyltransferase reaction of PurH. This suggests that overall there is a greater demand for purines when the cell is growing faster. The final biosynthetic product of the bifunctional PurH enzyme is IMP, which is the intermediary metabolite at which purine biosynthesis bifurcates into either adenosine or guanine (Srere, 1987). At 10ºC, guanylate kinase (Sala_3155), responsible for the phosphorylation of GMP to GDP in the synthesis of guanine nucleotides, was 1.7-fold increased (Table 4.3). Therefore, purine biosynthesis seems to be central in S. alaskensis at both growth temperatures. Also increased at 10ºC, was the Sala_1753 guanosine-5'-triphosphate,3'- diphosphate diphosphatase (Table 4.3), which catalyses the interconversion of pppGpp to ppGpp. (p)ppGpp is most notably associated with the stringent response, classically induced by N or C starvation, where the (p)ppGpp effector molecule alerts the cell to low levels of amino acids or carbon sources and inhibits RNA synthesis by binding to the  subunit of RNA polymerase (Chatterji et al., 1998). Since the hallmark of the stringent response is the rapid intracellular accumulation of (p)ppGpp, the increase of Sala_1753 guanosine-5'-triphosphate,3'-diphosphate diphosphatase at 10ºC indicates a thermally-controlled stringent response. In a classical nutritional stringent response, a growth lag is expected while the cell biosynthesises a sufficient amino acid pool to supply the demands of growth (Chatterji & Ojha, 2001). After a 24h carbon starvation, the outgrowth response of S. alaskensis did not include a typical lag phase (Eguchi et al., 1996; Eguchi et al., 2001; Cavicchioli et al., 2003). The lack of a stringent response

L. Ting, UNSW. 183 Chapter 4

in S. alaskensis is consistent with the pattern of protein synthesis rates that occurs as cells enter starvation. In contrast with a spike in rate of protein synthesis that accompanies the initial starvation phase, rates of synthesis remain low in S. alaskensis (Fegatella & Cavicchioli, 2000). A temperature controlled stringent response has been identified in E. coli (Yang & Ishiguro, 2003) and L. monocytogenes (Liu et al., 2006) and putatively identified in Photobacterium profundum (Lauro et al., 2008). Thus a thermally controlled stringent response is possible; however, the mechanics of such a response has not been elucidated. There was no evidence of differential abundance of pyrimidine biosynthetic enzymes from the quantitative MS data. However, most of the enzymes in the pyrimidine biosynthesis pathway were identified in the MS analyses (Appendix D), and the lack of significant abundance changes due to temperature reflects the essential nature of nucleotide biosynthesis.

4.4.12. Thermally important proteins with general function prediction only

There was little information on function for nearly one quarter of the proteins with significant abundance changes (Table 4.3). Most of these proteins were in the general function prediction only or function unknown COG categories; however, some proteins were classed into more specific COG categories due to good matches to conserved domains, but with little other information on function. It is noteworthy that many proteins with unknown function had large, if not the largest, 14N/15N FC values in several COG categories. Specific examples include a putative glucose inhibited division (GidA) family protein with the largest increase at 10ºC in the translation COG (Table 4.3). Also in the translation COG, a putative endoribonuclease protein had the largest abundance increase at 30ºC (Table 4.3). Further examples include an aminotransferase class III protein with the largest abundance increase at 10ºC in the amino acid transport and metabolism COG (Table 4.3); a putative ATP12 family chaperone protein as the most increased protein at 30ºC in energy production and conversion (Table 4.3); and an AMP-dependent synthetase and ligase as the most significantly abundant protein at 30ºC in the lipid metabolism COG (Table 4.3). The large dataset of uncharacterised proteins that are clearly important for cold adaptation, as well as heat stress, suggests that there may be many more unidentified

184 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

processes involved in thermal adaptation in addition to what is already known. The list of unknown proteins with significant changed abundances are clear targets for further investigation to continue to unravel the molecular mechanisms of cold adaptation.

4.4.13. Unchanged proteins and the Sec-dependent secretion pathway

Proteins that were consistently unchanged in abundance at 10ºC vs 30ºC must play a core role in cellular maintenance, growth and replication (Appendix J). Many of the unchanged proteins (with high q-values) have housekeeping roles in central metabolic pathways. Additionally, comparison of this subset of proteins to those with significant abundance changes provided information on understanding cold adaptation at the molecular level. For example, the SecE integral membrane protein in the Sec-dependent protein secretory pathway was unchanged (Appendix J). This is consistent with what is known about the Sec-dependent secretion system, where SecE (along with SecY and SecG) is a primary component of the translocation machinery (Mori & Ito, 2001). In contrast, the SecB chaperone was increased at 10ºC (Table 4.3). The Sec-dependent secretion pathway can operate without SecB (SecA as the ATPase coupled to SecYEG as the channel); however, SecB is employed when proteins destined for export require extra assistance in unfolding (Weiss et al., 1988). It suggests that SecB-facilitated protein unfolding pre-export is particularly important to cold protein translocation in S. alaskensis. The SecF subunit had unchanged abundance (Appendix J) while the YajC integral membrane protein was increased at 10ºC (Table 4.3).The role of the auxiliary SecDF-YajC complex is not clearly defined; however, mutations in the SecDF-YajC operon in E. coli produced cold-sensitive mutants (Nouwen & Driessen, 2005). The mutations were in the secD or secF genes and not yajC (Nouwen & Driessen, 2005). The data from the present study suggests that SecF does not have a cold responsive role in S. alaskensis, but that YajC is important for cold growth protein export. It is also important to note that protein export is a cold-sensitive process (Pogliano & Beckwith, 1993) that may confound cold-sensitive mutational studies.

L. Ting, UNSW. 185 Chapter 4

4.4.14. The fast and the furious: Growth at 30ºC is fast and stressful

4.4.14.1. The fast Growth of cells at 30ºC was significantly faster than growth at 10ºC. Cultures grown at

30ºC reached OD433 0.3 in ~3 days, while cultures at 10ºC reached the same OD in ~35 days. Accordingly, many proteins with significant increases of abundance at 30ºC were involved in supplying biomass and energy to fuel a faster growth rate, as indicated by the over-representation of proteins increased at 30ºC in the energy production and conversion COG category (Figure 4.1). Most of these proteins are involved in the TCA cycle or the electron transport chain. An increased growth rate must also be coupled with increased cell division. At 30ºC, an ATPase associated with cell septation (Sala_1127) was 2.7-fold increased (Table 4.3). In contrast, a Maf protein was increased at 10ºC (Table 4.3).Maf is associated with inhibition of septation in B. subtilis; similarly the OrfE homolog of Maf in E. coli causes inhibition of septation (Butler et al., 1993). Ribose-5-phosphate isomerase B (Sala_0792) and ribose-phosphate pyrophosphokinase (Sala_2942) were both increased at 30ºC (Table 4.3). These enzymes facilitate the sequential biosynthesis of PRPP from ribulose-5-phosphate (Rudolph, 1994). Their respective 1.7-fold and 1.4-fold increases at 30ºC (Table 4.3) clearly indicate that as a result of a faster growth rate at higher temperature, there is an increased demand for biomass production. Furthermore, the increased abundance of PRPP at 30ºC is consistent with supplying a central substrate for nucleotide biosynthesis (Section 4.4.4.3).

4.4.14.2. The furious: Stress indicators at high temperature growth A ribosome associated stress response protein (YfiA in yeast or protein Y (pY) in bacteria) is a 54 homolog and has one ribosome-associated inhibitor A (RaiA) domain (Sala_2837). This protein was significantly more abundant at 30 C, by 1.7-fold in all 20 MS experiments (Table 4.3 and Table 4.4). pY is associated with the slowing of translation during stress after cold-induction in E. coli (Agafonov et al., 2001). Upon expression in E. coli, pY binds to the 30S subunit interface in the 70S ribosome and inhibits peptide elongation by blocking the binding of aminoacyl-tRNA to the ribosomal A site (Agafonov et al., 2001; Rak et al., 2002). There are pY homologues in other bacteria such as Haemophilus influenza (Parsons et al., 2001), and the LrtA protein

186 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

from Synechococcus PCC 7002 is a light repressed pY homolog (Tan et al., 1994). It has been suggested that members of the pY family are generally involved in the process of adaptation to stressful environmental conditions (Ye et al., 2002). Since the growth of S. alaskensis at 30 C is approximately 10- to 15-fold faster than at 10 C, and since S. alaskensis is an inherently slow-growing bacterium; it is possible that fast growth is stressful. Protein-L-isoaspartate (D-aspartate) O-methyltransferase (Sala_2261) was 1.5-fold increased at 30ºC (Table 4.3). This enzyme acts on L-isoaspartyl and D-aspartyl, the products of the spontaneous deamidation or isomerisation of normal L-aspartyl and L-asparaginyl residues in proteins in bacteria (Fu et al., 1991; Johnson et al., 1991), eukarya (Johnson et al., 1991), and archaea (Griffith et al., 2001). The general role of this methyltransferase is to repair damaged proteins, where the enzymatic methyl esterification of the abnormal residues can lead to their conversion to normal L-aspartyl residues (Brennan et al., 1994). In S. alaskensis, the increase of this methyltransferase at 30ºC reflects fast cell growth, and therefore an increase in repair mechanisms. In addition, in E. coli (Kindrachuk et al., 2003) and A. thaliana (Villa et al., 2006), the overexpression of this methyltransferase conferred an improvement in heat shock survival by some mechanism independent of methyltransferase activity. Further examples to illustrate the stressful condition of growth at 30ºC for S. alaskensis include the increased abundance of ectoine hydroxylase (Sala_2952) (Table 4.3) that converts ectoine to hydroxyectoine. The latter is a compatible solute that can be induced in osmotically or thermally stressful conditions in bacteria and fungi (Malin & Lapidot, 1996; Kuhlmann & Bremer, 2002). Finally, an IbpA homolog small heat shock protein (Sala_2830) was increased 8-fold at 30ºC (Table 4.3). Overall the increased abundance of repair and survival mechanisms at 30ºC is consistent with stress. Also, the data illustrate that growth temperatures close to the Topt of S. alaskensis (35-40ºC) (Eguchi et al., 1996), where Topt refers to the maximal growth rate of the cell, are stressful. Consequently, 30ºC may not the optimum physiological growth temperature for S. alaskensis, rather, it is the fastest growth temperature.

4.4.15. Conclusion

The numerical dominance of S. alaskensis in its environment clearly demonstrates a successful cold adaptive strategy. The premise of the global metabolic labelling-based

L. Ting, UNSW. 187 Chapter 4

quantitative proteomics experiments was that temperature sensitive changes in protein abundance correspond to metabolic pathways, thus providing insight into its cold adaptation. It is clear that the quantitative protein profiles of S. alaskensis at 10ºC vs. 30ºC are different, and investigating the changes in relative protein abundance allows for the reconstruction of metabolic pathways with a role in cold adaptation. At 10ºC many temperature influenced changes in proteins were consistent with the findings and hypotheses in cold adaptation research. For example, the cell envelope undergoes restructuring and possible de novo unsaturated fatty acid biosynthesis, which is consistent with what is known regarding membrane adaptation to cold temperature. Also at low temperature, the structural components of transcription machinery and a number of transcription factors were increased in order to maintain transcriptional processes; which in turn, was expected to increase translational efficiency, processing and turnover. Transporters for phosphates, cobalamin, iron were increased, which is consistent with predicted lowered enzyme activity at low temperature. Finally, there were a large number of proteins with no clear function that appeared to have an important role in cold adaptation due to their large fold differences. A number of cold adaptive responses in S. alaskensis able to be inferred from the quantitative proteomics data were uncommon or unique. For example, protein folding by PPIase, GroESL, DnaK-DnaJ-GrpE and ClpB was important for the cell at low temperature. GroESL, DnaK-DnaJ-GrpE and ClpB are classically heat shock induced systems, and the cold function of these proteins are novel and merit further attention. Similarly, a stringent response may be influenced by the cold in S. alaskensis. This is uncommon because the stringent response is classically induced by nutritional deficiency, and requires further investigation. Lipid degradation was increased and the products are used as substrate for energy generation, which is more efficient than energy generation from carbohydrates; or be converted into PHA storage compounds, which are most likely slowly released as a carbon and energy store. The suggested immediate uptake of glucose and storage into PHA for later use is a unique feature of S. alaskensis and most likely an adaptation due to the compounded pressures of inhabiting a cold and nutrient depleted environment. A literature search did not reveal any other example of such a response. The quantitative MS data also revealed a range of possible carbohydrate specificities for a number of TonB-dependent receptors. The biosynthesis of EPS material was increased at low temperature, consistent with a role in

188 L. Ting, UNSW. Molecular mechanisms of cold adaptation in S. alaskensis

cryoprotection; however, the xylose component of the S. alaskensis EPS is novel and the de novo biosynthesis of xylose by bacteria has been reported in only two other bacterial species. The quantitative proteomics data revealed that different amino acid biosynthetic and catabolic enzymes were increased at 10ºC and 30ºC, reflecting the changing requirements of cellular processes that draw on the intracellular pool of free amino acids. The MS data also confirmed that the GS-GOGAT pathway of nitrogen assimilation is central to amino acid metabolism in S. alaskensis. Finally, a range of central metabolic processes such as carbohydrate metabolism and energy generation were significantly increased at 30ºC, which was clearly consistent with the increased energy and biomass requirements of a fast growing cell.

L. Ting, UNSW. 189

190 L. Ting, UNSW. General discussion

Chapter 5. General Discussion

5.1. Summary

The goal of this work was to examine the cold adaptation biology of S. alaskensis by performing a global quantitative proteomics comparison of protein profiles at 10ºC vs. 30ºC growth. The rationale of choice of these two temperatures reflected the in situ isolation temperature (10ºC) and that of common culturing conditions (30ºC). The experimental methodology was empirically evaluated and optimised for application to S. alaskensis in culture, and the post-experimental data processing and analysis was developed and optimised to give high quality and statistically confident quantitative data of the abundance of S. alaskensis proteins. The experimental and post-experimental developments in this work are methodological contributions to advancing the rigour of mass spectrometry investigations. The large body of data generated by this study required the development of a comprehensive analysis of the cell biology of cold adaptation. The biological insights gained in this work not only expands knowledge of S. alaskensis and the biology of cold adaptation, but also has the potential of being applicable to a range of other areas including ecology, biotechnology, biomedical applications and astrobiology.

5.2. Method development for a quantitative proteomics analysis of S. alaskensis

5.2.1. Developing a GeLC-MS/MS analytical platform for S. alaskensis

In order to extract as much information as possible from proteomics data, it is not only important to successfully identify and quantify proteins on a global scale, but also to rigorously test the significance of the data in order to achieve confident biological conclusions. An insight into protein abundances, under different conditions, gives information about the biological processes involved. As the current study is the first example of the application of quantitative proteomics to S. alaskensis, a range of experimental and post-experimental parameters were evaluated and optimised for a metabolic labelling-based quantitative proteomics analysis. Cell culture, protein extraction, GeLC-MS/MS analysis and post-experimental data processing were

L. Ting, UNSW. 191 Chapter 5

empirically optimised to give a standard proteomics workflow applicable to the microorganism. 15 It was established that 10 generations of growth in labelled NH4Cl medium was sufficient to achieve maximal (>99%) APE of the 15N label. A large cell culture volume (250mL) was required to generate sufficient protein yield for MS analysis. The extraction of proteins was performed in either a Tris or urea buffer containing the protease inhibitor, PMSF and chelating agent, EDTA. Sonication on ice for 4 and 2min was required for cell disruption in Tris or urea, respectively. A GeLC-MS/MS platform was determined to achieve more reliable results than LC-MS/MS, with respect to generating peptide and protein identifications; partially due to an extra dimension of protein separation by SDS-PAGE, which increased the resulting number of protein identifications. The experimental protocol for GeLC-MS/MS was optimised, including trypsin:protein ratios, peptide sample dilution in HCOOH/HFBA buffer, injection volume, LC gradient length, precursor ion scanning, and number and type of MSn scans using a 3D ion trap mass spectrometer and a hybrid linear ion trap/FTICR mass spectrometer. Furthermore, post-experimental identification and quantitation processing was optimised to increase data confidence while controlling the number of false negative and positives. A 1% FDR for protein identification was established for SEQUEST identification and DTA Select filtering, and a set of stringent quantitation parameters in RelEx were determined.

5.2.2. Improving data processing and analysis of quantitative proteomics data: Normalisation and statistics

The analysis of biological systems using a proteomics approach is a technology-driven science (Lee, 2001). The ability to identify, quantify or investigate proteins is dictated by the advances in proteomics technologies in data generation (i.e. experimental methodology), as well as post-experimental developments in rigorously processing the information to allow for confident conclusions to be made with respect to cellular processes. It is important to develop and optimise experimental and post-experimental methodologies specifically for each biological system investigated in order to fully utilise and interpret the generated data. Biological conclusions are commonly drawn from proteomics studies without

192 L. Ting, UNSW. General discussion

robust statistical validation. Avoiding false conclusions is critical when the high throughput proteomics data is not the experimental endpoint, but rather, the beginning for further experiments striving to answer biological questions (such as gene knock-out, protein over-expression and crystallisation, and clinical trial studies). More robust approaches of statistical analyses and validation of quantitative proteomics work are required. Currently, there is no generally accepted benchmark for a robust approach for the processing, analysis and statistical testing of quantitative proteomics data because of the analytical challenge that is faced when estimating the probability of differential protein abundance, while correctly accounting for an experimental design that can include label swapping, different extraction buffers, and biological and technical replicates. Several key publications have described a range of approaches for data normalisation (Chang et al., 2004; Kreil et al., 2004; Callister et al., 2006) and statistical hypothesis testing (Karp et al., 2005; Listgarten & Emili, 2005; Corzett et al., 2006; Zybailov et al., 2006; Chich et al., 2007; Karp et al., 2007; Xia et al., 2007; Choi & Nesvizhskii, 2008; Hill et al., 2008; Kall et al., 2008; Nedenskov et al., 2008; Oberg et al., 2008; Pavelka et al., 2008; Wong et al., 2008); however, the current study appears to be the first example of combining processing of data generated by metabolic labelling mass spectrometry in RelEx and R, linear modelling of the data to the experimental parameters, MA plots to gauge the necessity and extent of normalisation, and an empirical Bayes moderated t-test coupled with the Storey-Tibshirani FDR for statistical testing. The normalisation and statistical testing approach developed in this study provided a platform for rigorous data processing and confident evaluation of differential abundance in a high-throughput quantitative proteomics dataset. The approach is applicable to global quantitative proteomics analyses and contributes to the discussion and development of a more sophisticated protocol for processing high-throughput quantitative proteomics data. This is important in the context of using quantitative proteomics to discover changes in biological systems in response to changing growth conditions (e.g. treatment vs. control or disease vs. healthy). If confident inferences are to be made about cell biology, the experimental and post-experimental data processing and statistical testing must be accurate in order to achieve meaningful conclusions.

L. Ting, UNSW. 193 Chapter 5

5.2.3. Maintaining high data quality

A large amount of genome analysis was performed in the current study. Genes of interest (determined from the quantitative proteomics work) were manually functionally annotated using a range of publicly available bioinformatics software and ranked according to the in-house Evidence Rating system established in the Cavicchioli laboratory. This process of validating the auto-annotated gene descriptions ensured that the genes of interest were accurately described with respect to function, and allowed for a high quality and confident interpretation of the data. The manual genomic curation of S. alaskensis has a close relationship with the proteomics work in the current study; genomic analysis allowed for the elucidation of the genetic capacity of S. alaskensis, which, when combined with the information regarding protein expression and abundance, allowed for a more insightful interpretation of the global biology of the cell. In such a systems-biology approach, the data quality from all aspects of the study (i.e. genome annotation, expression and abundance data, bioinformatics processing, normalisation and statistics) affects the quality of the final interpretation of the study (Stead et al., 2008). The integration of genomics and proteomics with a view of maintaining high data quality has clearly given valuable insight into the cold adaptation of S. alaskensis.

5.3. A continuing story of S. alaskensis

Past research on S. alaskensis has revealed information on the general physiology and metabolism (Schut et al., 1993; Schut et al., 1995; Eguchi et al., 1996; Schut et al., 1997; Matallana-Surget et al., 2007; Williams et al., 2009), oligotrophic lifestyle (Fegatella et al., 1998; Fegatella et al., 1999; Fegatella & Cavicchioli, 2000; Ostrowski et al., 2004), hydrogen peroxide resistance (Ostrowski et al., 2001), UV resistance (Matallana-Surget et al., 2008), and genomics (Lauro et al., 2009) of the organism. The current study on cold adaptation adds to the growing body of knowledge of the microorganism. Previous MS-based publications on S. alaskensis used a 2D-PAGE approach coupled with LC-MS/MS (Fegatella et al., 1999; Ostrowski et al., 2004). These studies were performed prior to the sequencing of S. alaskensis, and proteins could only be identified based on cross-species matching. The current study is the first to apply a

194 L. Ting, UNSW. General discussion

label-based quantitative proteomics approach for the analysis of protein abundance. Furthermore, as the genome was sequenced in 2004, it was possible to perform precise organism-specific protein identifications.

5.3.1. S. alaskensis and unravelling the mechanisms of cold adaptation

The aim of the global metabolic labelling-based quantitative proteomics experiments was to determine temperature sensitive changes in protein abundance that corresponded to the cellular processes relevant in cold adaptation. The quantitative protein profiles at 10ºC vs. 30ºC are clearly distinct and represent all aspects of cell biology. Investigating the changes in protein abundances has also allowed for the reconstruction of metabolic pathways in S. alaskensis, which is a contribution to the expanding knowledge of the molecular mechanisms of cold adaptation and biotic processes in cold marine environments. 5.3.1.1. Expected thermal adaptive responses Many strategies used by S. alaskensis were consistent with the other findings and hypotheses in cold adaptation research. For example, the cell envelope undergoes restructuring, possible de novo unsaturated fatty acid biosynthesis and carries out EPS production at cold temperature. However, the desaturase enzymes were not identified and quantified in the MS experiments, suggesting thatdesaturases may be more responsive when the cells experience cold shock. Future work could include a more detailed inspection of the three S. alaskensis desaturase enzymes and the fatty acid composition of the membrane at low vs. high growth temperatures to confirm the MS results. Furthermore, continuing work on the membrane and membrane-associated components of S. alaskensis could also include targeted analyses of EPS at low and high growth temperatures to confirm the quantitative proteomics data. Another interesting focus could be to elucidate the nucleotide sugar constituents, or the effect of EPS mutants on low temperature growth. Finally, a more detailed comparative genomics investigation into the rarity of microbial de novo xylose biosynthesis could reveal interesting insights into microbial physiology. At low temperature, the structural components of transcription machinery and a number of transcription factors were increased. This response most likely compensates for decreased transcriptional, processing and turnover efficiency due to the cold. The efficiency of translation was also increased at low temperature, which was demonstrated

L. Ting, UNSW. 195 Chapter 5

by the increase of ‘cold shock degradosome’ proteins responsible for increased mRNA processing, and a range of proteins associated with increasing translation efficiency. The data confirm that transcription and translation are crucial biotic processes that must be maintained when the growth temperature drops close to, or below, the enzyme activation threshold. Almost a quarter of proteins with significant abundance changes had no clear function, but had evidently important roles in cold adaptation due to their large fold differences between the two culture conditions. These proteins had only general functional assignments, often only at motifs or domains, or no functional assignments (i.e. hypothetical proteins). There would be obvious benefit in subjecting these proteins to further investigations to further understand of the mechanisms of cold adaptation by either improving current models or possibly revealing unique cold adaptation strategies. Some experimental approaches could include performing mutagenesis of selected genes and monitoring the flow-on effects; expressing and purifying the unknown recombinant protein to study for biophysical function and structure; or performing complementation studies in E. coli cold-sensitive mutants. At 30ºC, a range of central metabolic processes such as carbohydrate metabolism and energy generation were significantly increased, which was clearly consistent with the increased energy and biomass requirements of a fast growing cell. In addition, a number of proteins likely to be indicators of stress were also increased during 30ºC growth. The data clearly demonstrated that the temperatures near Topt (35-40ºC) are stressful, and do not accurately convey the physiological adaptations of an organism in context to its natural environment. 5.3.1.2. Novel cold adaptive responses A number of cold adaptive responses in S. alaskensis inferred from the quantitative proteomics data were novel. For example, protein folding by PPIase, GroESL, DnaK- DnaJ-GrpE and ClpB was found to be important for the cell at low temperature. GroESL, DnaK-DnaJ-GrpE and ClpB are classically heat shock-induced systems, and the cold function of these proteins appear to be unusual since they have been documented in only a few organisms (Lelivelt & Kawula, 1995; Liu et al., 2002; Ferrer et al., 2003; Yoshimune et al., 2005; White-Ziegler et al., 2008). The expanding role of classical heat shock protein folding systems in cold shock or adaptation would benefit from further attention. The combination of PPIase, GroESL, DnaK-DnaJ-GrpE and

196 L. Ting, UNSW. General discussion

ClpB in cold protein folding has not previously been reported and may be an interesting avenue for further study. In addition, the future examination of the subunits and stoichiometry of the GroESL complex would be valuable in determining the exact nature of its composition in S. alaskensis, which would assist in understanding the reason for multiple GroESL sets. A possible stringent response appears to be controlled by low temperature in S. alaskensis, as evidenced by the increase of guanosine-5'-triphosphate,3'-diphosphate diphosphatase. This is unusual because the stringent response is classically induced by nutritional deficiency. Usually, upon some form of starvation (e.g. amino acid), it causes a lag in transcription and translation (Chatterji & Ojha, 2001); however, S. alaskensis has been shown to lack a lag phase upon starvation, leading to the hypothesis that it lacks a stringent response (Fegatella & Cavicchioli, 2000). A temperature-controlled stringent response has been observed in several other organisms (Yang & Ishiguro, 2003; Liu et al., 2006; Lauro et al., 2008), but is yet to be elucidated. It is possible that a temperature-controlled stringent response is very different to a nutritional stringent response; further investigation would be required to reveal the control and mechanisms of a cold-induced stringent response. Future investigation of the possible S. alaskensis temperature-controlled stringent response could include direct measurement of (p)ppGpp levels in the cell, and monitoring of downstream metabolic pathways affected by an increase in (p)ppGpp. It would be particularly interesting to elucidate the connection of (p)ppGpp and growth lag in a temperature-controlled stringent response – perhaps the lack of growth lag in S. alaskensis due to the unique uncoupling of protein synthesis and ribosome number is an overriding factor. Finally, since the cold adapted L. monocytogenes and P. profundum also display thermally- controlled stringent responses, it would be of interest to compare the cold induced stringent response of these organisms to S. alaskensis. Lipid degradation, where the products of -oxidation most likely entered energy generation pathways, or were converted into PHA storage compounds, was increased at low temperature growth. PHA stores are most likely slowly released as a carbon and energy sourceto supply metabolic demands. These outcomes are consistent with the study by Schut et al. (1993), where S. alaskensis was demonstrated to incorporate uptaken glucose into a polysaccharide instead of metabolising it for energy and biomass substrate. If this is the case, then intracellular PHA would be required to supply carbon

L. Ting, UNSW. 197 Chapter 5

instead of glucose. This appears to be a unique feature of S. alaskensis and most likely an adaptation due to the compound pressures of inhabiting a cold and nutrient depleted environment. It should be noted, however, that S. alaskensis prefers amino acids as a source of carbon, rather than glucose; a reflection of the available growth substrates in the nutrient depleted environment from which it was isolated (Schut et al., 1993; Schut et al., 1997). Glucose is a rich source of carbon and energy, while free amino acids are relatively poor. It is possible that glucose is too ‘rich’ for S. alaskensis, resulting in its storage as a polysaccharide, while the supply of carbon and energy is derived from PHA reserves. The hypothesised mechanism is poorly defined; a literature search did not reveal any other example of such a response in other organisms, which warrants further investigation into the unique metabolism of S. alaskensis. Continuing work could include a targeted analysis of the -oxidation pathway. The downstream metabolic pathways that the products of the -oxidation feed into would be an interesting focus, considering that the quantitative proteomics data show a link between fatty acid degradation and energy generation. Future work to investigate PHAs could include a direct measurement of PHA during cold vs. high growth temperatures. Furthermore, a large-scale comparison of the PHA capabilities of cold adapted organisms would be valuable in determining if PHA accumulation is a common characteristic. The quantitative MS data also revealed that transporters for phosphates (Pi), cobalamin, iron and copper resistance were increased to compensate for lowered enzyme activity at low temperature. Most of these transport systems were TonB- dependent receptors or similar to TonB-dependent receptors. This lead to the detailed investigation of the over-representation of TonB-dependent receptors in the S. alaskensis genome, which resulted in the prediction thata number of these receptors are likely to be specific for carbohydrates. The non-iron and non-cobalamin uptake role of TonB-dependent receptors is slowly expanding, and the current study has contributed to revealing the full extent of TonB-dependent receptors for substrate uptake in bacteria. Future work could include a thorough examination of the 43 TonB-dependent receptors in S. alaskensis and their specificities by genomic inspection and experimental probing. It would be beneficial to understand the regulation of these receptors, and it is anticipated that only a small proportion would be controlled by iron or cobalamin. Additionally, a cross-species investigation would be constructive in understanding the

198 L. Ting, UNSW. General discussion

commonalities between organisms with high vs. low numbers of TonB-dependent receptors, which is relevant to their ecological significance.

5.4. Significance of this research

Aside from the direct contribution to expanding knowledge on cold adaptation, the insights from this project are also relevant to biotechnological and biomedical applications, understanding the ecology of the marine environment and its contribution to global carbon cycles, and the search for life outside of earth in astrobiological research and surveys.

5.4.1. Contribution to biotechnology and biomedical applications

Cold adapted organisms possess a range of characteristics that are desirable for biotechnological exploitation. These features include protein flexibility, the high catalytic activity of enzymes at low temperatures and low thermostability at high temperatures (Cavicchioli et al., 2002). The biotechnological potential of these characteristics include the shortening of process or incubation times, a reduction in energy consumption, prevention of the loss of volatile compounds, improving the performance of reactions that include thermolabile compounds and a reduction in the risk of contamination (Margesin et al., 2007). Current examples of biotechnological applications of cold active enzymes include their use in detergents, antibacterial agents, cosmetics, biotransformation, molecular biology, fermentation, food storage, the dairy industry (e.g. cheese ripening, lactose removal from milk, improving the quality of ice cream), the wine industry (e.g. fermentation), and the textiles industry (e.g. desizing denim, water treatment) (reviewed in Cavicchioli et al., 2002). As a result of the current study, a specific example of the potential exploitation of S. alaskensis in biotechnology involves PHA compounds. This study has demonstrated that there is a significantly increased production of PHA in S. alaskensis during cold temperature growth. Microbial PHAs are of particular interest because of their biodegradability and non-petrochemical origin (Zhao et al., 2003; Hazer & Steinbüchel, 2007). Metabolic engineering strategies have been explored in constructing microbial plastic ‘factories’ to produce bioplastics with desirable structures and properties (Aldor & Keasling, 2003). These include external substrate manipulation, inhibitor addition, recombinant gene expression, host cell genome manipulation and,

L. Ting, UNSW. 199 Chapter 5

most recently, protein engineering of PHA biosynthetic enzymes (Aldor & Keasling, 2003). A recent study compared the cost and environmental load of PHA biopolymers against petrochemical polymers and found that the energy consumption and CO2 emissions of bio-based polymers were markedly lower than typical petrochemical polymers (Akiyama et al., 2002). The use of low cost raw materials or direct integration of PHA polymer production with milling or processing plants responsible for producing the required raw materials would significantly reduce the costs of producing microbial PHA biopolymers. This provides a rationale for further research to develop a commercially viable bioplastic alternative in the development of environmentally sustainable practice. PHAs are also of biomedical interest not only because of their biodegradability, but also due to their biocompatibility. Microbial PHAs have been investigated in the development of scaffolds for tissue engineering in a large range of cells including osteoblasts (Wang et al., 2004), chondrocytes (Deng et al., 2002), fibroblasts (Wang et al., 2005), heart valve cells (Sodian et al., 2000), and bone marrow stromal cells (Yang et al., 2004). Recently, a US patent was issued for the development of PHA-based medical devices such as sutures, screws, bone plates, surgical mesh, repair patches, slings, cardiovascular patches, orthopaedic pins, adhesion barriers, nerve guides, skin substitutes, bone graft substitutes, and wound dressings (Williams et al., 2002). The biosynthesis of EPS material containing xylose was increased at low temperature. The xylose component of the S. alaskensis EPS is uncommon because the de novo biosynthesis of xylose by bacteria has been reported in only two other bacterial species. Xylose is the substrate for producing xylitol – a popular sugar-substitute recommended for its lower calories, ability to inhibit dental caries and absorption in the human gut without involving insulin (Trahan, 1995). Xylose is mainly produced industrially by treating wood-derived hemicellulose or agricultural wastes such as wheat bran with acids, enzymes, high temperature and (Maloney et al., 1987; Sanjust et al., 2004). It may be possible that a bacterial-based production of xylose is a more economical option than the current approach using plant-based starting material. These current and potential developments are evidence of the importance of exploiting microbial life in the application of biotechnological and biomedical developments. With the current global dependence on a limited pool of exponentially

200 L. Ting, UNSW. General discussion

diminishing fossil fuels, one approach for a solution is to establish a renewable non- fossil fuel-dependent society.

5.4.2. Ecological significance

As described in the introduction, the ocean is the largest biosphere on Earth and is colonised by between 60-90% of all of Earth’s microorganisms (Whitman et al., 1998). S. alaskensis was isolated from a cold marine environment, which is significant when considering that ~90% of the Earth’s oceans have a temperature of 5°C or less (Russell, 1990). The estimated 3.6 x 1028 microorganisms in the open ocean account for ~3 x 1017g of carbon, which is ~55% of the estimated total carbon stored in plants (Whitman et al., 1998). The ocean is the largest carbon sink on Earth and accounts for approximately one third of anthropogenic CO2 (Siegenthaler & Sarmiento, 1993; Holligan & Robertson, 1996; Sabine et al., 2004). The ocean is estimated to sequester 17 ~1.2 x 10 g of carbon either in the form of CO2 absorption by the ocean or by photosynthesis (Sabine et al., 2004). Non-photosynthetic heterotrophic microbes, such as S. alaskensis, are also important in the consideration of the carbon cycle with regards to their role in the accumulation, export, remineralisation and transformation of the world’s largest pool of organic carbon (Cavicchioli et al., 2003).

The uptake of atmospheric CO2 is dependent upon the inorganic and organic oceanic carbon cycle, thus giving rise to the question as to whether ocean biota have a significant influence over CO2 sequestration; and whether this influence is a direct result of the effects of enhanced CO2 levels on biological processes, or indirectly due to the ecological consequences of global change and increasing CO2 (Siegenthaler & Sarmiento, 1993; Holligan & Robertson, 1996). Understanding the effect of the atmospheric increase of CO2 due to anthropogenic activity – such as the burning of fossil fuels – is important for predicting the carbon cycle and setting carbon budgets. In order to envisage the effects of anthropogenic CO2, a better understanding is required of all the processes that determine the flux of carbon between the atmosphere and the ocean. For example, the high concentration of EPS material of microbial origin in Arctic sea ice was detected at sufficiently high levels to recommend its inclusion into the Arctic carbon budget (Krembs et al., 2002). It was clearly demonstrated that at 10ºC, EPS material was significantly increased in S. alaskensis. If the ecological abundance of S. alaskensis is considered in context of the environment from which it

L. Ting, UNSW. 201 Chapter 5

was isolated, it is reasonable to infer that a significant proportion of carbon is captured in the EPS in addition to the carbon inside the cell. Thus, its contribution to the carbon budget of cold open oceans may be more than projected due to the production of previously undiscovered EPS material.

Finally, it is significant to note that CO2 absorption is a temperature dependent process. Not only is it important to carefully account for the contribution of marine microorganisms to the global carbon cycle, but also to consider how the temperature of each environment affects the diversity of the microbial inhabitants, its effect on all biotic processes and how this subsequently affects CO2 absorption.

5.4.3. Searching for extra-terrestrial life: An insight into astrobiology

Astrobiology addresses three core questions; how does life begin and evolve, does life exist elsewhere in the universe, and what is the future of life on Earth and beyond (Des Marais et al., 2003). The insight gained in the current study contributes knowledge to answering the second question. In order to identify locations in the Solar System and beyond able to sustain life, it is necessary to understand the fundamental environmental requirements for habitability and the limits to which life can exist, and the only model available for study, is life on Earth. The diversity of life on Earth today is a result of the dynamic interplay between genetic opportunity, metabolic capability, and environmental challenges. While all microorganisms are composed of nearly identical genes and molecules, evolution has enabled microbes to cope with a wide range of physical and chemical conditions. Extremophiles are particularly relevant for study because many microbes can survive and flourish in seemingly harsh conditions such as thermal extremes, intense radiation, very high or low pH, nutritional limitation and high hydrostatic pressure. An understanding of the adaptability of life on Earth, as well as an understanding of the molecular systems that some organisms utilise to survive such extremes, will provide a critical foundation for the search for life beyond Earth. These insights will help us to understand the molecular adaptations that define the physical and chemical limits for life on Earth. They will provide a baseline for developing predictions and hypotheses about extraterrestrial life (Cavicchioli, 2002; Des Marais et al., 2003). With respect to the cold adaptation focus of this study, understanding how life on Earth adapts to cold environments is crucial in the search for and understanding of

202 L. Ting, UNSW. General discussion

extra-terrestrial life in cold environments. The key candidates in the search for life include Mars, Europa (a moon of Jupiter with a liquid water ocean under an ice crust), Enceladus (a geologically active moon of Saturn with subsurface liquid water also under an ice crust), and Titan (a moon of Saturn with hydrocarbon lakes); which are all permanently cold environments with conditions that can theoretically support life (Kempe & Kazmierczak, 2002; McKay et al., 2008; Raulin, 2008). The insights gained from this study on the cold adaptive mechanisms of a psychrophilic marine bacterium contribute to the body of knowledge on the extreme conditions in which life can survive and the molecular adaptations required to maintain growth in the cold. Studying life in extreme environments expands the understanding of the limits to which life can exist and persist. The determination of the limits of life on Earth is essential for the interdisciplinary study of astrobiology, where building knowledge on the fundamentals of extreme life results in increasingly precise predictions of extra-terrestrial environments that can support life.

5.5. Conclusion

This project has demonstrated the amenability of S. alaskensis to a metabolic labelling- based quantitative GeLC-MS/MS proteomics analysis. The work has contributed to a methodological advance in experimental design, post-experimental data normalisation and statistical testing of large proteomics datasets. Importantly, valuable insights have been gained with respect to the molecular adaptations required for adaptation to low growth temperature. Furthermore, increasing knowledge of microbial adaptation to the cold is directly applicable to expanding our knowledge of life and its application in biotechnology and biomedical sciences, ecology and the future of the environment, astrobiology and the search for extraterrestrial life.

L. Ting, UNSW. 203

References

6. References

Aebersold, R. & Goodlett, D. R. (2001). Mass spectrometry in proteomics. Chem Rev 101:269-296. Aebersold, R. & Mann, M. (2003). Mass spectrometry-based proteomics. Nature 422:198-207. Aebi, C., Stone, B., Beucher, M., Cope, L. D., Maciver, I., Thomas, S. E., McCracken, G. H., Sparling, P. F. & Hansen, E. J. (1996). Expression of the CopB outer membrane protein by Moraxella catarrhalis is regulated by iron and affects iron acquisition from transferrin and lactoferrin. Infect Immun 64:2024-2030. Agafonov, D. E., Kolb, V. A. & Spirin, A. S. (2001). Ribosome-associated protein that inhibits translation at the aminoacyl-tRNA binding stage. EMBO rep 2:399. Aggarwal, K., Choe, L. & Lee, K. (2006). Shotgun proteomics using the iTRAQ isobaric tags. Brief Funct Genomic Proteomic 5:112-120. Allen, M. A., Thomas, T., Burg, D., Williams, T. J., Siddiqui, K. S., Francisci, D. D., Chong, K. W. Y., Pilak, O., Chew, H. H., Maere, M. Z. D., Lauro, F. M., Ting, L., Katrib, M., Ng, C., Sowers, K. R., Anderson, I. J., Ivanova, N., Dalin, E., Martinez, M., Lapidus, A., Hauser, L., Land, M. & Cavicchioli, R. (2009). The genome sequence of the psychrophilic archaeon, Methanococcoides burtonii: the role of genome evolution in cold adaptation. ISME J 3:1012-1035. America, A., Cordewener, J., van Geffen, M., Lommen, A., Vissers, J., Bino, R. & Hall, R. (2006). Alignment and statistical difference analysis of complex peptide data sets generated by multidimensional LC-MS. Proteomics 6:641-653. Anderle, M., Roy, S., Lin, H., Becker, C. & Joho, K. (2004). Quantifying reproducibility for differential proteomics: Noise analysis for protein liquid chromatography-mass spectrometry of human serum. Bioinformatics 20:3575-3582. Andreev, V. P., Li, L., Rejtar, T., Li, Q., Ferry, J. G. & Karger, B. L. (2006). New algorithm for 15N/14N quantitation with LC-ESI-MS using an LTQ-FT mass spectrometer. J Proteome Res 5:2039- 2045. Angelidis, A. S. & Smith, G. M. (2003). Role of the glycine betaine and carnitine transporters in adaptation of Listeria monocytogenes to chill stress in defined medium. Appl Environ Microbiol 69:7492- 7498. Araki, T. (1991). The effect of temperature shifts on protein synthesis by the psychrophilic bacterium Vibrio sp. strain ANT-300. J Gen Microbiol 137:817-826. Ashline, D., Singh, S., Hanneman, A. & Reinhold, V. (2005). Congruent strategies for carbohydrate sequencing. 1. Mining structural details by MSn. Anal Chem 77:6250. Atwater, J., Wisdom, R. & Verma, I. (1990). Regulated mRNA stability. Annu Rev Genet 24:519-541. Bachmair, A., Finley, D. & Varshavsky, A. (1986). In vivo half-life of a protein is a function of its amino-terminal residue. Science 234:179-186. Baggerly, K., Morris, J., Wang, J., Gold, D., Xiao, L. & Coombes, K. (2003). A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples: Mining MALDI-TOF data. Proteomics 3:1667-1672. Bakalarski, C. E., Elias, J. E., Villen, J., Haas, W., Gerber, S. A., Everley, P. A. & Gygi, S. P. (2008). The impact of peptide abundance and dynamic range on stable-isotope-based quantitative proteomic analyses. J Proteome Res 7:4756-4765. Bantscheff, M., Schirle, M., Sweetman, G., Rick, J. & Kuster, B. (2007). Quantitative mass spectrometry in proteomics: A critical review. Anal Bioanal Chem 389:1017-1031. Bar-Peled, M., Griffith, C. L. & Doering, T. L. (2001). Functional cloning and characterization of a UDP- glucuronic acid decarboxylase: The pathogenic fungus Cryptococcus neoformans elucidates UDP- xylose synthesis. Proc Natl Acad Sci USA 98:12003-12008. Barnett, M. E., Zolkiewska, A. & Zolkiewski, M. (2000). Structure and Activity of ClpB from Escherichia coli. Role of the amino- and carboxyl-terminal domains. J Biol Chem 275:37565-37571. Batrakov, S. G., Sheichenko, V. I. & Nikitin, D. I. (1999). A novel from Gram- negative aquatic bacteria. Biochim Biophys Acta, Mol Cell Biol Lipids 1440:163-175. Baudouin-Cornu, P., Schuerer, K., Marliere, P. & Thomas, D. (2004). Intimate evolution of proteins: Proteome atomic content correlates with genomes base composition. J Biol Chem 279:5421-5428. Bayles, D. O., Annous, B. A. & Wilkinson, B. J. (1996). Cold stress proteins induced in Listeria monocytogenes in response to temperature downshock and growth at low temperatures. Appl Environ Microbiol 62:1116-1119. Beckering, C. L., Steil, L., Weber, M. H. W., Volker, U. & Marahiel, M. A. (2002). Genomewide

L. Ting, UNSW. 205 References

transcriptional analysis of the cold shock response in Bacillus subtilis. Am Soc Microbiol 184:6395-6402. Belle, A., Tanay, A., Bitincka, L., Shamir, R. & O'Shea, E. K. (2006). Quantification of protein half- lives in the budding yeast proteome. Proc Natl Acad Sci USA 103:13004-13009. Ben-Zvi, A. P. & Goloubinoff, P. (2001). Review: Mechanisms of disaggregation and refolding of stable protein aggregates by molecular chaperones. J Struct Biol 135:84-93. Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 57:289-300. Bennion, B. J. & Daggett, V. (2003). The molecular basis for the chemical denaturation of proteins by urea. Proc Natl Acad Sci USA 100:5142-5147. Beran, R. K. & Simons, R. W. (2001). Cold-temperature induction of Escherichia coli polynucleotide phosphorylase occurs by reversal of its autoregulation. Mol Microbiol 39:112-125. Berger, F., Morellet, N., Menu, F. & Potier, P. (1996). Cold shock and cold acclimation proteins in the psychrotrophic bacterium Arthrobacter globiformis SI55. J Bacteriol 178:2999-3007. Beynon, R. J. & Pratt, J. M. (2005). Metabolic labeling of proteins for proteomics. Mol Cell Proteomics 4:857-872. Bindschedler, L. V., Wheatley, E., Gay, E., Cole, J., Cottage, A. & Bolwell, G. P. (2005). Characterisation and expression of the pathway from UDP-glucose to UDP-xylose in differentiating tobacco tissue. Plant Mol Biol 57:285-301. Biron, D. G., Brun, C., Lefevre, T., Lebarbenchon, C., Loxdale, H. D., Chevenet, F., Brizard, J. P. & Thomas, F. (2006). The pitfalls of proteomics experiments without the correct use of bioinformatics tools. Proteomics 6:5577-5596. Blackburn, N. T. & Clarke, A. J. (2001). Identification of four families of peptidoglycan lytic transglycosylases. J Mol Evol 52:78-84. Blanvillain, S., Meyer, D., Lauber, E. & Arlat, M. (2007). Plant carbohydrate scavenging through TonB-dependent receptors: A feature shared by phytopathogenic and aquatic bacteria. PLoS ONE 2:e224. Blondeau, F., Ritter, B., Allaire, P., Wasiak, S., Girard, M., Hussain, N., Angers, A., Legendre- Guillemin, V., Roy, L. & Boismenu, D. (2004). Tandem MS analysis of brain clathrin-coated vesicles reveals their critical involvement in synaptic vesicle recycling. Proc Natl Acad Sci USA 101:3833-3838. Blumer, C. & Haas, D. (2000). Mechanism, regulation, and ecological role of bacterial cyanide biosynthesis. Arch Microbiol 173:170-177. Boguski, M. S. & McIntosh, M. W. (2003). Biomedical informatics for proteomics. Nature 422:233- 237. Bondarenko, P., Chelius, D. & Shaler, T. (2002). Identification and relative quantitation of protein mixtures by enzymatic digestion followed by capillary reversed-phase liquid chromatography-tandem mass spectrometry. Anal Chem 74:4741-4749. Boonyaratanakornkit, B. B., Simpson, A. J., Whitehead, T. A., Fraser, C. M., El-Sayed, N. M. A. & Clark, D. S. (2005). Transcriptional profiling of the hyperthermophilic methanarchaeon Methanococcus jannaschii in response to lethal heat and non-lethal cold shock. Environ Microbiol 7:789-797. Bourot, S., Sire, O., Trautwetter, A., Touze, T., Wu, L. F., Blanco, C. & Bernard, T. (2000). Glycine betaine-assisted protein folding in a lysA mutant of Escherichia coli. J Biol Chem 275:1050-1056. Bouyssie, D., de Peredo, G., Mouton, E., Albigot, R., Roussel, L., Ortega, N., Cayrol, C., Burlet- Schiltz, O., Girard, J. P. & Monsarrat, B. (2007). MFPaQ, a new software to parse, validate, and quantify proteomic data generated by ICAT and SILAC mass spectrometric analyses: Application to the proteomic study of membrane proteins from primary human endothelial cells. Mol Cell Proteomics:T600069. Bradford, M. M. (1976). A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72:248-254. Bragg, J. G. & Hyder, C. L. (2004). Nitrogen versus carbon use in prokaryotic genomes and proteomes. Proc. R. Soc. B 271:S374. Brandl, H., Gross, R. A., Lenz, R. W. & Fuller, R. C. (1988). Pseudomonas oleovorans as a source of poly (beta-hydroxyalkanoates) for potential applications as biodegradable polyesters. Appl Environ Microbiol 54:1977-1982. Brennan, T. V., Anderson, J. W., Jia, Z., Waygood, E. B. & Clarke, S. (1994). Repair of spontaneously deamidated HPr phosphocarrier protein catalyzed by the L-isoaspartate-(D-aspartate) O- methyltransferase. J Biol Chem 269:24586-24595. Budde, I., Steil, L., Scharf, C., Volker, U. & Bremer, E. (2006). Adaptation of Bacillus subtilis to growth at low temperature: a combined transcriptomic and proteomic appraisal. Microbiology 152:831- 853. Butler, Y. X., Abhayawardhane, Y. & Stewart, G. C. (1993). Amplification of the Bacillus subtilis maf

206 L. Ting, UNSW. References

gene results in arrested septum formation. J Bacteriol 175:3139-3145. Cairns, D. A., Thompson, D., Perkins, D. N., Stanley, A. J., Selby, P. J. & Banks, R. E. (2008). Proteomic profiling using mass spectrometry - does normalising by total ion current potentially mask some biological differences? Proteomics 8:21-27. Caldas, T., Laalami, S. & Richarme, G. (2000). Chaperone properties of bacterial elongation factor EF- G and initiation factor IF2. J Biol Chem 275:855-860. Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W., Webb-Robertson, B. J. M., Smith, R. D. & Lipton, M. S. (2006). Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. J Prot Res 5:277-286. Cannataro, M. (2008). Computational proteomics: Management and analysis of proteomics data. Brief Bioinform 9:97-101. Carr, S., Aebersold, R., Baldwin, M., Burlingame, A., Clauser, K. & Nesvizhskii, A. (2004). The need for guidelines in publication of peptide and protein identification data: Working group on publication guidelines for peptide and identification data. Mol Cell Proteomics 3:531-533. Cavicchioli, R. (2006). Cold-adapted archaea. Nat Rev Microbiol 4:331-343. Cavicchioli, R., Ostrowski, M., Fegatella, F., Goodchild, A. & Guixa-Boixereu, N. (2003). Life under nutrient limitation in oligotrophic marine environments: An eco/physiological perspective of Sphingopyxis alaskensis (formerly Sphingomonas alaskensis). Microb Ecol 46:249-256. Cavicchioli, R., Thomas, T. & Curmi, P. M. G. (2000). Cold stress response in Archaea. Extremophiles 4:321-331. Celis, J. E. (2004). Gel-based proteomics: What does MCP expect? Mol Cell Proteomics 3:949. Cha, J. & Cooksey, D. A. (1991). Copper resistance in Pseudomonas syringae mediated by periplasmic and outer membrane proteins. Proc Natl Acad Sci USA 88:8915-8919. Chamot, D., Magee, W. C., Yu, E. & Owttrim, G. W. (1999). A cold shock-induced Cyanobacterial RNA helicase. J Bacteriol 181:1728-1732. Chamot, D. & Owttrim, G. W. (2000). Regulation of cold shock-induced RNA helicase gene expression in the Cyanobacterium Anabaena sp. strain PCC 7120. J Bacteriol 182:1251-1256. Chandu, D. & Nandi, D. (2004). Comparative genomics and functional roles of the ATP-dependent proteases Lon and Clp during cytosolic protein degradation. Res Microbiol 155:710-719. Chang, J., Remmen, H. V., Ward, W. F., Regnier, F. E., Richardson, A. & Cornell, J. (2004). Processing of data generated by 2-dimensional gel electrophoresis for statistical analysis: missing data, normalization, and statistics. J Proteome Res 3:1210-1218. Charollais, J., Pflieger, D., Vinh, J., Dreyfus, M. & Iost, I. (2003). The DEAD-box RNA helicase SrmB is involved in the assembly of 50S ribosomal subunits in Escherichia coli. Mol Microbiol 48:1253- 1265. Chatterji, D., Fujita, N. & Ishihama, A. (1998). The mediator for stringent control, ppGpp, binds to the beta-subunit of Escherichia coli RNA polymerase. Genes Cells 3:279-287. Chatterji, D. & Ojha, A. K. (2001). Revisiting the stringent response, ppGpp and starvation signaling. Curr Opin Microbiol 4:160-165. Chelius, D. & Bondarenko, P. V. (2002). Quantitative profiling of proteins in complex mixtures using liquid chromatography and mass spectrometry. J Proteome Res 1:317-323. Chen, G., Gharib, T. G., Huang, C. C., Taylor, J. M. G., Misek, D. E., Kardia, S. L. R., Giordano, T. J., Iannettoni, M. D., Orringer, M. B. & Hanash, S. M. (2002). Discordant protein and mRNA expression in lung adenocarcinomas. Mol Cell Proteomics 1:304-313. Chen, R., Yi, E., Donohoe, S., Pan, S., Eng, J., Cooke, K., Crispin, D., Lane, Z., Goodlett, D. & Bronner, M. (2005). Pancreatic cancer proteome: The proteins that underlie invasion, metastasis, and immunologic escape. Gastroenterology 129:1187-1197. Chich, J.-F., David, O., Villers, F., Schaeffer, B., Lutomski, D. & Huet, S. (2007). Statistics for proteomics: Experimental design and 2-DE differential analysis. J Chromatogr B 849:261-272. Cho, S., Goodlett, D. & Franzblau, S. (2006). ICAT-based comparative proteomic analysis of non- replicating persistent Mycobacterium tuberculosis. Tuberculosis 86:445-460. Choe, L., D’Ascenzo, M., Relkin, N., Pappin, D., Ross, P., Williamson, B., Guertin, S., Pribil, P. & Lee, K. (2007). 8-Plex quantitation of changes in cerebrospinal fluid protein expression in subjects undergoing intravenous immunoglobulin treatment for Alzheimer’s disease. Proteomics 7:3651–3660. Chuang, M. H., Wu, M. S., Lo, W. L., Lin, J. T., Wong, C. H. & Chiou, S. H. (2006). The antioxidant protein alkylhydroperoxide reductase of Helicobacter pylori switches from a peroxide reductase to a molecular chaperone function. Proc Natl Acad Sci USA 103:2552-2557. Conrads, T. P., Alving, K., Veenstra, T. D., Belov, M. E., Anderson, G. A., Anderson, D. J., Lipton, M. S., Pasa-Tolic, L., Udseth, H. R., Chrisler, W. B., Thrall, B. D. & Smith, R. D. (2001).

L. Ting, UNSW. 207 References

Quantitative analysis of bacterial and mammalian proteomes using a combination of cysteine affinity tags and 15N-metabolic labeling. Anal Chem 73:2132-2139. Corsaro, M. M., Lanzetta, R., Parrilli, E., Parrilli, M., Tutino, M. L. & Ummarino, S. (2004). Influence of growth temperature on lipid and phosphate contents of surface polysaccharides from the Antarctic bacterium Pseudoalteromonas haloplanktis TAC 125. J Bacteriol 186:29-34. Corzett, T. H., Fodor, I. K., Choi, M. W., Walsworth, V. L., Chromy, B. A., Turteltaub, K. W. & McCutchen-Maloney, S. L. (2006). Statistical analysis of the experimental variation in the proteomic characterization of human plasma by two-dimensional difference gel electrophoresis. J Proteome Res 5:2611-2619. Cox, J. & Mann, M. (2008). MaxQuant enables high peptide identification rates, individualized ppb- range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26:1367-1372. Craig, R., Cortens, J. & Beavis, R. (2005). The use of proteotypic peptide libraries for protein identification. Rapid Commun Mass Spectrom 19:1844–1850. Cunin, R., Glansdorff, N., Pierard, A. & Stalon, V. (1986). Biosynthesis and metabolism of arginine in bacteria. Microbiol Mol Biol Rev 50:314-352. D'Amico, S., Claverie, P., Collins, T., Gerolette, D., Gratia, E., Hoyoux, A., Meuwis, M. A., Feller, G. & Gerday, C. (2002). Molecular basis of cold adaptation. Philos Trans R Soc Lond B Biol Sci 357:917-925. D'Amico, S., Collins, T., Marx, J.-C., Feller, G. & Gerday, C. (2006). Psychrophilic microorganisms: challenges for life. EMBO rep 7:385-389. Daly, D. S., Anderson, K. K., Panisko, E. A., Purvine, S. O., Fang, R., Monroe, M. E. & Baker, S. E. (2008). Mixed-effects statistical model for comparative LC-MS proteomics studies. J Proteome Res. Dawes, E. A. & Senior, P. J. (1973). The role and regulation of energy reserve polymers in micro- organisms. Adv Microb Physiol 10:135-266. Decho, A. W. (2000). Microbial biofilms in intertidal systems: an overview. Cont Shelf Res 20:1257- 1273. Dersch, P., Kneip, S. & Bremer, E. (1994). The nucleoid-associated DNA-binding protein H-NS is required for the efficient adaptation of Escherichia coli K-12 to a cold environment. Mol Genet Genomics 245:255-259. Dijkstra, A. J. & Keck, W. (1996). Identification of new members of the lytic transglycosylase family in Haemophilus influenzae and Escherichia coli. Microb Drug Resist 2:141-145. Dougan, D. A., Reid, B. G., Horwich, A. L. & Bukau, B. (2002). ClpS, a substrate modulator of the ClpAP machine. Mol Cell 9:673-683. Drobnis, E. Z., Crowe, L. M., Berger, T., Anchordoguy, T. J., Overstreet, J. W. & Crowe, J. H. (1993). Cold shock damage is due to lipid phase transitions in cell membranes: A demonstration using sperm as a model. J Exp Zool 265:432-437. Dubouzet, J. G., Sakuma, Y., Ito, Y., Kasuga, M., Dubouzet, E. G., Miura, S., Seki, M., Shinozaki, K. & Yamaguchi-Shinozaki, K. (2003). OsDREB genes in rice, Oryza sativa L., encode transcription activators that function in drought-, high-salt-and cold-responsive gene expression. Plant J 33:751-763. Dudoit, S., Yang, Y. H., Callow, M. J. & Speed, T. P. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 12:111-139. Eguchi, M. (1999). The nonculturable state of marine bacteria. In Microbial biosystems: new frontiers, pp. 717–722. Edited by C. R. Bell, M. Brylinsky & P. Johnson-Green. Halifax, Canada: Atlantic Canada Society for Microbial Ecology. Eguchi, M., Nishikawa, T., MacDonald, K., Cavicchioli, R., Gottschal, J. & Kjelleberg, S. (1996). Responses to stress and nutrient availability by the marine ultramicrobacterium Sphingomonas sp. strain RB2256. Appl Environ Microbiol 62:1287-1294. Eguchi, M., Ostrowski, M., Fegatella, F., Bowman, J., Nichols, D., Nishino, T. & Cavicchioli, R. (2001). Sphingomonas alaskensis, strain AF01: An abundant oligotrophic ultramicrobacterium from the North Pacific. Appl Environ Microbiol 67:4945-4954. Einarson, M. & Orlinick, I. (2002). ldentification of protein-protein interactions with glutathione-S- transferase fusion proteins. In Protein-Protein Interactions: A Molecular Cloning Manual. Edited by E. Golemis & P. D. Adams. Cold Spring Harbor, NY, USA: Cold Spring Harbor Laboratory Press. Ejsing, C. S., Moehring, T., Bahr, U., Duchoslav, E., Karas, M., Simons, K. & Shevchenko, A. (2006). Collision-induced dissociation pathways of yeast sphingolipids and their molecular profiling in total lipid extracts: a study by quadrupole TOF and linear ion trap-orbitrap mass spectrometry. J Mass Spectrom 41:372-389. Emoto, N. & Yanagisawa, M. (1995). Endothelin-converting enzyme-2 is a membrane-bound, phosphoramidon-sensitive metalloprotease with acidic pH optimum. J Biol Chem 270:15262-15268.

208 L. Ting, UNSW. References

Eng, J. K., McCormack, A. L. & Yates III, J. R. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989. Engelsberger, W. R., Erban, A., Kopka, J. & Schulze, W. X. (2006). Metabolic labeling of plant cell 15 cultures with K NO3 as a tool for quantitative analysis of proteins and metabolites. Plant Methods 2:14. Engen, J. R., Bradbury, E. M. & Chen, X. (2002). Using stable-isotope-labeled proteins for hydrogen exchange studies in complex mixtures. Anal Chem 74:1680-1686. Everley, P. A., Krijgsveld, J., Zetter, B. R. & Gygi, S. P. (2004). Quantitative cancer proteomics: Stable Isotope Labeling with Amino Acids in Cell Culture (SILAC) as a tool for prostate cancer research. Mol Cell Proteomics 3:729-735. Fang, R., Elias, D., Monroe, M., Shen, Y., Mcintosh, M., Wang, P., Goddard, C., Callister, S., Moore, R. & Gorby, Y. (2006). Differential label-free quantitative proteomic analysis of Shewanella oneidensis cultured under aerobic and suboxic conditions by accurate mass and time tag approach. Mol Cell Proteomics 5:714-725. Farewell, A. & Neidhardt, F. C. (1998). Effect of temperature on in vivo protein synthetic capacity in Escherichia coli. J Bacteriol 180:4704-4710. Fegatella, F. & Cavicchioli, R. (2000). Physiological responses to starvation in the marine oligotrophic ultramicrobacterium Sphingomonas sp. strain RB2256. Appl Environ Microbiol 66:2037-2044. Fegatella, F., Lim, J., Kjelleberg, S. & Cavicchioli, R. (1998). Implication of rRNA operon copy number and ribosome content in the marine oligotrophic ultramicrobacterium Sphingomonas sp. strain RB2256. Appl Environ Microbiol 64:4433-4438. Fegatella, F., Ostrowski, M. & Cavicchioli, R. (1999). An assessment of protein profiles from the marine oligotrophic ultramicrobacterium, Sphingomonas sp. strain RB2256. Electrophoresis 20:2094- 2098. Feller, G. & Gerday, C. (2003). Psychrophilic enzymes: hot topics in cold adaptation. Nat Rev Microbiol 1:200-208. Fenselau, C. (2007). A review of quantitative methods for proteomic studies. J Chromatogr B 855:14-20. Ferguson, A. D., Hofmann, E., Coulton, J. W., Diederichs, K. & Welte, W. (1998). Siderophore- mediated iron transport: crystal structure of FhuA with bound lipopolysaccharide. Science 282:2215- 2220. Fernandez, E. A., Girotti, M. R., del Olmo, J. A. L., Llera, A. S., Podhajcer, O. L., Cantet, R. J. C. & Balzarini, M. (2008). Improving 2D-DIGE protein expression analysis by two-stage linear mixed models: assessing experimental effects in a melanoma cell study. Bioinformatics 24:2706-2712. Ferrer, M., Chernikova, T. N., Yakimov, M. M., Golyshin, P. N. & Timmis, K. N. (2003). Chaperonins govern growth of Escherichia coli at low temperatures. Nat Biotechnol 21:1266-1267. Ferrer, M., Lunsdorf, H., Chernikova, T. N., Yakimov, M., Timmis, K. N. & Golyshin, P. N. (2004). Functional consequences of single:double ring transitions in chaperonins: life in the cold. Mol Microbiol 53:167-182. Fey, S. J., Larsen, P. M., Görg, A., München, F., Weihenstephan, G. & Odense, D. (2000). Towards higher resolution: Two-dimensional electrophoresis of Saccharomyces cerevisiae proteins using overlapping narrow immobilized pH gradients. Electrophoresis 21:2610-2616. Fischer, H. M., Babst, M., Kaspar, T., Acuña, G., Arigoni, F. & Hennecke, H. (1993). One member of a gro-ESL-like chaperonin multigene family in Bradyrhizobium japonicum is co-regulated with symbiotic nitrogen fixation genes. EMBO J 12:2901-2912. Fodor, I., Nelson, D., Alegria-Hartman, M., Robbins, K., Langlois, R., Turteltaub, K., Corzett, T. & McCutchen-Maloney, S. (2005). Statistical challenges in the analysis of two-dimensional difference gel electrophoresis experiments using DeCyder™. Bioinformatics 21:3733-3740. Fourquet, S., Huang, M. E., D'Autreaux, B. & Toledano, M. B. (2008). The dual functions of thiol- based peroxidases in H2O2 scavenging and signaling. Antioxid Redox Signal 10:1565-1576. Fraser, C. M., Casjens, S., Huang, W. M., Sutton, G. G., Clayton, R., Lathigra, R., White, O., Ketchum, K. A., Dodson, R. & Hickey, E. K. (1997). Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390:580-586. Fu, J. C., Ding, L. I. & Clarke, S. (1991). Purification, gene cloning, and sequence analysis of an L- isoaspartyl protein carboxyl methyltransferase from Escherichia coli. J Biol Chem 266:14562-14572. Gao, H., Yang, Z. K., Wu, L., Thompson, D. K. & Zhou, J. (2006). Global transcriptome analysis of the cold shock response of Shewanella oneidensis MR-1 and mutational analysis of its classical cold shock proteins. J Bacteriol 188:4560-4569. Garcia-Fernandez, J. M., de Marsac, N. T. & Diez, J. (2004). Streamlined regulation and gene loss as adaptive mechanisms in Prochlorococcus for optimized nitrogen utilization in oligotrophic environments. Microbiol Mol Biol Rev 68:630-638.

L. Ting, UNSW. 209 References

Gatlin, C. L., Kleemann, G. R., Hays, L. G., Link, A. J. & Yates III, J. R. (1998). Protein identification at the low femtomole level from silver-stained gels using a new fritless electrospray interface for liquid chromatography-microspray and nanospray mass spectrometry. Anal Biochem 263:93- 101. Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y. & Gentry, J. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5. George, R., Kelly, S. M., Price, N. C., Erbse, A., Fisher, M. & Lund, P. A. (2004). Three GroEL homologues from Rhizobium leguminosarum have distinct in vitro properties. Biochem Biophys Res Commun 324:822-828. Georlette, D., Blaise, V., Collins, T., D'Amico, S., Gratia, E., Hoyoux, A., Marx, J.-C., Sonan, G., Feller, G. & Gerday, C. (2004). Some like it cold: Biocatalysis at low temperatures. FEMS Microbiol Rev 28:25-42. Gerber, S., Rush, J., Stemman, O., Kirschner, M. & Gygi, S. (2003). Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci USA 100:6940-6945. Gerday, C., Aittaleb, M., Bentahir, M., Chessa, J. P., Claverie, P., Collins, T., D'Amico, S., Dumont, J., Garsoux, G. & Georlette, D. (2000). Cold-adapted enzymes: from fundamentals to biotechnology. Trends Biotechnol 18:103-107. Ghai, S. K., Hisamatsu, M., Amemura, A. & Harada, T. (1981). Production and chemical composition of extracellular polysaccharides of Rhizobium. J Gen Microbiol 122:33-40. Giangrossi, M., Giuliodori, A. M., Gualerzi, C. O. & Pon, C. L. (2002). Selective expression of the beta-subunit of nucleoid-associated protein HU during cold shock in Escherichia coli. Mol Microbiol 44:205-216. Giaquinto, L., Curmi, P. M. G., Siddiqui, K. S., Poljak, A., DeLong, E., DasSarma, S. & Cavicchioli, R. (2007). Structure and function of cold shock proteins in Archaea. J Bacteriol 189:5738- 5748. Gibson, F., Anderson, L., Babnigg, G., Baker, M., Berth, M., Binz, P. A., Borthwick, A., Cash, P., Day, B. W. & Friedman, D. B. (2008). Guidelines for reporting the use of gel electrophoresis in proteomics. Nat Biotechnol 26:863. Gille, C., Goede, A., Schlöetelburg, C., Preißner, R., Kloetzel, P. M., Göbel, U. B. & Frömmel, C. (2003). A comprehensive view on proteasomal sequences: Implications for the evolution of the proteasome. J Mol Biol 326:1437-1448. Giovannoni, S. J., Tripp, H. J., Givan, S., Podar, M., Vergin, K. L., Baptista, D., Bibbs, L., Eads, J., Richardson, T. H. & Noordewier, M. (2005). Genome streamlining in a cosmopolitan oceanic bacterium. Science 309:1242-1245. Giuliodori, A. M., Brandi, A., Giangrossi, M., Gualerzi, C. O. & Pon, C. L. (2007). Cold-stress- induced de novo expression of infC and role of IF3 in cold-shock translational bias. RNA 13:1355-1365. Giuliodori, A. M., Brandi, A., Gualerzi, C. O. & Pon, C. L. (2004). Preferential translation of cold- shock mRNAs during cold adaptation. RNA 10:265-276. Godoy, F., Vancanneyt, M., Martinez, M., Steinbuchel, A., Swings, J. & Rehm, B. H. A. (2003). Sphingopyxis chilensis sp. nov., a chlorophenol-degrading bacterium that accumulates polyhydroxyalkanoate, and transfer of Sphingomonas alaskensis to Sphingopyxis alaskensis comb. nov. Int J Syst Evol Microbiol 53:473-477. Goldenberg, D., Azar, I. & Oppenheim, A. B. (1996). Differential mRNA stability of the cspA gene in the cold-shock response of Escherichia coli. Mol Microbiol 19:241-248. Goldfine, H. (1982). Lipids of prokaryotes - Structure and distribution. In Current topics in membranes and transport: Membrane lipids of prokaryotes. Edited by F. Bronner, A. Kleinzeller, S. Razin & A. Rotten. New York: Academic Press. Goloubinoff, P., Mogk, A., Zvi, A. P. B., Tomoyasu, T. & Bukau, B. (1999). Sequential mechanism of solubilization and refolding of stable protein aggregates by a bichaperone network. Proc Natl Acad Sci USA 96:13732. Gomez, J. G. C., Rodrigues, M. F. A., Alli, R. C. P., Torres, B. B., Netto, C. L. B., Oliveira, M. S. & da Silva, L. F. (1996). Evaluation of soil gram-negative bacteria yielding polyhydroxyalkanoic acids from carbohydrates and propionic acid. Appl Microbiol Biotechnol 45:785-791. Goodchild, A., Raftery, M., Saunders, N. F. W., Guilhaus, M. & Cavicchioli, R. (2004). Biology of the cold adapted archaeon, Methanococcoides burtonii determined by proteomics using liquid chromatography-tandem mass spectrometry. J Proteome Res 3:1164-1176. Goodchild, A., Raftery, M., Saunders, N. F. W., Guilhaus, M. & Cavicchioli, R. (2005). Cold aptation of the antarctic archaeon, Methanococcoides burtonii assessed by proteomics using ICAT. J Proteome

210 L. Ting, UNSW. References

Res 4:473-480. Goodlett, D. R. & Yi, E. C. (2002). Proteomics without polyacrylamide: qualitative and quantitative uses of tandem mass spectrometry in proteome analysis. Funct Integr Genomics 2:138-153. Gorg, A., Obermaier, C., Boguth, G., Harder, A., Scheibe, B., Wildgruber, R. & Weiss, W. (2000). The current state of two-dimensional electrophoresis with immobilized pH gradients. Electrophoresis 21:1037-1053. Gourmelon, M., Cillard, J. & Pommepuy, M. (1994). Visible light damage to Escherichia coli in seawater: oxidative stress hypothesis. J Appl Microbiol 77:105-112. Grabelnych, O. I., Sumina, O. N., Funderat, S. P., Pobezhimova, T. P., Voinikov, V. K. & Kolesnichenko, A. V. (2004). The distribution of electron transport between the main cytochrome and alternative pathways in plant mitochondria during short-term cold stress and . J Therm Biol 29:165-175. Graham, R. L. J., Pollock, C. E., Ternan, N. G. & McMullan, G. (2006). Top-down proteomic analysis of the soluble sub-proteome of the obligate thermophile, Geobacillus thermoleovorans T80: insights into Its cellular processes. J Prot Res 5:822-828. Grau, R., Gardiol, D., Glikin, G. C. & Mendoza, D. (1994). DNA supercoiling and thermal regulation of unsaturated fatty acid synthesis in Bacillus subtilis. Mol Microbiol 11:933-941. Graumann, J., Hubner, N. C., Kim, J. B., Ko, K., Moser, M., Kumar, C., Cox, J., Scholer, H. & Mann, M. (2008). Stable isotope labeling by amino acids in cell culture (SILAC) and proteome quantitation of mouse embryonic stem cells to a depth of 5,111 Proteins. Mol Cell Proteomics 7:672-683. Graumann, P. L. & Marahiel, M. A. (1998). A superfamily of proteins that contain the cold-shock domain. Trends Biochem Sci 23:286-290. Graumann, P. L. & Marahiel, M. A. (1999). Cold shock response in Bacillus subtilis. J Mol Microbiol Biotechnol 1:203-209. Griebel, R., Smith, Z. & Merrick, J. M. (1968). Metabolism of poly-beta-hydroxybutyrate. I. Purification, composition, and properties of native poly-beta-hydroxybutyrate granules from Bacillus megaterium. Biochemistry 7:3676-3681. Griffin, T. J., Han, D. K., Gygi, S. P., Rist, B., Lee, H., Aebersold, R. & Parker, K. C. (2001). Toward a high-throughput approach to quantitative proteomic analysis: expression-dependent protein identification by mass spectrometry. J Am Soc Mass Spectrom 12:1238-1246. Griffith, S. C., Sawaya, M. R., Boutz, D. R., Thapar, N., Katz, J. E., Clarke, S. & Yeates, T. O. (2001). Crystal structure of a protein repair methyltransferase from Pyrococcus furiosus with its l- isoaspartyl peptide substrate. J Mol Biol 313:1103-1116. Grisolia, S. & Cohen, P. P. (1953). Catalytic role of glutamate derivatives in citrulline biosynthesis. J Biol Chem 204:753-758. Gualerzi, C. O., Maria Giuliodori, A. & Pon, C. L. (2003). Transcriptional and post-transcriptional control of cold-shock genes. J Mol Biol 331:527-539. Gusnanto, A., Calza, S. & Pawitan, Y. (2007). Identification of differentially expressed genes and false discovery rate in microarray studies. Curr Opin Lipidol 18:187-193. Guthrie, C., Nashimoto, H. & Nomura, M. (1969). Structure and function of E. coli ribosomes, VIII. Cold-sensitive mutants defective in ribosome assembly. Proc Natl Acad Sci USA 63:384-391. Gygi, S. P., Corthals, G. L., Zhang, Y., Rochon, Y. & Aebersold, R. (2000). Evaluation of two- dimensional gel electrophoresis based proteome analysis technology. Proc Natl Acad Sci USA 97:9390- 9305. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H. & Aebersold, R. (1999). Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 17:994-999. Gygi, S. P., Rist, B., Griffin, T. J., Eng, J. K. & Aebersold, R. (2002). Proteome analysis of low- abundance proteins using multidimensional chromatography and isotope-coded affinty tags. J Prot Res 1:47-54. Gygi, S. P., Rochon, Y., Franza, B. R. & Aebersold, R. (1999). Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19:1720-1730. Habermann, B., Oegema, J., Sunyaev, S. & Shevchenko, A. (2004). The power and the limitations of cross-species protein identification by mass spectrometry-driven sequence similarity searches. Mol Cell Proteomics 3:238-249. Halligan, B. D., Slyper, R. Y., Twigger, S. N., Hicks, W., Olivier, M. & Greene, A. S. (2005). ZoomQuant: An application for the quantitation of stable isotope labeled peptides. J Am Soc Mass Spectrom 16:302-306. Han, D. K., Eng, J., Zhou, H. & Aebersold, R. (2001). Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 19:946-

L. Ting, UNSW. 211 References

951. Haqqani, A., Nesic, M., Preston, E., Baumann, E., Kelly, J. & Stanimirovic, D. (2005). Characterization of vascular protein expression patterns in cerebral ischemia/reperfusion using laser capture microdissection and ICAT-nanoLC-MS/MS. FASEB J 19:1809-1821. Harding, N. E., Patel, Y. N. & Coleman, R. J. (2004). Organization of genes required for gellan polysaccharide biosynthesis in Sphingomonas elodea ATCC 31461. J Ind Microbiol 31:70-82. Harlow, E. & Lane, D. (1999). Using antibodies: A laboratory manual. Cold Spring Harbour, NY, USA: Cold Spring Harbour Laboratory Press. Harper, A. D. & Bar-Peled, M. (2002). Biosynthesis of UDP-xylose. Cloning and characterization of a novel Arabidopsis gene family, UXS, encoding soluble and putative membrane-bound UDP-glucuronic acid decarboxylase Isoforms. Plant Physio 130:2188-2198. Hashimoto, W., He, J., Wada, Y., Nankai, H., Mikami, B. & Murata, K. (2005). Proteomics-based identification of outer-membrane proteins responsible for import of macromolecules in Sphingomonas sp. A1: Alginate-binding flagellin on the cell surface. Biochemistry 44:13783-13794. Hayashi, T., Koyama, T. & Matsuda, K. (1988). Formation of UDP-xylose and xyloglucan in soybean golgi membranes. Plant Physio 87:341-345. Hebraud, M., Dubois, E., Potier, P. & Labadie, J. (1994). Effect of growth temperatures on the protein levels in a psychrotrophic bacterium, Pseudomonas fragi. J Bacteriol 176:4017-4024. Hebraud, M. & Potier, P. (1999). Cold shock response and low temperature adaptation in psychrotrophic bacteria. J Mol Microbiol Biotechnol 1:211-219. Hendrick, J. P. & Hartl, F. (1993). Molecular chaperone functions of heat-shock proteins. Ann Rev Biochem 62:349-384. Hendrickson, E., Xia, Q., Wang, T., Leigh, J. & Hackett, M. (2006). Comparison of spectral counting and metabolic stable isotope labeling for use with quantitative microbial proteomics. Analyst 131:1335- 1341. Hengst, L. & Reed, S. I. (1996). Translational control of p27Kip1 accumulation during the cell cycle. Science 271:1861-1864. Hill, E. G., Schwacke, J. H., Comte-Walters, S., Slate, E. H., Oberg, A. L., Eckel-Passow, J. E., Therneau, T. M. & Schey, K. L. (2008). A statistical model for iTRAQ data analysis. J Proteome Res 7:3091-3101. Hillenkamp, F. & Peter-Katalinic, J. (2007). A practical guide to MALDI-MS. Instrumentation, methods and applications. New York: Wiley. Hinchman, C. & Ballatori, N. (1990). Glutathione-degrading capacities of liver and kidney in different species. Biochem Pharmacol 40:1131-1135. Hoffmann, B. & Ingraham, J. L. (1970). A cold sensitive mutant of Salmonella typhimurium which requires tryptophan for growth at 20ºC. Biochim Biophys Acta 201:970-972. Holmgren, A. (1985). Thioredoxin. Annu Rev Biochem 54:237-271. Horton, A. J., Hak, K. M., Steffan, R. J., Foster, J. W. & Bej, A. K. (2000). Adaptive response to cold temperatures and characterization of cspA in Salmonella typhimurium LT2. Antonie Van Leeuwenhoek 77:13-20. Horwich, A. L., Farr, G. W. & Fenton, W. A. (2006). GroEL-GroES-mediated protein folding. Chem Rev 106:1917-1930. Hoskins, J. R., Sharma, S., Sathyanarayana, B. K., Wickner, S. & Arthur, H. (2001). Clp ATPases and their role in protein unfolding and degradation. In Advances in Protein Chemistry, pp. 413-420: Academic Press. Hu, J., Coombes, K. R., Morris, J. S. & Baggerly, K. A. (2005). The importance of experimental design in proteomic mass spectrometry experiments: Some cautionary tales. Brief Funct Genomic Proteomic 3:322-331. Hunger, K., Beckering, C. L., Wiegeshoff, F., Graumann, P. L. & Marahiel, M. A. (2006). Cold- induced putative DEAD box RNA helicases CshA and CshB are essential for cold adaptation and interact with cold shock protein B in Bacillus subtilis. J Bacteriol 188:240-248. Hunt, S. M. N., Thomas, M. R., Sebastian, L. T., Pedersen, S. K., Harcourt, R. L., Sloane, A. J. & Wilkins, M. R. (2005). Optimal replication and the importance of experimental design for gel-based quantitative proteomics. J Proteome Res 4:809-819. Ihaka, R. & Gentleman, R. (1996). R: A language for data analysis and graphics. J Comput Graph Stat 5:299-314. Irie, Y., Preston, A. & Yuk, M. H. (2006). Expression of the primary carbohydrate component of the Bordetella bronchiseptica biofilm matrix is dependent on growth phase butiIndependent of Bvg regulation. J Bacteriol 188:6680-6687.

212 L. Ting, UNSW. References

Ishihama, Y., Oda, Y., Tabata, T., Sato, T., Nagasu, T., Rappsilber, J. & Mann, M. (2005). Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4:1265-1272. Issaq, H. (2001). The role of separation science in proteomics research. Electrophoresis 22:3629. Jackson, F. A. & Dawes, E. A. (1976). Regulation of the tricarboxylic acid cycle and poly-beta- hydroxybutyrate metabolism in Azotobacter beijerinckii grown under nitrogen or oxygen limitation. J Gen Microbiol 97:303-312. Jaffe, J. D., Mani, D. R., Leptos, K. C., Church, G. M., Gillette, M. A. & Carr, S. A. (2006). PEPPeR, a Platform for Experimental Proteomic Pattern Recognition. Mol Cell Proteomics 5:1927-1941. Jaglo, K. R., Kleff, S., Amundsen, K. L., Zhang, X., Haake, V., Zhang, J. Z., Deits, T. & Thomashow, M. F. (2001). Components of the Arabidopsis C-repeat/dehydration-responsive element binding factor cold-response pathway are conserved in Brassica napus and other plant species. Plant Physiol 127:910-917. Jang, H. H., Lee, K. O., Chi, Y. H., Jung, B. G., Park, S. K., Park, J. H., Lee, J. R., Lee, S. S., Moon, J. C. & Yun, J. W. (2004). Two enzymes in one: Two yeast peroxiredoxins display oxidative stress- dependent switching from a peroxidase to a molecular chaperone function. Cell 117:625-635. Jara, M., Vivancos, A. P., Calvo, I. A., Moldon, A., Sanso, M. & Hidalgo, E. (2007). The peroxiredoxin Tpx1 is essential as a H2O2 scavenger during aerobic growth in fission yeast. Mol Biol Cell 18:2288-2295. Jendrossek, D. & Handrick, R. (2002). Microbial degradation of polyhydroxyalkanoates. Annu Rev Microbiol 56:403-432. Jendrossek, D., Knoke, I., Habibian, R., Steinbüchel, A. & Schlegel, H. (1993). Degradation of poly (3-hydroxybutyrate), PHB, by bacteria and purification of a novel PHB depolymerase from Comamonas sp. J Polym Environ 1:53-63. Johnson, B. A., Ngo, S. Q. & Aswad, D. W. (1991). Widespread phylogenetic distribution of a protein methyltransferase that modifies L-isoaspartyl residues. Biochem Int 24:841-847. Johnson, G. V., Evans, H. J. & Ching, T. (1966). Enzymes of the glyoxylate cycle in Rhizobia and nodules of legumes 1. Plant Physio 41:1330-1336. Johnson, K. L. & Muddiman, D. C. (2004). A method for calculating 16O/18O peptide ion ratios for the relative quantification of proteomes. J Am Soc Mass Spectrom 15:437-445. Jones, P. G., Krah, R., Tafuri, S. R. & Wolffe, A. P. (1992). DNA gyrase, CS7. 4, and the cold shock response in Escherichia coli. J Bacteriol 174:5798-5802. Jones, P. G., Mitta, M., Kim, Y., Jiang, W. & Inouye, M. (1996). Cold shock induces a major ribosomal-associated protein that unwinds double-stranded RNA in Escherichia coli. Proc Natl Acad Sci USA 93:76. Jones, P. G., VanBogelen, R. A. & Neidhardt, F. C. (1987). Induction of proteins in response to low temperature in Escherichia coli. J Bacteriol 169:2093-2095. Jorgens, K. & Matz, C. (2002). Predation as a shaping for the phenotypic and genotypic composition of planktonic bacteria. Antonie Van Leeuwenhoek 81:413-434. Joux, F., Jeffrey, W. H., Lebaron, P. & Mitchell, D. L. (1999). Marine bacterial isolates display diverse responses to UV-B radiation. Appl Environ Microbiol 65:3820-3827. Julka, S. & Regnier, F. (2004). Quantification in proteomics through stable isotope coding: A review. J Prot Res 3:350-363. Julseth, C. R. & Inniss, W. E. (1990). Induction of protein synthesis in response to cold shock in the psychrotrophic yeast Trichosporon pullulans. Can J Microbiol 36:519-524. Kaiser, J. (2002). Public-private group maps out initiatives. Science 296:827. Kandror, O., DeLeon, A. & Goldberg, A. L. (2002). Trehalose synthesis is induced upon exposure of Escherichia coli to cold and is essential for viability at low temperatures. Proc Natl Acad Sci USA 99:9727-9732. Kaneko, T., Nakamura, Y., Sato, S., Minamisawa, K., Uchiumi, T., Sasamoto, S., Watanabe, A., Idesawa, K., Iriguchi, M. & Kawashima, K. (2002). Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum DNA Res 9:225-256. Karp, N., Kreil, D. & Lilley, K. (2004). Determining a significant change in protein expression with DeCyder™ during a pair-wise comparison using two-dimensional difference gel electrophoresis. Proteomics 4:1421-1432. Karp, N. A., Griffin, J. L. & Lilley, K. S. (2005). Application of partial least squares discriminant analysis to two-dimensional difference gel studies in expression proteomics. Proteomics 5:81-90. Karp, N. A., McCormick, P. S., Russell, M. R. & Lilley, K. S. (2007). Experimental and statistical considerations to avoid false conclusions in proteomic studies using differential in-gel electrophoresis.

L. Ting, UNSW. 213 References

Mol Cell Proteomics 6:1354-1364. Karp, N. A., Spencer, M., Lindsay, H., O'Dell, K. & Lilley, K. S. (2005). Impact of replicate types on proteomic expression analysis. J Proteome Res 4:1867-1871. Kasuya, K., Doi, Y. & Yao, T. (1994). Enzymatic degradation of poly (R)-3-hydroxybutyrate by Comamonas testosteroni ATSU of soil bacterium. Polym Degrad Stab 45:379-386. Katano, Y., Nimura-Matsune, K. & Yoshikawa, H. (2006). Involvement of DnaK3, one of the three DnaK proteins of Cyanobacterium Synechococcus sp. PCC7942, in translational process on the surface of the thylakoid membrane. Biosci Biotechnol Biochem 70:1592-1598. Kawahara, K., Kuraishi, H. & Zähringer, U. (1999). Chemical structure and function of glycosphingolipids of Sphingomonas spp and their distribution among members of the α-4 subclass of Proteobacteria. J Ind Microbiol 23:408-413. Kawahara, K., Seydel, U., Matsuura, M., Danbara, H., Rietschel, E. T. & Zarhringer, U. (1991). Chemical structure of glycosphingolipids isolated from Sphingomonas paucimobilis. FEBS Lett 292:107- 110. Kawamoto, J., Kurihara, T., Kitagawa, M., Kato, I. & Esaki, N. (2007). Proteomic studies of an Antarctic cold-adapted bacterium, Shewanella livingstonensis Ac10, for global identification of cold- inducible proteins. Extremophiles 11:819-826. Kawamura, D., Yamashita, I., Nimi, O. & Toh-e, A. (1994). Cloning and nucleotide sequence of a gene conferring ability to grow at a low temperature on Saccharomyces cerevisiae tryptophan auxotrophs. J Ferment Bioeng 77:1-9. Keller, A., Eng, J., Zhang, N., Li, X. & Aebersold, R. (2005). A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 2:1-8. Kerr, M. K. & Churchill, G. A. (2001). Experimental design for gene expression microarrays. Biostatistics 2:183. Khidekel, N., Ficarro, S. B., Peters, E. C. & Hsieh-Wilson, L. C. (2004). Exploring the O-GlcNAc proteome: Direct identification of O-GlcNAc-modified proteins from the brain. Proc Natl Acad Sci USA 101:13132-13137. Kindrachuk, J., Parent, J., Davies, G. F., Dinsmore, M., Attah-Poku, S. & Napper, S. (2003). Overexpression of l-isoaspartate O-methyltransferase in Escherichia coli increases heat shock survival by a mechanism independent of methyltransferase activity. J Biol Chem 278:50880-50886. Kirkpatrick, D., Gerber, S. & Gygi, S. (2005). The absolute quantification strategy: A general procedure for the quantification of proteins and post-translational modifications. Methods 35:265-273. Kiss, E., Huguet, T., Poinsot, V. & Batut, J. (2004). The typA gene is required for stress adaptation as well as for symbiosis of Sinorhizobium meliloti 1021 with certain Medicago truncatula lines. Mol Plant Microbe Interact 17:235-244. Ko, R., Smith, L. T. & Smith, G. M. (1994). Glycine betaine confers enhanced osmotolerance and cryotolerance on Listeria monocytogenes. J Bacteriol 176:426-431. Kornberg, H. L. (1966). The role and control of the glyoxylate cycle in Escherichia coli. Biochem J 99:1-11. Kovacs, E., van der Vies, S. M., Glatz, A., Torok, Z., Varvasovszki, V., Horvath, I. & Vigh, L. (2001). The chaperonins of Synechocystis PCC 6803 differ in heat inducibility and chaperone activity. Biochem Biophys Res Commun 289:908-915. Krambeck, C., Krambeck, H. J. & Overbeck, J. (1981). Microcomputer-assisted biomass determination of plankton bacteria on scanning electron micrographs. Appl Environ Microbiol 42:142- 149. Kreil, D. P., Karp, N. A. & Lilley, K. S. (2004). DNA microarray normalization methods can remove bias from differential protein expression analysis of 2D difference gel electrophoresis results. Bioinformatics 20:2026-2034. Krembs, C., Eicken, H., Junge, K. & Deming, J. W. (2002). High concentrations of exopolymeric substances in Arctic winter sea ice: implications for the polar ocean carbon cycle and cryoprotection of diatoms. Deep Sea Res Part I Oceanogr Res Pap 49:2163-2181. Krijgsveld, J. & Heck, A. J. R. (2004). Quantitative proteomics by metabolic labeling with stable isotopes. Drug Discov Today: Targets 3:S11-S15. Krijgsveld, J., Ketting, R. F., Mahmoudi, T., Johansen, J., Artal-Sanz, M., Verrijzer, C. P., Plasterk, R. H. A. & Heck, A. J. R. (2003). Metabolic labeling of C. elegans and D. melanogaster for quantitative proteomics. Nat Biotechnol 21:927-931. Krishnan, K. & Flower, A. M. (2008). Suppression of ΔbipA phenotypes in Escherichia coli by abolishment of pseudouridylation at specific sites on the 23S rRNA. J Bacteriol:JB.00835-00808. Kuhlmann, A. U. & Bremer, E. (2002). Osmotically regulated synthesis of the compatible solute

214 L. Ting, UNSW. References

ectoine in Bacillus pasteurii and related Bacillus spp. Appl Environ Microbiol 68:772-783. Kultima, K., Scholz, B., Alm, H., Sköld, K., Svensson, M., Crossman, A. R., Bezard, E., Andrén, P. E. & Lönnstedt, I. (2006). Normalization and expression changes in predefined sets of proteins using 2D gel electrophoresis: A proteomic study of L-DOPA induced dyskinesia in an animal model of Parkinson's disease using DIGE. BMC Bioinformatics 7:475. Lam, T. C., Li, K. K., Lo, S. C. L., Guggenheim, J. A. & To, C. H. (2006). A chick retinal proteome database and differential retinal protein expressions during early ocular development. J Prot Res 5:771- 784. Lange, R. & Hengge-Aronis, R. (1994). The cellular concentration of the sigma S subunit of RNA polymerase in Escherichia coli is controlled at the levels of transcription, translation, and protein stability. Genes Dev 8:1600-1612. Lasonder, E., Ishihama, Y., Andersen, J. S., Vermunt, A. M. W., Pain, A., Sauerwein, R. W., Eling, W. M. C., Hall, N., Waters, A. P., Stunnenberg, H. G. & Mann, M. (2002). Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419:537-542. Lauro, F. M. & Bertlett, D. H. (2007). Prokaryotic lifestyles in deep sea habitats. Extremophiles. Lauro, F. M., McDougald, D., Thomas, T., Williams, T. J., Egan, S., Rice, S., DeMaere, M. Z., Ting, L., Ertan, H., Johnson, J., Ferriera, S., Lapidus, A., Anderson, I., Kyrpides, N., Munk, A. C., Detter, C., Han, C. S., Brown, M. V., Robb, F. T., Kjelleberg, S. & Cavicchioli, R. (2009). The genomic basis of trophic strategy in marine bacteria. Proc Natl Acad Sci USA 106:15527-15533. Lauro, F. M., Tran, K., Vezzi, A., Vitulo, N., Valle, G. & Bartlett, D. H. (2008). Large-scale transposon mutagenesis of Photobacterium profundum SS9 reveals new genetic loci important for growth at low temperature and high pressure. J Bacteriol 190:1699-1709. Laursen, B. S., Siwanowicz, I., Larigauderie, G., Hedegaard, J., Ito, K., Nakamura, Y., Kenney, J. M., Mortensen, K. K. & Sperling-Petersen, H. U. (2003). Characterization of mutations in the GTP- binding domain of IF2 resulting in cold-sensitive growth of Escherichia coli. J Mol Biol 326:543-551. Lee, D.-G., Ahsan, N., Lee, S.-H., Kang, K. Y., Bahk, J. D., Lee, I.-J. & Lee, B.-H. (2007). A proteomic approach in analyzing heat-responsive proteins in rice leaves. Proteomics 7:3369-3383. Lee, K. H. (2001). Proteomics: a technology-driven and technology-limited discovery science. Trends Biotechnol 19:217-222. Lee, W. S., Choi, K. S., Riddell, J., Ip, C., Ghosh, D., Park, J. H. & Park, Y. M. (2007). Human peroxiredoxin 1 and 2 are not duplicate proteins: The unique presence of Cys83 in Prx1 underscores the structural and functional differences between Prx1 and Prx2. J Biol Chem 282:22011-22022. Lelivelt, M. J. & Kawula, T. H. (1995). Hsc66, an Hsp70 homolog in Escherichia coli, is induced by cold shock but not by heat shock. J Bacteriol 177:4900-4907. Lemieux, M. J., Huang, Y. & Wang, D. N. (2004). Glycerol-3-phosphate transporter of Escherichia coli: Structure, function and regulation. Res Microbiol 155:623-629. Leptos, K. C., Sarracino, D. A., Jaffe, J. D., Krastins, B. & Church, G. M. (2006). MapQuant: Open- source software for large-scale protein quantification. Proteomics 6:1770-1782. Levy, F., Bulet, P. & Ehret-Sabatier, L. (2004). Proteomic analysis of the systemic immune response of Drosophila. Mol Cell Proteomics 3:156-166. Li, L., Li, Q., Rohlin, L., Kim, U., Salmon, K., Rejtar, T., Gunsalus, R., Karger, B. & Ferry, J. (2007). Quantitative proteomic and microarray analysis of the archaeon Methanosarcina acetivorans grown with acetate versus methanol. J Proteome Res 6:759-771. Li, X., Yi, E. C., Kemp, C. J., Zhang, H. & Aebersold, R. (2005). A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatography-mass spectrometry. Mol Cell Proteomics 4:1328-1340. Li, X. J., Zhang, H., Ranish, J. A. & Aebersold, R. (2003). Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. Anal Chem 75:6648-6657. Lilley, K., Razzaq, A. & Dupree, P. (2002). Two-dimensional gel electrophoresis: recent advances in sample preparation, detection and quantitation. Curr Opin Chem Biol 6:46-50. Lim, J., Thomas, T. & Cavicchioli, R. (2000). Low temperature regulated DEAD-box RNA helicase from the antarctic archaeon, Methanococcoides burtonii. J Mol Biol 297:553-567. Lin, W. T., Hung, W. N., Yian, Y. H., Wu, K. P., Han, C. L., Chen, Y. R., Chen, Y. J., Sung, T. Y. & Hsu, W. L. (2006). Multi-Q: A fully automated tool for multiplexed protein quantitation. J Proteome Res 5:2328-2338. Link, A. J., Eng, J. K., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., Garvik, B. M. & Yates, J. R. I. (1999). Direct analysis of protein complexes using mass spectrometry detection. Nat Biotechnol 17:676-682.

L. Ting, UNSW. 215 References

Listgarten, J. & Emili, A. (2005). Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 4:419-434. Liu, H., Sadygov, R. & Yates III, J. (2004). A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 76:4193-4201. Liu, S., Bayles, D. O., Mason, T. M. & Wilkinson, B. J. (2006). A cold-sensitive Listeria monocytogenes mutant has a transposon insertion in a gene encoding a putative membrane protein and shows altered (p)ppGpp levels. Appl Environ Microbiol 72:3955-3959. Liu, S., Graham, J. E., Bigelow, L., Morse, P. D., II & Wilkinson, B. J. (2002). Identification of Listeria monocytogenes genes expressed in response to growth at low temperature. Appl Environ Microbiol 68:1697-1705. Llamas, M. A., Sparrius, M., Kloet, R., Jimenez, C. R., Vandenbroucke-Grauls, C. & Bitter, W. (2006). The heterologous siderophores ferrioxamine B and ferrichrome activate signaling pathways in Pseudomonas aeruginosa. J Bacteriol 188:1882-1891. Lonnstedt, I. & Speed, T. P. (2002). Replicated microarray data. Stat Sin 12:31-46. Lopez-Garcia, P. & Forterre, P. (1997). DNA topology in hyperthermophilic archaea: reference states and their variation with growth phase, growth temperature, and temperature stresses. Mol Microbiol 23:1267-1279. Lopez-Garcia, P. & Forterre, P. (1999). Control of DNA topology during thermal stress in hyperthermophilic archaea: DNA topoisomerase levels, activities and induced thermotolerance during heat and cold shock in Sulfolobus. Mol Microbiol 33:766-777. Lu, P., Vogel, C., Wang, R., Yao, X. & Marcotte, E. M. (2006). Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 25:117- 124. MacCoss, M. J. & Matthews, D. E. (2005). Quantitative MS for proteomics: teaching a new dog old tricks. Anal Chem 77:294A-302A. MacCoss, M. J., Wu, C. C., Liu, H., Sadygov, R. & Yates III, J. R. (2003). A correlation algorithm for the automated quantitative analysis of shotgun proteomics data. Anal Chem 75:6912-6921. MacCoss, M. J., Wu, C. C., Matthews, D. E. & Yates III, J. R. (2005). Measurement of the isotope enrichment of stable isotope-labeled proteins using high-resolution mass spectra of peptides. Anal Chem 77:7646-7653. Macek, B., Waanders, L. F., Olsen, J. V. & Mann, M. (2006). Top-down protein sequencing and MS3 on a hybrid linear quadrupole ion trap-orbitrap mass spectrometer. Mol Cell Proteomics 5:949-958. Maehara, A., Taguchi, S., Nishiyama, T., Yamane, T. & Doi, Y. (2002). A repressor protein, PhaR, regulates polyhydroxyalkanoate(PHA) synthesis via its direct interaction with PHA. J Bacteriol 184:3992-4002. Maehara, A., Ueda, S., Nakano, H. & Yamane, T. (1999). Analyses of a polyhydroxyalkanoic acid granule-associated 16-kilodalton protein and its putative regulator in the pha locus of Paracoccus denitrificans. J Bacteriol 181:2914-2921. Malin, G. & Lapidot, A. (1996). Induction of synthesis of tetrahydropyrimidine derivatives in Streptomyces strains and their effect on Escherichia coli in response to osmotic and heat stress. J Bacteriol 178:385-395. Mana-Capelli, S., Mandal, A. K. & Arguello, J. M. (2003). Archaeoglobus fulgidus CopB Is a Thermophilic Cu2+-ATPase J Biol Chem 278:40534-40541. Mancuso Nichols, C. A., Guezennec, J. & Bowman, J. P. (2005). Bacterial exopolysaccharides from extreme marine environments with special consideration of the southern ocean, sea ice, and deep-sea hydrothermal vents: A review. Mar Biotechnol 7:253-271. Mancuso Nichols, C. A., Lardière, S. G., Bowman, J. P., Nichols, P. D., Ae Gibson, J. & Guézennec, J. (2005). Chemical characterization of exopolysaccharides from Antarctic marine bacteria. Microb Ecol 49:578-589. Mancuso-Nichols, C. A., Lardière, S. G., Bowman, J. P., Nichols, P. D., Ae Gibson, J. & Guézennec, J. (2005). Chemical characterization of exopolysaccharides from Antarctic marine bacteria. Microb Ecol 49:578-589. Maniak, M. & Nellen, W. (1988). A developmentally regulated membrane protein gene in Dictyostelium discoideum is also induced by heat shock and cold shock. Mol Cell Biol 8:153-159. Manly, K. F., Nettleton, D. & Hwang, J. T. G. (2004). Genomics, prior probability, and statistical tests of multiple hypotheses. Genome Res 14:997-1001. Mann, M. & Jensen, O. (2003). Proteomic analysis of post-translational modifications. Nat Biotechnol 21:255-261. Markowitz, V. M., Szeto, E., Palaniappan, K., Grechkin, Y., Chu, K., Chen, I. M. A., Dubchak, I.,

216 L. Ting, UNSW. References

Anderson, I., Lykidis, A., Mavromatis, K., Ivanova, N. N. & Kyrpides, N. C. (2007). The integrated microbial genomes (IMG) system in 2007: Data content and analysis tool extensions. Nucl Acids Res 36:D528-D533. Marouga, R., David, S. & Hawkins, E. (2005). The development of the DIGE system: 2D fluorescence difference gel analysis technology. Anal Bioanal Chem 382:669-678. Marti-Arbona, R., Fresquet, V., Thoden, J. B., Davis, M. L., Holden, H. M. & Raushel, F. M. (2005). Mechanism of the reaction catalyzed by isoaspartyl dipeptidase from Escherichia coli. Biochemistry 44:7115-7124. Marx, J.-C., Collins, T., D'Amico, S., Feller, G. & Gerday, C. (2007). Cold-adapted enzymes from marine Antarctic microorganisms. Mar Biotechnol 9:293-304. Matallana-Surget, S., Joux, F., Lebaron, P. & Cavicchioli, R. (2007). Isolation and characterization of marine oligotrophic bacteria. J Soc Biol 201:41-50. Matallana-Surget, S., Joux, F., Raftery, M. J. & Cavicchioli, R. (2009). The response of the marine bacterium Sphingopyxis alaskensis to solar radiation assessed by quantitative proteomics. Environ Microbiol In Press:doi: 10.1111/j.1462-2920.2009.01992.x. Matallana-Surget, S., Meador, J. A., Joux, F. & Douki, T. (2008). Effect of the GC content of DNA on the distribution of UVB-induced bipyrimidine photoproducts. Photochemical & Photobiological Sciences 7:794-801. Matz, C. & Jürgens, K. (2003). Interaction of nutrient limitation and protozoan grazing determines the phenotypic structure of a bacterial community. Microb Ecol 45:384-398. May, D., Fitzgibbon, M., Liu, Y., Holzman, T., Eng, J., Kemp, C. J., Whiteaker, J., Paulovich, A. & McIntosh, M. (2007). A platform for accurate mass and time analyses of mass spectrometry data. J Prot Res 6:2685-2694. Mayer, M. P., Rudiger, S. & Bukau, B. (2000). Molecular basis for interactions of the DnaK chaperone with substrates. Biol Chem 381:877-885. McGann, L. E., Walterson, M. L. & Hogg, L. M. (1988). Light scattering and cell volumes in osmotically stressed and frozen-thawed cells. Cytometry 9:33-38. Medigue, C., Krin, E., Pascal, G., Barbe, V., Bernsel, A., Bertin, P. N., Cheung, F., Cruveiller, S., D'Amico, S., Duilio, A., Fang, G., Feller, G., Ho, C., Mangenot, S., Marino, G., Nilsson, J., Parrilli, E., Rocha, E. P. C., Rouy, Z., Sekowska, A., Tutino, M. L., Vallenet, D., von Heijne, G. & Danchin, A. (2005). Coping with cold: The genome of the versatile marine Antarctica bacterium Pseudoalteromonas haloplanktis TAC125. Genome Res 15:1325-1335. Meng, F., Wiener, M., Sachs, J., Burns, C., Verma, P., Paweletz, C., Mazur, M., Deyanova, E., Yates, N. & Hendrickson, R. (2007). Quantitative analysis of complex peptide mixtures using FTMS and differential mass spectrometry. J Am Soc Mass Spectrom 18:226-233. Methe, B. A., Nelson, K. E., Deming, J. W., Momen, B., Melamud, E., Zhang, X., Moult, J., Madupu, R., Nelson, W. C., Dodson, R. J., Brinkac, L. M., Daugherty, S. C., Durkin, A. S., DeBoy, R. T., Kolonay, J. F., Sullivan, S. A., Zhou, L., Davidsen, T. M., Wu, M., Huston, A. L., Lewis, M., Weaver, B., Weidman, J. F., Khouri, H., Utterback, T. R., Feldblyum, T. V. & Fraser, C. M. (2005). The psychrophilic lifestyle as revealed by the genome sequence of Colwellia psychrerythraea 34H through genomic and proteomic analyses. Proc Natl Acad Sci USA 102:10913-10918. Meunier, B., Bouley, J., Piec, I., Bernard, C., Picard, B. & Hocquette, J.-F. (2005). Data analysis methods for detection of differential protein expression in two-dimensional gel electrophoresis. Anal Biochem 340:226-230. Meyers, P. R., Bourn, W. R., Steyn, L. M., van Helden, P. D., Beyers, A. D. & Brown, G. D. (1998). Novel method for rapid measurement of growth of Mycobacteria in detergent-free media. J Clin Microbiol 36:2752-2754. Michel, V., Lehoux, I., Depret, G., Anglade, P., Labadie, J. & Hebraud, M. (1997). The cold shock response of the psychrotrophic bacterium Pseudomonas fragi involves four low-molecular-mass nucleic acid-binding proteins. J Bacteriol 179:7331-7342. Miller, S., Su, Q., McGrath, A., Estock, M., Parmar, P., Zhao, M., Huang, S., Zhou, J., Wang, F. & Esquer-Blasco, R. (2003). The human serum proteome: Display of nearly 3700 chromatographically separated protein spots on two-dimensional electrophoresis gels and identification of 325 distinct proteins. Proteomics 3:1345-1364. Miraglia, N., Basile, A., Pieri, M., Acampora, A., Malorni, L., Giulio, B. D. & Sannolo, N. (2002). Ion trap mass spectrometry in the structural analysis of haemoglobin peptides modified by epichlorohydrin and diepoxybutane. Rapid Commun Mass Spectrom 16:840-847. Missiakas, D., Schwager, F., Betton, J., Georgopoulos, C. & Raina, S. (1996). Identification and characterization of HsIV HsIU (ClpQ ClpY) proteins involved in overall proteolysis of misfolded proteins

L. Ting, UNSW. 217 References

in Escherichia coli. EMBO J 15:6899-6909. Miyagi, M. & Rao, K. C. (2007). Proteolytic 18O-labeling strategies for quantitative proteomics. Mass Spectrom Rev 26:121-136. Mizushima, T., Kataoka, K., Ogata, Y., Inoue, R. & Sekimizu, K. (1997). Increase in negative supercoiling of plasmid DNA in Escherichia coli exposed to cold shock. Mol Microbiol 23:381-386. Mogk, A., Deuerling, E., Vorderwulbecke, S., Vierling, E. & Bukau, B. (2003). Small heat shock proteins, ClpB and the DnaK system form a functional triade in reversing protein aggregation. Mol Microbiol 50:585-595. Molloy, M., Phadke, N., Maddock, J. & Andrews, P. (2001). Two-dimensional electrophoresis and peptide mass fingerprinting of bacterial outer membrane proteins. Electrophoresis 22:1686-1696. Molloy, M. P., Brzezinsk, E. E., Hang, J., McDowell, M. T. & VanBogelen, R. A. (2003). Overcoming technical variation and biological variation in quantitative proteomics. Proteomics 3:1912-1919. Monroe, M. E., Shaw, J. L., Daly, D. S., Adkins, J. N. & Smith, R. D. (2008). MASIC: A software program for fast quantitation and flexible visualization of chromatographic profiles from detected LC- MS(/MS) features. Comput Biol Chem 32:215-217. Moon, J. C., Hah, Y. S., Kim, W. Y., Jung, B. G., Jang, H. H., Lee, J. R., Kim, S. Y., Lee, Y. M., Jeon, M. G. & Kim, C. W. (2005). Oxidative stress-dependent structural and functional switching of a human 2-Cys peroxiredoxin isotype II that enhances HeLa cell resistance to H2O2-induced cell death. J Biol Chem 280:28775-28784. Moraitis, C. & Curran, B. P. G. (2004). Reactive oxygen species may influence the heat shock response and stress tolerance in the yeast Saccharomyces cerevisiae. Yeast 21:313-323. Morbidoni, H. R., de Mendoza, D. & Cronan Jr, J. E. (1995). Synthesis of sn-glycerol 3-phosphate, a key precursor of membrane lipids. J Bacteriol 177:5899–5905. Morel, M. H., Dehlon, P., Autran, J. C., Leygue, J. P. & Bar-L'Helgouac'h, C. (2000). Effects of temperature, sonication time, and power settings on size distribution and extractability of total wheat flour proteins as determined by size-exclusion high-performance liquid chromatography. Cereal Chem 77:685- 691. Mori, H. & Ito, K. (2001). The Sec protein-translocation pathway. Trends Microbiol 9:494-500. Moriarity, J. L., Hurt, K. J., Resnick, A. C., Storm, P. B., Laroy, W., Schnaar, R. L. & Snyder, S. H. (2002). UDP-glucuronate decarboxylase, a key enzyme in proteoglycan synthesis. Cloning, characterization, and localization. . J Biol Chem 277:16968-16975. Murata, Y., Homma, T., Kitagawa, E., Momose, Y., Sato, M. S., Odani, M., Shimizu, H., Hasegawa- Mizusawa, M., Matsumoto, R. & Mizukami, S. (2006). Genome-wide expression analysis of yeast response during exposure to 4°C. Extremophiles 10:117-128. Naidu, B. P., Paleg, L. G., Aspinall, D., Jennings, A. C. & Jones, G. P. (1991). Amino acid and glycine betaine accumulation in cold-stressed wheat seedlings. Phytochemistry 30:407-409. Nakaminami, K., Karlson, D. T. & Imai, R. (2006). Functional conservation of cold shock domains in bacteria and higher plants. Proc Natl Acad Sci USA 103:10122-10127. Nakayama, K., Saito, T., Fukui, T., Shirakura, Y. & Tomita, K. (1985). Purification and properties of extracellular poly (3-hydroxybutyrate) depolymerases from Pseudomonas lemoignei. Biochim Biophys Acta 827:63-72. Nedwell, D. B. (1999). Effect of low temperature on microbial growth: Lowered affinity for substrates limits growth at low temperature. FEMS Microbiol Ecol 30:101-111. Nelson, C. J., Huttlin, E. L., Hegeman, A. D., Harms, A. C. & Sussman, M. R. (2007). Implications of 15N-metabolic labeling for automated peptide identification in Arabidopsis thaliana. Proteomics 7:1279- 1292. Nemecek-Marshall, M., Wojciechowski, C., Wagner, W. P. & Fall, R. (1999). Acetone formation in the Vibrio family: A new pathway for bacterial leucine catabolism. J Bacteriol 181:7493-7499. Nesvizhskii, A. I., Vitek, O. & Aebersold, R. (2007). Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat Methods 4:787-797. Neugebauer, H., Herrmann, C., Kammer, W., Schwarz, G., Nordheim, A. & Braun, V. (2005). ExbBD-dependent transport of maltodextrins through the novel MalA protein across the outer membrane of Caulobacter crescentus. J Bacteriol 187:8300-8311. Nevot, M., Deroncele, V., Messner, P., Guinea, J. & Mercade, E. (2006). Characterization of outer membrane vesicles released by the psychrotolerant bacterium Pseudoalteromonas antarctica NF3. Environ Microbiol 8:1523-1533. Nichols, D. S., Miller, M. R., Davies, N. W., Goodchild, A., Raftery, M. & Cavicchioli, R. (2004). Cold adaptation in the Antarctic archaeon Methanococcoides burtonii involves membrane lipid unsaturation. J Bacteriol 186:8508-8515.

218 L. Ting, UNSW. References

Nimura, K., Takahashi, H. & Yoshikawa, H. (2001). Characterization of the dnaK multigene family in the Cyanobacterium Synechococcus sp. strain PCC7942. J Bacteriol 183:1320-1328. Noda, C. & Ichihara, A. (1976). Control of ketogenesis from amino acids. IV. Tissue specificity in oixidation of leucine, tyrosine, and lysine. J Biochem 80:1159-1164. North, M. J. (1989). Prevention of unwanted proteolysis. In Proteolytic Enzymes: A Practical Approach, pp. 105–124. Edited by R. J. Beynon & J. S. Bond. Oxford: IRL Press. Nouwen, N. & Driessen, A. J. M. (2005).Inactivation of protein translocation by cold-sensitive mutations in the yajC-secDF operon, pp. 6852-6855: Am Soc Microbiol. Oberg, A. L., Mahoney, D. W., Eckel-Passow, J. E., Malone, C. J., Wolfinger, R. D., Hill, E. G., Cooper, L. T., Onuma, O. K., Spiro, C., Therneau, T. M. & Bergen III, H. R. (2008). Statistical analysis of relative labeled mass spectrometry data from complex samples using ANOVA. J Proteome Res 7:225-233. Oda, T., Nakamura, A., Shikayama, M., Kawano, I., Ishimatsu, A. & Muramatsu, T. (1997). Generation of reactive oxygen species by raphidophycean phytoplankton. Biosci Biotechnol Biochem 61:1658-1662. Oda, Y., Huang, K., Cross, F. R., Cowburn, D. & Chait, B. T. (1999). Accurate quantitation of protein expression and site-specific phosphorylation. Proc Natl Acad Sci USA 96:6591-6596. Old, W. M., Meyer-Arendt, K., Aveline-Wolf, L., Pierce, K. G., Mendoza, A., Sevinsky, J. R., Resing, K. A. & Ahn, N. G. (2005). Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol Cell Proteomics 4:1487-1502. Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H., Pandey, A. & Mann, M. (2002). Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1:376-386. Ong, S. E., Foster, L. J. & Mann, M. (2003). Mass spectrometric-based approaches in quantitative proteomics. Methods 29:124-130. Ong, S. E., Kratchmarova, I. & Mann, M. (2003). Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC). J Prot Res 2:173-181. Ostrowski, M., Cavicchioli, R., Blaauw, M. & Gottschal, J. (2001). Specific growth rate plays a critical role in hydrogen peroxide resistance of the marine oligotrophic ultramicrobacterium Sphingomonas alaskensis strain RB2256. Appl Environ Microbiol 67:1292-1299. Ostrowski, M., Fegatella, F., Wasinger, V., Guilhaus, M., Corthals, G. & Cavicchioli, R. (2004). Cross-species identification of proteins from proteome profiles of the marine oligotrophic ultramicrobacterium, Sphingopyxis alaskensis. Proteomics 4:1779-1788. Owens, R. M., Pritchard, G., Skipp, P., Hodey, M., Connell, S. R., Nierhaus, K. H. & O'Connor, C. D. (2004). A dedicated translation factor controls the synthesis of the global regulator Fis. EMBO J 23:3375-3385. Pace, N. R. (1997). A molecular view of microbial diversity and the biosphere. Science 276:734-740. Palagi, P. M., Walther, D., Quadroni, M., Catherinet, S., Burgess, J., Zimmermann-Ivol, C. G., Sanchez, J. C., Binz, P. A., Hochstrasser, D. F. & Appel, R. D. (2005). MSight: an image analysis software for liquid chromatography-mass spectrometry. Proteomics 5:2381-2384. Pandey, A. & Mann, M. (2000). Proteomics to study genes and genomes. Nature 405:837-846. Panoff, J. M., Thammavongs, B. & Guéguen, M. (2000). Cryoprotectants lead to phenotypic adaptation to freeze–thaw stress in Lactobacillus delbrueckii ssp. bulgaricus CIP 101027T. Cryobiology 40:264-269. Paoletti, A. C., Parmely, T. J., Tomomori-Sato, C., Sato, S., Zhu, D., Conaway, R. C., Conaway, J. W., Florens, L. & Washburn, M. P. (2006). Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proc Natl Acad Sci USA 103:18928- 18933. Park, S. K., Venable, J. D., Xu, T. & Yates Iii, J. R. (2008). A quantitative analysis software tool for mass spectrometry–based proteomics. Nat Methods 5:319-322. Parsons, L., Eisenstein, E. & Orban, J. (2001). Solution structure of HI 0257, a bacterial ribosome binding protein. Biochemistry 40:10979-10986. Pasa-Tolic, L., Jensen, P. K., Anderson, G. A., Lipton, M. S., Peden, K. K., Martinovic, S., Tolic, N., Bruce, J. E. & Smith, R. D. (1999). High throughput proteome-wide precision measurements of protein expression using mass spectrometry. J Am Chem Soc 121:7949-7950. Patel, J. J. & Gerson, T. (1974). Formation and utilisation of carbon reserves by Rhizobium. Arch Microbiol 101:211-220. Patterson, S. D. (1994). From electrophoretically separated protein to identification: Strategies for sequence and mass analysis. Anal Biochem 221:1-15. Patton, W. F., Schulenberg, B. & Steinberg, T. H. (2002). Two-dimensional gel electrophoresis; Better

L. Ting, UNSW. 219 References

than a poke in the ICAT? Curr Opin Biotechnol 13:321-328. Pavelka, N., Fournier, M. L., Swanson, S. K., Pelizzola, M., Ricciardi-Castagnoli, P., Florens, L. & Washburn, M. P. (2008). Statistical similarities between transcriptomics and quantitative shotgun proteomics data. Mol Cell Proteomics 7:631-644. Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J. & Gygi, S. P. (2003). Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large- scale protein analysis: The yeast proteome. J Proteome Res 2:43-50. Peng, J. & Gygi, S. P. (2001). Proteomics: the move to mixtures. J Mass Spectrom 36:1083-1091. Perkins, D. N., Pappin, D. J., Creasy, D. & Cottrell, J. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551-3567. Phadtare, S. & Inouye, M. (2004). Genome-wide transcriptional analysis of the cold shock response in wild-type and cold-sensitive, quadruple-csp-deletion strains of Escherichia coli. J Bacteriol 186:7007- 7014. Phadtare, S., Kazakov, T., Bubunenko, M., Court, D. L., Pestova, T. & K., S. (2007). Transcription antitermination by translation initiation factor IF1. J Bacteriol 189:4087-4093. Pogliano, K. J. & Beckwith, J. (1993). The Cs sec mutants of Escherichia coli reflect the cold sensitivity of protein export itself. Genetics 133:763-773. Polgár, L. (2002). The prolyl oligopeptidase family. Cell Mol Life Sci 59:349-362. Porankiewicz, J. & Clarke, A. K. (1997). Induction of the heat shock protein ClpB affects cold acclimation in the cyanobacterium Synechococcus sp. strain PCC 7942. J Bacteriol 179:5111-5117. Posen, S. (1967). Alkaline phosphatase. Ann Intern Med 67:183-203. Potier, P., Drevet, P., Gounot, A.-M. & Hipkiss, A. R. (1990). Temperature dependent changes in proteolytic activities and protein composition in the psychrotrophic bacterium Arthrobacter globiformis S155. J Gen Microbiol 136:283-291. Potter, M., Madkour, M. H., Mayer, F. & Steinbuchel, A. (2002). Regulation of phasin expression and polyhydroxyalkanoate (PHA) granule formation in Ralstonia eutropha H16. Microbiology 148:2413- 2426. Prud'homme-Genereux, A., Beran, R. K., Iost, I., Ramey, C. S., Mackie, G. A. & Simons, R. W. (2004). Physical and functional interactions among RNase E, polynucleotide phosphorylase and the cold- shock protein, CsdA: evidence for a 'cold shock degradosome'. Mol Microbiol 54:1409-1421. Purusharth, R. I., Madhuri, B. & Ray, M. K. (2007). R in Pseudomonas syringae is essential for growth at low temperature and plays a novel role in the 3' end processing of 16 and 5S ribosomal RNA. J Biol Chem 282:16267. Py, B., Higgins, C. F., Krisch, H. M. & Carpousis, A. J. (1996). A DEAD-box RNA helicase in the Escherichia coli RNA degradosome. Nature 381:169-172. Qin, F., Sakuma, Y., Li, J., Liu, Q., Li, Y. Q., Shinozaki, K. & Yamaguchi-Shinozaki, K. (2004). Cloning and functional analysis of a novel DREB1/CBF transcription factor involved in cold-responsive gene expression in Zea mays L. Plant Cell Physiol 45:1042-1052. Qiu, Y., Kathariou, S. & Lubman, D. M. (2006). Proteomic analysis of cold adaptation in a Siberian permafrost bacterium- Exiguobacterium sibiricum 255-15 by two-dimensional liquid separation coupled with mass spectrometry. Proteomics 6:5221-5233. Rabilloud, T. (2002). Two-dimensional gel electrophoresis in proteomics: Old, old fashioned, but it still climbs up the mountains. Proteomics 2:3-10. Rak, A., Kalinin, A., Shcherbakov, D. & Bayer, P. (2002). Solution structure of the ribosome- associated cold shock response protein Yfia of Escherichia coli. Biochem Biophys Res Commun 299:710- 714. Ramos, F., Thuriaux, P., Wiame, J. M. & Bechet, J. (1970). The participation of ornithine and citrulline in the regulation of arginine metabolism in Saccharomyces cerevisiae. Eur J Biochem 12:40-47. Ramos-Fernandez, A., Lopez-Ferrer, D. & Vazquez, J. (2007). Improved method for differential expression proteomics using trypsin-catalyzed 18O labeling with a correction for labeling efficiency. Mol Cell Proteomics 6:1274-1286. Rappsilber, J., Ryder, U., Lamond, A. & Mann, M. (2002). Large-scale proteomic analysis of the human spliceosome. Genome Res 12:1231-1245. Reeslev, M., Jorgensen, B. B. & Jorgensen, O. B. (1996). Exopolysaccharide production and morphology of Aureobasidium pullulans grown in continuous cultivation with varying ammonium- glucose ratio in the growth medium. J Biotechnol 51:131-135. Rehm, B. H. A. & Steinbuchel, A. (1999). Biochemical and genetic analysis of PHA synthases and other proteins required for PHA synthesis. Int J Biol Macromol 25:3-19. Reitzer, L. J. & Magasanik, B. (1996). Ammonia assimilation and the biosynthesis of glutamine,

220 L. Ting, UNSW. References

glutamate, aspartate, asparagine, L-alanine, and D-alanine. In Escherichia coli and Salmonella: Cellular and molecular biology, 2nd ed, pp. 391–407. Edited by F. C. Neidhart, Curtiss, R., Ingraham, J.L., Lin, E.C.C., Low, K.B., Magasanik, B. Washington, DC: American Society for Microbiology Press. Ren, H., Dover, L. G., Islam, S. T., Alexander, D. C., Chen, J. M., Besra, G. S. & Liu, J. (2007). Identification of the lipooligosaccharide biosynthetic gene cluster from Mycobacterium marinum. Mol Microbiol 63:1345-1359. Renner, T. & Waters, E. R. (2007). Comparative genomic analysis of the Hsp70s from five diverse photosynthetic eukaryotes. Cell Stress Chaperones 12:172. Repaske, R. & Repaske, A. C. (1976). Quantitative requirements for exponential growth of Alcaligenes eutrophus. Appl Environ Microbiol 32:585-591. Roberts, I. S. (1996). The biochemistry and genetics of capsular polysaccharide production in bacteria. Annu Rev Microbiol 50:285-315. Roberts, M. R. & Inniss, W. E. (1992). The synthesis of cold shock proteins and cold acclimation proteins in the psychrophilic bacterium Aquaspirillum arcticum. Curr Microbiol 25:275-278. Rocke, D. M. (2004). Design and analysis of experiments with high throughput biological assay data. Semin Cell Dev Biol 15:703-713. Rodionov, D. A., Hebbeln, P., Gelfand, M. S. & Eitinger, T. (2006). Comparative and functional genomic analysis of prokaryotic nickel and cobalt uptake transporters: Evidence for a novel group of ATP-binding cassette transporters. J Bacteriol 188:317-327. Rodrigues, D. F., Goris, J., Vishnivetskaya, T., Gilichinsky, D., Thomashow, M. F. & Tiedje, J. M. (2006). Characterization of Exiguobacterium isolates from the Siberian permafrost. Description of Exiguobacterium sibiricum sp. nov. Extremophiles 10:285-294. Rodrigues, D. F. & Tiedje, J. M. (2008). Coping with our cold planet. Appl Environ Microbiol 74:1677- 1686. Rodríguez-Quiñones, F., Maguire, M., Wallington, E. J., Gould, P. S., Yerko, V., Downie, J. A. & Lund, P. A. (2005). Two of the three groEL homologues in Rhizobium leguminosarum are dispensable for normal growth. Arch Microbiol 183:253-265. Rohrwild, M., Coux, O., Huang, H., Moerschell, R., Yoo, S., Seol, J., Chung, C. & Goldberg, A. (1996). HslV-HslU: A novel ATP-dependent protease complex in Escherichia coli related to the eukaryotic proteasome. Proc Natl Acad Sci USA 93:5808-5813. Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B., Parker, K., Hattan, S., Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., Bartlet-Jones, M., He, F., Jacobson, A. & Pappin, D. J. (2004). Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 3:1154-1169. Roxas, B. & Li, Q. (2008). Significance analysis of microarray for relative quantitation of LC/MS data in proteomics. BMC Bioinformatics 9:187. Roy, S., Anderle, M., Lin, H. & Becker, C. (2004). Differential expression profiling of serum proteins and metabolites for biomarker discovery. Int J Mass Spectrom 238:163-171. Rudolph, F. B. (1994). The biochemistry and physiology of nucleotides. J Nutr 124:1245-1275. Rusanganwa, E. & Gupta, R. S. (1993). Cloning and characterization of multiple groEL chaperonin- encoding genes in Rhizobium meliloti. Gene 126:67-75. Russell, N. J. (1990). Cold adaptation of microorganisms. Philos Trans R Soc Lond B Biol Sci 326:595- 611. Russell, N. J. (1997). Psychrophilic bacteria-molecular adaptations of membrane lipids. Comp Biochem Physiol A Physiol 118:489-493. Russell, N. J. (2000). Toward a molecular understanding of cold activity of enzymes from psychrophiles. Extremophiles 4:83-90. Russell, N. J. & Nichols, D. S. (1999). Polyunsaturated fatty acids in marine bacteria--a dogma rewritten. Microbiology 145 ( Pt 4):767-779. Sabine, C. L., Feely, R. A., Gruber, N., Key, R. M., Lee, K., Bullister, J. L., Wanninkhof, R., Wong, C. S., Wallace, D. W. R., Tilbrook, B., Millero, F. J., Peng, T.-H., Kozyr, A., Ono, T. & Rios, A. F. (2004). The oceanic sink for anthropogenic CO2. Science 305:367-371. Sauve, A. & Speed, T. (2004). Normalization, baseline correction and alignment of high-throughput mass spectrometry data. Proceedings Gensips 4. Schade, B., Jansen, G., Whiteway, M., Entian, K. D. & Thomas, D. Y. (2004). Cold adaptation in budding yeast. Mol Biol Cell 15:5492-5502. Schauer, K., Gouget, B., Carriere, M., Labigne, A. & Reuse, H. d. (2007). Novel nickel transport mechanism across the bacterial outer membrane energized by the TonB/ExbB/ExbD machinery. Mol Microbiol 63:1054-1068.

L. Ting, UNSW. 221 References

Schauer, K., Rodionov, D. A. & de Reuse, H. (2008). New substrates for TonB-dependent transport: do we only see the 'tip of the iceberg'? Trends Biochem Sci 33:330-338. Schulze, W. X. & Mann, M. (2004). A novel proteomic screen for peptide-protein interactions. J Biol Chem 279:10756-10764. Schut, F. (1994). Ecophysiology of a marine ultramicrobacterium. PhD thesis:Department of Microbiology, University of Groningen. Schut, F., de Vries, E., Gottschal, J., Robertson, B., Harder, W., Prins, R. & Button, D. (1993). Isolation of typical marine bacteria by dilution culture: Growth, maintenance, and characteristics of isolates under laboratory conditions. Appl Environ Microbiol 59:2150-2160. Schut, F., Gottschal, J. C. & Prins, R. A. (1997). Isolation and characterisation of the marine ultramicrobacterium Sphingomonas sp. strain RB2256. FEMS Microbiol Rev 20:363-369. Schut, F., Jansen, M., Gomes, T., Gottschal, J., Harder, W. & Prins, R. (1995). Substrate uptake and utilization by a marine ultramicrobacterium. Microbiology 141:351-361. Seaton, B. L. & Vickery, L. E. (1994). A gene encoding a DnaK/hsp70 homolog in Escherichia coli. Proc Natl Acad Sci USA 91:2066-2070. Seo, J. B., Kim, H. S., Jung, G. Y., Nam, M. H., Chung, J. H., Kim, J. Y., Yoo, J. S., Kim, C. W. & Kwon, O. (2004). Short communication psychrophilicity of Bacillus psychrosaccharolyticus: A proteomic study. Proteomics 4:3654-3659. Sethi, S., Surface, J. M. & Murphy, T. F. (1997). Antigenic heterogeneity and molecular analysis of CopB of Moraxella (Branhamella) catarrhalis. Infect Immun 65:3666-3671. Shadforth, I. P., Dunkley, T. P., Lilley, K. S. & Bessant, C. (2005). i-Tracker: For quantitative proteomics using iTRAQ. BMC Genomics 6:145-151. Shen, Y., Jacobs, J., Camp 2nd, D., Fang, R., Moore, R., Smith, R., Xiao, W., Davis, R. & Tompkins, R. (2004). Ultra-high-efficiency strong cation exchange LC/RPLC/MS/MS for high dynamic range characterization of the human plasma proteome. Anal Chem 76:1134-1144. Shiba, K., Ito, K., Nakamura, Y., Dondon, J. & Grunberg-Manago, M. (1986). Altered translation initiation factor 2 in the cold-sensitive ssyG mutant affects protein export in Escherichia coli. EMBO J 5:3001-3006. Shinozaki, K., Yamaguchi-Shinozaki, K. & Seki, M. (2003). Regulatory network of gene expression in the drought and cold stress responses. Curr Opin Plant Biol 6:410-417. Siddiqui, K. S. & Cavicchioli, R. (2006). Cold-adapted enzymes. Annu Rev Biochem 75:403-433. Singer, G. A. C. & Hickey, D. A. (2000). Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Biol Evol 17:1581-1588. Siuzdak, G. (1996). Mass spectrometry for biotechnology. San Diego, CA, USA: Academic Press. Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:3. Snijders, A. P. L., de Vos, M. G. J. & Wright, P. C. (2005). Novel approach for peptide quantitation and sequencing based on 15N and 13C metabolic labeling. J Prot Res 4:578-585. Solioz, M. & Odermatt, A. (1995). Copper and silver transport by CopB-ATPase in membrane vesicles of Enterococcus hirae. J Biol Chem 270:9217-9221. Srere, P. A. (1987). Complexes of Sequential Metabolic Enzymes. Annu Rev Biochem 56:89-124. Staugger, G. V. (1996). Biosynthesis of serine, glycine, and one-carbon units. In Escherichia coli and Salmonella: cellular and molecular biology, 2ed. Edited by F. C. Neidhardt, R. C. III, J. L. Ingraham, E. C. C. Lin, K. B. Low, B. Magasanik, W. S. Reznikoff, M. Riley, M. Schaechter & H. E. Umbarger. Washington DC: ASM Press. Stead, D. A., Paton, N. W., Missier, P., Embury, S. M., Hedeler, C., Jin, B., Brown, A. J. P. & Preece, A. (2008). Information quality in proteomics. Brief Bioinform 9:174-188. Storey, J. D. & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440-9445. Sudesh, K., Abe, H. & Doi, Y. (2000). Synthesis, structure and properties of polyhydroxyalkanoates: Biological polyesters. Prog Polym Sci 25:1503-1555. Sugimoto, A., Shiraki, M., Hatakeyama, S. & Saito, T. (2008). Secretion pathway for the poly (3- hydroxybutyrate) depolymerase in Ralstonia pickettii T1. Antonie Van Leeuwenhoek 94:223-232. Sutherland, I. W. (1985). Biosynthesis and composition of Gram-negative bacterial extracellular and wall polysaccharides. Annu Rev Microbiol 39:243-270. Sutherland, I. W. (2001). Microbial polysaccharides from Gram-negative bacteria. Int Dairy J 11:663- 674. Suzuki, Y., Haruki, M., Takano, K., Morikawa, M. & Kanaya, S. (2004). Possible involvement of an FKBP family member protein from a psychrotrophic bacterium Shewanella sp. SIB1 in cold-adaptation.

222 L. Ting, UNSW. References

Eur J Biochem 271:1372-1381. Tabb, D. L., McDonald, W. H. & Yates III, J. R. (2002). DTASelect and Contrast: Tools for assembling and comparing protein identifications from shotgun proteomics. J Prot Res 1:21-26. Tan, X., Varughese, M. & Widger, W. R. (1994). A light-repressed transcript found in Synechococcus PCC 7002 is similar to a chloroplast-specific small subunit ribosomal protein and to a transcription modulator protein associated with sigma 54. J Biol Chem 269:20905-20912. Tang, H., Arnold, R., Alves, P., Xun, Z., Clemmer, D., Novotny, M., Reilly, J. & Radivojac, P. (2006). A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 22:e481-e-488. Tanio, T., Fukui, T., Shirakura, Y., Saito, T., Tomita, K., Kaiho, T. & Masamune, S. (1982). An extracellular poly (3-hydroxybutyrate) depolymerase from Alcaligenes faecalis. Eur J Biochem 124:71- 77. Taurhesia, S. & McNeil, B. (1994). Physicochemical factors affecting the formation of the biological response modifier scleroglucan. J Chem Technol Biotechnol 59:157-163. Taylor, C. F., Binz, P. A., Aebersold, R., Affolter, M., Barkovich, R., Deutsch, E. W., Horn, D. M., Andreas, H., Kussmann, M. & Lilley, K. (2008). Guidelines for reporting the use of mass spectrometry in proteomics. Nat Biotechnol 26:860. Taylor, C. F., Hermjakob, H., Julian Jr, R. K., Garavelli, J. S., Aebersold, R. & Apweiler, R. (2006). The work of the Human Proteome Organisation's Proteomics Standards Initiative (HUPO PSI). OMICS 10:145-151. Taylor, C. F., Paton, N. W., Lilley, K. S., Binz, P. A., Julian, R. K., Jones, A. R., Zhu, W., Apweiler, R., Aebersold, R. & Deutsch, E. W. (2007). The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887. Taylor, C. J., Anderson, A. J. & Wilkinson, S. G. (1998). Phenotypic variation of lipid composition in Burkholderia cepacia: a response to increased growth temperature is a greater content of 2-hydroxy acids in phosphatidylethanolamine and ornithine amide lipid. Microbiology 144:1737-1745. Thatje, S., Anger, K., Calcagno, J. A., Lovrich, G. A., Pörtner, H. O. & Arntz, W. E. (2005). Challenging the cold: Crabs reconquer the Antarctic. Ecology 86:619-625. Thieringer, H. A., Jones, P. G. & Inouye, M. (1998). Cold shock and adaptation. Bioessays 20:49-57. Tiku, P. E., Gracey, A. Y., Macartney, A. I., Beynon, R. J. & Cossins, A. R. (1996). Cold-induced expression of delta9-desaturase in Carp by transcriptional and posttranslational mechanisms. Science 271:815-818. Ting, L., Cowley, M. J., Hoon, S. L., Guilhaus, M., Raftery, M. & Cavicchioli, R. (2009). Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Mol Cell Proteomics 8:2227-2242. Uefuji, M., Kasuya, K. & Doi, Y. (1997). Enzymatic degradation of poly [(R)-3-hydroxybutyrate]: Secretion and properties of PHB depolymerase from Pseudomonas stutzeri. Polym Degrad Stab 58:275- 281. Unlu, M., Morgan, M. & Minden, J. (1997). Difference gel electrophoresis: A single gel method for detecting changes in protein extracts. Electrophoresis 18:2071-2077. Usaite, R., Wohlschlegel, J., Venable, J. D., Park, S. K., Nielsen, J., Olsson, L. & Yates III, J. R. (2008). Characterization of global yeast quantitative proteome data generated from the wild-type and glucose repression Saccharomyces cerevisiae strains: The comparison of two quantitative methods. J Proteome Res 7:266-275. Valentin, H. E., Lee, E. Y., Choi, C. Y. & Steinbüchel, A. (1994). Identification of 4-hydroxyhexanoic acid as a new constituent of biosynthetic polyhydroxyalkanoic acids from bacteria. Appl Microbiol Biotechnol 40:710-716. Van De Venter, H. A. (1985). Cyanide-resistant and cold resistance in seedlings of Maize (Zea mays L.). Ann Bot 56:561-563. Venable, J. D., Dong, M. Q., Wohlschlegel, J., Dillin, A. & Yates III, J. R. (2004). Automated approach for quantitative analysis of complex mixtures from tandem mass spectra. Nat Methods 1:1-7. Villa, S. T., Xu, Q., Downie, A. B. & Clarke, S. G. (2006). Arabidopsis protein repair l-isoaspartyl methyltransferases: Predominant activities at lethal temperatures. Physiol Plant 128:581-592. Viola, R. E. (2001). The central enzymes of the aspartate family of amino acid biosynthesis. Acc Chem Res 34:339-350. Voelker, D. R. (1997). Phosphatidylserine decarboxylase. Biochim Biophys Acta, Lipid Lipid Metab 1348:236-244. Vuilleumier, S. & Pagni, M. (2002). The elusive roles of bacterial glutathione S-transferases: new lessons from genomes. Appl Microbiol Biotechnol 58:138-146.

L. Ting, UNSW. 223 References

Wagner, M., Naik, D. & Pothen, A. (2003). Protocols for disease classification from mass spectrometry data. Proteomics 3:1692-1698. Wang, Q. F., Miao, J. L., Hou, Y. H., Ding, Y. & Li, G. Y. (2006). Expression of CspA and GST by an Antarctic psychrophilic bacterium Colwellia sp. NJ341 at near-freezing temperature. World J Microbiol Biotechnol 22:311-316. Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T., Hill, L., Norton, S., Kumar, P., Anderle, M. & Becker, C. (1999). Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Nat Biotechnol 17:994-999. Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T., Hill, L., Norton, S., Kumar, P., Anderle, M. & Becker, C. (2003). Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal Chem 75:4818-4826. Wang, Y. K., Ma, Z., Quinn, D. F. & Fu, E. W. (2002). Inverse 15N-metabolic labeling/mass spectrometry for comparative proteomics and rapid identification of protein markers/targets. Rapid Commun Mass Spectrom 16:1389-1397. Ward, A. C., Rowley, B. I. & Dawes, E. A. (1977). Effect of nitrogen and oxygen limitation on poly-b- hydroxybutyrate biosynthesis in ammonium-grown Azotobacter beijerinckii. J Gen Microbiol 102:61-68. Washburn, M. P., Ulaszek, R., Deciu, C., Schieltz, D. M. & Yates III, J. R. (2002). Analysis of quantitative proteomic data generated via multidimensional protein identification technology. Anal Chem 74:1650-1657. Wasinger, V. C., Cordwell, S. J., Cerpa-Poljak, A., Yan, J. X., Gooley, A. A., Wilkins, M. R., Duncan, M. W., Harris, R., Williams, K. L. & Humphery-Smith, I. (1995). Progress with gene- product mapping of the Mollicutes: Mycoplasma genitalium. Electrophoresis 16:1090-1094. Weber, M. H. W., Klein, W., Muller, L., Niess, U. M. & Marahiel, M. A. (2001). Role of the Bacillus subtilis fatty acid desaturase in membrane adaptation during cold shock. Mol Microbiol 39:1321-1329. Weber, T. E. & Bosworth, B. G. (2005). Effects of 28 day exposure to cold temperature or feed restriction on growth, body composition, and expression of genes related to muscle growth and metabolism in channel catfish. Aquaculture 246:483-492. Weinbauer, M. & Rassoulzadegan, F. (2004). Are viruses driving microbial diversification and diversity? Environ Microbiol 6:1. Weinberg, M. V., Schut, G. J., Brehm, S., Datta, S. & Adams, M. W. W. (2005). Cold shock of a hyperthermophilic archaeon: Pyrococcus furiosus exhibits multiple responses to a suboptimal growth temperature with a key role for membrane-bound glycoproteins. J Bacteriol 187:336-348. Weiss, J. B., Ray, P. H. & Bassford, P. J. (1988). Purified SecB protein of Escherichia coli retards folding and promotes membrane translocation of the maltose-binding protein in vitro. Proc Natl Acad Sci USA 85:8978-8982. Werner-Washburne, M., Becker, J., Kosic-Smithers, J. & Craig, E. A. (1989). Yeast Hsp70 RNA levels vary in response to the physiological status of the cell. J Bacteriol 171:2680-2688. West, M. (2003). Bayesian factor regression models in the “large p, small n” paradigm. In Bayesian Statistics, pp. 723-732. Edited by J. M. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith & M. West. Oxford: Oxford University Press. White, D. C., Sutton, S. D. & Ringelberg, D. B. (1996). The genus Sphingomonas: physiology and ecology. Curr Opin Biotechnol 7:301-306. White-Ziegler, C. A., Um, S., Perez, N. M., Berns, A. L., Malhowski, A. J. & Young, S. (2008). Low temperature (23ºC) increases expression of biofilm-, cold-shock- and RpoS-dependent genes in Escherichia coli K-12. Microbiology 154:148-166. Whitman, W. B., Coleman, D. C. & Wiebe, W. J. (1998). Prokaryotes: The unseen majority. Proc Natl Acad Sci USA 95:6578-6583. Whyte, L. G. & Inniss, W. E. (1992). Cold shock proteins and cold acclimation proteins in a psychrotrophic bacterium. Can J Microbiol 38:1281-1285. Wiener, M. C. (2005). TonB-dependent outer membrane transport: going for Baroque? Curr Opin Struct Biol 15:394-400. Wilken, C., Kitzing, K., Kurzbauer, R., Ehrmann, M. & Clausen, T. (2004). Crystal structure of the DegS stress sensor: How a PDZdomain recognizes misfolded protein and activates a protease. Cell 117:483-494. Wilkins, M. R., Appel, R. D., Van Eyk, J. E., Chung, M. C. M., Gorg, A., Hecker, M., Huber, L. A., Langen, H., Link, A. J., Paik, Y.-K., Patterson, S. D., Pennington, S. R., Rabilloud, T., Simpson, R. J., Weiss, W. & Dunn, M. J. (2006). Guidelines for the next 10 years of proteomics. Proteomics 6:4-8. Wilkins, M. R., Pasquali, C., Appel, R. D., Ou, K., Golaz, O., Sanchez, J. C., Yan, J. X., Gooley, A. A., Hughes, G. & Humphery-Smith, I. (1996). From proteins to proteomes: Large scale protein

224 L. Ting, UNSW. References

identification by two-dimensional electrophoresis and amino acid analysis. Bio/Technology 14:61-65. Williams, T. J., Ertan, H., Ting, L. & Cavicchioli, R. (2009). Carbon and nitrogen substrate utilization in the marine facultative oligotroph Sphingopyxis alaskensis strain RB2256. ISME J 3:1036-1052. Wilm, M., Shevchenko, A., Houthaeve, T., Breit, S., Schweigerer, L., Fotsis, T. & Mann, M. (1996). Femtomole sequencing of proteins from polyacrylamide gels by nano-electrospray mass spectrometry. Nature 379:466-469. Winter, A. D., Eschenlauer, S. C. P., McCormack, G. & Page, A. P. (2007). Loss of secretory pathway FK506-binding proteins results in cold-sensitive lethality and associate extracellular matrix defects in the nematode Caenorhabditis elegans. J Biol Chem 282:12813-12821. Wittke, S., Kaiser, T. & Mischak, H. (2004). Differential polypeptide display: the search for the elusive target. J Chromatogr B 803:17-26. Wouters, J. A., Rombouts, F. M., de Vos, W. M., Kuipers, O. P. & Abee, T. (1999). Cold shock proteins and low-temperature response of Streptococcus thermophilus CNRZ302. Appl Environ Microbiol 65:4436-4442. Wu, C. C. & MacCoss, M. J. (2007). Quantitative proteomic analysis of mammalian organisms using metabolically labeled tissues. In Quantitative Proteomics by mass spectrometry, pp. 191-201. Edited by S. Sechi. Totowa: Humana Press. Wu, C. C., MacCoss, M. J., Howell, K. E., Matthews, D. E. & Yates III, J. R. (2004). Metabolic labeling of mammalian organisms with stable isotopes for quantitative proteomic analysis. Anal Chem 76:4951-4959. Xia, Q., Hendrickson, E. L., Zhang, Y., Wang, T., Taub, F., Moore, B. C., Porat, I., Whitman, W. B., Hackett, M. & Leigh, J. A. (2006). Quantitative proteomics of the archaeon Methanococcus maripaludis validated by microarray analysis and real time PCR. Mol Cell Proteomics 5:868-881. Xia, Q., Wang, T., Park, Y., Lamont, R. J. & Hackett, M. (2007). Differential quantitative proteomics of Porphyromonas gingivalis by linear ion trap mass spectrometry: Non-label methods comparison, q- values and LOWESS curve fitting. Int J Mass Spectrom 259:105-116. Xing, W. & Rajashekar, C. B. (2001). Glycine betaine involvement in freezing tolerance and water stress in Arabidopsis thaliana. Environ Exp Bot 46:21-28. Xiong, L., Schumaker, K. S. & Zhu, J. K. (2002). during cold, drought, and salt stress. Plant Cell 14:S165-183. Yakhnin, A. V. & Babitzke, P. (2002). NusA-stimulated RNA polymerase pausing and termination participates in the Bacillus subtilis trp operon attenuation mechanism in vitro. Proc Natl Acad Sci USA 99:11067-11072. Yamanaka, K. & Inouye, M. (2001). Selective mRNA degradation by polynucleotide phosphorylase in cold shock adaptation in Escherichia coli. J Bacteriol 183:2808-2816. Yamashino, T., Kakeda, M., Ueguchi, C. & Mizuno, T. (1994). An analogue of the DnaJ molecular chaperone whose expression is controlled by sigma s during the stationary phase and phosphate starvation in Escherichia coli. Mol Microbiol 13:475-483. Yamazaki, M., Thorne, L., Mikolajczak, M., Armentrout, R. W. & Pollock, T. J. (1996). Linkage of genes essential for synthesis of a polysaccharide capsule in Sphingomonas strain S88. J Bacteriol 178:2676-2687. Yan, W. & Chen, S. S. (2005). Mass spectrometry-based quantitative proteomic profiling. Brief Funct Genomic Proteomic 4:1-12. Yang, X. & Ishiguro, E. E. (2003). Temperature-sensitive growth and decreased thermotolerance associated with relA mutations in Escherichia coli. J Bacteriol 185:5765-5771. Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. & Speed, T. P. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30:e15. Yao, X., Afonso, C. & Fenselau, C. (2003). Dissection of proteolytic 18O labeling: endoprotease- catalyzed 16O-to-18O exchange of truncated peptide substrates. J Proteome Res 2:147-152. Yates III, J. R. (2004). Mass spectral analysis in proteomics. Annu Rev Biophys Biomol Struct 33:297- 316. Yates III, J. R. & Snyder, M. (2004). Proteomics and genomics. Curr Opin Chem Biol 8:1-2. Ye, K., Serganov, A., Hu, W., Garber, M. & Patel, D. J. (2002). Ribosome-associated factor Y adopts a fold resembling a double-stranded RNA binding domain scaffold. Eur J Biochem 269:5182-5191. Ye, M., Zou, H., Liu, Z. & Ni, J. (2000). Separation of peptides by strong cation-exchange capillary electrochromatography. J Chromatogr A 869:385-394. Yonaha, K., Nishie, M. & Aibara, S. (1992). The primary structure of omega-amino acid:pyruvate aminotransferase. J Biol Chem 267:12506-12510.

L. Ting, UNSW. 225 References

York, G. M., Junker, B. H., Stubbe, J. A. & Sinskey, A. J. (2001). Accumulation of the PhaP phasin of Ralstonia eutropha is dependent on production of polyhydroxybutyrate in cells. J Bacteriol 183:4217- 4226. Yoshimune, K., Galkin, A., Kulakova, L., Yoshimura, T. & Esaki, N. (2005). Cold-active DnaK of an Antarctic psychrotroph Shewanella sp. Ac10 supporting the growth of dnaK-null mutant of Escherichia coli at cold temperatures. Extremophiles 9:145-150. Young, J. C., Agashe, V. R., Siegers, K. & Hartl, F. U. (2004). Pathways of chaperone-mediated protein folding in the cytosol. Nat Rev Mol Cell Biol 5:781-791. Yu, X. X., Lewin, D. A., Forrest, W. & Adams, S. H. (2002). Cold elicits the simultaneous induction of fatty acid synthesis and beta-oxidation in murine brown adipose tissue: prediction from differential gene expression and confirmation in vivo. FASEB J 16:155-168. Yura, T. & Nakahigashi, K. (1999). Regulation of the heat-shock response. Curr Opin Microbiol 2:153- 158. Zangrossi, S., Briani, F., Ghisotti, D., Regonesi, M. E., Tortora, P. & Deho, G. (2000). Transcriptional and post-transcriptional control of polynucleotide phosphorylase during cold acclimation in Escherichia coli. Mol Microbiol 36:1470-1480. Zavodszky, P., Kardos, J., Svingor, A. & Petsko, G. A. (1998). Adjustment of conformational flexibility is a key event in the thermal adaptation of proteins. Proc Natl Acad Sci USA 95:7406-7411. Zevenhuizen, L. (1971). Chemical composition of exopolysaccharides of Rhizobium and Agrobacterium. J Gen Microbiol 68:239-243. Zevenhuizen, L. P. T. M. (1981). Cellular glycogen, β-1,2-glucan, poly-β-hydroxybutyric acid and extracellular polysaccharides in fast-growing species of Rhizobium. Antonie Van Leeuwenhoek 47:481- 497. Zhang, B., VerBerkmoes, N., Langston, M., Uberbacher, E., Hettich, R. & Samatova, N. (2006). Detecting differential and correlated protein expression in label-free shotgun proteomics. J Proteome Res 5:2909-2918. Zhang, Q., Shirley, N., Lahnstein, J. & Fincher, G. B. (2005). Characterization and expression patterns of UDP-d-glucuronate decarboxylase genes in Barley. Plant Physio 138:131-141. Zheng, S., Ponder, M. A., Shih, J. Y. J., Tiedje, J. M., Thomashow, M. F. & Lubman, D. M. (2007). A proteomic analysis of Psychrobacter articus 273-4 adaptation to low temperature and salinity using a 2-D liquid mapping approach. Electrophoresis 28:467-488. Zhong, H., Marcus, S. L. & Li, L. (2004). Two-dimensional mass spectra generated from the analysis of 15N-labeled and unlabeled peptides for efficient protein identification and de novo peptide sequencing. J Prot Res 3:1155-1163. Zolkiewski, M. (1999). ClpB Cooperates with DnaK, DnaJ, and GrpE in Suppressing Protein Aggregation: A novel multi-chaperone system from Escherichia coli. J Biol Chem 274:28083-28086. Zybailov, B., Coleman, M. K., Florens, L. & Washburn, M. P. (2005). Correlation of relative abundance ratios derived from peptide ion chromatograms and spectrum counting for quantitative proteomic analysis using stable isotope labeling. Anal Chem 77:6218-6224. Zybailov, B., Mosley, A. L., Sardiu, M. E., Coleman, M. K., Florens, L. & Washburn, M. P. (2006). Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae. J Proteome Res 5:2339-2347.

226 L. Ting, UNSW. Appendix A

Appendix A.

Table A.1. Additional 15N APE experiments. Experiment Locus tag Peptide 15N APE 1 Sala_0409 AREQIEMAQALGYATK 99 1 Sala_0409 EAVSALDETQAAISAIDAGNNK 99.7 1 Sala_0409 EQIEMAQALGYATK 99.8 1 Sala_1959 ALAENNGDIEASIDWLR 99.7 1 Sala_1959 TPVAEVVAAAGK 99.4 1 Sala_1959 VAAEGLVGFATDGTR 99.6 1 Sala_2287 DIGFDQASEISAR 99.5 1 Sala_2287 SDTHLLVVLNSDR 99.5 1 Sala_2287 VLFYLVGR 99.5 2 Sala_0786 AQEMALVAEAQSLPERDLPEYLAVDGTK 99.2 2 Sala_0786 GDTGQNLIGLLER 99.5 2 Sala_0786 VPTLDEVPYPVK 99.5 2 Sala_1694 AQVVAAVDQAAAANEAK 99.5 2 Sala_1694 SASADAVVVTTPAGEDVSLPR 99.7 2 Sala_1694 TAAAADDAAVAAALVAGAEVR 99.6 2 Sala_1951 ENNVDITEAVVTELNR 99.4 2 Sala_1951 ILPNVSIAVPAGYQPGQLVQQR 99.6 2 Sala_1951 KTPQNQAALQAAAK 99.6 3 Sala_0452 ANDKAGDGTTTATVLAQAIVR 99.8 3 Sala_0452 VGDVEQGFNAATDVYENLK 99.7 3 Sala_2286 TAVAIDTFINQK 99.6 3 Sala_2286 VVDGLGNPIDGK 99.7 3 Sala_2286 VVDGLGNPIDGKGPIK 99.6 3 Sala_2288 APEFVDQSTEASILVTGIK 99.7 3 Sala_2288 FTQAGSEVSALLGR 99.6 3 Sala_2288 LVLEVAQHLGENTVR 99.6 4 Sala_0799 AVEAVFDAITGALK 99.5 4 Sala_0799 MNKQDLIAAVADSSGLTK 99.3 4 Sala_0799 TGETMTIAASNQPK 99.2 4 Sala_2814 FVPVSVNEDMVGMK 99.3 4 Sala_2814 GPFVDLHLLK 99.4 4 Sala_2814 KAETAQEGGSTAPIK 99.5 4 Sala_2816 STLLSENNAVVFK 99.6 4 Sala_2816 VVGVNTLVTK 99.6 Average 99.53 Standard deviation 0.17 APE, atomic percent excess. Experimental isotopic 15N peptide profiles were compared to theoretically generated 15N profiles using IDCalc. 15N APE was determined by closest matches between experimental and theoretical profiles.

L. Ting, UNSW. 227 Appendix A A 903.16 903 N peptide 15 902.43 -010 le. D, Theoretical D, le. N and 14 902 901.93 N 99.6% 15 901.43 N 15 901 901.40 N isotopic profi 900.91 14 900.41 900 TQ-FT MS survey scan of TQ-FT 899 899.11 AREQIEMAQALGYATK Ions Score: 97 Expect: 1.8e A, Experimental L 898 DE B 897.51 N N 897 99.0% 99.4% 15 15 m/z 896.65 896 N APE. AREQIEMAQALGYATK. AREQIEMAQALGYATK. 15 895 894.42 894 les of Sala_0409 893.69 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 893 C 5.92E4 892.46 NL: 1 892 891.96 AV: N 14 891.46 N APE. E, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi APE. E, N N 33.49 15 891 14 890.96 RT: 890.46 2636 # 890.43 890 889.94 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative : ily_21_11_08_1 isotopic profi le of 99.4% isotopic profi derivatives. B, Theoretical isotopic profi le of 99% Theoretical isotopic profi derivatives. B, Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure L T

228 L. Ting. UNSW Appendix A N A 15 le. D, le. 1111 N and 1110.75 14 1110 -010 N isotopic profi 1109.51 N 14 99.8% 15 1109 1109.01 1108.51 1108 1108.00 N TQ-FT MS survey scan of TQ-FT 15 1107.50 1107 1107.00 1106 1106.03 EAVSALDETQAAISAIDAGNNK Ions Score: 98 Expect: 1.1e A, Experimental L 1105 1104.77 DE B 1104 N N 1103.30 1103 99.7% 99.5% 15 15 N APE. 15 m/z 1102 1101.52 1101 1100.97 1100 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 1099 1098.54 le of 99.7% 1098 C 4.67E5 1097.82 N APE. E, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi APE. E, N NL: 15 1097 1 AV: 1096.54 N 14 1096 1096.04 N 38.07 14 1095.54 RT: 1095 1095.04 3341 # 1094.54 1094 1094.28 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_1 T: peptide derivatives. B, Theoretical isotopic profi peptide derivatives. B, EAVSALDETQAAISAIDAGNNK. les of Sala_0409 EAVSALDETQAAISAIDAGNNK. isotopic profi A.1. Experimental and theoretical Figure Theoretical isotopic profi le of 99.5% Theoretical isotopic profi

L. Ting. UNSW 229 Appendix A A 787.92 N peptide de- 787.37 15 -012 le. D, Theoretical D, le. 787 786.87 N and N 14 99.9% 15 786.37 786 N isotopic profi 14 785.87 N 785.36 15 785.34 785 784.86 TQ-FT MS survey scan of TQ-FT EQIEMAQALGYATK Ions Score: 108 Expect: 5.4e 784 783.72 DE B A, Experimental L 783 N N 782.77 99.8% 99.7% 15 15 m/z () 782 N APE. 781.61 . EQIEMAQALGYATK 15 781 780.66 les of Sala_0409 780 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 779.39 C 2.16E5 779 778.89 NL: 1 AV: 778.39 N 14 778 N APE. E, Theoretical isotopic profi le of 99.9% Theoretical isotopic profi APE. E, N N 34.83 15 777.89 14 RT: 777.41 777.39 2851 # 777 776.89 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative T: Lily_21_11_08_1 Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure rivatives. B, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi rivatives. B, isotopic profi le of 99.7% isotopic profi

230 L. Ting. UNSW Appendix A A N peptide 958 15 -013 957.43 le. D, Theoretical D, le. N and 14 957 N N 956.94 99.8% 15 15 956.44 956 N isotopic profi 955.93 14 955.43 955 955.40 TQ-FT MS survey scan of TQ-FT 954 ALAENNGDIEASIDWLR Ions Score: 127 Expect: 2.2e 953.37 953 A, Experimental L DE B 952 N N 99.7% 99.6% 15 15 951 951.12 m/z () 950.62 950 N APE. ALAENNGDIEASIDWLR. 15 949.21 949 948.45 948 947.83 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 947 C 1.61E5 946.47 NL: 1 946 945.97 AV: 945.47 N APE. E, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi APE. E, N N 39.39 15 945 14 944.97 RT: 944.47 N 3530 944 14 # 943.97 943.94 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_1 T: derivatives. B, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi derivatives. B, Figure A.1. Experimental and theoretical isotopic profi les of Sala_1959 isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.6% isotopic profi

L. Ting. UNSW 231 Appendix A A 565.36 565 564.80 -010 N peptide deriva- le. D, Theoretical le. 15 564.36 N 99.6% 15 N and 564.30 14 564 563.83 563.81 N N isotopic profi 563.80 14 15 563.33 563.31 563.30 563 563.29 562.80 TQ-FT MS survey scan of TQ-FT TPVAEVVAAAGK Ions Score: 94 Expect: 2.5e 562 DE 561.96 B A, Experimental L 561.60 m/z N N 99.4% 99.2% 15 15 561 561.28 N APE. TPVAEVVAAAGK. TPVAEVVAAAGK. 560.28 15 560 559.78 559.26 les of Sala_1959 559 558.82 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 1.10E6 C NL: 558.32 1 558 AV: 557.82 31.75 557.33 N APE. E, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi APE. E, N N 15 RT: 557.32 14 557 556.83 N 2358 # 14 556.82 556.81 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative T: Lily_21_11_08_1 isotopic profi le of 99.2% isotopic profi Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure tives. B, Theoretical isotopic profi le of 99.4% Theoretical isotopic profi tives. B,

232 L. Ting. UNSW Appendix A A 744 743.86 N peptide de- 743.36 15 -012 le. D, Theoretical D, le. 743 N and 742.86 N 14 99.7% 15 742.36 741.87 742 N N isotopic profi 14 15 741.85 741.35 741 741.33 740.85 TQ-FT MS survey scan of TQ-FT VAAEGLVGFATDGTR Expect: 4.7e Ions Score: 114 740 739.85 DE B A, Experimental L 739 739.02 N N 99.6% 99.4% 15 15 738.36 m/z 738 N APE. VAAEGLVGFATDGTR. VAAEGLVGFATDGTR. 15 737 736.86 737.84 736.18 736 les of Sala_1959 735.63 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 735 734.89 C 2.08E6 734.38 NL: 1 734 AV: 733.88 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N 34.91 N 733.38 15 14 RT: 733 732.88 2865 # N 732.38 14 732 732.36 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_1 T: rivatives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi rivatives. B, Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.4% isotopic profi

L. Ting. UNSW 233 Appendix A 3 A 715. 715 715.01 714.82 -010 N peptide deriva- le. D, Theoretical le. 15 N 714.32 99.6% 15 N and 714 713.84 14 N 713.82 15 N isotopic profi 713.33 14 713.31 713 713.30 712.81 712.57 TQ-FT MS survey scan of TQ-FT 712 DIGFDQASEISAR Ions Score: 96 Expect: 1.9e 711.33 DE B 711 A, Experimental L 710.57 N N 99.5% 99.4% 15 15 m/z 710 709.60 N APE. DIGFDQASEISAR. 15 709 708.68 708.41 708 les of Sala_2287 of les N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 C 707 2.08E5 706.84 NL: 1 706.35 AV: 706 N APE. E, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi APE. E, N N 32.59 15 705.84 14 RT: 705.36 705.34 2489 # N 705 14 704.84 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_1 T: tives. B, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi tives. B, isotopic profi le of 99.4% isotopic profi Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure

234 L. Ting. UNSW Appendix A 7 A 746.6 746 745.87 N peptide deriva- -009 le. D, Theoretical le. 15 N 745.37 99.6% 15 N and 745 N 14 15 744.87 N isotopic profi 14 744.36 744 744.34 743.86 743 TQ-FT MS survey scan of TQ-FT SDTHLLVVLNSDR Ions Score: 81 Expect: 6.3e 742.86 742 DE B N N 741.19 741 99.5% 99.4% 15 15 m/z 740.88 740 N APE. 15 739.33 739 738.95 738 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 737.73 15 C 6.82E4 737.34 NL: 737 1 AV: N APE. E, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi APE. E, N 33.34 736 N 15 14 735.89 RT: 2610 735.39 # 735 N 14 734.89 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_1 T: tives. B, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi tives. B, isotopic profi le of 99.4% isotopic profi SDTHLLVVLNSDR. A, Experimental L les of Sala_2287 SDTHLLVVLNSDR. isotopic profi A.1. Experimental and theoretical Figure

L. Ting. UNSW 235 Appendix A A 491 490.78 -008 N N peptide derivatives. 490.28 15 99.7% 15 490 489.92 N 489.79 N and 15 14 489.78 489.30 489.29 N isotopic profi le. D, Theoretical isotopic D, le. N isotopic profi 489.28 14 489 489.27 488.78 VLFYLVGR Ions Score: 7.2 Expect: 3.2e TQ-FT MS survey scan of TQ-FT 488.26 488 DE B 487.49 m/z N N A, Experimental L 99.5% 99.3% 15 15 487 486.78 VLFYLVGR. VLFYLVGR. N APE. 485.90 486 15 485.61 485.30 485 2.73E5 C 484.80 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N NL: 15 1 484.30 AV: 484.29 484 37.53 483.80 N N 483.79 14 RT: 14 483.78 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N 3263 15 # 483 0 90 80 70 60 50 40 30 20 10

le of 99.3%

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_1 T: profi Figure A.1. Experimental and theoretical isotopic profi les of Sala_2287 isotopic profi A.1. Experimental and theoretical Figure B, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi B,

236 L. Ting. UNSW Appendix A A N isotopic 1028.48 14 1028.15 1028 1027.82 N 1027.48 99.4% 15 TQ-FT MS survey scan of TQ-FT 1027.15 1027 N 1026.82 15 1026.48 1026 -009 1025.76 A, Experimental L 1025 1024.71 1024 DE B 1023.32 N APE. 1023 15 AQEMALVAEAQSLPERDLPEYLAVDGTK AQEMALVAEAQSLPERDLPEYLAVDGTK Ions Score: 83 Expect: 4e N N 99.2% 99.0% 15 15 1022 m/z 1021.75 AQEMALVAEAQSLPERDLPEYLAVDGTK. AQEMALVAEAQSLPERDLPEYLAVDGTK. 1021 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 1020.51 1020 1019.75 les of Sala_0786 1019 1019.02 N APE. E, Theoretical isotopic profi le of 99.4% Theoretical isotopic profi APE. E, N 15 C 1018 6.62E4 1018.02 NL: 1 AV: 1017 1016.84 N 40.31 1016.52 14 RT: 1016.18 1016 1015.85 3841 N # 1015.51 14 N peptide derivatives. B, Theoretical isotopic profi le of 99.2% Theoretical isotopic profi N peptide derivatives. B, 1015 15 0 90 80 70 60 50 40 30 20 10

le. D, Theoretical isotopic profi le of 99.0% Theoretical isotopic profi D, le.

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative N and T: Lily_21_11_08_2 profi Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure 14

L. Ting. UNSW 237 Appendix A A 704.35 704 703.85 N peptide deriva- -007 le. D, Theoretical le. 15 703.35 N 702.87 703 99.7% 15 N and 14 702.85 N 15 N isotopic profi 702.35 14 702 702.33 701.85 701.31 701 TQ-FT MS survey scan of TQ-FT 700.94 GDTGQNLIGLLER Ions Score: 63 Expect: 2.9e 700.36 700 DE B 699.76 A, Experimental L N N 699 m/z 99.6% 99.5% 15 15 698.57 698 N APE. 15 697.83 697.06 697 696.32 696 695.74 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 C 2.72E5 695.38 NL: 1 695 694.88 AV: 38.71 694.38 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N N 15 14 694 RT: 693.88 3630 # N 693.37 14 693 693.36 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: isotopic profi le of 99.5% isotopic profi Figure A.1. Experimental and theoretical isotopic profi les of Sala_0786 GDTGQNLIGLLER. isotopic profi A.1. Experimental and theoretical Figure tives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi tives. B,

238 L. Ting. UNSW Appendix A A 687.36 687 686.86 -007 N peptide deriva- le. D, Theoretical le. 15 686.38 N 99.6% 15 686.36 N and 14 685.90 685.87 686 N N isotopic profi 15 685.86 14 685.39 685.37 685.36 685.34 685 684.86 TQ-FT MS survey scan of TQ-FT VPTLDEVPYPVK Ions Score: 66 Expect: 1.7e 684.33 684 DE B 683.87 A, Experimental L N N m/z 99.5% 99.4% 15 15 683.23 683 682.69 N APE. VPTLDEVPYPVK. 15 682.02 682 681.68 681.35 681 680.88 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 C 7.44E5 NL: 680.38 1 679.89 680 AV: 679.88 34.42 N APE. E, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi APE. E, N 679.39 N 15 14 RT: 679.38 678.91 678.89 679 2996 N # 14 678.87 678.86 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: isotopic profi le of 99.4% isotopic profi Figure A.1. Experimental and theoretical isotopic profi les of Sala_0786 isotopic profi A.1. Experimental and theoretical Figure tives. B, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi tives. B,

L. Ting. UNSW 239 Appendix A B N peptide 15 -013 le. D, Theoretical D, le. N and 14 N 99.5% 15 N 99.6% 15 N isotopic profi 14 TQ-FT MS survey scan of TQ-FT AQVVAAVDQAAAANEAK Ions Score: 109 Expect: 5.7e 828 A A, Experimental L 827.19 DE 827 826.56 826 825.90 N 825.40 99.4% 15 825 824.90 824.40 824 823.90 823.37 N APE. N 823 15 AQVVAAVDQAAAANEAK. AQVVAAVDQAAAANEAK. 15 822.87 822 821 820.91 m/z 820 820.41 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 819 C 818.41 818 3.13E5 817 NL: 817.11 1 AV: N APE. E, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi APE. E, N 816 N 816.05 15 14 30.06 815.43 815 RT: 814.93 N 814.43 2294 # 14 814 813.93 813.41 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: Figure A.1. Experimental and theoretical isotopic profi les of Sala_1694 isotopic profi A.1. Experimental and theoretical Figure derivatives. B, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi derivatives. B, isotopic profi le of 99.4% isotopic profi

240 L. Ting. UNSW Appendix A N A 15 le. D, le. 1035.12 N and 1035 14 1034.99 -009 1034.49 1033.99 N isotopic profi 1034 N 14 99.8% 15 1033.49 N 1033 1032.98 15 TQ-FT MS survey scan of TQ-FT 1032 1031 SASADAVVVTTPAGEDVSLPR Ions Score: 87 Expect: 1.5e A, Experimental L 1030 DE B 1029 N N m/z 99.7% 99.6% 15 15 N APE. 15 1028 1027 1026 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 1025 C 6.83E4 1024.27 1024 NL: N APE. E, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi APE. E, N 1 15 AV: 1023 1023.03 34.42 N 1022.52 14 RT: 1022 1022.03 2996 N # 14 1021.53 1021 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: peptide derivatives. B, Theoretical isotopic profi le of 997% Theoretical isotopic profi peptide derivatives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi SASADAVVVTTPAGEDVSLPR. les of Sala_1694 SASADAVVVTTPAGEDVSLPR. isotopic profi A.1. Experimental and theoretical Figure

L. Ting. UNSW 241 Appendix A N A 15 le. D, le. 957 N and 14 956.47 956 955.96 -012 N isotopic profi N 14 N 955.46 99.7% 15 15 955 954.96 954.46 954 TQ-FT MS survey scan of TQ-FT 953.96 953.91 953 952.14 TAAAADDAAVAAALVAGAEVR Expect: 3e Ions Score: 111 A, Experimental L 952 951.72 DE B 951 951.05 N N 950.21 950 99.6% 99.5% 15 15 m/z N APE. 15 949.71 949 948.92 TAAAADDAAVAAALVAGAEVR. TAAAADDAAVAAALVAGAEVR. 948 947.59 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 947 15 les of Sala_1694 946 le of 99.6% 945.76 C 1.19E5 945 NL: N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N 1 15 944.51 AV: 944 944.00 39.17 N 14 943.50 RT: 943 943.00 3688 # N 942.49 14 942 942.46 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: peptide derivatives. B, Theoretical isotopic profi peptide derivatives. B, Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure Theoretical isotopic profi le of 99.5% Theoretical isotopic profi

242 L. Ting. UNSW Appendix A A 922 921.76 N peptide 15 -011 921 le. D, Theoretical D, le. N and 920.93 14 N 920.43 99.5% 15 920 919.93 N isotopic profi N 14 15 919.43 919 918.92 TQ-FT MS survey scan of TQ-FT 918.42 918 ENNVDITEAVVTELNR Ions Score: 108 Expect: 1.3e 917.46 917 916.75 A, Experimental L DE B 916 916.08 N N 915.59 99.4% 99.3% 15 15 m/z 915 914.18 914 N APE. 15 913 912.96 912.40 912 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 C 910.96 911 4.98E5 NL: 1 910.46 910 AV: 909.96 N 14 38.37 N APE. E, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi APE. E, N 909.46 N 15 14 RT: 909 908.96 3587 908.46 # 908 908.43 908.29 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: derivatives. B, Theoretical isotopic profi le of 99.4% Theoretical isotopic profi derivatives. B, ENNVDITEAVVTELNR. les of Sala_1951 ENNVDITEAVVTELNR. isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.3% isotopic profi

L. Ting. UNSW 243 Appendix A N 15 le. D, le. B N and 14 -012 N N isotopic profi 99.6% 15 N 14 99.7% 15 TQ-FT MS survey scan of TQ-FT ILPNVSIAVPAGYQPGQLVQQR Ions Score: 103 Expect: 9.1e A, Experimental L A DE 1192.13 1192 1191.61 1191.12 1190.62 N 99.5% 15 1190 1190.12 N N APE. 15 15 1189.61 1189.12 1188 1187.76 1186 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 1184 1183.59 m/z 1182.63 1182 1182.38 le of 99.6% C 1180 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N 15 1.70E5 NL: 1 1178 1177.67 AV: N 14 1177.17 N 37.58 14 1176.16 1176 RT: 1175.66 3480 1175.16 # 1174.66 1174 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: peptide derivatives. B, Theoretical isotopic profi peptide derivatives. B, ILPNVSIAVPAGYQPGQLVQQR. les of Sala_1951 ILPNVSIAVPAGYQPGQLVQQR. isotopic profi A.1. Experimental and theoretical Figure Theoretical isotopic profi le of 99.5% Theoretical isotopic profi

244 L. Ting. UNSW Appendix A A 733 732.89 N peptide de- 732.38 15 -011 le. D, Theoretical D, le. 732 731.88 N and N 14 99.7% 15 731.38 730.89 731 730.87 N isotopic profi 14 N 730.37 15 730 730.35 729.87 TQ-FT MS survey scan of TQ-FT 729 KTPQNQAALQAAAK Ions Score: 105 Expect: 3.6e 728.95 728 DE B A, Experimental L N N 727 99.6% 99.5% 15 15 m/z 726 N APE. 15 725 724.90 724.20 724 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 C 723 1.84E5 NL: 722.41 1 722 AV: 721.91 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N N N 24.29 15 721.41 14 14 720.92 RT: 721 720.90 1372 # 720.40 720 720.38 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: rivatives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi rivatives. B, Figure A.1. Experimental and theoretical isotopic profi les of Sala_1951 KTPQNQAALQAAAK. isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.5% isotopic profi

L. Ting. UNSW 245 Appendix A N 15 le. D, le. B N and 14 -013 N N isotopic profi 99.8% 15 N 14 99.9% 15 TQ-FT MS survey scan of TQ-FT ANDKAGDGTTTATVLAQAIVR Expect: 7.5e Ions Score: 116 A, Experimental L 4 105 A DE 1052.92 1052 1052.02 1051.52 N 1051.01 99.6% 15 N APE. 1050.51 15 1050 1049.79 N 15 1048.75 ANDKAGDGTTTATVLAQAIVR. ANDKAGDGTTTATVLAQAIVR. 1048 1046 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 m/z les of Sala_0452 1044 le of 99.8% C 1042 N APE. E, Theoretical isotopic profi le of 99.9% Theoretical isotopic profi APE. E, N 15 7.86E4 NL: 1 1040 AV: N 14 1039.45 35.08 N 14 1038.56 RT: 1038.05 1038 3137 # 1037.56 1037.06 1036 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_3 T: peptide derivatives. B, Theoretical isotopic profi peptide derivatives. B, Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure Theoretical isotopic profi le of 99.6% Theoretical isotopic profi

246 L. Ting. UNSW Appendix A 8 N A 15 104 1049 le. D, le. N and 14 1048.47 -015 1048 1047.97 N isotopic profi N 14 1047.46 99.8% 15 1046.96 1047 N 15 1046.46 1046 1045.96 TQ-FT MS survey scan of TQ-FT 1045.46 1045 VGDVEQGFNAATDVYENLK Ions Score: 135 Expect: 8.6e 1044 A, Experimental L DE B 1043 N N 1042 1041.98 99.7% 99.6% 15 15 m/z N APE. 15 1041.23 1041 VGDVEQGFNAATDVYENLK. VGDVEQGFNAATDVYENLK. 1040.73 1040 1039.85 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 1039 1038.00 1038 le of 99.7% C 5.11E6 1037.50 NL: 1037 N APE. E, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi APE. E, N 1037.00 1 15 AV: N 1036.50 14 1036 1036.00 38.14 N 14 RT: 1035.50 1035 1034.99 3636 # 1034.89 1034 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_3 T: peptide derivatives. B, Theoretical isotopic profi peptide derivatives. B, Figure A.1. Experimental and theoretical isotopic profi les of Sala_0452 isotopic profi A.1. Experimental and theoretical Figure Theoretical isotopic profi le of 99.6% Theoretical isotopic profi

L. Ting. UNSW 247 Appendix A A 957 956.47 956 -09 955.96 N peptide derivatives. N N 15 955.46 99.7% 15 15 955 954.96 N and 14 954.46 954 953.96 N isotopic profi le. D, Theoretical isotopic D, le. N isotopic profi 14 953.91 953 952.14 TAVAIDTFINQK Ions Score: 85 Expect: 4.6e TQ-FT MS survey scan of TQ-FT 952 951.72 DE B 951 951.05 A, Experimental L N N 950.21 950 99.6% 99.5% 15 15 m/z 949.71 949 948.92 TAVAIDTFINQK. TAVAIDTFINQK. 948 N APE. 15 947.59 947 946 945.76 C 1.19E5 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 945 NL: 1 944.51 AV: 944 944.00 39.17 N 14 943.50 RT: 943 943.00 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N 3688 15 # N 942.49 14 942 942.46 0 90 80 70 60 50 40 30 20 10

le of 99.5%

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_2 T: profi Figure A.1. Experimental and theoretical isotopic profi les of Sala_2286 isotopic profi A.1. Experimental and theoretical Figure B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi B,

248 L. Ting. UNSW Appendix A A 601.30 601 600.80 -008 N peptide deriva- le. D, Theoretical le. 15 600.32 N 600.30 99.8% 15 N and 14 600 599.83 599.81 599.80 N N isotopic profi 14 15 599.33 599.31 599.30 599 599.29 598.80 TQ-FT MS survey scan of TQ-FT VVDGLGNPIDGK Ions Score: 71 Expect: 7.1e 598 597.81 DE B 597.56 A, Experimental L N N 597 m/z 99.7% 99.6% 15 15 596.79 596.15 N APE. 596 VVDGLGNPIDGK. 15 595.76 595.15 595 594.80 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 C 9.97E5 594 NL: 593.82 1 AV: 593.32 30.60 N APE. E, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi APE. E, N 593 N 592.83 15 14 RT: 592.82 592.33 2393 # N 592.32 14 592.31 592 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_3 T: tives. B, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi tives. B, isotopic profi le of 99.6% isotopic profi Figure A.1. Experimental and theoretical isotopic profi les of Sala_2286 isotopic profi A.1. Experimental and theoretical Figure

L. Ting. UNSW 249 Appendix A A 802 801.93 N peptide 15 801.43 -09 le. D, Theoretical D, le. N and 801 14 800.92 N N 99.7% 15 15 800.42 799.94 800 N isotopic profi 799.92 14 799.42 799 799.40 798.92 TQ-FT MS survey scan of TQ-FT 798.67 VVDGLGNPIDGKGPIK Ions Score: 86 Expect: 1.5e 798 797.37 A, Experimental L DE B 797 797.05 N N 796 99.6% 99.5% 15 15 m/z 795.38 795 N APE. VVDGLGNPIDGKGPIK. 15 794.90 794 793.33 793 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 792.33 C 3.07E5 792 NL: 1 791.45 AV: 791 790.95 31.50 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N N 15 14 RT: 790.45 N 790 14 2541 789.95 # 789.92 789.38 789 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_3 T: derivatives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi derivatives. B, Figure A.1. Experimental and theoretical isotopic profi les of Sala_2286 isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.5% isotopic profi

250 L. Ting. UNSW Appendix A A 1015.99 N peptide 1016 15 -012 1015.49 le. D, Theoretical D, le. N and 14 1015 N N 1014.99 99.8% 15 15 1014.49 1014 N isotopic profi 14 1013.99 1013.49 1013 1012.99 TQ-FT MS survey scan of TQ-FT 1012 APEFVDQSTEASILVTGIK Expect: 2.5e Ions Score: 112 1011.78 A, Experimental L 1011 DE B 1010.85 1010 m/z N N 99.7% 99.6% 15 15 1009 1008.83 N APE. 15 APEFVDQSTEASILVTGIK. APEFVDQSTEASILVTGIK. 1008.13 1008 1007 1006.98 les of Sala_2288 of les N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 1006 15 4.91E5 C 1005.03 NL: 1005 1 AV: 1004.53 1004 1004.02 38.33 N APE. E, Theoretical isotopic profi le of 99.8% Theoretical isotopic profi APE. E, N N 15 14 RT: 1003.52 1003 1003.02 3664 # N 14 1002.52 1002 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative T: Lily_21_11_08_3 derivatives. B, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi derivatives. B, Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.6% isotopic profi

L. Ting. UNSW 251 Appendix A A 729.36 729 N peptide de- 15 728.86 le. D, Theoretical D, le. -012 N and N 728.36 N 14 99.7% 15 15 727.88 728 727.86 N isotopic profi 14 727.35 727 727.34 726.86 726.36 TQ-FT MS survey scan of TQ-FT 726 FTQAGSEVSALLGR Expect: 4e Ions Score: 113 725.68 725.35 DE 725 B A, Experimental L 724.71 N N 99.6% 99.5% 15 15 m/z 724 723.37 723.87 723 N APE. 15 722.37 722 721.78 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 721 721.01 C 1.15E6 720.38 NL: 1 720 AV: 719.88 36.06 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N 719.38 N 15 14 RT: 719 N 718.88 14 3301 # 718.38 718 718.36 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_3 T: rivatives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi rivatives. B, Figure A.1. Experimental and theoretical isotopic profi les of Sala_2288 FTQAGSEVSALLGR. isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.5% isotopic profi

252 L. Ting. UNSW Appendix A A 853 852.94 N peptide de- 852.44 15 -011 le. D, Theoretical D, le. 852 N 851.94 15 N and 851.46 N 14 99.7% 15 851.44 850.96 851 850.94 N isotopic profi 14 850.44 850 850.41 849.94 849.44 TQ-FT MS survey scan of TQ-FT 849 LVLEVAQHLGENTVR Ions Score: 104 Expect: 1.1e 848.46 848 DE B A, Experimental L 847.41 847 N N 847.02 99.6% 99.5% 15 15 m/z 846 845.99 845.43 N APE. LVLEVAQHLGENTVR. LVLEVAQHLGENTVR. 15 845 844 843.92 les of Sala_2288 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 843 15 C 1.09E6 842 NL: 1 841.48 AV: 841 840.97 N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N 34.16 N 15 14 RT: 840.47 840 2988 839.97 # N 14 839.47 839 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_3 T: rivatives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi rivatives. B, Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.5% isotopic profi

L. Ting. UNSW 253 Appendix A B N peptide de- 15 -009 le. D, Theoretical D, le. N 99.5% 15 N and 14 N isotopic profi D 14 N 99.6% 15 TQ-FT MS survey scan of TQ-FT AVEAVFDAITGALK Ions Score: 82 Expect: 6.3e A 713 712.54 A, Experimental L 712 711.87 N 711.37 15 711 710.87 710.37 710 C 709.96 AVEAVFDAITGALK. AVEAVFDAITGALK. 709 709.00 708 m/z N 14 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 707 706.89 706 2.01E4 705.65 NL: 1 705 AV: N APE. 15 41.95 704.40 N 704 RT: 14 703.90 3309 # 703.39 703 702.89 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_4 T: Figure A.1. Experimental and theoretical isotopic profi les of Sala_0799 isotopic profi A.1. Experimental and theoretical Figure rivatives. B, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi rivatives. B, isotopic profi le of 99.6% isotopic profi

254 L. Ting. UNSW Appendix A B N peptide 15 -010 le. D, Theoretical D, le. N and 14 N 99.3% 15 N 99.4% 15 N isotopic profi 14 TQ-FT MS survey scan of TQ-FT MNKQDLIAAVADSSGLTK Ions Score: 97 Expect: 1.5e A A, Experimental L 636 DE 635.31 635 634.98 N N 634.64 99.2% 15 15 634.31 634 633.97 633.64 633.53 633 N APE. 15 632.34 632 631.41 m/z 631 630.84 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 630.36 630 C 629.86 629.33 629 5.19E4 NL: 628.64 1 628.31 AV: N APE. E, Theoretical isotopic profi le of 99.4% Theoretical isotopic profi APE. E, N 628 N 15 628.00 14 35.23 627.66 RT: 627.33 627 626.99 2642 N # 626.66 14 626.29 626 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_4 T: derivatives. B, Theoretical isotopic profi le of 99.3% Theoretical isotopic profi derivatives. B, MNKQDLIAAVADSSGLTK. les of Sala_0799 MNKQDLIAAVADSSGLTK. isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.2% isotopic profi

L. Ting. UNSW 255 Appendix A B N peptide de- 15 -009 le. D, Theoretical D, le. N 99.2% 15 N and N 14 99.3% 15 N isotopic profi 14 TQ-FT MS survey scan of TQ-FT TGETMTIAASNQPK Ions Score: 80 Expect: 2.3e 748 A DE A, Experimental L 746 745.85 N 99.1% 15 745.11 744 743.84 N N APE. TGETMTIAASNQPK. 15 15 742.84 742.33 742 741.83 741.33 740.83 m/z 740 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 738.79 C 738 737.87 5.78E4 NL: 736.32 736 1 AV: 735.32 N APE. E, Theoretical isotopic profi le of 99.3% Theoretical isotopic profi APE. E, N N 15 14 24.25 N 734.36 734 14 RT: 733.86 733.36 1185 # 732.85 732 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_4 T: rivatives. B, Theoretical isotopic profi le of 99.2% Theoretical isotopic profi rivatives. B, Figure A.1. Experimental and theoretical isotopic profi les of Sala_2814 isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.1% isotopic profi

256 L. Ting. UNSW Appendix A B N peptide de- 15 -009 le. D, Theoretical D, le. N N and 99.3% 15 N 14 99.4% 15 N isotopic profi 14 TQ-FT MS survey scan of TQ-FT FVPVSVNEDMVGMK Ions Score: 83 Expect: 5.6e A 802.36 DE 802 A, Experimental L 801.85 N 801.36 15 801 N 800.86 99.2% 15 800.35 800 799.85 799.39 N APE. 15 799 798.37 798 m/z 797.15 797 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 796.36 15 796 C 795.86 795.36 1.55E5 795 NL: 1 794.36 AV: N APE. E, Theoretical isotopic profi le of 99.4% Theoretical isotopic profi APE. E, N N 794 15 14 793.88 30.63 RT: 793.38 793 2020 N 792.88 # 14 792.38 792 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_4 T: rivatives. B, Theoretical isotopic profi le of 99.3% Theoretical isotopic profi rivatives. B, Figure A.1. Experimental and theoretical isotopic profi les of Sala_2814 FVPVSVNEDMVGMK. isotopic profi A.1. Experimental and theoretical Figure isotopic profi le of 99.2% isotopic profi

L. Ting. UNSW 257 Appendix A 578 A 577.82 577.33 577.32 -010 N peptide derivatives. 577.22 N 577 15 576.83 99.5% 15 576.82 N 15 N and 14 576.34 576.33 576.32 576.30 576.29 N isotopic profi le. D, Theoretical isotopic D, le. N isotopic profi 576 14 575.82 575.37 GPFVDLHLLK Ions Score: 83 Expect: 8e TQ-FT MS survey scan of TQ-FT 575 574.96 DE B 574.59 N N 574 m/z 99.4% 99.3% 15 15 573.80 573 N APE. 15 572.27 572 C 1.41E5 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N NL: 15 571.34 1 571 570.85 AV: 570.84 36.58 N 570.35 14 RT: 570.34 N APE. E, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi APE. E, N 2816 570.32 15 570 569.85 # N 14 569.84 0 90 80 70 60 50 40 30 20 10

le of 99.3%

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative T: Lily_21_11_08_4 B, Theoretical isotopic profi le of 99.4% Theoretical isotopic profi B, profi Figure A.1. Experimental and theoretical isotopic profi les of Sala_2814 GPFVDLHLLK. A, Experimental L isotopic profi A.1. Experimental and theoretical Figure

258 L. Ting. UNSW Appendix A B N peptide de- 15 -012 le. D, Theoretical D, le. N N and 99.5% 15 N 14 99.6% 15 N isotopic profi 14 TQ-FT MS survey scan of TQ-FT KAETAQEGGSTAPIK Expect: 3.5e Ions Score: 114 756 A 755.88 DE A, Experimental L 755 N 754.87 15 754.37 N 754 99.4% 15 753.86 753.36 753 752.52 N APE. 752 15 751 m/z 750 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 749.38 15 749 748.87 C 748.37 752.86 748 748.21 6.58E4 NL: 747 1 746.49 AV: N APE. E, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi APE. E, N N 15 14 746 22.90 745.90 RT: N 745.39 14 745 1032 # 744.89 744.39 744 5 0 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_4 T: KAETAQEGGSTAPIK. les of Sala_2814 KAETAQEGGSTAPIK. isotopic profi A.1. Experimental and theoretical Figure rivatives. B, Theoretical isotopic profi le of 99.5% Theoretical isotopic profi rivatives. B, isotopic profi le of 99.4% isotopic profi

L. Ting. UNSW 259 Appendix A A 722 721.87 -012 N peptide deriva- 721.37 le. D, Theoretical le. 15 721 N 99.7% 15 720.87 N and 14 720.37 N isotopic profi N 14 719.88 720 15 719.86 719.36 719.34 719 718.86 TQ-FT MS survey scan of TQ-FT STLLSENNAVVFK Expect: 3.5e Ions Score: 114 718.36 718 DE B A, Experimental L N N 717 99.6% 99.5% 15 15 m/z 716 N APE. 15 715.26 715.61 715 714.69 714 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 713.69 15 C 6.01E5 713.39 NL: 713 1 712.89 AV: N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N N 712.39 34.51 15 14 712 RT: 711.89 N 2540 14 # 711.39 711.37 711 0 90 80 70 60 50 40 30 20 10

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative T: Lily_21_11_08_4 tives. B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi tives. B, isotopic profi le of 99.5% isotopic profi STLLSENNAVVFK. les of Sala_2816 STLLSENNAVVFK. isotopic profi A.1. Experimental and theoretical Figure

260 L. Ting. UNSW Appendix A A 523.0 522.81 522.5 -008 522.30 N peptide derivatives. N 15 99.7% 15 522.0 521.81 N and 521.80 N 14 15 521.5 521.32 521.31 521.30 N isotopic profi le. D, Theoretical isotopic D, le. N isotopic profi 14 521.28 521.0 520.80 VVGVNTLVTK Ions Score: 77 Expect: 1.1e 520.5 TQ-FT MS survey scan of TQ-FT 520.14 520.0 DE B 519.92 519.5 A, Experimental L N N 99.6% 99.5% 15 15 519.14 m/z 519.0 () 518.77 518.5 VVGVNTLVTK. VVGVNTLVTK. 518.27 N APE. 15 518.0 517.76 les of Sala_2816 517.5 517.33 517.0 C 516.82 1.29E6 N APE was the closest matching isotopic distribution to experimental. C, Theoretical APE was the closest matching isotopic distribution to experimental. C, N 15 NL: 516.5 1 516.32 AV: 516.0 515.83 N 30.35 14 515.82 RT: N APE. E, Theoretical isotopic profi le of 99.7% Theoretical isotopic profi APE. E, N N 515.5 515.35 15 515.33 1979 14 # 515.32 515.31 515.0 0 90 80 70 60 50 40 30 20 10

le of 99.5%

100 FTMS + p NSI Full ms [350.00-1750.00] Abundance Relative Lily_21_11_08_4 T: profi Figure A.1. Experimental and theoretical isotopic profi A.1. Experimental and theoretical Figure B, Theoretical isotopic profi le of 99.6% Theoretical isotopic profi B,

L. Ting. UNSW 261 Appendix B

Appendix B.

Code used for data processing, normalisation and statistical testing in R/Bioconductor.

# # 11/7/2007 # importing files. # files1 <- dir("2007-07-11/30-30", pattern="csv", full=T) x30.30 <- import.relex.experiment(files1) names(x30.30) <- substring(names(x30.30), 4,9)

# pdf.A4("2007-07-11/30-30/plots.pdf") # for(i in 1:l(x10.10)) plot.relex.experiment(x10.10[[i]], ylim=c(-2,2)) # dev.off() pdf("2007-07-11/30-30/plots.pdf", width=11.69, height=5) par(mar=c(5,4,4,3)) for(i in 1:l(x30.30)) { plot.relex.experiment(x30.30[[i]], ylim=c(-2,2)) mtext(side=4, names(x30.30)[i], cex=1, line=1) } dev.off()

files2 <- dir("2007-07-11/10-30", pattern="csv", full=T) x10.30 <- import.relex.experiment(files2) names(x10.30) <- sub(" RelEx-OutputMC", "", names(x10.30)) pdf("2007-07-11/10-30/plots.pdf", width=11.69, height=5) par(mar=c(5,4,4,3)) for(i in 1:l(x10.30)) { plot.relex.experiment(x10.30[[i]], ylim=c(-6,6)) mtext(side=4, names(x10.30)[i], cex=1, line=1) } dev.off()

files3 <- dir("2007-07-11/10-30 adjusted", pattern="csv", full=T) x10.30.adj <- import.relex.experiment(files3) names(x10.30.adj) <- sub(" RelEx-OutputMC", "", names(x10.30.adj)) pdf("2007-07-11/10-30 adjusted/plots.pdf", width=11.69, height=5) par(mar=c(5,4,4,3)) for(i in 1:l(x10.30.adj)) { plot.relex.experiment(x10.30.adj[[i]], ylim=c(-6,6)) mtext(side=4, names(x10.30.adj)[i], cex=1, line=1) } dev.off()

# # how often was each protein observed? # pdf("2007-07-11/30-30.protein.frequency.pdf", width=10, height=4) hist.int(x30.30.norm$protein.counts, main="Frequency of Protein occurence in multiple experiments", xlab="Number of observations/experiments", ylab="Number of Proteins")

262 L. Ting, UNSW. Appendix B

dev.off() pdf("2007-07-11/10-30.protein.frequency.pdf", width=10, height=4) hist.int(x10.30.norm$protein.counts, main="Frequency of Protein occurence in multiple experiments", xlab="Number of observations/experiments", ylab="Number of Proteins") dev.off() pdf("2007-07-11/10-30.adj.protein.frequency.pdf", width=10, height=4) hist.int(x10.30.adj.norm$protein.counts, main="Frequency of Protein occurence in multiple experiments", xlab="Number of observations/experiments", ylab="Number of Proteins") dev.off() pdf.A4.portrait("2008-07-03/protein.frequency.pdf") par(mfrow=c(3,1), las=1) hist.int(x30.30.norm$protein.counts, main="Protein detection frequency: 30-30", xlab="Number of observations/experiments", ylab="Number of Proteins") hist.int(x10.30.norm$protein.counts, main="Protein detection frequency: 10-30", xlab="Number of observations/experiments", ylab="Number of Proteins") hist.int(x10.30.adj.norm$protein.counts, main="Protein detection frequency: 10-30 adj", xlab="Number of observations/experiments", ylab="Number of Proteins") dev.off()

# # cross-experiment boxplots. # x10.30.unnorm <- combine.relex.experiments(x10.30, normalised=F) x10.30.norm <- combine.relex.experiments(x10.30, normalised=T)

pdf.A4.portrait("2007-07-11/10-30.boxplots.pdf") par(mfrow=c(4,1), mar=c(5,4,4,3)+0.1) boxplot(x10.30.unnorm$ratio, main="Ratio", ylab="ratio (log2)", varwidth=T) boxplot(x10.30.unnorm$ratioSD, main="STDDEV of Ratio", ylab="STDDEV of ratio (log2)", varwidth=T) # mtext(side=4, "normalised", line=1, outer=T) boxplot(x10.30.unnorm$sn, main="Signal/Noise", ylab="S/N (log10)", varwidth=T) boxplot(x10.30.unnorm$snSD, main="STDDEV of Signal/Noise", ylab="STDDEV of Signal/Noise (log10)", varwidth=T) mtext(side=4, "unnormalised", line=1, outer=T)

# par(mfrow=c(4,1), mar=c(5,4,4,3)+0.1) boxplot(x10.30.norm$ratio, main="Ratio", ylab="ratio (log2)", varwidth=T) boxplot(x10.30.norm$ratioSD, main="STDDEV of Ratio", ylab="STDDEV of ratio (log2)", varwidth=T) # mtext(side=4, "normalised", line=1, outer=T) boxplot(x10.30.norm$sn, main="Signal/Noise", ylab="S/N (log10)", varwidth=T) boxplot(x10.30.norm$snSD, main="STDDEV of Signal/Noise", ylab="STDDEV of Signal/Noise (log10)", varwidth=T) mtext(side=4, "normalised", line=1, outer=T) dev.off()

x10.30.adj.unnorm <- combine.relex.experiments(x10.30.adj, normalised=F) x10.30.adj.norm <- combine.relex.experiments(x10.30.adj, normalised=T) pdf.A4.portrait("2007-07-11/10-30.adj.boxplots.pdf") par(mfcol=c(4,2), mar=c(5,4,4,3)+0.1) mtext(side=3, "unnormalised", line=1, outer=T) boxplot(x10.30.adj.unnorm$ratio, main="Ratio", ylab="ratio (log2)", varwidth=T) boxplot(x10.30.adj.unnorm$ratioSD, main="STDDEV of Ratio", ylab="STDDEV of ratio (log2)",

L. Ting, UNSW. 263 Appendix B

varwidth=T) # mtext(side=4, "normalised", line=1, outer=T) boxplot(x10.30.adj.unnorm$sn, main="Signal/Noise", ylab="S/N (log10)", varwidth=T) boxplot(x10.30.adj.unnorm$snSD, main="STDDEV of Signal/Noise", ylab="STDDEV of Signal/Noise (log10)", varwidth=T)

# par(mfrow=c(4,1), mar=c(5,4,4,3)+0.1) mtext(side=3, "normalised", line=1, outer=T) boxplot(x10.30.adj.norm$ratio, main="Ratio", ylab="ratio (log2)", varwidth=T) boxplot(x10.30.adj.norm$ratioSD, main="STDDEV of Ratio", ylab="STDDEV of ratio (log2)", varwidth=T) # mtext(side=4, "normalised", line=1, outer=T) boxplot(x10.30.adj.norm$sn, main="Signal/Noise", ylab="S/N (log10)", varwidth=T) boxplot(x10.30.adj.norm$snSD, main="STDDEV of Signal/Noise", ylab="STDDEV of Signal/Noise (log10)", varwidth=T) dev.off()

x30.30.unnorm <- combine.relex.experiments(x30.30, normalised=F) x30.30.norm <- combine.relex.experiments(x30.30, normalised=T) pdf.A4.portrait("2007-07-11/30-30.boxplots.pdf") par(las=2) par(mfcol=c(4,2), mar=c(5,4,4,3)+0.1) mtext(side=3, "unnormalised", line=1, outer=T) boxplot(x30.30.unnorm$ratio, main="Ratio", ylab="ratio (log2)", varwidth=T) boxplot(x30.30.unnorm$ratioSD, main="STDDEV of Ratio", ylab="STDDEV of ratio (log2)", varwidth=T) # mtext(side=4, "normalised", line=1, outer=T) boxplot(x30.30.unnorm$sn, main="Signal/Noise", ylab="S/N (log10)", varwidth=T) boxplot(x30.30.unnorm$snSD, main="STDDEV of Signal/Noise", ylab="STDDEV of Signal/Noise (log10)", varwidth=T)

# par(mfrow=c(4,1), mar=c(5,4,4,3)+0.1) mtext(side=3, "normalised", line=1, outer=T) boxplot(x30.30.norm$ratio, main="Ratio", ylab="ratio (log2)", varwidth=T) boxplot(x30.30.norm$ratioSD, main="STDDEV of Ratio", ylab="STDDEV of ratio (log2)", varwidth=T) # mtext(side=4, "normalised", line=1, outer=T) boxplot(x30.30.norm$sn, main="Signal/Noise", ylab="S/N (log10)", varwidth=T) boxplot(x30.30.norm$snSD, main="STDDEV of Signal/Noise", ylab="STDDEV of Signal/Noise (log10)", varwidth=T) dev.off()

# # ratio boxplots AND density plots # (changed to CairoP.. on 15/12/07) # CairoPNG("2007-07-11/30-30.ratios.png", width=1200, height=700) # CairoPDF.A4("2007-07-11/30-30.ratios.pdf") par(mfrow=c(1,2)) par(las=2) boxplot(x30.30.norm$ratio, main="intra-experiment normalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x30.30.norm$ratio, na.rm=T)), col=1:8) par(las=0)

264 L. Ting, UNSW. Appendix B

plot.density.matrix(x30.30.norm$ratio, auto.log2=F, main="intra-experiment normalised ratio", xlab="ratio (log2)", xlim=symmetricise(range(x30.30.norm$ratio, na.rm=T))) legend("topright", names(x30.30), lty=1, col=1:l(x30.30), bty="n") dev.off()

CairoPNG("2007-07-11/10-30.ratios.png", width=1200, height=700) # CairoPDF.A4("2007-07-11/10-30.ratios.pdf") par(mfrow=c(1,2)) par(las=2) boxplot(x10.30.norm$ratio, main="intra-experiment normalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x10.30.norm$ratio, na.rm=T)), col=1:8) par(las=0) plot.density.matrix(x10.30.norm$ratio, auto.log2=F, main="intra-experiment normalised ratio", xlab="ratio (log2)", xlim=symmetricise(range(x10.30.norm$ratio, na.rm=T))) legend("topright", names(x10.30), lty=1, col=1:l(x10.30), bty="n") dev.off()

CairoPNG("2007-07-11/10-30.adj.ratios.png", width=1200, height=700) # CairoPDF.A4("2007-07-11/10-30.adj.ratios.pdf") par(mfrow=c(1,2)) boxplot(x10.30.adj.norm$ratio, main="intra-experiment normalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x10.30.adj.norm$ratio, na.rm=T))) plot.density.matrix(x10.30.adj.norm$ratio, auto.log2=F, main="intra-experiment normalised ratio", xlab="ratio (log2)", xlim=symmetricise(range(x10.30.adj.norm$ratio, na.rm=T))) legend("topright", names(x10.30.adj), lty=1, col=1:l(x10.30.adj), bty="n") dev.off()

# 15/12/07 # UNNORMALISED # # ratio boxplots AND density plots # CairoPNG("2007-07-11/30-30.unnorm.ratios.png", width=1200, height=700) # CairoPDF.A4("2007-07-11/30-30.unnorm.ratios.pdf") par(mfrow=c(1,2)) par(las=2) boxplot(x30.30.unnorm$ratio, main="intra-experiment unnormalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x30.30.unnorm$ratio, na.rm=T)), col=1:8) par(las=0) plot.density.matrix(x30.30.unnorm$ratio, auto.log2=F, main="intra-experiment unnormalised ratio", xlab="ratio (log2)", xlim=symmetricise(range(x30.30.unnorm$ratio, na.rm=T))) legend("topright", names(x30.30), lty=1, col=1:l(x30.30), bty="n") dev.off()

CairoPNG("2007-07-11/10-30.unnorm.ratios.png", width=1200, height=700) # CairoPDF.A4("2007-07-11/10-30.unnorm.ratios.pdf") par(mfrow=c(1,2)) par(las=2) boxplot(x10.30.unnorm$ratio, main="intra-experiment unnormalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x10.30.unnorm$ratio, na.rm=T)), col=1:8) par(las=0) plot.density.matrix(x10.30.unnorm$ratio, auto.log2=F, main="intra-experiment unnormalised ratio", xlab="ratio (log2)", xlim=symmetricise(range(x10.30.unnorm$ratio, na.rm=T))) legend("topright", names(x10.30), lty=1, col=1:l(x10.30), bty="n")

L. Ting, UNSW. 265 Appendix B

dev.off()

CairoPNG("2007-07-11/10-30.adj.unnrom.ratios.png", width=1200, height=700) # CairoPDF.A4("2007-07-11/10-30.adj.unnorm.ratios.pdf") par(mfrow=c(1,2)) boxplot(x10.30.adj.unnorm$ratio, main="intra-experiment unnormalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x10.30.adj.unnorm$ratio, na.rm=T)), col=1:l(x10.30.adj)) plot.density.matrix(x10.30.adj.unnorm$ratio, auto.log2=F, main="intra-experiment unnormalised ratio", xlab="ratio (log2)", xlim=symmetricise(range(x10.30.adj.unnorm$ratio, na.rm=T))) legend("topright", names(x10.30.adj), lty=1, col=1:l(x10.30.adj), bty="n") dev.off()

# # # merge the 10.30 (experiments A-D) and 10.30.adj (experiments E-F) data into one large experiment. # then do the DE stats # x10.30.merged <- merge.relex.experiments(x10.30.norm, x10.30.adj.norm)

pdf("2007-07-11/10-30.merged.protein.frequency.pdf", width=10, height=4) hist.int(x10.30.merged$protein.counts, main="Frequency of Protein occurence in BOTH 10-30 and 10-30 adjusted experiments", xlab="Number of observations/experiments", ylab="Number of Proteins") dev.off() pdf.A4.portrait("2007-07-11/protein.frequency.pdf") par(mfrow=c(4,1)) hist.int(x30.30.norm$protein.counts, main="Frequency of Protein occurence in multiple experiments: 30- 30", xlab="Number of observations/experiments", ylab="Number of Proteins") hist.int(x10.30.norm$protein.counts, main="Frequency of Protein occurence in multiple experiments: 10- 30", xlab="Number of observations/experiments", ylab="Number of Proteins") hist.int(x10.30.adj.norm$protein.counts, main="Frequency of Protein occurence in multiple experiments: 10-30 adjusted", xlab="Number of observations/experiments", ylab="Number of Proteins") hist.int(x10.30.merged$protein.counts, main="Frequency of Protein occurence in BOTH 10-30 and 10-30 adjusted experiments", xlab="Number of observations/experiments", ylab="Number of Proteins") dev.off()

# added later on to just show the 30-30 and the 10-30 merged. pdf.A4.portrait("2008-07-03/protein.frequency-V2.pdf") par(mfrow=c(2,1), las=1) hist.int(x30.30.norm$protein.counts, main="Protein detection frequency: 30-30", xlab="Number of observations/experiments", ylab="Number of Proteins") #hist.int(x10.30.norm$protein.counts, main="Protein detection frequency: 10-30", xlab="Number of observations/experiments", ylab="Number of Proteins") #hist.int(x10.30.adj.norm$protein.counts, main="Protein detection frequency: 10-30 adj", xlab="Number of observations/experiments", ylab="Number of Proteins") hist.int(x10.30.merged$protein.counts, main="Protein detection frequency: 10-30 and 10-30 adj (merged)", xlab="Number of

266 L. Ting, UNSW. Appendix B

observations/experiments", ylab="Number of Proteins") dev.off()

# # # DE stats # # build the design matrix up into something that represents the true # biological design. # # limma user guide treats replicates as bioreps by default # - technical replicates are often grouped into the biorep block that they hail from # eg there would be 6 blocks # -- work out how to use the duplicateCorrelation function, and blocking variables. # # - can we have multiple nested levels of blocks? # library(limma) targets <- read.csv("2007-12-02/10-30.merged.design.matrix.2007-12-02.csv", as.is=T) colnames(targets)[c(4,5)] # [1] "N14" "N15" colnames(targets)[c(4,5)] <- c("Cy3", "Cy5") targets # exp.no Name File Cy3 Cy5 buffer biorep techrep run # 1 1 A1_1 A1_1 RelEx-OutputMC.csv deg10 deg30 water A 1 1 # 2 2 A1_2 A1_2 RelEx-OutputMC.csv deg10 deg30 water A 1 2 # 3 3 A2_1 A2_1 RelEx-OutputMC.csv deg10 deg30 water A 2 1 # 4 4 A2_2 A2_2 RelEx-OutputMC.csv deg10 deg30 water A 2 2 # 5 5 B1_1 B1_1 RelEx-OutputMC.csv deg30 deg10 water B 1 1 # 6 6 B1_2 B1_2 RelEx-OutputMC.csv deg30 deg10 water B 1 2 # 7 7 B2_1 B2_1 RelEx-OutputMC.csv deg30 deg10 water B 2 1 # 8 8 B2_2 B2_2 RelEx-OutputMC.csv deg30 deg10 water B 2 2 # 9 9 C1_1 C1_1 RelEx-OutputMC.csv deg10 deg30 urea C 1 1 # 10 10 C1_2 C1_2 RelEx-OutputMC.csv deg10 deg30 urea C 1 2 # 11 11 C2_1 C2_1 RelEx-OutputMC.csv deg10 deg30 urea C 2 1 # 12 12 C2_2 C2_2 RelEx-OutputMC.csv deg10 deg30 urea C 2 2 # 13 13 D1_1 D1_1 RelEx-OutputMC.csv deg30 deg10 urea D 1 1 # 14 14 D1_2 D1_2 RelEx-OutputMC.csv deg30 deg10 urea D 1 2 # 15 15 D2_1 D2_1 RelEx-OutputMC.csv deg30 deg10 urea D 2 1 # 16 16 D2_2 D2_2 RelEx-OutputMC.csv deg30 deg10 urea D 2 2 # 17 17 E_1 E_1 RelEx-OutputMC.csv deg10 deg30 water E 1 1 # 18 18 E_2 E_2 RelEx-OutputMC.csv deg10 deg30 water E 1 2 # 19 19 F_1 F_1 RelEx-OutputMC.csv deg30 deg10 water F 1 1 # 20 20 F_2 F_2 RelEx-OutputMC.csv deg30 deg10 water F 1 2

# design <- modelMatrix(targets, ref="deg10") # design <- cbind(DyeEffect=1, deg10vs30=design[,1]) # design <- cbind(dye=1, deg30vs10=modelMatrix(targets, ref="deg10")[,1], buffer=factor(targets$buffer, levels=c("water", "urea")), biorep=factor(targets$biorep)) design3 <- cbind(dye=1, deg30vs10=modelMatrix(targets, ref="deg10")[,1], buffer=factor(targets$buffer, levels=c("water", "urea"))) rownames(design3) <- targets$Name design3 # dye deg30vs10 buffer # A1_1 1 1 1

L. Ting, UNSW. 267 Appendix B

# A1_2 1 1 1 # A2_1 1 1 1 # A2_2 1 1 1 # B1_1 1 -1 1 # B1_2 1 -1 1 # B2_1 1 -1 1 # B2_2 1 -1 1 # C1_1 1 1 2 # C1_2 1 1 2 # C2_1 1 1 2 # C2_2 1 1 2 # D1_1 1 -1 2 # D1_2 1 -1 2 # D2_1 1 -1 2 # D2_2 1 -1 2 # E_1 1 1 1 # E_2 1 1 1 # F_1 1 -1 1 # F_2 1 -1 1

# # what's the dupcor b/w the technical replicates? # duplicateCorrelation(M, block=factor(targets$biorep))$consensus.cor # [1] 0.628 # # there are technical replicates within each biorep too # duplicateCorrelation(M[,1:4+0], block=factor(targets$techrep[1:4+0]))$consensus.cor duplicateCorrelation(M[,1:4+4], block=factor(targets$techrep[1:4+4]))$consensus.cor duplicateCorrelation(M[,1:4+8], block=factor(targets$techrep[1:4+8]))$consensus.cor duplicateCorrelation(M[,9:10], block=factor(targets$techrep[9:10]))$consensus.cor duplicateCorrelation(M[,11:12], block=factor(targets$techrep[11:12]))$consensus.cor # > duplicateCorrelation(M[,1:4+0], block=factor(targets$techrep[1:4+0]))$consensus.cor # [1] -0.0582 # > duplicateCorrelation(M[,1:4+4], block=factor(targets$techrep[1:4+4]))$consensus.cor # [1] 0.138 # > duplicateCorrelation(M[,1:4+8], block=factor(targets$techrep[1:4+8]))$consensus.cor # [1] 0.0338 # > duplicateCorrelation(M[,9:10], block=factor(targets$techrep[9:10]))$consensus.cor # [1] NaN # > duplicateCorrelation(M[,11:12], block=factor(targets$techrep[11:12]))$consensus.cor # [1] NaN # # why is there such low correlation between the technical replicates, within each biorep? # # This implies that using a blocking variable on the experiment (A-F) leaves essentially # independent data within each block? # fit3 <- lmFit(M, design3, block=factor(targets$biorep), correlation=duplicateCorrelation(M, block=factor(targets$biorep))$consensus.cor) fit3e <- eBayes(fit3) tt3 <- topTableQ(fit3e, coef="deg30vs10", nrow(M)) head(tt3) # ID logFC t P.Value adj.P.Val B q # 817 2718_Aspartate-semial 1.323 8.15 5.64e-08 1.96e-07 8.53 2.97e-05 # 179 0588_bacterioferritin 2.092 8.87 2.08e-07 7.19e-07 7.47 5.46e-05 # 637 1951_outer_membrane_c 0.993 6.60 1.15e-06 3.97e-06 5.51 2.02e-04

268 L. Ting, UNSW. Appendix B

# 409 1319_glyceraldehyde-3 -1.025 -6.33 2.17e-06 7.47e-06 4.88 2.41e-04 # 247 0801_two_component_tr 0.769 6.24 2.67e-06 9.15e-06 4.68 2.41e-04 # 953 3101_OmpAORMotB 1.244 6.22 2.76e-06 9.41e-06 4.65 2.41e-04 summarise.tt(tt3) # P<0.05 P<0.001 P<1e-04 q<0.25 q<0.1 q<0.05 # 1 172 27 12 289 87 64

# # did the buffer have an effect on the ratios?? # topTableQ(fit3e, coef="buffer") # ID logFC t P.Value adj.P.Val B q # 514 1577_short-chain_dehy 2.364 5.05 0.000463 0.00110 -2.84 0.314 # 755 2274_hypothetical_pro -1.205 -2.79 0.014986 0.03513 -3.25 0.995 # 575 1756_hypothetical_pro -1.173 -2.92 0.013622 0.03199 -3.30 0.995 # 302 1040_2OG-Fe(II)_oxyge -2.089 -3.39 0.006636 0.01565 -3.45 0.995 # 464 1442_ribosomal_protei 1.153 2.78 0.020735 0.04812 -3.51 0.995 # 954 3103_conserved_hypoth 2.080 3.63 0.005276 0.01247 -3.54 0.995 # 323 1104_acyl-CoA_dehydro 1.802 3.06 0.015106 0.03534 -3.62 0.995 # 112 0388_hypothetical_pro -1.207 -2.35 0.034730 0.07966 -3.64 0.995 # 836 2782_phenylalanyl-tRN -3.632 -3.82 0.004797 0.01136 -3.65 0.995 # 380 1272_Host_factor_Hfq -0.751 -2.30 0.041397 0.09439 -3.75 0.995 # # No, not after multiple testing correction.... #

# # simple models, upto more complicates ones... #

# Ignore the buffer, and biological replicates -- Just use the "dye-swap" info tt.ver1 <- topTableQ(eBayes(lmFit(M, modelMatrix(targets, ref="deg10"))), coef="deg30", number=nrow(M)) # add an intercept term (for the effect of the "dye-swap") tt.ver2 <- topTableQ(eBayes(lmFit(M, cbind(dye=1, modelMatrix(targets, ref="deg10")))), coef="deg30", number=nrow(M)) # add a factor for the buffer treatment -- water vs urea tt.ver3 <- topTableQ(eBayes(lmFit(M, cbind(dye=1, modelMatrix(targets, ref="deg10"), buffer=factor(targets$buffer, levels=c("water", "urea"))))), coef="deg30", number=nrow(M)) # add biorep as a factor (it should probably be a blocking variable) tt.ver4 <- topTableQ(eBayes(lmFit(M, cbind(dye=1, modelMatrix(targets, ref="deg10"), buffer=factor(targets$buffer, levels=c("water", "urea")), biorep=factor(targets$biorep)))), coef="deg30", number=nrow(M)) # use biorep as a blocking variable -- work out duplicateCorrelation using this blocking variable, and pass this into lmFit design5 <- cbind(dye=1, deg30vs10=modelMatrix(targets, ref="deg10")[,1], buffer=factor(targets$buffer, levels=c("water", "urea"))); rownames(design) <- targets$Name fit5 <- lmFit(M, design5, block=factor(targets$biorep), correlation=duplicateCorrelation(M, block=factor(targets$biorep))$consensus.cor) rownames(fit5$coefficients) <- rownames(M) tt.ver5 <- topTableQ(eBayes(fit5), coef="deg30vs10", number=nrow(M)) # 15/12/07 -- fit the full model but without the dye swap term # all models that have a dye term introduce 337 NA's in the topTable.

L. Ting, UNSW. 269 Appendix B

# simpler to just remove this dye swap term, since: # what's a dye swap term mean in proteomics? I guess there would be an N14 bias over N15 given the N14 in the air. # design6 <- cbind(deg30vs10=modelMatrix(targets, ref="deg10")[,1], buffer=factor(targets$buffer, levels=c("water", "urea"))); rownames(design6) <- targets$Name fit6 <- lmFit(M, design6, block=factor(targets$biorep), correlation=duplicateCorrelation(M, block=factor(targets$biorep))$consensus.cor) tt.ver6 <- topTableQ(eBayes(fit6), coef="deg30vs10", number=nrow(M)) rownames(fit6$coefficients) <- rownames(M) t(sapply(list(tt.ver1, tt.ver2, tt.ver3, tt.ver4, tt.ver5, tt.ver6), function(x) c(nrow=nrow(x), summarise.tt(x)))) # nrow P<0.05 P<0.001 P<1e-04 q<0.25 q<0.1 q<0.05 # [1,] 1172 383 123 69 660 416 302 # [2,] 1172 352 123 69 638 432 349 # [3,] 1172 374 129 69 690 488 388 # [4,] 1172 382 132 76 809 570 448 # [5,] 1172 172 27 12 289 87 64 # [6,] 1172 214 33 13 265 90 45 t(sapply(list(tt.ver1, tt.ver2, tt.ver3, tt.ver4, tt.ver5, tt.ver6), function(x) c(nrow=nrow(x), numNA=count.na(x$adj.P.Val), summarise.tt(x)))) # nrow numNA P<0.05 P<0.001 P<1e-04 q<0.25 q<0.1 q<0.05 # [1,] 1172 0 383 123 69 660 416 302 # [2,] 1172 337 352 123 69 638 432 349 # [3,] 1172 337 374 129 69 690 488 388 # [4,] 1172 337 382 132 76 809 570 448 # [5,] 1172 337 172 27 12 289 87 64 # [6,] 1172 0 214 33 13 265 90 45 # # so what has happened here? # num DE proteins increased from: # 1to2: 302->349 when we included a 'dye intercept' term # 2to3: 349->388 when we included a term for the buffer # 3to4: 388->448 when we included a factor for the biological replicate # 4to5: 448->64 when we accounted for the correlation within technical replicates # 5to6: 64->45 when we removed the dye term (since this introduces NA's into toptables...) # # Conclusion: number of DE proteins is too high when we don't account for # correlation between technical replicates. #

# # What's the deal with the NA's when there's an intercept? # numobs <- ncol(M) - rapply(M, count.na) table(numobs) # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # 218 123 78 91 42 51 51 32 37 38 23 29 32 29 28 34 44 55 72 65 idx <- which(is.na(tt.ver5$adj.P.Val)) names(idx) <- tt.ver5$ID[idx] # The P/q-vals are NA because logFC is NA... tt.ver5[idx[1:5],] # ID logFC t P.Value adj.P.Val B q # 4 0014_glutathione_S-tr NA NA NA NA NA NA # 6 0016_glutathione_S-tr NA NA NA NA NA NA # 8 0024_protein_of_unkno NA NA NA NA NA NA

270 L. Ting, UNSW. Appendix B

# 12 0034_transcriptional_ NA NA NA NA NA NA # 13 0036_5-methyltetrahyd NA NA NA NA NA NA # How many observations were there for these cases where logFC = NA? table(ncol(M) - rapply(is.na(M[names(idx),]), sum)) # 1 2 3 4 5 7 # 218 80 16 12 8 3 # # The vast majority of these NA's have only 1 observation. # ... but what about those with 7 observations? # M[names(which(ncol(M) - rapply(is.na(M[names(idx),]), sum)==7)),] # A1_1 A1_2 A2_1 A2_2 B1_1 B1_2 B2_1 B2_2 C1_1 C1_2 C2_1 C2_2 D1_1 # 0591_phosphoribosylfo NA NA NA NA 0.592 0.98 1.1049 0.56 NA NA NA NA NA # 1775_Ribonucleoside-d NA NA NA NA -0.327 -0.99 -0.0891 NA NA NA NA NA NA # 2008_phosphoribosyltr -0.524 -0.676 -0.258 NA NA NA NA NA NA NA -0.312 -0.807 NA # D1_2 D2_1 D2_2 E_1 E_2 F_1 F_2 # 0591_phosphoribosylfo 0.831 0.2463 NA NA NA NA -0.0589 # 1775_Ribonucleoside-d NA -0.0946 -0.221 NA NA -0.218 -0.0517 # 2008_phosphoribosyltr NA NA NA -0.736 -0.493 NA NA cbind(fit5$coefficients, numobs)[names(idx)[1:10],] # dye deg30vs10 buffer numobs # 0014_glutathione_S-tr 0.737 NA NA 1 # 0016_glutathione_S-tr 0.206 NA 0.133 4 # 0024_protein_of_unkno 1.640 NA NA 1 # 0034_transcriptional_ -0.758 NA NA 1 # 0036_5-methyltetrahyd 0.470 NA -0.285 3 <<<<< must be from different buffers # 0075_Twin-arginine_tr -0.309 NA NA 1 # 0126_peptide_methioni -0.564 NA NA 2 <<<<< must be from the same buffer # 0147_Rhodanese-like -0.637 NA NA 1 # 0160_aminoglycoside_p 0.521 NA NA 1 # 0162_PASORPAC_sensor_ -0.052 NA NA 2 cbind(fit6$coefficients, numobs)[names(idx)[1:10],] # deg30vs10 buffer numobs # 0014_glutathione_S-tr 0.737 NA 1 # 0016_glutathione_S-tr -0.206 0.133 4 # 0024_protein_of_unkno -1.640 NA 1 # 0034_transcriptional_ 0.758 NA 1 # 0036_5-methyltetrahyd -0.470 -0.285 3 # 0075_Twin-arginine_tr 0.309 NA 1 # 0126_peptide_methioni -0.564 NA 2 # 0147_Rhodanese-like 0.637 NA 1 # 0160_aminoglycoside_p 0.521 NA 1 # 0162_PASORPAC_sensor_ -0.052 NA 2

# # so, if the data was only observed in one of the "dye swaps", the protein ratio # goes into the intercept column, rather than the deg30vs10, where i think it # should belong. Because dye-rep(1,...) these ratios don't have the correct sign. #

# # We can either: # -Manually move these ratios across from the dye column to the ratio column, then fit eBayes; or # -Use the model without the "dye swap" term.

L. Ting, UNSW. 271 Appendix B

# In the first case, the values from fit6 have the correct sign, whereas the values for # fit5$coefficients$dye are not adjusted for the dye swap... # fit7 <- fit5 fit7$coefficients[names(idx),2] <- fit6$coefficients[names(idx),1] fit7$coefficients[names(idx),1] <- NA fit7$coefficients[names(idx)[1:10],] tt.ver7 <- topTableQ(eBayes(fit7), coef="deg30vs10", number=nrow(M)) t(sapply(list(tt.ver1, tt.ver2, tt.ver3, tt.ver4, tt.ver5, tt.ver6, tt.ver7), function(x) c(nrow=nrow(x), numNA=count.na(x$adj.P.Val), summarise.tt(x)))) # nrow numNA P<0.05 P<0.001 P<1e-04 q<0.25 q<0.1 q<0.05 # [1,] 1172 0 383 123 69 660 416 302 # [2,] 1172 337 352 123 69 638 432 349 # [3,] 1172 337 374 129 69 690 488 388 # [4,] 1172 337 382 132 76 809 570 448 # [5,] 1172 337 172 27 12 289 87 64 # [6,] 1172 0 214 33 13 265 90 45 # [7,] 1172 337 172 27 12 289 87 64 # # same number DE... should I be updating the stdev slot? # rownames(fit5$stdev.unscaled) <- rownames(M) rownames(fit6$stdev.unscaled) <- rownames(M) fit7 <- fit5 fit7$coefficients[names(idx),2] <- fit6$coefficients[names(idx),1] fit7$coefficients[names(idx),1] <- NA fit7$coefficients[names(idx)[1:10],] fit7$stdev.unscaled[names(idx),2] <- fit6$stdev.unscaled[names(idx),1] fit7$stdev.unscaled[names(idx),1] <- NA tt.ver7 <- topTableQ(eBayes(fit7), coef="deg30vs10", number=nrow(M)) t(sapply(list(tt.ver1, tt.ver2, tt.ver3, tt.ver4, tt.ver5, tt.ver6, tt.ver7), function(x) c(nrow=nrow(x), numNA=count.na(x$adj.P.Val), summarise.tt(x)))) # nrow numNA P<0.05 P<0.001 P<1e-04 q<0.25 q<0.1 q<0.05 # [1,] 1172 0 383 123 69 660 416 302 # [2,] 1172 337 352 123 69 638 432 349 # [3,] 1172 337 374 129 69 690 488 388 # [4,] 1172 337 382 132 76 809 570 448 # [5,] 1172 337 172 27 12 289 87 64 # [6,] 1172 0 214 33 13 265 90 45 # [7,] 1172 0 211 30 12 260 84 47 # # just 2 more DE proteins... This is hardly worth the effort that will be required # to explain it in the manuscript! #

############################################################################ # Since we are ultimately concerned with DE proteins b/w the temperatures, # # and not the dye effects, then we should fit the model that is comparible # # with the sparse nature of the data.... that is, model6 with no dye term # ############################################################################

# # reproduce the master table using tt.ver6 # (see below) # # have a look at the expression values #

272 L. Ting, UNSW. Appendix B

tmp <- t(rapply(x10.30.merged$ratio, function(x) x * as.numeric(modelMatrix(targets, ref="deg10")))) # # make a master expression ratio table that includes the statistics, using the latest # and greatest model. # tmp <- cbind(tmp, mean=rowMeans(tmp, na.rm=T), stdev=rowSD(tmp), count=20-rowSums(is.na(tmp))) tmp <- rownames2col(tmp, 1, "ID") master <- merge(tt.ver5, tmp, by="ID") master <- master[order(master$B, decreasing=T), ] head(master) # ID logFC t P.Value adj.P.Val B A1_1 A1_2 # 971 2718_Aspartate-semial 1.323 8.15 5.64e-08 4.71e-05 8.53 NA 1.110 # 217 0588_bacterioferritin 2.092 8.87 2.08e-07 8.68e-05 7.47 2.892 3.220 # 757 1951_outer_membrane_c 0.993 6.60 1.15e-06 3.21e-04 5.51 1.197 1.062 # 486 1319_glyceraldehyde-3 -1.025 -6.33 2.17e-06 3.83e-04 4.88 -0.460 -0.296 # 299 0801_two_component_tr 0.769 6.24 2.67e-06 3.83e-04 4.68 0.623 0.768 # 1136 3101_OmpAORMotB 1.244 6.22 2.76e-06 3.83e-04 4.65 1.857 1.890 # A2_1 A2_2 B1_1 B1_2 B2_1 B2_2 C1_1 C1_2 C2_1 C2_2 D1_1 # 971 1.659 1.832 1.640 1.63 1.249 1.223 0.5932 0.327 0.999 0.938 0.925 # 217 2.746 2.670 1.663 1.75 1.361 1.591 NA NA 1.497 1.974 1.091 # 757 1.221 1.357 1.214 1.15 0.854 0.835 0.7930 0.947 1.251 1.287 0.553 # 486 -0.914 -0.929 -1.125 -0.94 -1.564 -0.896 -0.4547 -0.665 -0.882 -0.286 -1.544 # 299 0.756 0.649 0.717 1.07 0.971 0.502 0.5592 0.633 0.860 0.959 0.712 # 1136 1.883 1.917 1.754 1.74 1.624 1.484 0.0947 1.218 1.481 1.402 1.555 # D1_2 D2_1 D2_2 E_1 E_2 F_1 F_2 mean stdev count # 971 1.01 1.188 1.108 1.472 1.423 1.764 1.908 1.26 0.423 19 # 217 1.40 NA NA NA NA NA 2.907 2.06 0.722 13 # 757 1.22 1.202 1.221 0.314 0.126 1.466 1.238 1.03 0.352 20 # 486 -1.23 -0.830 -1.359 -1.201 -1.200 -1.289 -1.553 -0.98 0.401 20 # 299 0.79 1.082 0.718 1.097 1.037 0.422 0.477 0.77 0.210 20 # 1136 1.14 0.744 1.043 1.094 1.145 0.488 0.604 1.31 0.518 20 write.csv(master, "2007-12-02/10-vs-30-master.csv", na="") # # -> 2007-12-02/10-vs-30-master.xls # num.obs.threshold <- 1 idx <- x10.30.merged$protein.counts >= num.obs.threshold tt.1 <- topTable(eBayes(lmFit(M[idx,], design, block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,], block=factor(targets$biorep))$consensus.cor)), coef=2, number=l(idx)) num.obs.threshold <- 10 # tt.10 <- topTable( eBayes( lmFit(M[x10.30.merged$protein.counts >= num.obs.threshold,], design) ), coef=1, n=sum(x10.30.merged$protein.counts >= num.obs.threshold) ) idx <- x10.30.merged$protein.counts >= num.obs.threshold tt.10 <- topTable(eBayes(lmFit(M[idx,], design, block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,], block=factor(targets$biorep))$consensus.cor)), coef=2, number=l(idx)) num.obs.threshold <- 16 # tt.16 <- topTable( eBayes( lmFit(M[x10.30.merged$protein.counts >= num.obs.threshold,], design) ), coef=1, n=sum(x10.30.merged$protein.counts >= num.obs.threshold) ) idx <- x10.30.merged$protein.counts >= num.obs.threshold

L. Ting, UNSW. 273 Appendix B

tt.16 <- topTable(eBayes(lmFit(M[idx,], design, block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,], block=factor(targets$biorep))$consensus.cor)), coef=2, number=l(idx)) num.obs.threshold <- 20 # tt.20 <- topTable( eBayes( lmFit(M[x10.30.merged$protein.counts >= num.obs.threshold,], design) ), coef=1, n=sum(x10.30.merged$protein.counts >= num.obs.threshold) ) idx <- x10.30.merged$protein.counts >= num.obs.threshold tt.20 <- topTable(eBayes(lmFit(M[idx,], design, block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,], block=factor(targets$biorep))$consensus.cor)), coef=2, number=l(idx)) write.csv(tt.1, "2007-12-02/DEproteins.observed.atleast.1.csv") write.csv(tt.10, "2007-12-02/DEproteins.observed.atleast.10.csv") write.csv(tt.16, "2007-12-02/DEproteins.observed.atleast.16.csv") write.csv(tt.20, "2007-12-02/DEproteins.observed.atleast.20.csv")

save.image("lily.RData", compress=T)

# # todo, start making figures/tables for the paper. #

# ## the data in 10-vs-30-master.xls file is missing statistics for those genes ## that weren't present in both extraction conditions. ## # ## A number of were proposed, we are going with: ## - fit the full model for all genes ## - fit a model without the extraction buffer term, and use this output for those genes ## that were not present in both conditions. ## # ACTUALLY, these NA statistics turned out to be due to the dye term. # - much simpler solution: # fit a model without the dye term, and simply produce a toptable, and master table. # design <- cbind(deg30vs10=modelMatrix(targets, ref="deg10")[,1], buffer=factor(targets$buffer, levels=c("water", "urea"))); rownames(design) <- targets$Name tt.ver6 <- topTable(eBayes(lmFit(M, design, block=factor(targets$biorep), correlation=duplicateCorrelation(M, block=factor(targets$biorep))$consensus.cor)), coef=1, number=nrow(M))

# # have a look at the expression values # tmp <- t(rapply(x10.30.merged$ratio, function(x) x * as.numeric(modelMatrix(targets, ref="deg10")))) # # make a master expression ratio table that includes the statistics, using the latest # and greatest model. # tmp <- cbind(tmp, mean=rowMeans(tmp, na.rm=T), stdev=rowSD(tmp), count=20-rowSums(is.na(tmp))) tmp <- rownames2col(tmp, 1, "ID")

274 L. Ting, UNSW. Appendix B

master <- merge(tt.ver6, tmp, by="ID") master <- master[order(master$B, decreasing=T), ] count.na(master$logFC) # [1] 0 head(master) # ID logFC t P.Value adj.P.Val B A1_1 A1_2 # 971 2718_Aspartate-semial 1.323 8.15 5.64e-08 4.71e-05 8.53 NA 1.110 # 217 0588_bacterioferritin 2.092 8.87 2.08e-07 8.68e-05 7.47 2.892 3.220 # 757 1951_outer_membrane_c 0.993 6.60 1.15e-06 3.21e-04 5.51 1.197 1.062 # 486 1319_glyceraldehyde-3 -1.025 -6.33 2.17e-06 3.83e-04 4.88 -0.460 -0.296 # 299 0801_two_component_tr 0.769 6.24 2.67e-06 3.83e-04 4.68 0.623 0.768 # 1136 3101_OmpAORMotB 1.244 6.22 2.76e-06 3.83e-04 4.65 1.857 1.890 # A2_1 A2_2 B1_1 B1_2 B2_1 B2_2 C1_1 C1_2 C2_1 C2_2 D1_1 # 971 1.659 1.832 1.640 1.63 1.249 1.223 0.5932 0.327 0.999 0.938 0.925 # 217 2.746 2.670 1.663 1.75 1.361 1.591 NA NA 1.497 1.974 1.091 # 757 1.221 1.357 1.214 1.15 0.854 0.835 0.7930 0.947 1.251 1.287 0.553 # 486 -0.914 -0.929 -1.125 -0.94 -1.564 -0.896 -0.4547 -0.665 -0.882 -0.286 -1.544 # 299 0.756 0.649 0.717 1.07 0.971 0.502 0.5592 0.633 0.860 0.959 0.712 # 1136 1.883 1.917 1.754 1.74 1.624 1.484 0.0947 1.218 1.481 1.402 1.555 # D1_2 D2_1 D2_2 E_1 E_2 F_1 F_2 mean stdev count # 971 1.01 1.188 1.108 1.472 1.423 1.764 1.908 1.26 0.423 19 # 217 1.40 NA NA NA NA NA 2.907 2.06 0.722 13 # 757 1.22 1.202 1.221 0.314 0.126 1.466 1.238 1.03 0.352 20 # 486 -1.23 -0.830 -1.359 -1.201 -1.200 -1.289 -1.553 -0.98 0.401 20 # 299 0.79 1.082 0.718 1.097 1.037 0.422 0.477 0.77 0.210 20 # 1136 1.14 0.744 1.043 1.094 1.145 0.488 0.604 1.31 0.518 20 write.csv(master, "2008-07-03/10-vs-30-master.csv", na="") # # -> 2007-12-02/10-vs-30-master.xls # num.obs.threshold <- 1 idx <- x10.30.merged$protein.counts >= num.obs.threshold tt.1 <- topTableQ(eBayes(lmFit(M[idx,], design, block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,], block=factor(targets$biorep))$consensus.cor)), coef=1, number=l(idx)) num.obs.threshold <- 10 # tt.10 <- topTable( eBayes( lmFit(M[x10.30.merged$protein.counts >= num.obs.threshold,], design) ), coef=1, n=sum(x10.30.merged$protein.counts >= num.obs.threshold) ) idx <- x10.30.merged$protein.counts >= num.obs.threshold tt.10 <- topTableQ(eBayes(lmFit(M[idx,], design, block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,],

block=factor(targets$biorep))$consensus.cor)), coef=1, number=l(idx)) num.obs.threshold <- 16 # tt.16 <- topTable( eBayes( lmFit(M[x10.30.merged$protein.counts >= num.obs.threshold,], design) ), coef=1, n=sum(x10.30.merged$protein.counts >= num.obs.threshold) ) idx <- x10.30.merged$protein.counts >= num.obs.threshold tt.16 <- topTableQ(eBayes(lmFit(M[idx,], design,

L. Ting, UNSW. 275 Appendix B

block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,],

block=factor(targets$biorep))$consensus.cor)), coef=1, number=l(idx)) num.obs.threshold <- 20 # tt.20 <- topTable( eBayes( lmFit(M[x10.30.merged$protein.counts >= num.obs.threshold,], design) ), coef=1, n=sum(x10.30.merged$protein.counts >= num.obs.threshold) ) idx <- x10.30.merged$protein.counts >= num.obs.threshold tt.20 <- topTableQ(eBayes(lmFit(M[idx,], design, block=factor(targets$biorep), correlation=duplicateCorrelation(M[idx,],

block=factor(targets$biorep))$consensus.cor)), coef=1, number=l(idx)) tt.1 <- merge(tt.1, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F) tt.1 <- tt.1[order(tt.1$q, decreasing=F),c("ID", "Description", "logFC", "t", "P.Value", "adj.P.Val", "B", "q")] tt.10 <- merge(tt.10, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F) tt.10 <- tt.10[order(tt.10$q, decreasing=F),c("ID", "Description", "logFC", "t", "P.Value", "adj.P.Val", "B", "q")] tt.16 <- merge(tt.16, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F) tt.16 <- tt.16[order(tt.16$q, decreasing=F),c("ID", "Description", "logFC", "t", "P.Value", "adj.P.Val", "B", "q")] tt.20 <- merge(tt.20, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F) tt.20 <- tt.20[order(tt.20$q, decreasing=F),c("ID", "Description", "logFC", "t", "P.Value", "adj.P.Val", "B", "q")] export.topTable(tt.1, "2008-07-03/DEproteins.observed.atleast.1.xls") export.topTable(tt.10, "2008-07-03/DEproteins.observed.atleast.10.xls") export.topTable(tt.16, "2008-07-03/DEproteins.observed.atleast.16.xls") export.topTable(tt.20, "2008-07-03/DEproteins.observed.atleast.20.xls") t(sapply(list(tt.1, tt.10, tt.16, tt.20), summarise.tt)) # P<0.05 P<0.001 P<1e-04 q<0.25 q<0.1 q<0.05 # [1,] 214 33 13 265 90 45 # [2,] 125 26 13 203 112 65 # [3,] 81 19 10 152 84 54 # [4,] 22 7 4 45 26 20

save.image("lily.RData", compress=T) # 15/12/07

# # The P-values withOUT the dye term (v6) are marginally smaller (more significant) than without it (v5); # However, due to the fewer number of test with the dye term (v5 -- due to the NA's), the q-values # are smaller in v5 and v6 #

# # They are generally quite well correlated at the low end, and the P-values do lie on unity # CairoPNG("2008-07-03/xyplots-models5vs6.png", 1200, 800) par(mfrow=c(1,2), las=1, mgp=c(4,1,0), mar=c(par()$mar+1)) plot(tt.ver5$P.Value, tt.ver6$P.Value, log="xy", main="unadjusted P-values", xlab="Model with dye term (P-value)", ylab="Model without dye term (P-value)") abline(0,1, lty="dashed")

276 L. Ting, UNSW. Appendix B

abline(v=0.05, h=0.05, col=6) plot(tt.ver5$adj.P.Val, tt.ver6$adj.P.Val, log="xy", main="FDR q-values", xlab="Model with dye term (q-value)", ylab="Model without dye term (q-value)") abline(0,1, lty="dashed") abline(v=0.05, h=0.05, col=6) dev.off()

# # are there many with P < 0.05 in both lists? # __ there's a pretty good overlap there. # print.venn( tt.ver5$ID[tt.ver5$adj.P.Val<0.05 & !is.na(tt.ver5$adj.P.Val)], tt.ver6$ID[tt.ver6$adj.P.Val<0.05 & !is.na(tt.ver5$adj.P.Val)] # yes, i mean not NA in ver5 list ) # BH q-values (2008-07-03) # A ~ B A & B B ~ A A | B # N 59 31 4 94 # % 62.77 32.98 4.26 100 print.venn( tt.ver5$ID[tt.ver5$q<0.05 & !is.na(tt.ver5$q)], tt.ver6$ID[tt.ver6$q<0.05 & !is.na(tt.ver5$q)] # yes, i mean not NA in ver5 list ) # ST qvalues (2008-07-03) # A ~ B A & B B ~ A A | B # N 24 40 5 69 # % 34.78 57.97 7.25 100

# # map the truncated ID to the full protein name # fa <- import.fasta("/home/mark/Work/data/lab/lily/2007-07-11/Salasken_complete_21_2_06_edited.fa", F) save(fa, file="/home/mark/Work/data/lab/lily/Salasken.fa.Rda.gz", compress=T) # load("/home/mark/Work/data/lab/lily/Salasken.fa.Rda.gz") info <- cbind(names(fa), gsub("_.*", "", names(fa)), gsub("^[^_]+_", "", names(fa))) colnames(info) <- c("FullName", "NumericID", "Description") info <- as.df(info) save(info, file="protein.info.Rda.gz", compress=T) # load("protein.info.Rda.gz")

# # try to truncate the FullName to the same length as the truncated ID # then map using == # # pmap <- list() # pmap[[nrow(tt.ver6)]] <- NA # names(pmap) <- tt.ver6$ID # for(i in 1:l(pmap)) { # n <- names(pmap)[i] # tmp <- substr(info$FullName,0,strlen(n)) # pmap[[i]] <- which(tmp == n) # } # # # table(sapply(pmap, l)) # # 0 1

L. Ting, UNSW. 277 Appendix B

# # 50 1122 # # names(pmap)[sapply(pmap,l)==0] # # [1] "3101_OmpAORMotB" "0973_beta-Ig-H3ORfasc" "0237_OmpAORMotB" "3011_AspartylORAspara" # # [5] "1661_aldoORketo_reduc" "3120_GlyoxalaseORbleo" "1198_OrnORDAPORArg_de" "1753_PpxORGppA_phosph" # # [9] "1521_GluORLeuORPheORV" "1988_OmpAORMotB" "1832_DEADORDEAH_box_h" "1395_glucoseORgalacto" # # [13] "1573_DegTORDnrJOREryC" "1449_HydrophobeORamph" "1838_GlyoxalaseORbleo" "0017_alphaORbeta_hydr" # # [17] "0685_GlyoxalaseORbleo" "0982_GatBORYqey" "1363_aldoORketo_reduc" "1051_DEADORDEAH_box_h" # # [21] "2176_permease_YjgPORY" "0019_GlyoxalaseORbleo" "1769_Sua5ORYciOORYrdC" "1344_cytochrome_bORb6" # # [25] "1947_DEADORDEAH_box_h" "1280_cyclaseORdehydra" "2725_OmpAORMotB" "0687_alphaORbeta_hydr" # # [29] "1973_GluORLeuORPheORV" "2214_HpcHORHpaI_aldol" "2222_alphaORbeta_hydr" "2181_HesBORYadRORYfhF" # # [33] "1300_NADH-ubiquinoneO" "2053_tRNAORrRNA_methy" "0997_beta-Ig-H3ORfasc" "0600_Na+ORH+_antiport" # # [37] "0455_alphaORbeta_hydr" "0807_MotAORTolQORExbB" "1220_GlyoxalaseORbleo" "0162_PASORPAC_sensor_" # # [41] "3127_Heparinase_IIORI" "1836_GlyoxalaseORbleo" "0327_OmpAORMotB" "2146_catalaseORperoxi" # # [45] "1383_NitrilaseORcyani" "0774_alphaORbeta_hydr" "0958_aldoORketo_reduc" "1419_fatty_acidORphos" # # [49] "0331_MotAORTolQORExbB" "0770_nitriteORsulfite"

# # just pull out the 4 digit numeric ID and match using that! # pmap <- list() pmap[[nrow(tt.ver6)]] <- NA names(pmap) <- tt.ver6$ID for(i in 1:l(pmap)) { n <- sub("_.*","",names(pmap)[i]) pmap[[i]] <- which(as.character(info$NumericID) == as.character(n)) } table(sapply(pmap, l)) # 1 # 1172

# # excellent. we can now map the truncated ID to the real one. # info$ID <- NA info$ID[unlist(pmap)] <- names(pmap) # sort and reorder columns info <- info[order(info$ID), c("ID", "Description", "NumericID", "FullName")] # truncate FullName to an ID of length 21 chars (for those that we haven't already got an ID for.) # map the / char to OR # These chars do not need changing: _ , - # info$ID[is.na(info$ID)] <- substr(gsub("/", "OR", info$FullName[is.na(info$ID)]),0,21) save(info, file="protein.info.Rda.gz", compress=T) # load("protein.info.Rda.gz")

278 L. Ting, UNSW. Appendix B

head(info) # ID Description NumericID FullName # 32 0002_glutathione_S-tr glutathione_S-transferase-like 0002 0002_glutathione_S-transferase-like # 33 0003_GTP-binding GTP-binding 0003 0003_GTP- binding # 34 0004_60_kDa_inner_mem 60_kDa_inner_membrane_insertion_protein 0004 0004_60_kDa_inner_membrane_insertion_protein # 41 0011_3-demethylubiqui 3-demethylubiquinone-9_3-methyltransferase 0011 0011_3- demethylubiquinone-9_3-methyltransferase # 44 0014_glutathione_S-tr glutathione_S-transferase-like 0014 0014_glutathione_S-transferase-like # 45 0015_protein_of_unkno protein_of_unknown_function_DUF1428 0015 0015_protein_of_unknown_function_DUF1428 tail(info) # ID Description NumericID FullName # 3198 3179_hypothetical_pro hypothetical_protein 3179 3179_hypothetical_protein # 3199 3180_UBA/THIF-type_NA UBA/THIF-type_NAD/FAD_binding_fold 3180 3180_UBA/THIF-type_NAD/FAD_binding_fold # 3203 3184_Post-segregation Post-segregation_antitoxin_CcdA 3184 3184_Post-segregation_antitoxin_CcdA # 3205 3186_ubiquinone/menaq ubiquinone/menaquinone_biosynthesis_methyltransferases 3186 3186_ubiquinone/menaquinone_biosynthesis_methyltransferases # 3206 3187_formamidopyrimid formamidopyrimidine-DNA_glycosylase 3187 3187_formamidopyrimidine-DNA_glycosylase # 3207 3188_ribosomal_protei ribosomal_protein_S20 3188 3188_ribosomal_protein_S20

# # fix 10-vs-30-master.csv # master <- merge(master, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F, sort=F)[,c("ID", "Description", colnames(master)[2:ncol(master)]) ] master <- master[order(master$adj.P.Val),] write.csv(master, "2008-07-03/10-vs-30-master.xls", na="") # # -> 2007-12-02/10-vs-30-master.xls # save.image("lily.RData", compress=T)

# # here are the expected ratios of spike ins in the 30-30 experiments. # # ratio # [1,] "11to15" "1:1" # [2,] "15to24" "1.2:1" # [3,] "19to21" "1:1" # [4,] "27to29" "0.8:1" # [5,] "28to31" "0.8:1" # [6,] "29to3_" "1:1" # [7,] "4to10_" "1.2:1" # [8,] "4to8_8" "0.8:1" # [9,] "4to8_9" "1.2:1" ratios <- c(1, 1.2, 1, 0.8, 0.8, 1, 1.2, 0.8, 1.2) sample.ratio.info <- as.df(cbind(ratio=ratios,

L. Ting, UNSW. 279 Appendix B

samples=colnames(x30.30.unnorm$ratio), median=prettyNum(2^capply(x30.30.unnorm$ratio, median, na.rm=T)), mean=prettyNum(2^capply(x30.30.unnorm$ratio, mean, na.rm=T)))) sample.ratio.info <- sample.ratio.info[order(rownames(sample.ratio.info)),] write.csv(sample.ratio.info, "2007-12-24/30-30-ratios.csv") # save as 30-30-ratios.xls sample.ratio.info # ratio samples median mean # 0.8:1 0.8 27to29 0.852 0.837 # 0.8:1.1 0.8 28to31 0.788 0.798 # 0.8:1.2 0.8 4to8_8 0.934 0.942 # 1.2:1 1.2 15to24 1.13 1.14 # 1.2:1.1 1.2 4to10_ 1.25 1.24 # 1.2:1.2 1.2 4to8_9 1.19 1.19 # 1:1 1 11to15 1.04 1.04 # 1:1.1 1 19to21 1.01 1.01 # 1:1.2 1 29to3_ 0.987 0.994

# # Do the 30-30 samples cluster together before normalisation? # par(mfrow=c(1,2)) plot(hclust(dist(t(x30.30.unnorm$ratio))), lab=names(ratios), main="30-30 samples: unnormalised ratios", sub="", xlab="") rect.hclust(hclust(dist(t(x30.30.unnorm$ratio))), k=3) plot(hclust(dist(t(x30.30.norm$ratio))), lab=names(ratios), main="30-30 samples: normalised ratios", sub="", xlab="") rect.hclust(hclust(dist(t(x30.30.norm$ratio))), k=3)

CairoPDF("/home/mark/data/lab/lily/2007-12-24/HCL-30-30-unnorm.pdf") plot(hclust(dist(t(x30.30.unnorm$ratio))), lab=names(ratios), main="30-30 samples: unnormalised ratios", sub="", xlab="") plot(hclust(dist(t(x30.30.unnorm$ratio))), main="30-30 samples: unnormalised ratios", sub="", xlab="") dev.off() CairoPDF("/home/mark/data/lab/lily/2007-12-24/HCL-30-30-norm.pdf") plot(hclust(dist(t(x30.30.norm$ratio))), lab=names(ratios), main="30-30 samples: normalised ratios", sub="", xlab="") dev.off() capply(x30.30.norm$ratio, sd, na.rm=T) # 11to15 15to24 19to21 27to29 28to31 29to3_ 4to10_ 4to8_8 4to8_9 # 0.158 0.153 0.167 0.216 0.148 0.204 0.172 0.140 0.189 mean(capply(x30.30.norm$ratio, sd, na.rm=T)) # [1] 0.172

# tmp <- seq(-1,1,l=5) # tmp <- c(-1,-0.585,0,0.585,1) tmp <- c(-1,log2(1/1.2),0,log2(1.2),1) tmp2 <- c("1:2", "1:1.2", "1:1", "1.2:1", "2:1") tmp <- c(-1,log2(1/1.2),0,log2(1.2),1) tmp2 <- c("0.5:1", "0.8:1", "1:1", "1.2:1", "2:1") # tmp <- c(-1,log2(1/1.5),log2(1/1.2),0,log2(1.2),log2(1.5), 1) # tmp2 <- c("1:2", "1:1.5", "1:1.2", "1:1", "1.2:1", "1.5:1", "2:1") CairoPDF("/home/mark/data/lab/lily/2008-07-03/densities-30-30-all9samples.pdf", width=10, height=10) par(mfrow=c(3,3), las=1) for(i in c(1,3,6,4,5,8,2,7,9)) {

280 L. Ting, UNSW. Appendix B

#plot(density(na.rm(x30.30.unnorm$ratio[,i])), col=match(ratios[i], c(0.8,1,1.2)), main=colnames(x30.30.norm$ratio)[i], xlab="ratio", sub=paste("expected ratio", names(ratios)[i]), ylim=c(0,3.5), xlim=symmetricise(na.rm(x30.30.unnorm$ratio[,i]))) plot(density(na.rm(x30.30.unnorm$ratio[,i])), col=match(ratios[i], c(0.8,1,1.2)), main=colnames(x30.30.norm$ratio)[i], xlab="ratio", sub=paste("expected ratio", names(ratios)[i]), ylim=c(0,3.5), xlim=c(-1.2,1.2), xaxt="n") lines(density(rnorm(mean=mean(x30.30.unnorm$ratio[,i], na.rm=T), sd=sd(x30.30.unnorm$ratio[,i], na.rm=T), n=1e05)), col=match(ratios[i], c(0.8,1,1.2)), lty=3) lines(density(rnorm(mean=log2(ratios[i]), sd=sd(x30.30.unnorm$ratio[,i], na.rm=T), n=1e05)), col=match(ratios[i], c(0.8,1,1.2)), lty=2) axis(side=1, at=tmp, labels=tmp2) if(i==1) legend("topright", c("observed ratios", "~ N(obs, obs)", "~ N(exp, obs)"), lty=c(1,3,2), col=match(ratios[i], c(0.8,1,1.2)), inset=0.01) } dev.off()

CairoPDF.A4("/home/mark/data/lab/lily/2008-07-03/densities-30-30-3groups.pdf") par(mfrow=c(1,3), mar=c(5,4,5,2)+0.1, las=1) for(ratio in c(0.8,1,1.2)) { cols <- which(ratios==ratio) plot.density.matrix(x30.30.unnorm$ratio[,cols], col=match(ratio, c(0.8,1,1.2)), xlab="ratio", main=p("\nexpected ratio ", ratio, ":1"), ylim=c(0,3.5)) polygon(density(rnorm(mean=mean(x30.30.unnorm$ratio[,cols], na.rm=T), sd=sd(x30.30.unnorm$ratio[,cols], na.rm=T), n=1e06)), col="#11111133", border=NA) plot.density.matrix(x30.30.unnorm$ratio[,cols], col=match(ratio, c(0.8,1,1.2)), xlab="ratio", main=p("\nexpected ratio ", ratio, ":1"), add=T) } mtext(side=3, "30-30 samples: unnormalised ratios", outer=T, line=-1) dev.off()

# # QQ plots -- are these data normally distributed?? # CairoPNG("2008-07-03/30-30-qqplots.png", 1200, 1200) par(mfrow=c(3,3), las=1, cex=1.5, mar=c(4,4,3,1)+0.1) for(i in c(1,3,6,4,5,8,2,7,9)) { qqnorm(x30.30.unnorm$ratio[,i], main=colnames(x30.30.unnorm$ratio)[i], cex=0.8) qqline(x30.30.unnorm$ratio[,i]) } dev.off()

# # volcano plots # CairoPNG.SVGA("2008-07-03/Volcano-Pvals.png") plot.volcano.tt(tt.ver6, pthresh=0.05, qthresh=NULL, pch=19) dev.off()

CairoPNG.SVGA("2008-07-03/Volcano-Qvals.png") plot.volcano.tt(tt.ver6, pthresh=NULL, qthresh=0.05, pch=19) dev.off()

# # deconvolute the normalisation from the import process. # files1 <- dir("2007-07-11/30-30", pattern="csv", full=T) unnorm30.30 <- import.relex.experiment(files1, normalize=F, verbose=F)

L. Ting, UNSW. 281 Appendix B

tmp <- normalizeWithin.relex.experiment(unnorm30.30, "none") tmp <- normalizeWithin.relex.experiment(unnorm30.30, "median") tmp <- normalizeWithin.relex.experiment(unnorm30.30, "loess") tmp <- normalizeBetween.relex.experiment(unnorm30.30, "none") tmp <- normalizeBetween.relex.experiment(unnorm30.30, "scale") tmp <- normalizeBetween.relex.experiment(unnorm30.30, "quantile") tmp <- normalizeWithin.relex.experiment(unnorm30.30, "loess") tmp <- normalizeBetween.relex.experiment(tmp, "scale") tmp <- normalize.relex.experiment(unnorm30.30, "loess", "quantile")

# # are the assumptions behind using the ST q-value method accurate? # library(qvalue) qvalue(tt.1$P.Value)$pi0 qvalue(tt.1$P.Value, lambda=0.4)$pi0 qvalue(tt.1$P.Value, lambda=0.5)$pi0 qvalue(tt.1$P.Value, lambda=0.6)$pi0 # > qvalue(tt.1$P.Value)$pi0 # [1] 0.717 # > qvalue(tt.1$P.Value, lambda=0.4)$pi0 # [1] 0.717 # > qvalue(tt.1$P.Value, lambda=0.5)$pi0 # [1] 0.705 # > qvalue(tt.1$P.Value, lambda=0.6)$pi0 # [1] 0.67 sum(qvalue(tt.1$P.Value)$q<0.05) sum(qvalue(tt.1$P.Value, lambda=0.4)$q<0.05) sum(qvalue(tt.1$P.Value, lambda=0.5)$q<0.05) sum(qvalue(tt.1$P.Value, lambda=0.6)$q<0.05) # > sum(qvalue(tt.1$P.Value)$q<0.05) # [1] 45 # > sum(qvalue(tt.1$P.Value, lambda=0.4)$q<0.05) # [1] 45 # > sum(qvalue(tt.1$P.Value, lambda=0.5)$q<0.05) # [1] 45 # > sum(qvalue(tt.1$P.Value, lambda=0.6)$q<0.05) # [1] 50 # # so the estimate of the proportion of truly null H's remains steady over # a range of lambda's 0.4-0.6, as does the number of DE proteins with q < 0.05. # # From the p-histogram, about 0.4-0.6 looks like a good spot to estimate this # proportion, however using the automated, cubic spine procedure of ST identifies # a higher proportion of truly nulls, ie a MORE CONSERVATIVE estimate than if we # manually specify a lambda... # We shall use the default value for completeness, and because Lily has been using # this data for a while now; that is, the justification for changing now is not # strong enough. # q <- qvalue(tt.1$P.Value)

CairoPDF("2008-07-03/qvalues.pdf", 9, 9) plot(q) dev.off()

282 L. Ting, UNSW. Appendix B

CairoPDF("2008-07-03/hist-pvals.pdf") par(las=1) hist(tt.1$P.Value, breaks=25, main="raw, unadjusted P-values", xlab="P-value") abline(h=nrow(tt.1)/25, lty="dashed") abline(h=nrow(tt.1)/25*q$pi0, lty="dotted") legend("topright", c("pi0 under H0", "estimated pi0"), lty=c("dashed", "dotted")) dev.off()

CairoPDF("2008-07-03/pvals-and-qvals.pdf") par(las=1) hist(tt.1$P.Value, breaks=25, main="raw, unadjusted P-values", xlab="P-value") abline(h=nrow(tt.1)/25, lty="dashed") abline(h=nrow(tt.1)/25*q$pi0, lty="dotted") legend("topright", c("pi0 under H0", "estimated pi0"), lty=c("dashed", "dotted")) plot(q) dev.off()

# # How many genes pass various q-value thresholds??? # qthresh <- c(0.05,0.1,0.15,0.2,0.25) tmp <- sapply(qthresh, function(x) sum(q$q < x)) names(tmp) <- p("< ", qthresh) w <- 0.6 CairoPDF("2008-07-03/Proportions-passing-qval-thresholds.pdf") par(las=1, mgp=c(4,1,0), mar=c(5,5,4,2)+0.1) barplot(tmp, xlab="FDR (%)", ylab="Number of differentially expressed proteins", width=w, space=w*0.9, ylim=c(0,290)) barplot(tmp*qthresh, col="black", add=T, width=w, space=w*0.9) # grid(col="black") legend("topleft", c("DE proteins", "Expected FP"), fill=c("grey", "black"), inset=0.02) text(x=barplot.getx(tmp, w=w, sp=w*0.9), y=tmp, labels=tmp, pos=3) dev.off()

# # demonstrate the effect of intra-array normalisation for the 30-30 data # CairoPDF.A4("2008-07-03/30-30.unnorm-vs-norm.ratios.pdf") par(mfrow=c(1,2)) par(las=2) boxplot(x30.30.unnorm$ratio, main="intra-experiment unnormalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x30.30.unnorm$ratio, na.rm=T)), col=1:8) axis.Mvals() boxplot(x30.30.norm$ratio, main="intra-experiment normalised ratio", ylab="ratio (log2)", varwidth=T, ylim=symmetricise(range(x30.30.norm$ratio, na.rm=T)), col=1:8) dev.off()

# # on 2008-07- # are there any DE proteins in the 30-30 data? we would expect not! # files1 <- dir("2007-07-11/30-30", pattern="csv", full=T) x30.30.raw <- import.relex.experiment(files1, normalize=F, verbose=F) x30.30.mn <- combine.relex.experiments(normalize.relex.experiment(x30.30.raw, "median", "none"),

L. Ting, UNSW. 283 Appendix B

normalized=T) x30.30.ln <- combine.relex.experiments(normalize.relex.experiment(x30.30.raw, "loess", "none"), normalized=T) x30.30.ms <- combine.relex.experiments(normalize.relex.experiment(x30.30.raw, "loess", "scale"), normalized=T) x30.30.nn <- combine.relex.experiments(normalize.relex.experiment(x30.30.raw, "none", "scale"), normalized=T)

# intra-array norm: median # inter-array norm: none # = mn x30.30.mn.stats <- list() x30.30.mn.stats$design <- rep(1,9) x30.30.mn.stats$fit <- lmFit(x30.30.mn$ratio, x30.30.mn.stats$design) x30.30.mn.stats$fit2 <- eBayes(x30.30.mn.stats$fit) x30.30.mn.stats$tt <- topTableQ(x30.30.mn.stats$fit2, coef=1, number=length(x30.30.mn.stats$fit2$coefficients)) sum(x30.30.mn.stats$tt$q < 0.05) # [1] 0

# intra-array norm: loess # inter-array norm: none # = ln x30.30.ln.stats <- list() x30.30.ln.stats$design <- rep(1,9) x30.30.ln.stats$fit <- lmFit(x30.30.ln$ratio, x30.30.ln.stats$design) x30.30.ln.stats$fit2 <- eBayes(x30.30.ln.stats$fit) x30.30.ln.stats$tt <- topTableQ(x30.30.ln.stats$fit2, coef=1, number=length(x30.30.ln.stats$fit2$coefficients)) sum(x30.30.ln.stats$tt$q < 0.05) # [1] 0

# intra-array norm: median # inter-array norm: scale # = ms x30.30.ms.stats <- list() x30.30.ms.stats$design <- rep(1,9) x30.30.ms.stats$fit <- lmFit(x30.30.ms$ratio, x30.30.ms.stats$design) x30.30.ms.stats$fit2 <- eBayes(x30.30.ms.stats$fit) x30.30.ms.stats$tt <- topTableQ(x30.30.ms.stats$fit2, coef=1, number=length(x30.30.ms.stats$fit2$coefficients)) sum(x30.30.ms.stats$tt$q < 0.05) # [1] 0

# intra-array norm: none # inter-array norm: none # = nn x30.30.nn.stats <- list() x30.30.nn.stats$design <- rep(1,9) x30.30.nn.stats$fit <- lmFit(x30.30.nn$ratio, x30.30.nn.stats$design) x30.30.nn.stats$fit2 <- eBayes(x30.30.nn.stats$fit) x30.30.nn.stats$tt <- topTableQ(x30.30.nn.stats$fit2, coef=1, number=length(x30.30.nn.stats$fit2$coefficients)) sum(x30.30.nn.stats$tt$q < 0.05) # [1] 1 # # NO differentially Expressed proteins in any of the normalized 30-30 experiments -- great result. # 1 DE protein found in the un normalised data... #

284 L. Ting, UNSW. Appendix B

summary(x30.30.mn.stats$tt$logFC) summary(x30.30.ln.stats$tt$logFC) summary(x30.30.nn.stats$tt$logFC) summary(x30.30.ms.stats$tt$logFC) # > summary(x30.30.mn.stats$tt$logFC) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -0.81500 -0.06440 0.00926 0.01700 0.08830 0.53300 # > summary(x30.30.ln.stats$tt$logFC) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -0.82900 -0.06920 0.00353 0.00580 0.07030 0.54000 # > summary(x30.30.nn.stats$tt$logFC) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -0.5100 -0.0492 0.0323 0.0555 0.1310 1.0600 # > summary(x30.30.ms.stats$tt$logFC) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -0.63700 -0.06750 0.00396 0.00730 0.07970 0.71400

CairoPNG.SVGA("2008-07-15/x30-30.densityplots.png") par(las=1) plot(density(x30.30.nn.stats$tt$logFC), col=1, main="Comparison of normalised ratios: 30-30 data", xlab="ratio (log2)", xlim=c(-1.1,1.1), ylim=c(0,4)) lines(density(x30.30.mn.stats$tt$logFC), col=2) lines(density(x30.30.ln.stats$tt$logFC), col=3) lines(density(x30.30.ms.stats$tt$logFC), col=4) legend("topright", c("nn", "mn", "ln", "ms"), lty=1, col=1:4, inset=0.01) dev.off()

CairoPNG.XGA("2008-07-15/x30-30.median-vs-loess.comparison-of-ratios.png") par(las=1) plot.cor(x30.30.mn.stats$fit$coefficients, x30.30.ln.stats$fit$coefficients, main="Comparison of normalised data: 30-30\n median norm vs loess norm", xlab="Average Ratio (log2) - median norm", ylab="Average Ratio (log2) - loess norm") dev.off() # R^2 = 0.965

CairoPNG.XGA("2008-07-15/x30-30.median.none-vs-scale.comparison-of-ratios.png") par(las=1) plot.cor(x30.30.mn.stats$fit$coefficients, x30.30.ms.stats$fit$coefficients, main="Comparison of normalised data: 30-30\n median, none norm vs median+scale norm", xlab="Average Ratio (log2) - median+none norm", ylab="Average Ratio (log2) - median+scale norm") dev.off() # R^2 = 0.927 t(sapply( list(x30.30.nn.stats$tt, x30.30.mn.stats$tt, x30.30.ln.stats$tt, x30.30.ms.stats$tt), summarise.tt)) # P<0.05 P<0.001 P<1e-04 q<0.25 q<0.1 q<0.05 N # [1,] 17 3 1 3 2 1 479 # [2,] 52 3 0 48 16 0 479 # [3,] 47 1 0 16 0 0 479

L. Ting, UNSW. 285 Appendix B

# [4,] 42 0 0 0 0 0 479 # # So while there are no DE proteins in mn at q < 0.05, there are more DE proteins than nn at q<0.1, # and more at P<0.05 than you'd expect by chance (479*.05=24) # # This is not a clear cut demonstration that unnormalisd data -> messy signals that can induce DE # and normalised signals are cleaner=//////// and avoid DE. # Perhaps normalising, and thus cleaning up the signals introduces some uniformity that allows some # DE signals to appear. #

# # Export the topTables.... # export.topTable(merge(x30.30.nn.stats$tt, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F, sort=F), "2008-07-15/x30-30.topTable.nn.xls") export.topTable(merge(x30.30.mn.stats$tt, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F, sort=F), "2008-07-15/x30-30.topTable.mn.xls") export.topTable(merge(x30.30.ln.stats$tt, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F, sort=F), "2008-07-15/x30-30.topTable.ln.xls") export.topTable(merge(x30.30.ms.stats$tt, info[,c("ID", "Description")], by="ID", all.x=T, all.y=F, sort=F), "2008-07-15/x30-30.topTable.ms.xls")

CairoPNG("2008-07-15/x30-30.Volcano-Pvals.png", 1200, 1200) par(mfrow=c(2,2)) plot.volcano.tt(x30.30.nn.stats$tt, main="Volcano Plot: 30-30 nn") plot.volcano.tt(x30.30.mn.stats$tt, main="Volcano Plot: 30-30 mn") plot.volcano.tt(x30.30.ln.stats$tt, main="Volcano Plot: 30-30 ln") plot.volcano.tt(x30.30.ms.stats$tt, main="Volcano Plot: 30-30 ms") dev.off() CairoPNG("2008-07-15/x30-30.Volcano-Qvals.png", 1200, 1200) par(mfrow=c(2,2)) plot.volcano.tt(x30.30.nn.stats$tt, main="Volcano Plot: 30-30 nn", pt=NULL, qt=0.05) plot.volcano.tt(x30.30.mn.stats$tt, main="Volcano Plot: 30-30 mn", pt=NULL, qt=0.05) plot.volcano.tt(x30.30.ln.stats$tt, main="Volcano Plot: 30-30 ln", pt=NULL, qt=0.05) plot.volcano.tt(x30.30.ms.stats$tt, main="Volcano Plot: 30-30 ms", pt=NULL, qt=0.05) dev.off()

# what are the duplicate correlations?? x10.30.AF.stats$dupcor x10.30.AB.stats$dupcor x10.30.AD.stats$dupcor x10.30.EF.stats$dupcor # > x10.30.AF.stats$dupcor # [1] 0.628 # > x10.30.AB.stats$dupcor # [1] 0.583 # > x10.30.AD.stats$dupcor # [1] 0.435 # > x10.30.EF.stats$dupcor # [1] 0.952

286 L. Ting, UNSW. Appendix C

Appendix C.

Table C.1. 10ºC vs. 30ºC MS identification and quantitation results. DTA Select RelEx No. No. NR No. NR No No proteins Experiment No. matched proteins peptides proteins after spectra spectra identified identified quantified filtering A1_1 174139 10484 1009 4887 623 320 A1_2 177269 10608 1044 5168 729 401 A2_1 126253 12963 984 6165 760 503 A2_2 121939 13171 969 6207 774 481 B1_1 104313 15111 1024 7780 840 590 B1_2 102754 14628 1003 7516 838 561 B2_1 245500 16733 1242 7550 861 526 B2_2 245173 16547 1251 7678 851 511 C1_1 216769 14537 1047 5045 727 401 C1_2 216378 14560 1069 5156 738 403 C2_1 247590 17791 1326 7064 769 402 C2_2 242480 17857 1316 7095 776 399 D1_1 218228 16190 1244 6366 858 481 D1_2 205593 15140 1288 6052 802 461 D2_1 194832 13353 1129 5559 818 481 D2_2 187540 12683 1130 5428 804 475 E_1 294112 24247 1242 6267 968 645 E_2 291247 23860 1233 6224 992 654 F_1 266625 9262 1173 4843 861 508 F_2 236505 8436 1127 4557 839 495 In experiments A-D, cell pellets were combined based on equal OD. In experiments E-F, proteins were extracted separately and combined 1:1 to give equal protein concentration for 14N and 15N samples. Experiments A, C and E were 10ºC 14N combined with 30ºC 15N, experiments B, D and F were 10ºC 15N combined with 30ºC 14N. NR, non redundant.

L. Ting, UNSW. 287 Appendix D

Appendix D.

Table D.1. S. alaskensis proteins identified using metabolic labeling and a GeLC-MS/MS platform. 2135 proteins were confidently identified in all MS experiments and are shown below. Proteins were identified using the SEQUEST search algorithm in Bioworks BioBrowser (v 3.3) with the following search parameters against the S. alaskensis database: monoisotopic precursor and fragments mass type, fully enzymatic trypsin (KR) enzyme with allowance for one missed cleavage, and variable acrylamide, carbamidomethyl and oxidation modifications. For LCQ data files a 1.2 Da peptide tolerance and 0.6 Da fragment ion tolerance was used, while for LTQ data files a 0.8 Da peptide tolerance and 0.6 Da. The S. alaskensis database contains 3208 proteins. Identifications were filtered using DTA Select based on the following parameters: DeltaCN of at least 0.08, a minimum XCorr of 2.1 for +1, 2.7 for +2, and 3.2 for +3 charged peptides. The MS/MS were also interrogated against a decoy database (randomised S. alaskensis) with ~1% false positive identification rate in DTA Select. For each identified protein the following is shown: RefSeq accession number, locus tag, protein description, sequence coverage (%), number of peptides identified and identified peptides listed.

See CD attached to thesis for table.

288 L. Ting, UNSW. Appendix E

Appendix E.

Table E.1. Quantified proteins from all 10ºC vs 30ºC metabolic labeling experiments (A-F) 1172 proteins were quantified using RelEx (v 0.92) and are shown below. The quantitation parameters used were: 4 scans before; 4 scans after; 0.15 threshold factor; apply Savitsky-Golay filter with 7 points; apply S/N filer at 5; apply regression filter with 0.8 minimum correlation at 1, 0.7 minimum correlation at 10; 99% incorporation of 15N. Only proteins with two or more peptides were considered for quantitation. The following is shown for each quantified protein: locus tag; RefSeq accession number; protein 14 15 description; averaged N/ N fold change ratio (log2 FC); the number of experiments in which the protein was quantified (N). Three statistical tests were performed: the Student's t-test, the Student's t-test accounting for replicate correlation and buffers using linear modeling, and an empirical Bayes moderated t-test. For each of the statistical tests, the T-value, p-value unadjusted for multiple testing (p-value), Bonferroni adjusted p-value (p-value(BON)), Storey-Tibshirani FDR adjusted q-value (q-value(ST)) (where these exist) for each protein are shown. The experimental averaged 14N/15N ratio (where these exist across experiments A-F); mean across all experiments; and standard deviation for each experiment is also shown

See CD attached to thesis for table.

L. Ting, UNSW. 289 Appendix F

Appendix F.

Figure F.1. MA plots of peptides and proteins pre- and post-normalisation for artificially skewed data. Unnormalised peptides (left column), proteins (middle column) and normalised proteins (right column) were represented in 14 15 MA plots, using the log2 N/ N ratio of abundance (y-axis) and log10 S/N (x- axis), and a non-linear, locally weighted regression line (lowess) was drawn (pink line). Cells were combined in 0.8:1 14N/15N ratios (panels A-C), 1:1 14N/15N ratios (panels D and E), and 1.2:1 14N/15N ratios (panels F-H). Proteins were globally normalised using lowess normalisation. Continued on next page.

290 L. Ting, UNSW. Appendix F

Figure F.1. continued.

L. Ting, UNSW. 291 Appendix G

Appendix G.

Figure G.1. MA plots of peptides and proteins pre- and post-normalisation for skewed 10ºC vs. 30ºC data. Unnormalised peptides (left column), proteins (middle column) and normalised proteins (right column) were represented in 14 15 MA plots. Using the log2 N/ N ratio of abundance (y-axis) and log10 S/N (x- axis), a non-linear, locally weighted regression line (lowess) was drawn (pink line). Cells were combined: 10ºC 14N with 30ºC 15N (panels A-G) or 10ºC 15N with 30ºC 14N (panels H-N). Proteins were extracted in a Tris PE buffer (panels D-G and H-K) or Urea PE buffer (panels A-C and L-N). Proteins were globally normalised using lowess normalisation. Continued on the next 3 pages.

292 L. Ting, UNSW. Appendix G

Figure G.1. continued.

L. Ting, UNSW. 293 Appendix G

Figure G.1. continued.

294 L. Ting, UNSW. Appendix G

Figure G.1. continued.

L. Ting, UNSW. 295 Appendix H

Appendix H.

Figure H.1. MA plots of peptides and proteins pre- and post-normalisation for non-skewed 10ºC vs. 30ºC data. Unnormalised peptides (left column), proteins (middle column) and normalised proteins (right column) were 14 15 represented in MA plots. Using the log2 N/ N ratio of abundance (y-axis) and log10 S/N (x-axis), a non-linear, locally weighted regression line (lowess) was drawn (pink line). Protein extracts from 10ºC 14N with 30ºC 15N samples were combined 1:1. Proteins were globally normalised using lowess normalisation.

296 L. Ting, UNSW. Appendix I

Appendix I.

Table I.1. Linear modelling design matrix. Description of experiments Indicator variables Experiment Name 14N 15N Buffer Biorep 10ºC vs 30ºC buffer 1 A1_1 10ºC 30ºC Tris A 1 0 2 A1_2 10ºC 30ºC Tris A 1 0 3 A2_1 10ºC 30ºC Tris A 1 0 4 A2_2 10ºC 30ºC Tris A 1 0 5 B1_1 30ºC 10ºC Tris B -1 0 6 B1_2 30ºC 10ºC Tris B -1 0 7 B2_1 30ºC 10ºC Tris B -1 0 8 B2_2 30ºC 10ºC Tris B -1 0 9 C1_1 10ºC 30ºC Urea C 1 1 10 C1_2 10ºC 30ºC Urea C 1 1 11 C2_1 10ºC 30ºC Urea C 1 1 12 C2_2 10ºC 30ºC Urea C 1 1 13 D1_1 30ºC 10ºC Urea D -1 1 14 D1_2 30ºC 10ºC Urea D -1 1 15 D2_1 30ºC 10ºC Urea D -1 1 16 D2_2 30ºC 10ºC Urea D -1 1 17 E_1 10ºC 30ºC Tris E 1 0 18 E_2 10ºC 30ºC Tris E 1 0 19 F_1 30ºC 10ºC Tris F -1 0 20 F_2 30ºC 10ºC Tris F -1 0 The linear model design matrix accounts for experimental variables including the experimental 14N/15N label swap, where some 10ºC samples are labeled with 14N (+1) while others are labeled with 15N (-1), the Tris (0) or urea (1) protein extraction buffer, and the number of technical replicates in each experiment (two for experiments E-F and four for experiments A-D). The biorep factor groups the experiments into biological replicates, and the technical replicates within each biological replicate are expected to have a higher correlation to each other than between biological replicates.

L. Ting, UNSW. 297 Appendix J

Appendix J.

Table J.1. Proteins with unchanged abundance at 10ºC vs. 30ºC. The protein with unchanged abundance were the top 150 proteins with the largest q-values. Accession number refers to the RefSeq accession number for each protein. FC; 14N:15N fold change. FDR; false discovery rate expressed as a q-value.

ACCESSION LOCUS PROTEIN FC FDR Lipid transport and metabolism (I) YP_615736 Sala_0682 AMP-dependent synthetase and ligase 1.0 0.71 YP_618105 Sala_3068 3-hydroxyacyl-CoA dehydrogenase, NAD-binding 1.0 0.70 YP_617999 Sala_2961 Putative acyl-carrier protein 1.0 0.70 YP_617337 Sala_2295 AMP-dependent synthetase and ligase 1.0 0.70 YP_616942 Sala_1897 Malonyl CoA-acyl carrier protein transacylase 1.0 0.70 YP_618093 Sala_3056 3-hydroxyisobutyrate dehydrogenase 0.9 0.69 Energy production and conversion (C) YP_616407 Sala_1360 Luciferase-like protein 1.0 0.71 YP_616719 Sala_1673 Cytochrome c oxidase, cbb3-type, subunit III 1.0 0.71 YP_615420 Sala_0364 Alcohol dehydrogenase, zinc-binding 1.0 0.71 YP_618118 Sala_3081 Phosphoenolpyruvate carboxylase 1.0 0.71 YP_617335 Sala_2293 NADH:ubiquinone oxidoreductase 17.2 kD subunit 1.0 0.72 YP_618099 Sala_3062 Hypothetical protein 1.0 0.72 YP_618000 Sala_2962 Transketolase, central region 1.0 0.70 YP_617779 Sala_2741 4Fe-4S ferredoxin, iron-sulfur binding 1.0 0.70 YP_616234 Sala_1185 Electron-transferring-flavoprotein dehydrogenase 1.0 0.70 YP_617174 Sala_2132 Alcohol dehydrogenase, zinc-binding 1.0 0.69 YP_616390 Sala_1343 Cytochrome c1 1.0 0.69 YP_617494 Sala_2454 Aldehyde dehydrogenase (NAD+) 1.1 0.67 Carbohydrate transport and metabolism (G) YP_616656 Sala_1610 Pyruvate, phosphate dikinase 1.0 0.71 Amino acid transport and metabolism (E) YP_616844 Sala_1799 Shikimate kinase 1.0 0.71 5-methyltetrahydrofolate-- YP_615096 Sala_0037 methyltransferase 1.0 0.71 YP_617401 Sala_2359 Peptidase M19, renal .0 0.71 YP_615320 Sala_0264 Extracellular -binding receptor 1.0 0.72 YP_616250 Sala_1201 dehydrogenase 1.0 0.72 YP_617213 Sala_2171 Glycine C-acetyltransferase 1.0 0.70 YP_616986 Sala_1941 Succinylarginine dihydrolase 1.0 0.70 YP_616985 Sala_1940 Arginine N-succinyltransferase 1.0 0.70 YP_616517 Sala_1471 Acetolactate synthase, small subunit 1.0 0.70 YP_615821 Sala_0768 Phosphoadenosine phosphosulfate reductase 1.0 0.70 YP_616887 Sala_1842 Threonine dehydratase 1.0 0.70 YP_617026 Sala_1982 Gamma-glutamyltransferase 1.0 0.70 YP_616573 Sala_1527 Sulfate adenylyltransferase, small subunit 1.0 0.70 YP_615723 Sala_0669 Cysteine desulfurases, SufS subfamily 1.0 0.69 YP_615753 Sala_0699 ABC transporter related 1.1 0.69 YP_616916 Sala_1871 Glycine cleavage system T protein 1.1 0.68

298 L. Ting, UNSW. Appendix J

YP_615735 Sala_0681 Gamma-glutamyltransferase 1.0 0.68 YP_616445 Sala_1398 Threonine synthase 1.1 0.67 YP_617988 Sala_2950 Diaminobutyrate--2-oxoglutarate aminotransferase 1.0 0.67 Nucleotide transport and metabolism (F) YP_618141 Sala_3104 --glycine ligase 1.0 0.71 YP_616819 Sala_1773 Ribonucleoside-diphosphate reductase 1.0 0.71 YP_615974 Sala_0923 Amidohydrolase 1.0 0.70 YP_617835 Sala_2797 Adenylate kinase 1.0 0.67 Coenzyme transport and metabolism (H) YP_617412 Sala_2370 Thiamine biosynthesis protein ThiC 1.0 0.71 YP_618028 Sala_2990 Riboflavin synthase, alpha subunit 1.0 0.72 YP_617923 Sala_2885 Methionine biosynthesis MetW 1.0 0.72 YP_618019 Sala_2981 3,4-dihydroxy-2-butanone 4-phosphate synthase 1.0 0.70 YP_616703 Sala_1657 3,4-dihydroxy-2-butanone 4-phosphate synthase 1.0 0.70 YP_616342 Sala_1295 Biotin--acetyl-CoA-carboxylase ligase 1.1 0.70 YP_615626 Sala_0572 Ubiquinone biosynthesis protein COQ7 0.9 0.69 Inorganic ion transport and metabolism (P) YP_615873 Sala_0822 Phosphate uptake regulator, PhoU 1.0 0.71 YP_616501 Sala_1455 TonB-dependent receptor, plug 1.0 0.71 YP_616800 Sala_1754 Polyphosphate kinase 1.0 0.70 YP_615877 Sala_0826 Phosphate binding protein, putative 1.0 0.70 YP_615794 Sala_0741 TonB-dependent receptor 1.0 0.70 YP_615654 Sala_0600 Na+/H+ antiporter NhaA 1.0 0.70 YP_617421 Sala_2379 TonB-dependent receptor 0.9 0.69 YP_618204 Sala_3168 TonB-dependent receptor 1.0 0.69 YP_617108 Sala_2066 Ferric uptake regulator, Fur family 1.1 0.67 Secondary metabolites biosynthesis, transport and catabolism (Q) YP_616460 Sala_1413 Amidohydrolase 1.0 0.72 YP_615554 Sala_0500 DSBA oxidoreductase 0.9 0.68 YP_618107 Sala_3070 Short-chain dehydrogenase/reductase SDR 1.1 0.68 Replication, recombination and repair (L) YP_616801 Sala_1755 ATPase involved in DNA replication initiation 1.0 0.71 YP_617150 Sala_2108 DNA topoisomerase IV, A subunit 1.0 0.69 Transcription (K) YP_617675 Sala_2637 Helicase-like protein 1.0 0.70 YP_616138 Sala_1088 Transcriptional regulator, LysR family 1.0 0.70 YP_617986 Sala_2948 Transcriptional regulator, MarR family 0.9 0.69 Translation, ribosomal structure and biogenesis (J) YP_615405 Sala_0349 Cysteinyl-tRNA synthetase 1.0 0.71 YP_615517 Sala_0463 Lysyl-tRNA synthetase 1.0 0.72 YP_616961 Sala_1916 Translation initiation factor IF-1 1.0 0.71 YP_617764 Sala_2726 16S rRNA processing protein RimM 1.0 0.70 YP_616361 Sala_1314 Translation elongation factor P 1.0 0.70 YP_615839 Sala_0786 RNA-binding S4 1.0 0.70 YP_615746 Sala_0692 Threonyl-tRNA synthetase 1.0 0.70 YP_615305 Sala_0249 Methionyl-tRNA formyltransferase 1.0 0.70 YP_617083 Sala_2041 Ribosomal protein L21 1.0 0.69 YP_617002 Sala_1957 Ribosome recycling factor 1.0 0.69 YP_615892 Sala_0841 G-tRNA(Gln) amidotransferase, A subunit 1.0 0.68

L. Ting, UNSW. 299 Appendix J

YP_615304 Sala_0248 tRNA pseudouridine synthase A 1.0 0.68 YP_617095 Sala_2053 tRNA/rRNA methyltransferase (SpoU) 0.9 0.68 Cell wall/membrane/envelope biogenesis (M) YP_617199 Sala_2157 Glycosyl transferase, group 1 1.0 0.71 YP_615766 Sala_0712 Cell wall-associated hydrolase 1.0 0.71 YP_615525 Sala_0471 Secretion protein HlyD 1.0 0.72 YP_618212 Sala_3176 Glycosyl transferase, family 51 1.0 0.70 YP_618014 Sala_2976 Penicillin-binding protein 1A 1.0 0.70 YP_616627 Sala_1581 Acylneuramite cytidylyltransferase 1.0 0.70 YP_615092 Sala_0033 Outer membrane protein 1.0 0.70 YP_616622 Sala_1576 N-acylneuramite-9-phosphate synthase 1.0 0.69 RND efflux system, outer membrane lipoprotein, YP_616496 Sala_1450 NodT 0.9 0.69 YP_617178 Sala_2136 3-beta hydroxysteroid dehydrogenase/isomerase 1.0 0.69 YP_616417 Sala_1370 Nucleotidyl transferase 0.9 0.68 YP_616925 Sala_1880 UDP-N-acetylmuramate--alanine ligase 1.0 0.68 Signal transduction mechanisms (T) YP_615778 Sala_0724 Transcriptional regulators, TraR/DksA family 1.0 0.71 YP_617091 Sala_2049 HPr kinase 1.0 0.71 YP_617191 Sala_2149 PTSINtr with GAF domain, PtsP 1.0 0.70 YP_616700 Sala_1654 Diguanylate cyclase/ 1.0 0.70 YP_615872 Sala_0821 Two component transcriptional regulator, winged 1.0 0.70 helix family YP_615219 Sala_0162 PAS/PAC sensor signal transduction histidine kinase 1.0 0.70 Two component, sigma54 specific, transcriptional YP_616322 Sala_1275 regulator, Fis family 0.9 0.69 YP_617228 Sala_2186 Response regulator receiver protein 1.0 0.69 Intracellular trafficking, secretion and vesicular transport (U) YP_616491 Sala_1445 SecE subunit of protein translocation complex 1.0 0.71 YP_616969 Sala_1924 Conserved hypothetical protein 1.0 0.72 YP_616759 Sala_1713 SecF protein 1.0 0.70 Defense mechanisms (V) YP_616225 Sala_1176 ABC transporter related 1.0 0.70 Cell cycle control, cell division, chromosome partitioning (D) YP_617312 Sala_2270 Chromosome segregation protein SMC 1.0 0.72 YP_617707 Sala_2669 Cell divisionFtsK/SpoIIIE 1.1 0.69 Posttranslation modifications, protein turnover and chaperones (O) YP_616776 Sala_1730 SsrA-binding protein 1.0 0.71 YP_617263 Sala_2221 Signal peptide peptidase SppA, 67K type 1.0 0.71 YP_616561 Sala_1515 Thioredoxin-like protein 1.0 0.70 YP_617291 Sala_2249 HflK protein 1.0 0.69 YP_617257 Sala_2215 Band 7 protein 1.0 0.69 YP_615447 Sala_0391 Heat shock protein Hsp20 1.0 0.68 YP_617433 Sala_2391 ATPase AAA-2 1.0 0.68 YP_615075 Sala_0016 Glutathione S-transferase-like protein 0.9 0.68 YP_616857 Sala_1812 20S proteasome, A and B subunits 0.9 0.67 YP_616559 Sala_1513 Cytochrome c biogenesis factor 1.1 0.67 General function prediction only (R) YP_616105 Sala_1055 Alcohol dehydrogenase, zinc-binding 1.0 0.71 YP_616319 Sala_1272 Host factor Hfq 1.0 0.71

300 L. Ting, UNSW. Appendix J

YP_615973 Sala_0922 2-nitropropane dioxygenase, NPD 1.0 0.72 YP_617815 Sala_2777 Molybdopterin binding domain 1.0 0.72 YP_618019 Sala_2981 3,4-dihydroxy-2-butanone 4-phosphate synthase 1.0 0.70 YP_616098 Sala_1048 SapC 1.0 0.70 YP_615707 Sala_0653 Phenazine biosynthesis PhzC/PhzF protein 1.0 0.70 YP_615827 Sala_0774 Alpha/beta hydrolase fold 0.9 0.69 - Sala_1436 Thiamine biosynthesis protein 1.0 0.68 YP_618205 Sala_3169 Tetratricopeptide TPR_2 0.9 0.67 YP_618017 Sala_2979 Electron transport protein SCO1/SenC 0.9 0.67 YP_617989 Sala_2951 Ectoine synthase 0.9 0.67 YP_617930 Sala_2892 ABC transporter related 0.9 0.67 YP_617865 Sala_2827 Pyrimidine 5- 1.0 0.67 YP_617704 Sala_2666 Peptidase S16, lon-like protein 1.1 0.67 YP_616642 Sala_1596 Transcriptional regulator, XRE family 1.0 0.67 Function unknown (S) YP_615227 Sala_0170 Hypothetical protein 1.0 0.71 YP_615547 Sala_0493 Hypothetical protein 1.0 0.71 YP_616617 Sala_1571 Hypothetical protein 1.0 0.71 YP_618057 Sala_3019 Hypothetical protein 1.0 0.70 YP_617340 Sala_2298 Hypothetical protein 1.0 0.70 YP_617247 Sala_2205 Hypothetical protein 1.0 0.70 YP_616749 Sala_1703 Protein of unknown function DUF1275 1.0 0.70 YP_616667 Sala_1621 Hypothetical protein 1.0 0.70 YP_616486 Sala_1440 Hypothetical protein 1.0 0.70 YP_616500 Sala_1454 Hypothetical protein 1.0 0.70 YP_616222 Sala_1173 Hypothetical protein 1.0 0.70 YP_615625 Sala_0571 Hypothetical protein 1.0 0.70 YP_618135 Sala_3098 Hypothetical protein 1.0 0.70 YP_617299 Sala_2257 Hypothetical protein 1.1 0.69 YP_616480 Sala_1433 Hypothetical protein 1.0 0.69 YP_617066 Sala_2022 Protein of unknown function DUF853, NPT 0.9 0.69 hydrolase putative YP_618009 Sala_2971 Hypothetical protein 0.9 0.68 YP_615818 Sala_0765 Uncharacterized conserved protein UCP032146 1.1 0.67 YP_616763 Sala_1717 Protein of unknown function DUF847 1.1 0.67 YP_616846 Sala_1801 Hypothetical protein 1.1 0.67

L. Ting, UNSW. 301