Master’s degree in & Bioinformatics January 2008

Master’s report

November 2006 –April 2007

Characterisation of human cerebrospinal fluid (CSF) after Tandem Mass Tag (TMT0) labelling

Virginie LICKER

Supervision: Vaksha Patel & Dr. Malcolm Ward, Proteome Sciences

Geneva University coordinator: Dr. Jean-Charles Sanchez, BPRG

Proteome Sciences Plc Laboratory, London Institute of Psychiatry King’s College London De Crespigny Park London SE5 8AF United Kingdom

Contact: [email protected]

A la mémoire d’Alexis Etudiant MPB 2005-2007

Son amitié, sa générosité, sa sincérité, son sens de l’humour et sa bonne humeur communicative, resteront des souvenirs impérissables de ces années communes d’Université.

2

Acknowledgements

I would like to express my gratitude to Dr. Malcolm Ward, head scientist of Proteome Sciences London’s research team, for welcoming me and allowing me to complete my internship in Proteome Sciences laboratory. I would like to thank particularly Vaksha Patel my supervisor and Dr. Helen Byers, for guiding me throughout my first labwork experiments as well as for their scientific advice. Many thanks to all my other lab’s mates, James, Annette, Darragh, Abdul, “Maxton” and the King’s academics, Andy, Mirreia and Steve for their care, kindness, help and for making my everyday life in the lab’ fun. I would like to thank Dr. Jean-Charles Sanchez for allowing me to continue the reflexion initiated in London on the Geneva Proteomics plateform. I am grateful for his patience and for trusting in my capabilities, as well as for his encouragements and communication skills. I would also like to thank Loic Dayon, for letting me benefit of his knowledge on TMT and other techniques through scientific discussions and lab’ experiments. Thanks to my MPB classmates (Yann, Harris, Ilona, Bea, Nath, Greg & Cie) and the members of the BPRG for stimulating me to write up my report! Last but not least, I would like to thank my friends, especially Dany and Alain, as well as my family for their flawless support.

3

Characterisation of human cerebrospinal fluid (CSF) after Tandem Mass Tag (TMT0) labeling

A. ABSTRACT…………………………………………………………………...p.7

B. INTRODUCTION…………………………………………………………….p.8-17

1. Introduction to proteomics p.8

2. Sample preparation for : p.9-11 2.1 Principles and Instrumentation 2.2 Reversed Phase and Strong Cation Exchange characteristics 2.3 Workflows

3. Mass Spectrometry p.11-13 3.1 Principles and Instrumentation (Maldi-TOF and Q-TOF) 3.2 identification 3.3 Database search and interpretation

4. p.13-15

5. Biomarker discovery in the Cerebrospinal Fluid (CSF) p.15-16

6. Description of the project p.16

C. MATERIAL AND METHODS……………………………………………....p.17-23

1. Reagents p.17 2. Samples p.17 3. Determination of total protein concentration: Bradford assay p.17-18 4. and Staining p.18 5. Digestion p.18-19 5.1 In-solution digestion 5.1.1. Standard 5.1.2 TMT protocol 5.2 In-gel digestion: TMT protocol 6. Labeling using TMT0 reagent p.19 7. Chromatography p.19-21 7.1 Manual SCX 7.2 HPLC 7.2.1 RPC18-SCX in series 7.2.2. a. RPC18 b. SCX 8. Sample preparation for mass spectrometry: Zip-tipping with C18 p.21 9. Mass Spectrometry p.22-23 9.1. Protein identification by Mass Fingerprinting (PMF) 9.2.1. Sample preparation 9.2.2. Data acquisition: MALDI-TOF Voyager DE Pro

4

9.2.3. Data analysis: PMF 9.3. Protein identification by (MS/MS) 9.3.1. Sample preparation 9.3.2. Data acquisition: ESI Q-TOF Micro 9.3.3. Data analysis

D. RESULTS & DISCUSSION…………………………………………….……p.24-42

1. Sample preparation and pre-fractionation for using isobaric reagent TMT p.24-31

1.1. Evaluation of TMT protocols using BSA a) Digestion protocols: “in solution” and “in gel” digestion b) Buffers: TEAB versus Borate buffer c) Labeling “efficiency”

1.2. Optimization of the purification and fractionation of TMT0 labeled on HPLC using BSA a) Columns configuration b) Buffer composition c) Sample preparation - RP C18 - SCX d) Method

2. Mass spectrometry p.32-36

2.1. MS and MS/MS data for the identification and quantification of 2.1.1. MALDI-TOF analysis 2.1.1.1. Sample preparation: methods and matrix 2.1.1.2. Peptide representation 2.1.1.3. Limits of detection (gel, peptide standard) 2.1.1.4 Applications: - Protein identification: PMF - Quality control

2.1.2. ESI-Q-TOF analysis 2.1.2.1. Sample preparation: zip-tipping or not? 2.1.2.2. Protein identification & quantification

3. Protein profiling of CSF after isobaric TMT labeling p.37-41 3.1. Sample preparation 3.1.1. Determination of total protein amounts 3.1.2. Workflow 3.1.3. Influence of buffer pH on SCX fractionation

5

3.2. Protein analysis 3.2.1. Protein identification 3.2.2. Protein quantification 3.2.3. Biological signification 3.3.1. Method improvements

E. CONCLUSION………………………………………………………………...p.42-43

F. TABLES………………………………………………………………………..p.44-46

G. REFERENCES………………………………………………………………...p.47-48

6

A. ABSTRACT

Clinical diagnosis of neurodegenerative diseases such as Alzheimer’s or Parkinson’s disease is still unsatisfactory and requires sensitive and specific biomarkers for their detection, prevention and treatment. Extensive characterisation of cerebrospinal fluid (CSF) can contribute to a better understanding of underlying pathogenetic neurodegenerative mechanisms and provide a valuable source for biomarker discovery, due to its brain proximity and privileged connection to blood. The analysis of the CSF proteome is challenging because of its diversity and high dynamic range that requires very robust and sensitive quantitative proteomics plateforms to detect down to femtomole protein levels. In this project, a shotgun proteomics approach using the isobaric tandem mass tags TMT0 was employed to label undepleted human CSF, followed by an off-line 2D-LC and identification of proteins with tandem mass spectrometry (MS/MS). The optimised protocol allowed the characterisation of 73 proteins of which 67 were labeled. Among them, brain specific proteins and other documented potential biomarkers were found. These results demonstrated the applicability of the established protocol and suggested that multiplex quantitative proteomics methods using the TMT technology can be applied to CSF for the detection of neurodegenerative biomarkers in up to six samples simultaneously.

7

B. INTRODUCTION

1. Introduction to proteomics

The term “proteomics” refers to the systematic study of all proteins present in a cell or tissue at a given time, to describe their structure, function and expression in various biological systems1. As proteins are involved in almost all biological activities, their identification and the determination of their covalent structures have been central to life sciences, providing a window into complex cellular regulatory networks2. Before the genomics revolution, chemical methods such as the stepwise Edman N- terminal degradation were used to determine the sequence of single, highly purified proteins3. With the emergence of genomics, time consuming methods were replaced by large scale protein identifications using the correlation of mass-spectrometric measurements with newly available translated gene sequence databases. As rapid identification of proteins can now be achieved for most of the species, current proteomic studies were extended to the systematic determination of diverse properties of proteins including their sequence, quantity, post-translational modifications, protein-protein interactions and structure4. The analysis of the full proteome remains challenging because of its size and unknown complexity. The number of genes in a species is not representative of the number of proteins that is, in comparison, greatly increased as a result of alternative splicing, RNA editing and post-translational modifications for example. Moreover, the range of protein concentrations generally exceeds the dynamic range of any single analytical method. High-throughput proteomic analysis workflows generally consist in three stages: proteins separation coupled with mass spectrometry analysis and finally identification with the help of bioinformatics tools. The two most common approaches in proteomics (Figure 1) rely either on the fractionation at the protein level by one or two dimensional gel electrophoresis (“gel based”) or on the separation at the peptide level (“gel free” or “shotgun proteomics” ) by one or two dimensional liquid chromatographic (2D-LC) methods. Both techniques involve the protein digestion with a site-specific enzymatic protease (i.e trypsin) into peptides prior to mass spectrometry, and are often referred as “bottom up” approaches5. The shotgun strategy has allowed the high-throughput identification of thousands of proteins from highly complex mixtures such as cell lysates6.

7 Figure 1. Proteomics analysis by (a) gel-based and (b) gel-free approaches (Fournier et al, 2007) . In bottom-up proteomics, the analytes introduced into the mass spectrometer are peptides generated by enzymatic cleavage of one or many proteins. A. Gel approach: A protein mixture is separated by two-dimensional electrophoresis, first by isoelectric focusing then by SDS-PAGE. After visualization, protein spots are excised from the gel, digested, and analyzed by mass spectrometry for identification by database searching. B. Gel-free approach (shotgun proteomics): A protein mixture is directly digested by a specific enzyme into a peptide mixture separated by multidimensional separation methods and finally analysed by mass spectrometry. Proteins are identified from the generated mass spectra using database searching. On the contrary, “top-down” proteomics involves the gas-phase fragmentation in the mass analyzer of intact protein ions generated by ESI.

8

2. Sample preparation for mass spectrometry: chromatography

The interpretation of the large number of wide dynamic range peptides generated by the digestion of complex protein mixtures is too challenging for mass spectrometers. Prior to mass spectrometry (MS) analysis, protein and peptide mixtures are currently separated by the powerful gel electrophoresis and chromatography techniques in order to reduce sample complexity and thus simplify MS spectra. Moreover, multidimensional separations exploiting two or more independent physical properties of the peptides or proteins can achieve a higher level of resolution and a higher loading capacity than a single dimension8. Two-dimensional gel electrophoresis (2D gel) is probably the most widely used technique and combines isoelectric focusing in the first dimension and sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE) in the second to provide an efficient separation of protein mixtures. However, this technique has major drawbacks such as bias against membrane proteins9, very acidic or basic and very small or large proteins10. The detection of low abundant proteins is critical due to the limited sensitivity of the available staining method11 and their propensity to be masked by high abundant comigrating proteins. Moreover, good reproducibility between gels and automation for high throughput identification are still not achieved.

1. Principles and instrumentation12

High Performance Liquid Chromatography (HPLC) is a well established molecule separation method (Figure 2) with excellent recovery and resolution that found many applications in the biochemical field. The technique relies on a liquid mobile phase that separates the components of a mixture dissolved in a solvent, by forcing them to flow through a solid chromatographic support, the stationary phase or column, under a high pressure. The relative affinity of the sample components for each phase lead to sample partition: compounds forming strong interactions with the stationary phase will have longer retention and elution times than those having stronger interaction for the mobile phase, leading to sample partition.

Buffers A,B,C,D Degassing:Helium flow control

C D A B Helium Pumps Buffer extraction

Back side

Front side Sample injection Sample leop Waste lead (mode A) Mode A Mode B sample loop disconnected sample loop connected Computer Chromatographic column Remote control & Data processing

UV detector detectors Elution lead BioLC Fractions collection Figure 2 BioLC components. An HPLC system such as the Dionex BioLC, requires a solvent delivery system (pumps), an injector valve, a column and a UV detector. The sample is introduced in the flowing solvent using a manual injection valve with a fixed sample loop: in mode A, the flow from the pump is sent directly into the column and when the position is switched in mode B, the flow from the pump is diverted via the sample loop into the column, thus performing injection. The sample enters the chromatographic column (RP, SCX) and is resolved in its various components according to their relative affinity for the mobile and stationary phases. Analyte elution from the column is continuously monitored by an in line UV detector and detected as a peak in the chromatogram on a computer.

9

The sample is finally eluted from the column by changing the properties of the mobile phase, so that the mobile phase displace the sample ions from the stationary phase. Various stationary phases were developed for protein or peptide analysis that exhibit different surface properties (i.e. size exclusion, normal phase, reversed phase, ion exchange, and affinity) in order to separate molecules according to their particular physical features (size, solubility in water or organic solvent, charge, or ligand affinity). However, multi-step pre-fractionation methods are often composed of a reversed phase (RP) and strong cation exchange (SCX) liquid chromatography (LC).

2. RPLC and SCX characteristics12

RPLC chromatography separates molecules on the basis of their hydrophobicity and can be used for many purposes including complex sample fractionation, concentration, desalting and removal of very hydrophobic components. The stationary phase is made of hydrophobic alkyl chains of variable length (C-4, C-8 and C-18) covalently bound to silica based packings. C-18 column, constructed with an octadecyl ligand, is particularly suited for the separation of peptides or small molecules that necessitate longer chains to be captured. The adsorption reaction is a dynamic equilibrium between free and adsorbed molecules controlled by the mobile phase concentration in organic solvent, usually acetonitrile. Common mobile phases for RP of peptides consist of a high aqueous solvent A with a minimal concentration of organic solvent (0.1-5%) and a high organic solvent B (70-100%), with acid (0.1% TFA usually). The organic solvent allows peptide solubilisation and UV detection (between 200-300nm) whereas acid improves peptide binding by ion-pairing effects and generates tryptic peptides with a net positive charge, particularly useful for coupling with positive ion electrospray (ESI) mass spectrometry. Finally, compounds are eluted by raising the strength of the organic solvent. Ion-Exchange chromatography separates peptides on the basis of their charge. The stationary phase consists in charge-bearing functional groups fixed to a polymer matrix associated with counter ions. The sample is retained by a selective exchange of ions with counter ions. Cation exchange columns are used to extract basic molecules capable to exhibit positive charge in given conditions (cations) and interact with the negatively charged functional moiety of the chromatographic support. At a mobile phase of acidic pH 3, peptide carboxylates (C-terminal groups) are predominantly protonated, allowing basic residues (arginine, lysine, histidine) and N-terminal groups to confer a net positive charge to peptides. The sample is finally eluted by raising the sample pH or increasing salt concentrations for example.

3. Workflow

The digestion of all proteins in a sample followed by two dimensional-liquid chromatography (2D-LC) is a common procedure that usually first separates peptides according to their charge, using a strong cation exchange (SCX) chromatography and then on the basis of their hydrophobicity using a reversed phase column (RP). This procedure can be performed either off-line, using uncoupled SCX-RPLC systems requiring SCX fraction collection and reinjection in RP or online - if the direct elution of the analyte from SCX and RP are coupled. A variation of this approach, called Mudpit (Multi-dimensional protein identification)13 is a fast and automated technique, minimising sample losses. These methods, traditionally coupled to ESI were also developed using MALDI14.

10

3. Mass spectrometry

1. Concepts & Instrumentation

Mass spectrometry (MS) is an analytical tool measuring ion mass-to-charge ratios (m/z), that has found many applications in proteomics through its ability to identify and quantify proteins. The three main elements of a mass spectrometer comprise the “ion source” where analytes are ionised and transferred from their condensed to gas phase, the “analyser” responsible for the ion separation according to their mass-to-charge under high vacuum and finally the “detector” which registers the number of ions at each m/z and creates a signal converted into a mass spectrum. In the 1980’s, the emergence of two types of soft ionization sources - matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI) – able to convert polar biomolecules into gas phase ions without excessive degradation, made possible the analysis of proteins and peptides by mass spectrometry. In MALDI, singly charged ions are produced from desorption and ionisation of dry co-crystallised matrix-analyte mixture, via short laser pulses15. In ESI, multiply charged ions are created from molecules in solution by spraying an electrically generated fine mist of ions into the inlet of a mass analyzer at atmospheric pressure16, facilitating its coupling to liquid based separation tools. MALDI is relatively tolerant to salts and contaminant compared to ESI, but suffers from lack of signal reproducibility. On the other hand, ESI has the advantage to be directly coupled to liquid chromatography (LC) separation, increasing the dynamic range of the analysis and automation. After ionisation, analytes reach the key component mass analyzer, which uses an electric or magnetic field to manipulate ion motion in a mass-dependent manner and direct them to the detector. Four basic mass analyser types are used in proteomics – the time of flight (TOF), ion trap (IT), quadrupole (Q) and Fourier transform ion cyclotron resonance (FTICR) differing in their sensitivity, mass accuracy and resolution performances. Hybrid mass spectrometers such as Q-TOFs were constructed to exhibit unique capabilities by the assembly of different types of mass analyzers. Finally, the combination of an ion source, a mass analyser and a detector is determined by the application, but ESI is often coupled to ion-traps or quadrupoles whereas MALDI is commonly used with TOFs. Single stage mass spectrometers - such as MALDI-TOFs - allow the quick determination of biomolecules mass (MW), referred as MS scan analysis (MS). Additionally, the so called tandem mass spectrometers - such as Q-TOFs - enable the determination of amino acid sequence in conjunction with additional features such as post-translational modifications, through a procedure called tandem mass spectrometry or MS/MS. Briefly, ions of specific m/z ratios selected by a first mass analyser are enclosed in a collision cell or a trap, in which fragmentation is induced by the introduction of an inert gas (i.e. Argon) in a process called Collision Induced Dissociation (CID). Then, the product ions are analysed in the second mass analyser that generates fragment ion spectra containing peptide sequence information of the selected precursor ions. Figure 3 details the features of the two mass spectrometers, Maldi- TOF (MS) and Q-TOF (MS/MS), that were used in Proteome Sciences laboratory for routine analysis.

11

a. b.

A B

Figure 3. Mass spectrometers used in Proteomes Sciences laboratory(Aebersold R. et al., 2003) 17 A. Maldi ionisation (a.) is usually associated to simple TOF instruments. In the Maldi Time-Of-Flight (TOF), ions packets are accelerated to high kinetic energy and are separated along a flight tube by the time they take to traverse it, as a result of their different velocities18. In reflectron mode, ions are turned around in a reflector which compensates for slight differences in kinetic energies. In linear mode, used for the analysis of high molecular masses, ions are not reflected but rather maintained in a linear trajectory until they impact the detector. Moreover, a delayed extraction of ions from the source is used to improve both mass accuracy and resolution19. B. ESI ionisation (b.) is often associated to quadrupoles. In quadrupole mass spectrometers, mass separation is achieved by the application of oscillating electric fields between four rods, that only allows the transmission of ions of a desired m/z. The first quadrupole (Q1) acts as a filter for particular m/z ions selection, the second (q2) is a collision cell where ion fragmentation is induced. The third part separates the ion fragments and consists in a quadrupole (Triple-quadrupole) or a TOF analyser, in the case of the hybrid Q-TOF instrument.

2. Protein identification

Protein identification combined to mass spectrometry employs two approaches that rely on the analysis of peptides rather than full-length proteins: sequence database searching (PMF and PFF) and de novo sequencing. First, peptide mass fingerprinting (PMF)20 is a relatively simple and robust method widely used in proteomics to identify proteins from simple mixtures. Peptide masses generated by sequence-specific endoproteases such as trypsin are compared to theoretical masses obtained via an in silico digestion of all protein entries contained in selected databases (Swiss-prot, TrEMBL) by softwares such as . The method requires the protein fractionation usually by one or two dimensional gel electrophoresis to obtain purified protein mixtures analysable by single stage mass spectrometers (typically Maldi-TOF). However, the specific combination of masses in the spectra or “mass fingerprint” is sometimes not powerful enough to discriminate proteins when searching very large eukaryotic databases. The use of gas phase peptide ion fragmentation spectra (MS/MS) provides more reliable protein identification, based on supplementary sequence information rather than only peptide masses. This technique known as peptide fragmentation fingerprinting (PFF) is gradually replacing PMF to become an accepted standard for protein identification. It relies on the comparison of fragment ions from experimental MS/MS spectra with all the predicted fragments generated for the hypothetical precursors of a given mass in the database, following known fragmentation rules 21,23. Finally, de novo peptide sequencing22 can be applied for proteins not present in the database or to confirm litigious identifications. This technique does not require database

12 search but the labour-intensive manual examination of m/z peak intervals in MS/MS spectra, to reconstitute amino-acids sequences.

3. Database search & interpretation

Once mass spectra have been recorded, data must be searched against an appropriate database with variable or fixed settings (molecular mass range, mass tolerances, amino-acid modifications) using one of the existing algorithms, each characterised by their own strength and weakness. The most widely used program, Mascot23, relies on the probability based scoring algorithm MOWSE (Molecular Weight Search)24, that computes a probability that the observed match between the experimental and calculated data from peptide sequence in the database is a random event. As a result, the software returns a list of proteins, ranked according to a score that reflects the hit statistical significance. High Mascot peptide scores, associated to an elevated number of matched peptides towards the protein and high sequence coverage, are usually recognised as a true identification - without the need for visual inspection of the spectra - whereas low scores indicate a random correlation and may be excluded. The difficulty resides on estimating the confidence of intermediate scores. These automated identification systems have greatly improved the treatment of the huge amounts of data generated by high throughput experiments. However, limitations still have to be overcome such as the lack of fast and reliable accepted standards for determining significant matches (and estimate false positives rates). Moreover, because databank genomic sizes are rapidly growing, more stringent criteria (larger numbers of matched peptides, higher score) for protein identification combined to as smaller error tolerance are necessary. Finally, as identifications are restricted on species for which comprehensive database are available, database must be completed and redundancy (multiple proteins with similar sequences identified using homologous peptides) diminished.

4. Quantitative Proteomic: MS based strategies for relative and absolute quantification

As genome and transcriptome expression profiling do not accurately reflect proteome complexity, the direct measurement of global protein expression levels alone, termed as quantitative proteomics, can provide valuable information on biological processes and contribute to biomarker discovery. Quantitative proteomics usually combines protein separation with mass spectrometry (MS) or tandem mass spectrometry (MS/MS) based identification of a protein species. In large scale proteomic projects, relative amounts - in opposition to absolute amounts - of the identified peptides or proteins are determined, through the comparison of the same samples representing different conditions. Relative quantification was traditionally performed by 2-DE separation experiments followed by staining and image analysis to identify differences in gel patterns through the differential staining comparison25. However, this labour intensive and sequential method is difficult to automate and suffers from a lack of sensitivity. As an alternative, high throughput quantitative proteomics is usually based on shotgun proteomics using multidimensional LC to resolve complex mixtures.

1. Stable isotope labeling

Several quantitative methods were developed that rely on the incorporation of stable isotope labels or tags to proteins or peptides. Pairs of identical sequence peptides differently labeled – in their heavy or light form - can be discriminated in MS or MS/MS mode (for isobaric tags) owing to their mass difference. They are quantified by measuring the ratio of

13 peak intensities for pairs of peptide ions, thought to reflect accurately the abundance ratio for the compared proteins. The spiking of a known quantity of an isotope-labeled peptide chemically synthesized as an internal standard can provide relative26 and absolute quantitation (AQUA)27. The whole proteome labeling in living cells, termed metabolic labeling, can be achieved through the incorporation of metabolic precursors bearing stable isotopes - such as 15N-labeling28 or isotope-labeled amino acid (SILAC)29 - into proteins during cell growth and protein turnover. MS-distinguishable protein populations (heavy or light) are produced depending on the type of media used. The principal advantage of this method is that cells from different states can be mixed at a very early experimental stage (before lysis, digestion and fractionation steps), reducing considerably sample to sample variation and improving quantification accuracy. However, this approach is not suitable for samples such as human tissues and body fluids. Stable isotopes can be incorporated into peptides by an enzyme during protein digestion in the 18 30,31 presence of the labeled H2O water molecule . However, the occasional exchange of a unique - instead of both - carboxyl oxygens induce variations in quantification and complicate data interpretation, as a mass shift of 2Da results in an isotopic overlap of differently labeled peptides. Finally, stable isotope tags can be introduced chemically on target amino acid residues of peptides. A wide panel of techniques were developed (i.e. cysteine sulfhydryl32, N-terminal and Lysine amine or aspartic33/glutamic carboxyl34 directed tagging), sometimes coupling isotope-labeling to an affinity tag to enrich mixtures in quantifiable peptides such as in the ICAT technology (biotin tag)32.

2. Isobaric labelling: Tandem Mass Tag (TMT)

In contrast to traditional isotopic labeling strategies, isobaric tagging of peptides using iTRAQ33 (developed by ABI) or TMT35 (developed by Proteome Sciences) enable the simultaneous identification and quantification of up to respectively 6 samples36 and 8 samples37 by tandem mass spectrometry.

Figure 4. Tandem Mass Tagging (TMT) chemistry (Thanks to P. Giron). The TMT labels contain four parts: 1) a reactive group (N-hydroxysuccinimide, NHS) which covalently labels free amines, namely at the N-terminus and internal lysine side chains of the peptides via an active ester reaction, 2) a mass normaliser group varying in mass through the incorporation of appropriate isotopes (13C and 15N) to keep the tag isobaric, 3) a reporter group exhibiting different mass range, depending on the TMT reagent set, to differentially label peptides and 4) a cleavable linker containing a proline residue to enhance fragmentation and thus enable the release of the TMT reporter fragments in MS/MS. Thus, the labels are termed “isobaric” because the overall mass of the reporter and balance or “mass normaliser” group is maintained constant.

The TMT molecules (Figure 4) consist of a protein reactive group that allows labeling on primary amine groups - at the N-terminus and internal lysine (K) side chains, a reporter group that reports in MS/MS mode the abundance of a given peptide after sample mixing, a cleavable linker enabling the release of the TMT reporter fragment in MS/MS, a mass normalisation group that balance mass differences from individual reporter fragments to keep

14 the overall mass of the tags from a set constant (isobaric). Proteome Sciences has developed several different sets of tags, depending on the experiment to be performed. TMT-zero (TMT0) is a cheap tag used for method development that does not carry any isotopic substitutions. The duplex TMT2, fourplex TMT4 and sixplex TMT6 are sets of respectively two, four, and six isobaric mass labels, with one to five isotopic substitutions per tag, that are used from simple pairwise comparisons to more complex analysis of protein abundance. These multiplex isobaric tags appear at the same mass in an MS scan, but upon CID fragmentation in the mass spectrometer, give rise to low mass MS/MS signature ions between 126 and 131 m/z (TMT) corresponding to the reporter loss. The reporter relative peaks or areas indicate the proportion of labeled peptides of the sample from which they are derived (Figure 5).

Reporter ion region on the MS/MS spectrum:

3:1:3:2:5:7 Mixture 131.1

130.1

126.1 128.1 129.1 127.1 A

Mix MS

Figure 5. Isobaric tagging using the six-plex TMT (6-TMT) reagent. Proteins from six samples are solubilised, reduced, alkylated and digested independently prior to labeling with one of the sixplex 6-TMT reagent (reporter m/z 126 to 131). Samples are then mixed and identical peptides, each labeled with one of the multiplex reagent set appears as a single unresolved precursor ion of identical m/z on the MS spectrum. After CID, reporters (m/z 126-131) are released and can be distinguished on the MS/MS spectrum.

Because TMT tagged peptide pairs are isobaric, they comigrate in chromatographic separations and lead to a more accurate quantification than conventional isotope labeling strategies (i.e ICAT). In addition, the MS signal of each peptide pair is not split into various peaks due to mass shifts, improving the sensitivity in the MS mode. Thus, the new advancements in quantitative proteomics particularly with the isobaric tagging technology, provides more sensitive and high-throughput tools for the comparison of mulitiple disease states proteomes and thus for the discovery of biomarkers.

5. An application: biomarker discovery in the cerebrospinal fluid (CSF)

The diagnosis of most neurodegenerative disorders (Alzheimer’s, Parkinson’s disease, dementia with Lewy Body) is mainly based on the analysis of various clinical criteria, laboratory investigations as well as brain imaging, and remains unsatisfactory due to a considerable lack of sensitivity of these methods. Thus, the discovery of one or a combination

15 of biochemical markers to provide a diagnosis of high sensitivity and specificity and monitor the diseases’ progression at preclinical stages is necessary. As a tissue in direct contact with brain, cerebrospinal fluid (CSF) has the potential to reflect biochemical changes occurring in brain and offer an accessible tool to characterise its physiology and disfunctions. Its role as a connective tissue between the brain and blood circulation could allow the validation in plasma of the initially discovered biomarkers for a more widespread application. In addition, CSF lumbar puncture is a procedure of restricted risk for the patient that can eventually be repeated over time. For these reasons, CSF analysis represents a promising material to discover biomarkers for neurodegenerative diseases and the most straightforward way to assess the chemical and cellular environment of the brain in the living patient. CSF is produced in the choroid plexus - a structure situated in-between ventricules in the brain that circulates around the central nervous system (CNS). This water-like colorless fluid composition is characterised by very few cells (0-4 cells/µl), low protein content (≈400µg/ml) and salts concentrations comparable to blood38. As the largest protein fractions present in CSF comes from the blood, its protein profile is similar to plasma39, with different pattern of relative concentrations. Its high dynamic range of protein concentrations (around 109) creates a major challenge for the identification of low abundant proteins. For example, albumin constitutes more than 50% and immunoglobulins more than 15% of total CSF content39. Because this fluid is closely regulated via balanced secretion and absorption with an average of 125ml to 150 ml for an adult39, amounts that can be preleved are restrained and can be contaminated by blood, which largely bias protein concentrations. The main proteomic approach for biomarker discovery apart protein profiling, is the comparative analysis of diseased versus healthy proteomes of human CSF. The large LC-MS based techniques have considerably increased the number of proteins detected in CSF40 compared to 2-DE experiments that suffers from low dynamic range (104 ) and are restrained to the study of abundant molecules. Recently, the combination of 2D-LC separation with isobaric tagging (iTRAQ) has allowed the detection of 1500 proteins and 300 proteins exhibiting disease-specific quantitative changes and demonstrating the potential of this technology43.

8. Project

The project took place in Proteome Sciences laboratory, London and aimed to characterise human cerebrospinal fluid (CSF) in combination with the home-made new isobaric Tandem Mass Tag (TMT) technology that will be soon commercialised by the company to rival iTRAQ. A shotgun proteomics approach using the developmental tag TMT0 was employed to label undepleted human CSF, followed by an off-line 2D-LC prior to the identification of proteins with tandem mass spectrometry (MS/MS). The sequential steps of the workflow were tested using bovine serum albumin as a standard. Two protocols - in gel and in solution - and two buffers - TEAB and Borate - for the digestion and TMT labeling of proteins were evaluated through a Maldi-TOF analysis of the fragments generated by the standard. Then, the chromatographic steps, consisting in a reversed phase and a strong cation exchange were optimised to maximise sample retention and fractionation. Once the methods were proved to be efficient and after having gained understanding and experience on the various techniques, the optimised protocol was finally applied on CSF samples, allowing the characterisation of 73 proteins of which 66 were labeled. Among them, were found brain specific proteins and possible biomarkers that could be used to diagnose neurodegenerative diseases. These results demonstrated the efficiency of the methods that could without doubt be applied to promising quantitative studies using multiplexing TMT technology

16

C. MATERIAL & METHODS

1. Reagents for proteomics analysis

Water was purified to 18.2 MΩ-cm using a MilliQ system. BioRad dye reagent for protein assay was obtained from BioRad, Acetonitrile HPLC grade (ACN) from Fisher, Trypsin sequencing grade from Roche and the Zip tip µC18 P10 from Millipore. Ammonium acetate (NH4AcO), formic acid, trifluoroacetic acid (TFA), hydrochloric acid (HCl) were from BDH (VWR). Ammonium bicarbonate 99%, Iodoacetamide (IAA), N,N,N',N'- tetramethylethylenediamine (Temed), Laemmli sample buffer, Brilliant Blue Colloidal Concentrate, ortho phosophoric acid (H3PO4), potassium chloride for molecular biology (KCl), Potassium phosphate monobasic (H3PO4), alpha-cyano-4-hydroxycinnamic acid (CHCA), triethylammonium bicarbonate 1M solution (TEAB), sodium tetraborate decahydrate, sodium dodecyl sulfate (SDS), hydroxylamine 50 wt. % in H2O, 99.999 and albumin from bovine serum minimum (BSA) 98% electrophoresis were from Sigma-Aldrich. For electrophoresis, reagents used for the PAGE gel - 10% Bis-tris precast gel 1.5 mm x 15 and NuPAGE® MOPS Running Buffer – were from Invitrogen. The 1 mm cassettes used for casting the stacking gel were from Invitrogen NC2010. For Chromatography, all reagents were HPLC grade, Strong Cation Exchange Cartridge was from the Applied Biosystems ICAT kit 4326747 and RPC18 cartridge reference is unknown.

2. Samples

2.1. Bovine Serum Albumin (BSA) Bovine serum albumin (BSA) was used as a standard to test and compare performances between different protocols. A stock solution of 2.96µg/µl was prepared in ultra pure water, aliquoted and kept at -80°C. Once defrost, aliquots were not reused.

2.2. Human cerebrospinal fluid (CSF)

Human control CSF from two different patients were obtained from Professor Kaj Blennow, University of Göteborg and collected following ethically approved clinical protocols. The 7 and 8 ml samples came from patients undergoing lumbar puncture to exclude infectious disorders of the central nervous system and not presenting any major symptoms of neurological or psychiatric disorders. Sample treatment before shipment was unknown but probably consisted in a rapid centrifugation - to remove insoluble material like cells, prior to freezing. At their arrival, samples were aliquoted on ice into 500 µl fractions and kept at -80°C until further processing. For each experiment two aliquots, one of each patient were pooled together. The needed volume (400 µl, equivalent to 100 µg of protein) was taken out and lyophilized to dryness with a Speedvac. Once defrost, aliquots were not reused.

3. Determination of protein concentrations: protein assay

Total protein contents of CSF samples were determined using a Bio-Rad’s protein assay41 based on the observation that the absorbance maximum for an acidic solution of Coomassie brilliant blue G-250 shifts from 465 to 595 nm when binding to protein occurs. Because low total concentrations of proteins were expected, a microtiter plate assay with a linear range between 0.05 to 0.5 μg/µl was performed. The experiment was conducted on ice to limit evaporation and degradation of samples. BSA standards of 0.05, 0.1, 0.2, 0.3, 0.4 and

17

0.6μg/µl protein content were prepared. Bio-Rad dye stock solution was diluted five times in water and filtered with a Whatman paper. 10μl of each standard and sample were deposited in triplicates into separate microtiter plate wells, mixed with 200μl of Bio-Rad dye diluted reagent and incubated at room temperature for 5 minutes. The absorbance of the colored reaction product, a direct function of protein amount, was measured with a microplate reader set at 595nm. For each BSA standard, the mean absorbance - averaged from three wells measurements - and the corresponding standard deviations (σ2) were calculated on Excel. The standard curve was obtained by plotting on Excel graph the mean absorbance value as the dependent values (x-axis) and the known concentrations of BSA as the independent values (y- axis). Assuming that the overall relationship between concentration and absorbance was best described by a straight line between 0.05 and 0.5μg/ul protein content, a linear regression was calculated with Excel for this set of standards. The estimation of sample protein concentrations were finally calculated by solving this equation for x, with the y value equal to the mean absorbance measured for the sample. Assay was validated if standard deviations (σ2) remained under the 5% threshold (σ2<0.05) and if the correlation coefficient (R2) was higher than 99% (R2> 0.99).

4. Polyacrylamide Gel Electrophoresis (PAGE) and staining

For monodimensional PAGE gel, various amounts of BSA (6x10 µg, 4x1µg and 4x0.1µg) were prepared and volumes were adjusted to 10µl with water. Protein samples were solubilised in a 1:1 ratio with a 2X concentrated Laemmli buffer (4% SDS, 20% glycerol, 10% 2-mercaptoethanol, 0.004% bromphenol blue and 0.125 M Tris HCl)) and heated a few minutes at 80°C before separation on a NuPAGE 10% Bis-tris precast gel 1 mm x 15. The run was performed at 200 V using NuPAGE® MOPS Running Buffer and stopped when the front dye had reached the bottom of the gel.

After electrophoresis, proteins were stained with Brilliant Blue G-Colloidal concentrate following manufacturer’s procedure and bands (2–3 mm diameter) of interest were precisely cut out of gels using a scalpel.

5. Digestion

Digestion protocols for further TMT labeling were performed using two different buffers, 50mM triethylammonium bicarbonate (TEAB) or 100mM Borate Buffer alternately. Borate buffer was prepared with sodium tetraborate decanhydrate, pH was adjusted to 8 using HCl. Unless stated, experiments were done at room temperature.

5.1. In solution digestion

5.1.1. Standard protocol Proteins were solubilised in 50mM ammonium bicarbonate (ambic), reduced with 5 mM DTT at 60°C for 30 min and alkylated with IAA at room temperature in the dark for 30 min.

5.1.2. TMT Protocol Proteins were solubilised and denaturated in 0.1% [w/v] SDS and 50mM TEAB/100mM Borate Buffer at a pH adjusted to 7.5 with HCl. They were then reduced with 1mM TCEP for 30 minutes, alkylated with 7.5mM IAA prepared in acetonitrile for 1 hour in darkness and trypsin digested overnight at 37°C. Trypsin sequence grade was resolubilized in

18

50mM TEAB/100mM Borate Buffer, pH 7.5 and added to the sample to reach a final concentration of 0.033 µg.

5.2. In gel digestion: TMT protocol 66kDa bands corresponding to the BSA protein were excised from PAGE gel. Wells number 1-3-8 (10 µg), 4-11 (1 µg), 6-7 (0.1µg) were digested using Borate buffer and wells number 2-9-10 (10µg), 5-12 (1µg), 13-14 (0.1µg) with TEAB buffer (random distribution). 5.2.1. Reduction and alkylation Gel cubes were first washed with 50mM TEAB/100mM Borate buffer for 5 minutes, dehydrated with acetonitrile and dried completely off in a speedvac’. Gel pieces were then reduced with 1mM TCEP prepared in 50mM TEAB/100mM Borate buffer for 1 hour, dehydrated twice with acetonitrile and dried off in the speedvac. Gel pieces were finally alkylated with 7.5mM IAA prepared in acetonitrile:50mM TEAB/100 mM Borate buffer 1:9 (v/v), for 1 hour in darkness. If destaining was not required, excess of IAA solution was removed, gel pieces were washed with 50mM TEAB/100mM Borate buffer and dehydrated twice with acetonitrile before being completely dried off under vacuum. 5.2.2. Coomassie Blue Destaining For gel pieces not fully destained, additional washes with acetonitrile:50mM TEAB/100mM Borate 1:1 (v/v), shaking at 37°C for 30 minutes, were repeated until complete destaining. Gel pieces were then dehydrated with acetonitrile and completely dried off in a speedvac. 5.2.3. Trypsin digestion A 25 µg trypsin aliquot was resolubilized with 250 µl of 0.1% TFA, 15 µl were taken out and further diluted in 100ul of 50mM TEAB/75mM Borate buffer. 25 µl of this trypsin solution was added to gel pieces and left at 4°C for 30 minutes to fully rehydrate gel pieces. Excess of trypsin was removed and replaced by a minimal volume of 50mM TEAB/75mM Borate buffer to cover gel pieces and keep them aqueous during digestion. Samples were incubated at 37°C for 2 hours and then overnight at room temperature. 5.2.4. Peptide extraction Supernatant was transferred into a fresh tube. Gel pieces were washed with 50mM TEAB/75mM Borate buffer at 37°C for 15 minutes and twice dehydrated with acetonitrile at 37°C for 15 minutes. Liquid phase in-between steps were all collected, combined with initial supernatant and finally lyophilized. Prior to labeling, samples were reconstituted in 50mM TEAB/75mM Borate buffer.

6. Labeling with TMT0 reagent

Digested proteins were incubated for 1 hour with a solution of 15mM TMT0 reagent prepared in acetonitrile, and then treated with 0.25% v/v hydroxylamine for 30 minutes to reverse partial side reactions. Samples were frozen at -80°C until further processing.

7. Chromatography

Digested and labeled samples were purified to remove excess of reagents - TMT0, hydroxylamine, TCEP and particularly SDS, which would negatively affect later analysis by

19 mass spectrometry (MS) - and eventually fractionated to reduce sample complexity, by strong cation exchange (SCX). Prior to SCX, a desalting step using RPC18 cartridges was recommended to reduce the cation concentration in the sample to a minimum, therefore preventing early elution of peptides during the SCX washing step.

7.1. Manual Strong Cation Exchange (SCX) Labeled peptides from in gel digestion were purified as follows. SCX cartridge was conditioned with 2 ml of SCX loading buffer (25% v/v ACN, 0.1% TFA). Samples were resuspended in 500 µl SCX loading buffer, pH adjusted to 3 with HCl, and directly injected onto the cartridge using a 1ml syringe at a rate of one drop per second approximately. Bound peptides were washed with 1 ml SCX loading buffer and slowly eluted with 500 µl of SCX eluting buffer (25% v/v ACN, 400mM NH4AcO). Eluates were manually collected into 1.5 ml eppendorfs and lyophilized to dryness before MALDI-TOF analysis.

7.2. Multidimensional High Performance Liquid Chromatography (HPLC) Samples were injected into a Dionex BioLC HPLC system using a 200µl or 400µl loop. UV detection was monitored at two wavelengths, 214 nm and 280 nm for aromatic peptides. Two chromatographic solid phases - reversed phase C18 (RPC18) and strong cation exchange (SCX) - were used. For RPC18, mobile phase consisted in a high aqueous solvent A (1-5% v/v ACN, 0.1% v/v TFA) for column equilibration, loading and washing and a high organic solvent B (80-95% v/v ACN, 0.1% v/v TFA) for peptide elution. For SCX purification, mobile phase consisted in solvent C (25% v/v ACN, 0.1% v/v TFA) for column equilibration, loading and washing and high volatile salt content solvent D (400mM NH4AcO:ACN 75:25 v/v,) for peptide elution. For SCX fractionation, mobile phase consisted in solvent C (25% v/v ACN, 5mM KH2PO4, pH3 with phosphoric acid) and high salt content solvent D (25% v/v ACN, 5mM KH2PO4, 500mM KCl, pH3 with phosphoric acid). As all solvents contained dissolved oxygen or nitrogen from the air that could form bubbles in the system and affect the chromatographic analysis, they were purged using Helium. Prior to each experiment, RPC18 and SCX cartridges were washed with respectively 100% Buffer B and 100% buffer D or F at 0.5ml/min for 15 minutes - to ensure complete elimination of peptides or molecules possibly bound to columns - and then equilibrated with respectively 100% Buffer A and 100% Buffer C or E at 0.2ml/min for 20 minutes - to allow the best peptide retention conditions. Blank determination was performed prior to each sample run.

7.2.1. Method 1: RPC18-SCX in series In a first approach, RPC18 and SCX chromatography were performed in a single HPLC run. A lyophilized 50 µg BSA standard digest was resuspended in 400µL Buffer A (5% v/v ACN, 0.1% v/v TFA), pH checked to 3, and loaded onto the RPC18 cartridge with 100% A at 0.2 ml/min. During the first 8 min of the method, sample was injected manually in the 200 µl loop (two injections were necessary). Peptides were then washed for 7 minutes with 100% A with a flow rate increased to 0.5ml/min and SCX cartridge was connected “in series” with RPC18. Peptides elution from RPC18 into SCX was done isocratically with 50% Buffer B (95% v/v ACN, 0.1% TFA) for 5 minutes. Peptides bound to the SCX were then washed with 100% Buffer C for 5 minutes and eluted using a 12 step-gradient from 6.25% to 50% D in 30 minutes and 62.5% to 100%D in 20 minutes, spending 5 min per step.

20

7.2.2. Method 2: RPC18 and SCX one after the other In a second approach, used for CSF samples, RPC18 and SCX chromatography were performed in two separates HPLC runs. Labeled peptides were first loaded onto RPC18 column, eluate was then collected and loaded onto SCX column for a purification or fractionation step. Some optimisation work (sample preparation, methods etc…) was achieved using BSA.

7.2.2. a. Reversed Phase Chromatography (RPC18) CSF labeled digest was diluted in 1350 µl of Buffer A (5% v/v ACN, 0.1% v/v TFA) to decrease its Acn content. Sample was loaded in three full injections using the 400 µl loop during the first 10 minutes of the method at 0.2ml/min with 100 % A. Peptides bound to the column were washed for 5 minutes with 100% A at 0.5ml/min. Peptide elution was then performed with a 50% acetonitrile concentration, namely 63% B (80v/v ACN, 0.1% TFA) for 5 minutes at 0.5ml/min. 1ml collected eluate was frozen until next step.

7.2.2. b. Second dimension: Strong Cation Exchange CSF peptide eluates collected from RPC18 (≈ 1.3ml) were injected neat in HPLC- SCX system using a 400 µl loop (3-4 injections). o Sample purification: Sample was loaded during the initial 10 min washing at 0.2 ml/min with 100% C and washed for 5min at 0.5ml/min. Peptides were then eluted with an 8 step gradient from 0 to 400 mM NH4AcO (50mM, 100mM, 150mM, 200mM, 250mM, 300mM, 350 mM and 400mM steps) spending 5 minutes per step at a flow rate of 0.5 ml/min. Fractions were collected at 2.5 min intervals, namely two fractions per step. o Sample fractionation Sample was loaded during the initial 10 - or 15 min, if 4 injections were necessary - washing step at 0.2 ml/min with 100% C, and washed for 5min at 0.5ml/min. Peptides were then eluted with various step gradients from 0 to 500 mM KCl (6, 8, 10 or 12 steps, 5 minutes/step) at a flow rate of 0.5 ml/min, with fractions collected at 2.5 min intervals, namely two fractions per step.

8. Sample preparation for mass spectrometry: C18 zip-tipping

Prior to mass spectrometry analysis, detergent and salt contaminants were removed from peptide samples using micro scale C18 sample preparation columns (ZipTipµC18). Peptides were solubilized with at least 10µl of 0.1% TFA. ZipTipµC18 were attached onto a 10µl pipettor, hydrated three times with 10µl of an 80% ACN 0.1% TFA v/v solution and equilibrated four times with 10µl of 0.1% TFA. Peptides were then loaded by aspirating and dispensing the resuspended digests through the ZipTips at least eight times to ensure maximal binding on the column. For the MALDI-MS analysis, ZipTips were washed by aspirating 10 µl of 0.1% TFA and dispensing it to waste six times. The purified digests were eluted with 4µl of 80% ACN and directly loaded onto MALDI target plates.

21

For the LC MS/MS analysis, ZipTips were washed by aspirating 0.1% formic acid and dispensing it to waste six times. The purified digests were eluted with 4µl of 80% ACN and lyophilized to dryness before being resuspended in an appropriate buffer for LC-MS/MS. MALDI-MS was also performed following this protocol when both LC MS/MS and MALDI- MS analysis were required.

8. Mass spectrometry

9.1. Protein identification by Peptide Mass Fingerprinting (PMF) 9.1.1. Sample preparation BSA digests were ziptiped before MALDI analysis. For the dried droplet technique, two matrix solutions were prepared by dissolving alpha-cyano-4-hydroxycinnamic acid (CHCA) matrix (4mg/ml) in 0.1% TFA 50% acetonitrile (v/v) or dihydrobenzoic acid (DHB) matrix (4mg/ml) in ultrapure water for acquisition in reflectron mode. Equal volumes of matrix and sample solutions (0.5 μl) were spotted on a 2 × 96-well MALDI target plate (Perseptive Biosystems) and allowed to air dry for co-crystallisation. For the thin layer technique, CHCA matrix was dissolved in acetone (4mg/ml), 0.5 µl of this solution was deposited on the MALDI target to form the first layer and air dried. Then, 0.5 µl of the sample was applied on the top to form the second layer.

9.1.2. Data acquisition: MALDI-TOF Voyager DE Pro

The samples were analyzed with a matrix-assisted laser desorption ionization/time of flight mass spectrometer Voyager-DE Pro (Applied Biosystems) equipped with a 337-nm nitrogen laser. Data acquisition and analyses were performed using the Voyager version 5.10 and the Data Explorer software supplied by the manufacturer. Tryptic peptides’analysis was performed in the “positive-ion” reflectron mode with delayed-extraction (DE). The accelerating voltage was set at 20 kV, the grid voltage at 76%, the guide wire at 0.004%, the delayed extraction time at 210 nsec, and the low-mass gate between 500 and 800 Da. All spectra were externally mass-calibrated with a mixture consisting in des-Arg-Bradykinin (m/z 921.505), Glu-Fibrinopeptide (m/z 1570.67), ACTH clip 1-17 (m/z 2093.086), ACTH clip 18-39 (m/z 2465.198), ACTH clip 7-38 (m/z 3660.19). Spectra were obtained by summation of at least 300 consecutive laser shots. Raw mass spectra were processed using the mass spectrometer software Data Explorer (Applied Biosystems). Monoisotopic masses were extracted from the spectra using the deisotoping algorithm and peak lists obtained were submitted to Matrix science Mascot search engine for database searching and protein identification.

9.1.3. Data analysis: Protein Identification by Peptide Mass Fingerprinting The matching of the experimental tryptic peptide masses with the in silico derived tryptic peptide masses from the database was performed with Mascot using the Peptide Mass Fingerprinting (PMF) tool. Swiss-Prot mammalian database was searched within a mass tolerance of ±100 ppm and with two missed cleavages allowed. All modifications, namely alkylation of the cysteine residues (carbamidomethylation), oxidation of methionine, TMT0 (K) and TMT0 (N-term) for labeled peptides, were considered as variable. Real peptide matches were checked on experimental spectra. When the isotopic resolution and signal to noise ratio were too low, peptides were eliminated. Protein

22 identifications were then evaluated with the Mowse probability score with regards to the number of real matched tryptic peptides as well as the percentage of sequence coverage.

9.2. Protein identification by Tandem Mass Spectrometry (MS/MS)

The first five SCX fractions from the purification experiments and 11 fractions of the fractionation experiments using CSF sample were analysed by MS/MS.

9.2.1. Sample preparation SCX fractions from digested and labeled samples were eventually zip-tipped. They were then lyophilized and resuspended in 23 µl of buffer A (0.1% formic acid).

9.2.2. Data acquisition: ESI Q-TOF Micro An automated nanoflow liquid chromatograph/tandem mass spectrometric analysis was performed on a quadrupole time of flight (Q-Tof Micro) mass spectrometer. Calibration of the instrument was done using a glufibrinogen peptide in the MS/MS mode. The nanoflow-HPLC system (UltiMate–Switchos2–Famos; LC Packings) was used for chromatographic separation of the peptide mixture prior to MS detection. 21 µl of sample was loaded on a guard-column and desalted with a wash of solvent A. The retained peptides were then eluted and progressively separated on an C18 capillary analytical column - 15 cm length, 75 μm inner diameter packed with 3 μm C18 100a - using a 120 min linear gradient from solvent A to 95% solvent B (80% v/v acetonitrile in 0.1% v/v formic acid) at a flow rate of 200 nL/min. The mass spectrometer was operated in the positive ion mode and using data- dependent analysis under the full software control of MassLynx. A full scan mass spectrum (MS-TOF) was recorded for 1 second and the three most intense peaks (more abundant ions) were selected for fragmentation by collision-induced dissociation (Argon, 30eV) in the second quadrupole and MS/MS spectra acquired. A dynamic exclusion was set at 60s, meaning that an ion could not be selected for fragmentation a second time before a period of 60 seconds.

9.2.3. Data analysis After data acquisition, MS/MS spectra acquired for each of the precursors were processed using MassLynx and exported in a Micromass pkl format as a single Mascot- searchable peak list for automated peptide identification using the in-house Mascot server. The search was conducted against Swiss-Prot mammalian database for BSA samples and against SwissProt Human database for CSF samples with the following parameters: a maximum of two missed cleavage sites were allowed, peptide ion mass tolerance was set to ±1.2 Da and fragment ion mass tolerance to ±0.6 Da, carbamidomethylation (alkylation with iodoacetamide) of cysteines, oxidation of methionines and TMT0 (K) and TMT0 (N-term) for labeled peptides, were all specified as a variable modifications. Bold red peptides above the identity score - namely above the 5% threshold false positive probability - were accepted to be true matches.

23

D. RESULTS & DISCUSSION

1. Shotgun proteomics: sample preparation for mass spectrometry

1.1. Evaluation of digestion and TMT labeling protocols using BSA

Protocols for the digestion and TMT labeling of proteins were evaluated through a Maldi-TOF analysis of the fragments generated by BSA. The presence of TMT labeled tryptic peptides allowing a significant BSA identification by a Mascot database search - according to the Mascot probability Mowse score – was used to confirm the effectiveness of the methods.

a) “In solution” and “In gel” digestion

Two protocols differing in their digestion mode - “in gel or in solution” - were tested for their ability to produce TMT labeled tryptic peptides. First, an “in gel digestion” protocol, designed for the direct digestion of proteins separated in Coomassie stained monodimensional polyacrylamide gels (Figure 6) followed by the resulting peptides extraction and labeling, was performed. An average of 38 matched peptides associated to 50.8% sequence coverage - with 93% from TMT0 labeled peptides (Table 1) - was achieved, leading to highly significant protein identification. Moreover, peptide recovery seemed to be good despite the number of experimental steps (peptide extraction, SCX, zip tip), announcing that confident identification and thus quantitation could be achieved using this method.

MW 1 2 3 4 5 6 7 8 9 10 11 1213 14

66K

1 ug 1 ug 0.1 ug 0.1 ug 10 ug 10 ug

Digestion in Borate Digestion in TEAB

Figure 6. Gel electrophoresis of variable amounts of BSA. 10 µg of BSA were loaded on lanes 1, 2, 3, 8, 9, 10, 1 µg on lanes 4, 5, 11, 12 and 0.1 µg on lanes 6, 7, 13, 14. 66kDa bands were excised and trypsin digested with Borate buffer (lanes 1, 4 and 7) or TEAB buffer (lanes 2, 5 and 13). After peptide extraction and TMT0 labeling, BSA samples were purified using an SCX column with the volatile buffer ammonium acetate and desalted prior to Maldi-TOF MS analysis. Secondly, an “in solution” digestion protocol, designed for the straight solubilisation, digestion and labeling of liquid protein mixtures was shown to be efficient prior to its application on CSF sample. The presence of BSA TMT0 labeled peptides was confirmed by Maldi-TOF spectra of SCX fractions eluted at different salt concentrations (data not shown). Using both protocols, cysteine residues were almost always identified in their

24 carbamidomethylated form and the number of trypsin missed cleavages was low and rarely above 1. This suggests that TCEP reduction, IAA alkylation and trypsin digestion were highly effective. These results demonstrated the efficiency and applicability of the “in gel” and “in solution” digestion protocols, as well as their compatibility with TMT labeling. The direct comparison between both protocols using the same conditions would be interesting in order to evaluate respective peptide extraction and digestion efficiency as well as highlight possible differences in protein coverage, peptide representation and TMT labeling. For example, high molecular mass peptides may be underrepresented using in gel digestion because more difficult to extract. Both protocols could be envisaged for shotgun proteomics studies. Using gel based approach, MS protein identification offers advantages such as the removal of low molecular weight impurities (detergent, salts) altering MS analysis and the separation of protein mixtures (according to their molecular weight, isoelectric point) in individual entities or gel bands of reduced complexity that increase the dynamic range of analysis. On the other hand, major drawbacks such as poor gel to gel reproducibility, bands/spots excision or proteins/peptides extraction variability, membrane proteins or very basic/acidic protein misrepresentation (for 2D gels) depreciate its quantitative analysis by introducing additional differences in between samples. Therefore, traditional shotgun proteomics approach involving a tagging at the peptide level after numerous experimental steps is not truly conceivable. Intact protein labeling, thus early sample pooling, followed by gel electrophoresis and peptide MS/MS analyses could however be an alternative: differentially labeled protein co-migration in a unique gel would reduce sample to sample variability, allowing the simultaneous analysis of protein mixtures from 1D gels (pull down experiments, stacking gels) and to 2D gels. This promising procedure was recently demonstrated to be relevant using isobaric tag iTRAQ42. Its application using the similar chemistry TMT application will be the next challenge. On the other hand, in gel-free approach, proteins are kept in solution all along the different analytical steps from digestion to MS/MS analysis throughout peptide tagging and fractionation, which is particularly fitted for soluble samples such as body fluids (plasma, CSF). Sample to sample variability is minimised by early stage pooling, prior to any fractionation. This straightforward and well described proteomic method combined to the innovative TMT isobaric tagging was considered as the method of choice for further CSF analysis.

b) TEAB and Borate buffers Two different buffers - TEAB and Borate, respectively distinctive of iTRAQ and TMT labeling protocols, were compared. As illustrated on Table 1, the number of matched peptides was on average higher for samples digested using Borate buffer (41 for Borate versus 35 for TEAB), particularly for a lower 1 µg sample amount, but the protein sequence coverage was not affected (51% for both methods) because of peptide sequence overlaps. Moreover, the percentage of unlabeled peptides appeared to be lower using Borate buffer (5% versus 9% TEAB) even if these numbers are to be taken carefully because unlabeled and labeled peptides were not determined using MS/MS and could therefore correspond to false negatives and false positives, respectively. According to this unique “in gel” experiment, Borate Buffer provided better data for quantitative analysis. On the opposite, data obtained from HPLC-RPC18 UV profiles’ comparisons of “in solution” digested samples suggested that TEAB buffer could enhance peptides analysis (cf. 2.1c). Nevertheless, these results, although statistically poor determined that both buffer were suitable for TMT analysis.

25

Table 1 Maldi-TOF analysis data from an “in gel” digestion of BSA followed by TMT0 labeling using different buffers (Borate, TEAB) or protein amounts (10µg, 1µg)

BSA Nb of TMT Sequence coverage Nb of matched Sequence unlabeled error Buffer amount labeled by TMT labeled peptides coverage peptides (ppm) (µg) peptides peptides TEAB 10 38 53% 35 47% 9% 38 Borate 10 41 51% 40 51% 5% 58 TEAB 1 31 48% 28 44% 9% 24 Borate 1 40 51% 37 50% 5% 23 Mean 37.5 50.8% 35 48.0% 7% 35.75

Table 2. Maldi-TOF analysis average data from “in gel” digestions of BSA followed by TMT0 labeling using different buffers (Borate, TEAB) or protein amounts (10µg, 1µg) Means Buffers BSA (µg) Nb of Sequence Nb of TMT Sequence Unlabeled Mean matched coverage labeled coverage by peptides error peptides peptides TMT labeled (ppm) peptides Total TEAB+Borate 10+1µg 37.5 50.8% 35 48.0% 7% 35.75 Buffer TEAB 10+1µg 34.5 50.5% 31.5 45.5% 9% 35 specific Borate 10+1µg 40.5 51.0% 38.5 50.5% 5% 23.5 Amount Borate+TEAB 10 µg 39.5 52.0% 37.5 49.0% 7% 48 specific Borate+TEAB 1 µg 35.5 49.5% 32.5 47.0% 7% 23.5

c) TMT0 Labeling Evidence of TMT labeling was given by the identification of BSA TMT0 labeled peptides by a PMF Mascot search with TMT modification set as variable, and the presence of the TMT0 reporter ion (126 m/z) in the low mass region of MS/MS spectra (Figure 14). The completeness of the peptide tagging reaction was estimated using MS data: within four experiments, 7 % of the peptides remained unlabeled (37.5 peptides of which 35 labeled) (Table 2). In addition, a few peptides (not estimated) were only partially labeled with the N- terminus or the Lysine tag missing. Labeling was demonstrated to be efficient although not total and thus able to provide the basis for an accurate and reliable determination of protein concentration ratios by MS/MS.

1.2. Evaluation of chromatographic methods on HPLC using a BSA standard

Chromatographic methods described in Table 3 were independently tested using BSA standard digests on the BioLC HPLC instrument.

26

Chromatographic Function Buffer RP-C18 SCX method steps 1. Column Column brought to a starting state: possible Starting buffer 100% A 100% C conditioning interferences on the column are washed out and 0.2ml/min 0.2ml/min packing functional groups are made available for the binding of the analytes 2. Sample loading Sample is injected and adsorbed to the column, Starting buffer 100% A 100% C displacing counter-ions in SCX 0.5ml/min 0.5ml/min 3. Washing Unbound substances are washed out from the Starting buffer 100% A 100% C column. RPC18: salts, unreacted TMT reagents 0.5ml/min 0.5ml/min hydroxylamine, TCEP? SCX: SDS 4. Elution Analyte is removed from the column by changing Elution 63% B Salts steps to elution conditions unfavourable for hydrophobic buffer: ↑ (50% acn) (25mM to (RPC18) or ionic bonding (SCX) elution 0.5ml/min 500mM) strength 0.5ml/min 5. Column clean up Molecules not eluted under the previous Maximal 100%B 100% D experimental conditions are removed elution 0.5ml/min 0.5ml/min strength 6. Reequilibration Column is brought back to the starting state for Starting buffer 100% A 100%C next usage 0.2ml/min 0.2ml/min

Buffer A : 1% Acn 0.1%TFA, B: 80% Acn, 0.1% TFA C: 25% Acn, 0.1% TFA (for purification by NH4AcO) or 25% Acn, 5mMKH2PO4, pH3 (KCl) and D: 25% Acn, 400mM NH4AcO or 25% Acn, 5mM KH2PO4, 500mM KCl Table 3. Description of chromatographic methods. Prior to any run, the column was cleaned up with a buffer of maximal eluting strength and equilibrated with starting buffer for 30 min. Sample was then injected at a low flowrate (0.2ml/min) in multiple injections and washed at a flowrate increased at 0.5ml/min for 5 min. For RPC18, peptides were eluted with 50% organic solvent during 5 min or for SCX with sequential 5 min salts steps using NH4AcO or KCl and collected. Then, the column was regenerated.

RPC18 method (Figure 7A) and column were shown to be appropriate for peptides purification. After washing, peptides were eluted with a relatively high organic solvent concentration to avoid very large (hydrophobic) peptides’ losses. 50% acetonitrile appeared to be enough to ensure a complete peptide’s recovery (Figure 10) as no peptides were recovered in the column clean up fraction (80% acetonitrile). SCX method (Figure 7B) and column were shown to fractionate peptides sample efficiently. Eluates collected from SCX were analysed by Maldi-TOF. The presence of a peptide profile for each salt concentration indicated that fractionation was efficient (data not shown). To prevent analyte losses in SCX column, ionic strength and pH had to be carefully controled. Sample and buffer pH were decreased to ensure the total ionisation of the analytes. Ionic strength was reduced - in order to limit competition between analyte and other cations of the sample for ion exchange sites – by the achievement of a RP-C18 prior to SCX.

500mM 300mM 200mM 150mM 100mM 50mM reequilibra Wash clean- Sample tion loading up

Figure 7. Chromatographic elution profiles at UV 214nm and 280 nm from a RPC18 (A) and a SCX (B) of a BSA sample.

27

1.3. Optimization of the purification and fractionation chromatography for TMT0 labeled peptides using BSA

The first aim of the optimisation work was to find “optimal” conditions for peptide retention in the RP-C18 and SCX columns in order to minimise sample losses during the loading and washing steps. This was done by fine tuning different parameters such as column configuration, sample preparation, mobile phase compositions or chromatographic methods (flow rate, step times etc…). The second aim was to achieve a good peptide fractionation in order to reduce efficiently sample complexity before LC/MS/MS analysis. This was achieved by determining salt concentrations initiating peptide elution and consequently choosing the appropriate salt cut-offs. Methods were evaluated on the basis of their resulting chromatographic UV trace profile monitored at 214 and 280nm using either unlabeled or TMT0 labeled BSA digests as tagging could influence sample behaviour.

Optimizing peptide retention

a) Column configuration In a first attempt, RPC18 and SCX chromatography were performed in a single HPLC run, the two cartridges linked by peek tubing. A blank determination with Buffer A injections was recorded but data not shown. The resulting chromatograms monitored at 214nm and 280nm (Figure 8) indicated probable peptide losses during RP-C18 elution as an unexpected wide peak was observed from 17 to 23 min and to a lesser extent on the blank (data not shown). If the transition to a high organic solvent could affect UV trace as demonstrated by the blank, the peak high intensity and detection at 280nm could only be fully elucidated by the presence of peptides in the flowthrough.

Flowrate A B increase : 0.2ml/min to 0.5ml/min

FlowratFlowrate e increasincrease : e 200m200mM 0.2ml/min to M to 0.5ml/mi0.5ml/min n 175mM175m M

150mM150m SSCX S C Start of Start of StarStart ofo StarStart ofo M C X X t f t f RPC18 SCX RPC18 SCX 125mM125m RPC18 RRPC18 R

RPC18 M elutionelutio elution

RPC18 R R elutioelution elutioelution P P P n P n n 100mM100m C C C C M 18 18 18 18 75mM75m M SCX SCX connection injectioninjections connectionconnectio 50m50mM injections s n M

50mM 100mM 150mM 200mM Sample Sampl Wash Elution Wash 25mM 75mM 175mM Sample 25m25mM 125mM e ElutionElutio Loading LoadingLoadin WashWas WashWas M n g h h

Figure 8. UV absorbance trace at 214nm (A) and 280nm (B) from a RPC18-SCX of a BSA digest. A 50 µg BSA digest was lyophilized and resuspended in 500 µl Buffer A (5% Acn v/v, 0.1%TFA v/v). Peptides were loaded in three injections (200 µl loop) onto the RPC18 cartridge, and during the washing step, SCX column was connected. Then, Buffer B (95% Acn v/v, 0.1%TFA v/v) was set to 53% to allow peptides elution directly into the SCX column. A step gradient of salts (NH4AcO) from 25mM to 500mM was finally applied (25mM to 200mM shown) and peptides sequentially eluted from SCX. The gradient delay was around two minutes. The intense peak observed during the flow rate change is due to a weak cell inlet.

28

The elution of a sample part from RPC18 straight through the SCX was probably caused by the column configuration “in series”, that could hinder peptide adsorption to SCX phase variously. First, SCX column might no longer be conditioned when peptides elute from RPC18, as RPC18 buffer A and B have run across it from minutes 12 to 25. Secondly, RPC18 elution buffer and thus sample may not be at a pH 3 or lower as recommended to prevent small and less charged peptide losses from SCX. Finally, the high organic solvent composition of the RPC18 elution buffer (50% acetonitrile) could to some extent alter some peptides’ affinity to the column by modifying their pKa notably. In order to maximize effective peptide retention on the SCX, RPC18 and SCX chromatography were performed in two separate runs. Peptides were first loaded onto RPC18, and the collected eluate was subsequently loaded on the SCX column with the pH checked to be lower than 3.

b) Mobile phases (buffers)

For RPC18, the recommended composition of buffer A (5% Acn 0.1% TFA v/v) was shown to cause peptide losses at sample injection (Figure 8). Indeed, as RP-C18 column equilibration was performed before any injection, inappropriate mobile phase composition at loading time and thus sample preparation (sample was reconstituted in A) were the most likely explanation of those losses. Therefore, the organic solvent concentration was reduced to 1% Acn 0.1%TFA v/v to maximize peptide binding and successfully tested (Figure 10). Acetonitrile concentration was not reduced to 0% because in such high aqueous environment, the very hydrophobic C18 chains could get tangled and become ineffective to capture analytes. For SCX, buffer compositions were unchanged.

c) Sample processing

Figure 9 depicts the different sample preparation methods tested. Peptide losses at injection were difficult to assess because of various reagents’ elution (from digestion and labeling) interfering with UV absorbance or MS analysis, and were inferred by peptide recovery at elution time. Thus, losses were estimated by the comparison of the major elution peak areas and intensities, proportional to peptide’s amounts.

WORKFLOW SAMPLE Processing

In solution digestion • Borate vs TEAB buffer?

TMT labeling • Prior to RPC18: - Dilution in buffer A vs lyophilisation and resuspension in A? Purification & Fractionation -Sample: acidic vs basic?

RPC18 • Prior to SCX: Direct loading of RPC18 eluate vs SCX - lyophilisation and resuspension in Buffer C?

MS/MS analysis

Figure 9. Optimizing sample processing. Various TMT0 labeled sample preparation methods for chromatographic purification and fractionation were tested. As no major differences were emphasised, the most straightforward modes were selected (in green). Prior to RPC18, sample was simply resolubilised in Buffer A (1%Acn, 0.1%TFA), then the collected eluate was directly injected into the SCX after pH was checked to be under 3.

29

Samples “in solution” digested using either Borate or TEAB buffers were compared. On three replicates, peak’s intensity and area of samples digested in TEAB were always slightly higher. It was impossible to know if this increase in peptides’ detection was due to a higher efficient digestion or better peptide retention in RPC18 using TEAB buffer, as Maldi-TOF MS or ESI LC-MS/MS analysis were hindered by the presence of interfering compounds (salts, SDS …) that remained despite sample zip-tipping. However, these results could indicate that TEAB buffer was more suitable for digestion following TMT protocol - which did not exactly correlate the data obtained from “in gel” digestion (cf. 1.1b), or was specifically more efficient for “ in-solution” digestion. For RPC18, samples were recommended to be fully solubilised as well as to satisfy two criteria known to maximise peptide retention: total acetonitrile content lower than 5% and an acidic pH (pH<3). Acetonitrile content was too elevated (28%) after TMT labeling protocol, sample was therefore either lyophilised and reconstituted in Buffer A (450 µl, 1 injection) or simply diluted in Buffer A (1350µl, 3 injections) to reach respectively 1% and 4%. The comparison of UV traces from lyophilised and diluted sample did not reveal any major differences. Thus, eventual losses associated to lyophilisation were thought to be equivalent to those associated to multiple injections and higher acetonitrile content (4%). Sample was then decided not to be lyophilised to minimise the number of experimental steps as well as to save time, allowing both RPC18 and SCX to be performed in one day. Sample acidification could not be achieved by dilution in Buffer A alone - probably due to the presence of very basic hydroxylamine and unreacted TMT reagents neutralising the acid - but required the addition of neat TFA. The comparison of UV traces from unacidified (pH=6) and acidified samples (pH<3 with TFA) failed to highlight any pH dependent improvement in peptide retention, as the major elution peak’s intensities or areas were similar. In addition, sample acidification was found to be a delicate process associated to occasional peptide precipitation and difficult resolubilisation, leading to sample losses. As sample pH was not shown to interfere dramatically with RPC18 analysis, acidification was discontinued. For SCX, samples were required to be at a pH lower than 3, to ensure the total ionisation of the analytes. Eluates from RPC18, both lyophilised and resuspended in C (25% Acn 0.1% TFA) or directly injected (3 injections), were already at a pH around 1.5. According to a single experiment, fewer losses were observed with the lyophilised eluate than its counterpart at injection time. But as SCX elution UV profile could not be interpreted (no peptides detected at various salt concentration in any samples) it could not be determined if losses were really diminished or if “less losses” occurred because most sample had already been lost during lyophilisation process (sticking to eppendorf or others). Sample was chosen to be injected directly in the SCX column to prevent eventual losses associated to a additional lyophilisation step and reduce experimental time to a minimum allowing the RPC18 and SCX to be performed in one day. Instead of being discriminated, the several preparation methods tested were all shown to be applicable on the basis of spectrophotometric estimations of peptide amounts limited by interfering compounds.

d) Method : TMT protocol effects on chromatography . Sample loading time was augmented to allow multiple injections with a reduced flow rate (0.2µl/min) to promote maximal peptide binding to the column. During sample isocratic wash, UV line absorbance (280nm and 214nm) was modified after each TMT labeled sample injection, compared to unlabeled. This was probably mainly due to elution of unvolatile salts (Borate) contained in TMT digestion buffers in RPC18 (Figure 10) and anionic SDS and high acetonitrile content (50%, cf above) in SCX (data not shown).

30

Sample injections A B

Sample injection 214 nm 214 nm 280 nm 280 nm peptides Column equili- Sample loading wash peptides Column equili- Sample loading wash elution clean up bration elution clean up bration

Figure 10. Comparison of UV absorbance traces at 214 nm and 280 nm from RPC18 of BSA unlabeled (A) and TMT0 labeled (B) BSA digests. A: BSA was digested according to the in-solution protocol in volatile Ambic, lyophilised and injected in the HPLC RPC18 system. B. BSA was digested according to the in-solution TMT protocol in unvolatile Borate, lyophilised and injected in the HPLC RPC18 system. Compared to A, intense signals are observed at each sample injection that could be related to elution of unvolatile salts (Borate) as well as other digestion reagents. Thus, excess of reagents were also efficiently removed by an isocratic washing prior peptides elution. Secondly, eluting peak’s shape is different, with the presence of a small peak preceding the main eluting peak at the retention time of the BSA unlabeled main peak (A).

Some TMT protocol reagents were hydrophilic (salts, TCEP, hydroxylamine) and directly eliminated during RPC18 whereas others were sufficiently hydrophobic to co-elute with peptides in the RPC18 (SDS, perhaps unreacted TMT0 reagents) and were eluted at SCX injection or early salts concentration as they were poorly or not positively charged. These data suggested that reversed phase and SCX chromatography were competent for eliminating MS interfering compounds but that UV spectrophotometric detection at 280/214nm was not completely specific to peptides. To ascertain particular effects of digestion buffers and reagents on UV absorbance, these could have been separately injected and monitored and wavelength could also have been adjusted. Moreover, elution peak was less resolved using labeled TMT. This was probably due to the presence of SDS as a digestion performed without the detergent removed the small peak, as for unlabeled. Finally, optimisation work to determine salt range concentration causing TMT labeled peptide elution was complicated by the fact that chromatograms were hardly interpretable. Indeed, labeled peptides were barely detected on UV trace, probably because of the modification of the absorbance of different digestion/labeling reagents. SCX using six, eight, ten and twelve variable salt cut-offs were tested. No one was shown to be better than the other as most of the TMT0 labeled peptides seemed to elute in the first fraction and then no peptides could be detected

To conclude, the development of the best methods was a very difficult and never- ending task as selecting the appropriate conditions took considerable knowledge, experience and time. Therefore, this optimizing work was not intending to establish the perfect protocol, but to find a method working reasonably efficiently as to familiarise and gain understanding about chromatography.

31

2. Mass Spectrometry

2.1. MS and MS/MS data as a tool for the identification of proteins

2.1.1 Maldi-TOF MS analysis

2.1.1.1. Sample preparation (data not shown) Different matrix and sample deposition methods were tested on a BSA digest for their effects on protein identification. DHB and CHCA cold matrix exhibited different crystallisation features (crystal shapes) and varied in their peptide representation (peptide sequences, intensities). However, both matrix had comparable ionisation efficiency - evaluated on the number of BSA peptide matched – were considered suitable for peptide study and used alternately. “Dried droplet” and “thin layer” methods for sample preparation were shown to be efficient, allowing good sample crystallisation and ionisation. Dried droplet was however preferred for its predominant employ in the laboratory, related to its reliability and usage simplicity. Observation was made that salt and detergent contaminated samples as well as coloured micro PCR eppendorf usage prevented matrix-analyte to dry, crystallise and/or ionise correctly. Samples were therefore ziptiped or/and SCX purified to remove interfering salts and/or detergent. Coloured vials were avoided as they probably contained residual quantities of dye suppressing analyte’s signal when present in low quantities. Thus, sample preparation appeared to be an important and delicate step.

2.1.1.2 Peptide representation As a result of the analysis of a BSA tryptic digest with a Maldi-TOF mass spectrometer, a mass spectrum or “mass fingerprint” (Figure 11) representing signals from the unique combination of peptide masses produced by BSA digestion was obtained. However, only a subset of the potential tryptic peptides from the protein could be observed. Indeed, many parameters influence peptide representation and protein sequence coverage (i.e. protein amount, site specific digestion efficiency, ionisation efficiency among peptides, competition for available protons) leading to signal suppression of less abundant and hydrophobic/basic peptides essentially. Sensitivity and performance limitations of the mass spectrometer (ex. peptide masses outside the detection window of the instrument in the reflector mode, from 400 to 4000 m/z not analysed) can also be a limitating factor. Although major peaks of the spectrum could be attributed to BSA peptides, some less intense were not matched by the search engine. Non specific cleavages, missed cleavages, post-translational or chemical modifications could provide some explanations. Unless programs do exist for finding these modifications (i.e FindMod, etc...), sequence determination by MS/MS remains the only way to certify them.

32

Isotopic cluster of TMT0 labeled peptide KVPQVSTPTLVEVSR (m/z 2088.35) Monoistopic peak resolution: 7555 Intensity %

Mass (m/z) Figure 11 Maldi-TOF mass spectrum of a BSA digest TMT0 labeled (average of 600 scans). The mass spectrum plots peptides ions intensity against their mass over charge ratio (m/z). As shown in zoom, isotopic resolution is reached. Thus, peptide charge state can be inferred by the measurement of the m/z distance between adjacent isotopic peaks (1Da=1+; 0.5Da=2+; 0.33Da=3+…). In MALDI positive ionisation mode, peptide peaks + principally appear as singly charged ions [M+H] .

2.1.1.3 Applications a) Protein identification by PMF Monoisotopic masses of the 70 most intense peaks of the mass spectrum (Figure 11) were submitted to Mascot engine for a search against Swissprot mammalian database. BSA was identified with a high statistical confidence, namely a score of 105 associated to a very low expected value (3.4 x 10-7) that this event has occurred by chance (Figure 12). 49 monoisotopic peptide ions obtained from Maldi-TOF were matched to theoretical tryptic digestions, covering 58% of BSA protein sequence. Manual inspection of the experimental spectra allowed false positive elimination: peptide matches not evidently attributed to monoisotopic peaks or with too low signal/noise were removed. Thus, 41 true peptides remained, representing 51% of the sequence. An additional protein hit - namely serum albumin precursor from sheep - was reported with 34 peptides matched and 39% protein coverage, but did not reach the significance level with a number of matched peptides inferior to BSA. Sometimes, discrimination of homologous proteins in the same species or between species (if unknown) sharing large amino acid sequence similarities can be problematic by PMF. Indeed, maximum sequence coverage must be achieved to identify unique peptide masses - specific to the searched protein. For many reasons, this is not always possible (mixture of proteins, low protein amounts, etc) and sequence determination by MS/MS only can provide exact identification.

33

A B

Figure 12. Mascot search result presentation: significant identification of BSA according to the probability Mowse Score by mass spectrometry. A. Graphical output from Mascot indicates the MOWSE scores for top protein hits. Proteins with scores greater than 53 such as BSA here, are considered as the most significant matches expected to occur by chance with a frequency of less than 5% (p<0.05) . B. BSA protein view gives additional information including protein isoelectric point (pI), list of matched peptides (not shown), number of matched peptides, and protein sequence with assigned peptides shown in red.

Error on mass - between 50 to 100 ppm - was conform to the one expected using an external calibration of the instrument but could be improved by additional internal calibration with trypsin autolysis products or peptide spiking. In conclusion, Maldi-Tof mass spectrometry allows protein identification by the so called peptide mass fingerprinting technique through its ability to measure accurately masses of proteolytic peptides. A high probability of identification requires a high number of peptide matches and a large percentage of protein sequence that can hardly be maximal unless combining different analysis method (enzymatic digestion with different proteases, sample preparation using different matrix).

b) Quality control Maldi-Tof MS was also used as a routine quality control to confirm protein digestion and prevent injection of incomplete digests into nanoLC/MS/MS, that could obstruct C18 columns. Data were acquired in reflectron mode to detect eventual peptides. However, a Maldi-TOF linear mode could have been used to detect undigested protein fragments.

In conclusion, Maldi-TOF mass spectrometry was shown to be an effective tool in proteomics, capable to identify pure proteins by PMF. This straightforward method also found applications in routine peptide analysis such as sample digestion check.

2.1.2 ESI LC MS/MS analysis 2.1.2.1 Sample preparation Sample was prepared in formic acid instead of TFA as the latter was known to suppress ESI signals by forming gas-phase ion pairs with positively-charged analyte ions. Then, to evaluate the requirement of a desalting step prior to LC-MS/MS analysis, SCX fractions from a BSA digest were prepared with or without C18 ziptiping. Better quality BPI

34 chromatograms - exhibiting higher peak intensities, larger number of detected peaks, and cleaner chromatogram especially in the 85 to 120 min region - were obtained from ziptiped samples (Figure 13). These observations suggested that salts interfered with ESI MS analysis through signal suppression and background noise increase, confirming the common knowledge stating that ESI was a salt sensitive source. Although this was a possible cause of analyte losses, sample C18ziptiping was therefore performed prior to any MS/MS analysis to promote high quality data - essential for further protein identification and quantitation - as well as to protect LC systems and mass spectrometers from salt impurity.

BSA unlabeled 50mM unziptiped

Figure 13. Comparison of two total base peak ion (BPI) chromatograms from ziptiped and unziptiped SCX fractions. The green BPI chromatogram corresponds to a BSA digest after sample cleanup using C18 zip- tiping whereas the red BPI chromatogram represents the same BSA digest without this procedure. Both chromatograms were obtained from a nanoLC-MS on a Q-TOF. NB: A BPI chromatogram plots the highest intensity at each scan whereas the TIC (total ion current) is the sum of the noise and the sum of signal at each scan. . 2.1.1.2 Protein identification and quantification Proteins from complex samples such as those coming from CSF proteomics shotgun experiments were successfully identified using Mascot MS\MS ion database search engine against SwissProt database. Results consisted in a list of matched peptides for each protein hit (similarly to PMF), associated to MS/MS fragmentation spectra representations of the most intense peptides (Figure 14). TMT labeling was not shown to interfere with mass spectrometry but allow good peptide ionisation and fragmentation in MS or MS/MS Using the “Select Peptide Summary”, only fragmented peptides were displayed with top-ranking peptide matches in red and the first time peptides were matched to a query in the report (highest scoring) in bold. Red and bold peptides were selected for analysis as they corresponded to the most logical assignments. In addition, peptides were associated to a probability ion score of identity or extensive homology: valid spectra, as shown below, were typically exhibiting nice b and y ions series, a minimal m/z error, with major peaks assigned and very less unassigned peaks, compared to experimental fragmentation spectra. To avoid the time consuming manual inspection of the spectra, only high scoring peptides, matching the identity score at 5% probability false positive threshold, were chosen for analysis. Some peptide sequences were listed more than one time with variable mass error and ion score (lower scores in parenthesis). These redundant MS/MS spectra resulted from the repeated analysis of the high-abundant peptides (in the same run) or the same peptides present

35 in various fractions (in different runs) for shotgun proteomics. They were considered as a unique match for protein identification, in conformity with most of the scientific publications. Moreover, as for PMF, some protein isoforms or proteins sharing extensive sequence homologies could not be discriminated as they were matching the same set of non unique peptides. To summarise, the minimum criteria for protein identification were to match at least one high scoring red bold peptide, not found in any protein of the database (unique).

Reporter ion: TMT0 m/z= 126

Precursor ion having lost reporter

Figure 14. MS/MS spectrum of a TMT labeled peptide obtained with an ESI-Q-TOF. The above spectrum represents the hit upon Mascot search of the peptide LVGGPMDASVEEEGVR, TMT0 (N-term) labeled from Human Cystatin C, with an ion score of 77. The a table represents the observed b and y ions series, with mass differences between each peptide corresponding exactly to that of an amino acid.

In the same MS/MS spectra, quantitative information could be extracted. When TMT0 labeled peptides were captured and CID fragmented, the 126 m/z reporter ion could be detected in the low mass region of the MS/MS spectra. If peptides had been tagged with TMT duplex (2-TMT), four-plex (4-TMT) or sixplex (6-TMT), relative abundances of peptides between the two, four or six compared samples could have been extracted and the relative protein abundance ratios inferred..

This section reported the potential of the single stage MALDI-TOF mass spectrometer and the hybrid Q-TOF tandem mass spectrometer to identify proteins in simple or complex sample mixtures, respectively by peptide mass fingerprinting (PMF) and peptide fragment fingerprinting (MS/MS). Moreover, tandem mass spectrometry combined to isobaric tagging TMT was shown to provide the tools for quantitative analysis.

36

3. Protein profiling of CSF after isobaric TMT reagent labeling

3.1. Sample preparation

3.1.1 Determination of total protein amounts

According to a four point calibration curve (Figure 15), the total protein concentration of the first CSF sample was estimated at 0.22µg/µl and the second at 0.28 µg/µl. The total protein concentration of the pooled aliquots was approximated by the average of the two values, i.e 0.25 µg/µl.

BSA standard curve

0.8

0.6

0.4 y = 1.3577x + 0.0457 0.2 R2 = 0 . 9 9 4 0 0 0.1 0.2 0.3 0.4 0.5

prot ein ( ug/ ul)

Figure 15. Four dilutions BSA standard curve at 595nm. Optical density (OD) was corrected for blank.

3.1.2 Workflow

The optimised workflow was finally applied to CSF. Weight-normalised samples of 100 µg proteins were in solution digested and TMT0 labeled. The resulting peptide mixture was first desalted using reversed phase (RP) C18 chromatography, and then subjected to two rounds of chromatography for a two-dimensional peptide separation. First-dimension chromatography consisted in a strong cation exchange chromatography (SCX) using ammonium acetate (NH4AcO) or potassium chloride (KCl) step gradients. UV profiles are not shown as they are very similar to those obtained for BSA TMT0 labeled sample (Figure 7). For the second-dimension chromatography, peptides-containing fractions were desalted (C18-ziptip) and loaded onto a RPC18 nanochromatography column coupled to a Q-TOF mass spectrometer for an online MS/MS analysis.

3.1.3 Influence of buffer pH on SCX fractionation

Figure 16 plots the number of peptides and proteins identified in the first five fractions from an SCX performed with increased concentrations of NH4AcO. All proteins identified were found in the first 50mM salt fraction with 42 peptides represented, compared to 4 to 7 in the next ones. Peptides early elution was due to the nature of the eluent, whose buffering capacity situated at a pH around 7 promoted peptide deprotonation and their consequent immediate elution. NH4AcO could therefore not be considered as an appropriate salt for sample fractionation but rather for sample purification, since peptides remain bound to the SCX column and can be washed from impurities such as SDS (or other anions) until buffer is applied. 50mM NH4AcO was shown to purify the sample without excessive losses. As this salt is volatile, higher concentrations (>200mM) could be applied to ensure a complete sample recovery without risking to interfere with MS analysis.

37

A. CSF 5 fractions SCX

60 40 20 0 50mM 50 mM 100 mM 150 mM 200mM

1.1 1.2 2 3 4

# pept ides 424723 # prot eins 233422

Figure 16 LC MS/MS analysis of SCX fractions from a TMT0 labeled CSF sample obtained using NH4AcO. Pkl files from each fraction and their merged list (unique file) were searched independently in Mascot with the same criteria.

Figure 17 depicts peptide fractionation using potassium chloride step gradient (pH adjusted to 3) and the resulting protein identifications of SCX fractions collected at 9 salt step concentrations. Peptides distribution was observed between 25mM and 120mM KCl predominantly. This time, peptides were eluted only according to salt concentration (cation competition) as peptide protonation was preserved by acidic buffer conditions. Contrarily to ammonium acetate, potassium chloride allowed sample fractionation.

B. SCX fractionation 100

80

60

40

20

0 25 mM 25 mM 50 mM 50 mM75 mM 100 mM125 mM150 mM200 mM300 mM500mM fraction 1.1 1.2 2 2.2 3 456 7 8 9 # peptides 99 67 76 56 52 17 12 6 2 2 0 # proteins 35 26 30 20 24 8 7 5 2 10

Figure 17. LC-MS/MS analysis of SCX fractions from a TMT0 labeled CSF sample obtained using KCl.

3.2 Protein analysis

3.2.1 Protein identifications

A total of 73 proteins were identified with relatively good confidence in human CSF samples, 50 using KCl fractionation, 5 with NH4AcO purification and 18 by both methods (Figure 18). Table 4 gives on overview of the results obtained for each experiment.

38

Identified proteins Matched peptides TMT labeled peptides # ID >1 unique peptide Mascot scores total Non-redundant total partial Fractionation 68 30 3062 to 40 421 150 295 84 median 107 3 1 2 0 Purification 23 7 777 to 41 85 43 84 0 median 1 57 1 1 1 0 Table 4 Result summary of shotgun proteomics experiments carried on CSF. For each experiment, a unique merged file containing all individual fractions peak lists was created and searched against SwissProt using Mascot. Protein identifications were based on the assignment of unique inter-protein and non redundant intra-protein peptides to protein hits in human Swissprot database, with stringent criteria. In the fractionation and purification experiments, respectively 30 and 7 proteins were identified with at least two non redundant matched peptides while others with a single high- scored peptide (Mascot score > identity). Of these, a few protein isoforms or proteins sharing extensive sequence homologies such as various immunoglobulins, complement factor C4 A/B and neural cell adhesion molecule 1 120kDa/140kDa could not be discriminated as they were matching exactly the same set of non unique peptides. On average, only 7% of the queries were matched to protein hits, 421/5853 for fractionation and 85/1229 for purification, whereas others were matched but not assigned to hits (score under the identity threshold) or unmatched. These results could largely be explained by the acquisition of low quality spectra due to experimental conditions (low analytes amounts, unspecific digestion, low chromatographic separation efficiency, etc…) and mass spectrometer performance limitations. However, some of these spectra could correspond to protein sequences not yet annotated in human Swissprot database or to unknown proteins not present in any databases. To enhance the number of matches, search could be conducted against a larger database such as Swissprot-TrEmbl - which integrates genes Open Reading Frame (ORF) translations and/or spectra of interest could be manually checked to evaluate their chances to be true hits and eventually analyzed by de novo sequencing.

Influence of sample fractionation on protein identification

Number of proteins identified by two different methods

SCX fractionation using KCl

SCX purification using NH4AcO 18 50 5

Figure 18 Comparison of the number of protein identifications with or without sample fractionation

The merged analysis of SCX fractions from KCl steps allowed the detection of increased numbers of non redundant peptides, namely 150 compared to 43 from NH4AcO purification steps. This resulted in the identification of 68 proteins - versus 23 for purification, with relatively higher Mascot scores (median score of 107 versus 57) and non-redundant peptides per identification ratios (median of 3 versus 1). A relatively high overlap of

39 identified proteins was observed between the two experiments. Using KCl fractionation, three times more proteins were identified with higher confidence as demonstrated by the doubled median score and tripled median number of non redundant peptides per hit. In addition, the number of detected peptides and identified proteins in the first NH4AcO fraction (Figure 16) was lower than in a number of KCl fractions taken individually (Figure 17), although containing nearly all the analytes. This could suggest that excessively elevated quantities of sample hinder peptides’ detection. This could be particularly true for undepleted CSF that displays a high dynamic range of protein concentrations with highly abundant proteins such as albumin and immunoglobulins and many low abundance proteins.

3.2.2 Protein quantification

For protein quantification, only spectra from inter-protein unique peptides should be taken into consideration, otherwise more than one protein could contribute to non unique peptides’ reporter intensities. Thus, no quantitation could be performed on the 6 homologues identified in this study. Because quantitative ratios obtained from different peptides of the same proteins could be variable (variability coefficient, CV), intra-protein redundant peptides should be regarded as single measurements and averaged to improve the statistical significance of the results. Thus, more elevated the number of redundant peptides, more accurate the quantitative data could be, even allowing the detection and elimination of outliers. Sample fractionation enhanced greatly the number of peptides for quantification. The merged analysis of SCX fractionated analytes (KCl) allowed the detection of increased numbers of assigned labeled peptides - namely 295 compared to 84 from SCX purification (NH4AcO). Of these, 284 and 78 were unique and thus theoretically quantifiable. Then, peptides should obviously carry a TMT-label but it is not clear if they need to be fully labeled. Partial labeling results in a loss of sensitivity because unlabeled, labeled and partially labeled peptides differ in mass (n x 224) and may not co-migrate during chromatography. Thus, if labeling efficiency varies between samples, quantification may be seriously altered. Otherwise, reporter intensities of redundant peptides of a given sequence could be summed independently of their labeling, and compared between samples to obtain an accurate quantitation. To evaluate labeling, the number of partially or totally labeled assigned matched peptides was counted without checking the presence/absence of the reporter on MS/MS spectra. Using KCl, 70% (295/421) total labeling was achieved with 28% (84/295) of these partially labeled. Surprisingly, 100% labeling was obtained using NH4AcO. Two explanations could be considered. First, as fractions over 200mM NH4AcO were not analysed, we could imagine that unlabeled peptides were more charged and still retained on the column. Second, we can imagine that for some reasons, TMT labeled peptides ionised or fragmented better, giving better quality data. Six proteins only (Apolipoprotein D, Neuroendocrine protein 7B2 Nucleobindin-1, Collagen alpha-1(I) chain, Neural proliferation differentiation and control protein 1 Intercellular adhesion molecule 5) were not carrying any label on any of their identified peptides and could therefore not be quantified whereas six others exhibited only one partial label.

3.2.3 Biological signification

As expected by the analysis of undepleted CSF, most of the identified proteins were already characterized (very few unknown functions) with half of them being circulating proteins - synthetised in the liver and secreted in plasma – mostly involved in immune response or defense mechanisms and transport or binding (i.e. various immunoglobulins,

40 albumin etc) as shown on Figure 19. However, some brain specific proteins, potential biomarkers for neurodegenerative diseases were identified with TMT0 label, and could therefore be potentially quantified using multiplexing TMT reagents. For example, proteins with changes unique to Alzheimer disease were identified with TMT0 label4343: Secretogranin I, Chromogranin A, Kallikrein, Apolipoprotein E (IDs with more than 1 peptide) and Secretogranin III, Kallikrein 6 (IDswith a single peptide). Proteins with changes unique to Parkinson disease43 such as Cystatine C(IDs with more than 1 peptide), Amyloid- like protein 1, Kallikrein 6 (downregulated) as well as dementia with Lewy Bodies such as Proenkephalin A, Transthyretin precursor (IDswith a single peptide) were identified with TMT0 label.

Secreted Neuroendocrine Unknow n Intracellular granules 3% 5% ECM 22%

Circulating Cell surf ace proteins Membr ane 49% 21%

Figure 19. Classification of human CSF proteins identified according to their localisation. The undepleted CSF used in this study may not be considered as an adequate sample for protein discovery approach as its analysis is complicated by the large dynamic range of protein concentrations that can be up to twelve orders of magnitude44. A more extensive characterisation of CSF proteome could be obtained by the removal of high abundance proteins by immunoaffinity depletion or fractionation (ultrafiltration, precipitation)45 methods, shown to increase low abundance protein detection. However, these techniques could also introduce quantitative variations and remove some proteins of interest - that unspecifically bind to depletion column or interact with depleted proteins. Undepleted CSF could then provide supplemental information using sensitive plateforms. To conclude, some markers of interest were found to carry a TMT labeled, announcing that their relative quantitation between different states or disease and eventually absolute quantitation, using synthetic peptides of the identified peptides sequences could be performed using the novel TMT technology.

3.1.4 Method improvements

Improvements could be done to the described methods. First, the number of RPC18 desalting steps (3) could be reduced to reduce sample losses and experimental time. SCX fractionation could be performed using a linear gradient of salts to improve peptide separation resolution or step gradient salt concentrations could be fine-tuned between 25mM and 100mM KCl where most of the peptides were shown to elute. In that case, second parts of the SCX collected fractions (from 2.5 to 5min) should be automatically analysed as they could contain many peptides (cf. fraction 1.2 and 2.2, Figure 17Figure 17. LC-MS/MS analysis of SCX fractions from a TMT0 labeled CSF sample obtained using KCl.). The number of peptides and thus proteins were probably missed as only the first part of fractions 3 to 9 were analysed. For mass spectrometry, sample MS/MS spectra should be manually checked and

41 eventually annotated to confirm the protein sequence by a unique peptide. Moreover, sequences should be run in a Blast search to certify their uniqueness. F. CONCLUSIONS

The first part of the work consisted in the evaluation and optimization of sample preparation methods in a shotgun proteomics approach using the isobaric TMT0 labeling. The results demonstrated the efficiency and applicability of the “in gel” and “in solution” digestion protocols using either Borate or TEAB buffers, as well as their compatibility with TMT labeling. Chromatographic methods were optimized to find adequate conditions for peptide retention in the RP-C18 and SCX columns and thus minimise sample losses. The aim was not to establish the perfect protocols as it is a never-ending process taking considerable knowledge, experience and time, but rather to find a reasonably working method and gain understanding about different technologies. The second part, reported the potential of a single stage MALDI-TOF mass spectrometer and the hybrid Q-TOF tandem mass spectrometer to identify proteins in simple or complex sample mixtures, respectively by peptide mass fingerprinting (PMF) and peptide fragment fingerprinting (MS/MS). Moreover, TMT labeling was not shown to interfere with mass spectrometry but allowed good peptide ionisation and fragmentation in MS or MS/MS modes. Thus, the “gel free” shotgun proteomic approach combined to the innovative TMT isobaric tagging was demonstrated thoughout the report to be the method of choice for qualitative and quantitative CSF analysis. Next, the optimised workflow was applied to undepleted human cerebrospinal fluid, a promising source of neurodegenerative biomarkers. Briefly, after in solution digestion and TMT0 labeling, the peptide mixture was desalted using reversed phase (RP) C18 chromatography, and then subjected to two rounds of chromatography for a two-dimensional peptide separation: first-dimension chromatography consisted in a strong cation exchange chromatography (SCX) using ammonium acetate (NH4AcO) or potassium chloride (KCl) buffer step gradients whereas the second-dimension consisted in an online nanoRPC18-LC coupled to a Q-TOF mass spectrometer MS/MS analysis.. A total of 73 proteins was identified with relatively good confidence in human CSF among which 67 were carrying a TMT0 label. Most of them were already known, with half being circulating proteins and involved in immune response, defense mechanisms and transport or binding. However, some brain specific proteins and possible biomarkers for neurodegenerative diseases were identified with TMT0 label, demonstrating the high dynamic range of the proteins observed, from neurosecretory peptides to high abundant albumin. The quantification of these potential unique proteins related to disease using multiplexing TMT reagents could provide novel therapeutic targets and diagnosis tools to prevent, cure and monitor disease progression. Finally, some of the results presented may not be completely reproducible or even comparable as they were biased by various technical problems. Despite their lack of rigorous scientific values, these preliminary results could be of interest to improve some technical issues not yet addressed related to the novel TMT technology.

From a personal point of view, my placement in Proteome Sciences laboratory was rewarding on many points. Throughout the application of a shotgun proteomic approach, I progressively gained knowledge and experience on various basic experimentations. Additionally, I was given the opportunity to access the state-of-the-art mass spectrometry as well as to benefit from the latest advancements in quantitative proteomics using the Tandem Mass Tag technology. The exceptional location of the laboratory, in the heart of the reknown Institute of Psychiatry of the King’s College, was a enthousiastic source of motivation due to its

42 impelling atmosphere, as well as regular contact and fruitful discussions with advanced Scientists from all over the world. Moreover, the integration in a small company’s team allowed me to be more aware of the commercial issues, objectives and marketing rules governing biotechnology companies, contrasting to academic centres. Overall, I acquired labwork autonomy, developed my sense of initiative, my critical scientific reasoning and improved significantly my skills in English. Beyond the scientific point of view, the experience of everyday life in another country, the discovery of a new culture and the interaction with new people in such an exciting city as London was unforgettable.

43

E. TABLE Table 5. List of human proteins identified in the CSF

Proteins ID SCX Fractionation SCX Purification

Protein matched Non partial Protein matched Non Swiss Prot Protein name TMT TMT score peptides redundant labeling score peptides redundant peptides peptides

Proteins identified with > 1 unique peptide

1 ALBU_HUMAN Serum albumin precursor 3062 106 9 79 26 777 31 6 30

2 PTGDS_HUMAN Prostaglandin-H2 D-isomerase 1122 31 4 24 4 254 7 3 7 precursor 3 CO3_HUMAN Complement C3 precursor 699 19 7 10 2 [Contains: Complement C3 beta 4 CYTC_HUMAN Cystatin-C precursor 577 18 4 11 3 506 17 7 17

5 CMGA_HUMAN Chromogranin-A precursor 521 12 2 9 3 80 1 1 1

6 CSTN1_HUMAN Calsyntenin-1 precursor 481 9 2 5 0

7 CLUS_HUMAN Clusterin precursor 355 12 5 7 1 57 1 1 1

8 SCG1_HUMAN Secretogranin-1 precursor 353 12 7 12 5 66 2 2 2

9 AACT_HUMAN Alpha-1-antichymotrypsin precursor 308 10 4 7 0 42 1 1 1

10 PEDF_HUMAN Pigment epithelium-derived factor 272 9 5 7 0 48 1 1 1 precursor 11 SPRL1_HUMAN SPARC-like protein 1 precursor - 270 7 3 4 3

12 KAC_HUMAN Ig kappa chain C region 247 6 5 5 2 43 1 1 1

13 GELS_HUMAN Gelsolin precursor 236 7 4 7 0 96 2 2 2

14 PCSK1_HUMAN ProSAAS precursor 219 6 4 5 0

15 IGHG1_HUMAN Ig gamma-1 chain C region 194 6 5 6 5 44 1 1 1

16 A1BG_HUMAN Alpha-1B-glycoprotein precursor 185 7 4 4 2

17 OSTP_HUMAN Osteopontin precursor 136 4 2 2 0 84 1 1 1

18 FINC_HUMAN Fibronectin precursor 132 6 2 4 2

19 ANGT_HUMAN Angiotensinogen precursor 129 4 3 3 1 41 1 1 1

20 DKK3_HUMAN Dickkopf-related protein 3 precursor 122 5 3 3 0

21 CRAC1_HUMAN Cartilage acidic protein 1 precursor 117 3 2 2 0

22 A2MG_HUMAN Alpha-2-macroglobulin precursor 109 3 2 2 0

23 VGF_HUMAN Neurosecretory protein VGF 107 2 2 1 0 46 1 1 1 precursor 24 CHL1_HUMAN Neural cell adhesion molecule L1- 88 4 2 1 1 like protein precursor 25 LSAMP_HUMAN Limbic system-associated 77 2 2 2 1 membrane protein precursor 26 KLK6_HUMAN Kallikrein-6 precursor 70 2 2 2 0

27 CBPE_HUMAN Carboxypeptidase E precursor 66 2 2 1 0

28 CNTN1_HUMAN Contactin-1 precursor 66 1 2 2 0

29 APOE_HUMAN Apolipoprotein E precursor 55 2 2 2 0

44

Proteins homologues identified> 1 non unique peptide

30 CO4A_HUMAN Complement or C4-A C4-B 1008 24 10 1 13 3 167 5 4 CO4B HUMAN precursor Proteins identified with 1 unique peptide

31 KNG1_HUMAN Kininogen-1 precursor 295 8 1 5 3

32 FETUA_HUMAN Alpha-2-HS-glycoprotein 168 3 1 1 0 52 1 1 1 precursor 33 TRFE_HUMAN Serotransferrin precursor 167 3 1 2 1 132 4 4 4

34 CD14_HUMAN Monocyte differentiation antigen 153 3 1 2 0 CD14 precursor 35 NPDC1_HUMAN Neural proliferation 139 3 1 0 differentiation and control protein 36 LAC_HUMAN Ig lambda chain C regions 137 5 1 4 2

37 SCG3_HUMAN Secretogranin-3 precursor 135 4 1 3 3

38 EPHA4_HUMAN Ephrin type-A receptor 4 109 2 1 1 0 precursor (Human) 39 FA5_HUMAN Coagulation factor V precursor 107 3 1 3 2

40 IGHA1_HUMAN Ig alpha-1 chain C region 100 3 1 3 2

41 UBIQ_HUMAN Ubiquitin 86 2 1 1 0

42 IBP7_HUMAN Insulin-like growth factor-binding 84 3 1 2 1 protein 7 precursor 43 PENK_HUMAN Proenkephalin A precursor 80 2 1 1 1 [Contains: Synenkephalin 44 A4_HUMAN Amyloid beta A4 protein 78 2 1 1 0 precursor 45 NRCAM_HUMAN Neuronal cell adhesion molecule 76 3 1 2 0 precursor 46 CFAB_HUMAN Complement factor B precursor 68 2 1 2 0

47 SODE_HUMAN Extracellular superoxide 68 2 1 2 0 dismutase [Cu-Zn] precursor - 48 ENPP2_HUMAN Ectonucleotide 59 1 1 1 1 pyrophosphatase/phosphodiesteras 49 APLP1_HUMAN Amyloid-like protein 1 precursor 58 2 1 2 0

50 ENDD1_HUMAN Endonuclease domain-containing 58 1 1 1 0 1 protein precursor 51 FCGBP_HUMAN IgGFc-binding protein precursor 58 1 1 1 0

52 IBP6_HUMAN Insulin-like growth factor-binding 50 1 1 1 0 protein 6 precursor 53 LV102_HUMAN Ig lambda chain V-I region HA 49 1 1 0

54 FBLN1_HUMAN Fibulin-1 precursor 49 1 1 1 0

55 D1IP_HUMAN D1 dopamine receptor-interacting 46 1 1 1 0 protein calcyon 56 CO1A1_HUMAN Collagen alpha-1(I) chain 44 1 1 0 precursor 57 PGBM_HUMAN Basement membrane-specific 43 1 1 1 0 heparan sulfate proteoglycan core 58 CADH2_HUMAN Cadherin-2 precursor 42 1 1 1 1

59 7B2_HUMAN Neuroendocrine protein 7B2 42 1 1 0 precursor 60 APOD_HUMAN Apolipoprotein D precursor - 42 1 1 0

61 ZA2G_HUMAN Zinc-alpha-2-glycoprotein 42 2 1 2 0 precursor 62 ICAM5_HUMAN Intercellular adhesion molecule 5 40 1 1 0 precursor

45

63 NUCB1_HUMAN Nucleobindin-1 precursor 40 1 1 0

64 CNTN2_HUMAN Contactin-2 precursor 40 1 1 1 0

65 TTHY_HUMAN Transthyretin precursor 93 2 1 2

66 SCG2_HUMAN Secretogranin-2 precursor - 68 1 1 1

67 CO7_HUMAN Complement component C7 precursor 44 1 1 1

68 VTNC_HUMAN Vitronectin precursor - Homo 46 1 1 1 sapiens (Human) Proteins homologues identified with a single non unique peptide KV302_HUMAN 69 KV304_HUMAN Ig kappa chain V-III region 186 4 1 2 2 48 1 1 KV305_HUMAN SIE /Ti /WOL/GOL KV307_HUMAN KV301_HUMAN HV305_HUMAN 70 HV316_HUMAN Ig heavy chain V-III region 151 2 1 1 0 BRO/TEI

HV304_HUMAN 71 HV313_HUMAN Ig heavy chain V-III region 47 1 1 1 HV315_HUMAN TIL/POM/ WAS/TUR HV318_HUMAN NCA12_HUMAN/ 72 NCA11_HUMAN Neural cell adhesion molecule 48 1 1 1 1 1, 120 kDa/140 kDa isoform precursor HV101_HUMAN 73 HV106_HUMAN Ig heavy chain V-I region 42 1 1 1 0 HV107_HUMAN EU/SIE/Mot

46

G. REFERENCES

1 Wilkins MR et al., 1996 Biotechnol Genet Eng Rev 13, pp 19–50 2 Domon B et al., 2006, Science 312, pp 212-217 3 Aebersold et al.1987 Proc. Natl. Acad. Sci. 84, 6970-6974 4 Patterson SD and Aebersold RH, 2003, Nat Genet. Mar;33 Suppl:311-23. Review 5 G.E. Reid and Scott A. McLuckey, 2002 J. Mass Spectrom. 37, 663–675 6 A.J. Link et al, 1999. Nat. Biotechnol. 17, 676–682 7 Fournier et al., 2007 Chem. Rev., 107 (8), 3654 -3686 8 Delahunty C. et al., 2005. Nat. Genetics 35, pp 248-255 9 Santoni, V et al, 2000 Electrophoresis 21, pp 1054 10 Corthals, G et al., 2000 Electrophoresis, 21, 1104 11 Gygi, S et al. 2000 Proc. Natl. Acad. Sci. U.S.A., 97, 9390 12 Various articles on: http://www.lcgceurope.com/ and Oliver Hartley’ master course 13 Lin D et al. 2003. Biochim. Biophys. Acta 1646, pp 1-10 14 Hattan, S. et al. 2005 J. Proteome Res., 4, 1931 15 M. Karas and F. Hillenkamp, 1988. Anal. Chem. 60, pp. 2299–2301. 16 Fenn JB et al., 1989, Science 246, 64–71 17 Aebersold et al., 2003. Nature, vol 422, pp 198-207 18 Dayin L et al., 2003. Biochimica et Biophysica Acta (BBA) -Vol 1646, Issues 1-2, Pages 1-10 19 R. S. Brown et al, 1995, Anal . Chem. 67, 1998 20 Cottrell JS. 1994, Pept Res. 7, pp 115-124 21 Eng, J. K. et al., 1994 J. Am. Soc. Mass Spectrom. 5, 976-989. 22 Biemann, K. 1990, Methods Enzymol. 193, pp 886–888 23 Perkins DN et al., 1999 Electrophoresis 20, pp 3551-3567 24 Pappin, D. J et al . 1993 Curr. Biol., 3, 327. 25 Link AJ et al., 1997,Electrophoresis 18, 1314-1334 26 Kusmierz J. et al., 1990 Anal Chem 62, 2395-2400 27 Gygi SP et al. 2001, Cell 107, 715-726 28 Krijgsveld J al, 2003. Nat. Biotechnol. 21, pp 927-931. 29 Ong SE et al. 2002, Mol.Cell. Proteomics 1, pp 376-386 30 Rose, K. et al. 1983, Biochem J. 215, 273-277. 31 Yao, X et al. 2001, Anal Chem. 73, pp 2836-2842 32 Gygi et al., 1999, Nat. Biotechnol. 17, pp 994-999 33 Ross et al. Mol.Cell. Proteomics 3, pp 1154-1169 34 Goodlett DR et al.,Rapid Commun Mass Spectrom. 15, pp 1214-1221 35 Thompson A et al., 2003. Anal Chem 75, 1895-1904 36 Dayon et al., 2008. Anal Chem. (in press) 37 Pierce et al., 2007. Mol Cell Proteomics. 38 Wood JH, 1980. Neurobiology of Cerebrospinal fluid, Plenum press, New York 39 Blennow K, 1993 Eur. Neurol 33 (2), 129-133 40 Hümer AF et al., 2006, Dis. Marker 22, 2-26 41 Bradford M et al., 1976 , Anal. Biochem. Ibid. 72, pp. 248–254 42 Wiese et al. 2007, Proteomics. Feb;7(3):340-50. 43 Abdi F et al., 2006 Journal of Alzheimer’s Disease 9, pp 293-348 44 Anderson et al., 1998. Electrophoresis 19, pp 1853-1861. 45 Zhang, J et al., 2005. Neurobiology of Aging 26, pp 207-227

47