<<

A Thesis

entitled

Elucidating the Sequence and Structural Features of Human Bence-Jones

by

Weliwaththage Thilini Perera

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the

Master of Science in Chemistry

______Dragan Isailovic, Committee Chair

______Dr. Leif Hanson, Committee Member

______Dr. John Bellizzi, Committee Member

______Dr. Amanda Bryant-Friedrich, Dean College of Graduate Studies

The University of Toledo

August 2018

Copyright 2018, Weliwaththage Thilini Perera

This document is copyrighted material. Under copyright law, no parts of this document may be reproduced without the expressed permission of the author.

An Abstract of

Elucidating the Sequence and Structural Features of Human Bence-Jones Proteins

by

Weliwaththage Thilini Perera

Submitted to the Graduate Faculty as partial fulfillment of the requirements for the Master of Science Degree in Chemistry

The University of Toledo August 2018

Amyloidosis diseases are characterized by the deposition of insoluble aggregates called amyloids. More than 30 diseases are associated with amyloid-forming proteins, including Alzheimer’s, Parkinson’s, Huntington’s and immunoglobulin light chain amyloidosis (AL amyloidosis). The proteins contributing to each disease have distinct primary structures. The mechanism of amyloid formation is not well understood, but appears to be associated with protein misfolding processes accompanied by self- aggregation.

In AL amyloidosis, immunoglobulin light chain proteins become misfolded and accumulate as amyloid. No effective therapeutic solution is available to treat this disease condition, which is usually fatal a few months after diagnosis. AL amyloidosis appears in a subset of patients with multiple myeloma (MM), a malignant disease condition characterized by bone marrow failure. Most of the patients having MM excrete monoclonal free immunoglobulin light chains, also called Bence-Jones proteins, into the urine. In contrast to AL amyloidosis, amyloid deposits are not observed in vivo in most patients suffering from MM. Consequently, the identification of the precise sequence and structural

i

information of proteins related to MM and AL amyloidosis are essential to understand the factors responsible for aggregation.

As reported in the literature, several analytical methods have been used to study primary, secondary, tertiary and quaternary structures of amyloid-forming proteins. In our study, mass spectrometry, circular dichroism (CD) spectroscopy and dynamic light scattering (DLS) were used to obtain sequence and structural information about immunoglobulin light chain proteins, isolated from urine of AL amyloidosis and MM patients.

Accordingly, molecular masses of seven Bence-Jones protein samples were measured by ESI-MS and MALDI-MS and the properties of their ions in the gas phase were observed by IMS-MS. Since five of them are with unknown sequences, the bottom- up proteomic approach was used to obtain unreported sequences of immunoglobulin light chain proteins. The -digested protein samples were analyzed by MALDI-MS, ESI-

MS, IMS-MS and HPLC-ESI-MS/MS. A de novo sequencing-assisted database search was performed using PEAKS search tool to obtain possible peptide sequences. Multistep

MS/MS approaches were applied to differentiate between leucine and isoleucine residues present in newly identified peptides. Furthermore, CD spectroscopy was used to compare the secondary structure elements of different light chain proteins. Most light chains showed high percentage of β-sheets, which is common for amyloid-forming proteins. In addition,

DLS was used to study the effect of physiochemical parameters on the aggregation behavior of light chain proteins. Overall, novel sequence and structural features of ii

immunoglobulin light chains were obtained by these approaches and will be correlated with the properties of other amyloid-forming proteins.

iii

Acknowledgements

First and foremost, I would like to thank my advisor, Dr. Dragan Isailovic, for his immense support, motivation and valuable time sent on me for the discussions form initial to final stage of this project. I would also like to thank Dr. Leif Hanson, for providing me biological samples and the constant support at all stages including the guidance to conduct

MALDI and DLS experiments. I express my gratitude to Dr. Bellizzi, Dr. Mueser and Dr.

Edmundson for their valuable advices and suggestions. I would like to convey my special thanks to Dr. Erickson for his assistance to perform CD spectroscopy experiments.

Many thanks to all current and former lab members in Isailovic’s lab for being so nice and helpful throughout and making lab such a wonderful place to work in. This includes Krishani, Sanjee, David, Rachel, Siddhita, Jen and Kevin. My special thanks go to Dr. Rachel Marvin for her continuous help to solve scientific problems since the beginning of my research work.

Most importantly, I owe my deepest gratitude to my loving family including my husband, parents and siblings for constant support. Their unconditional love and continuous encouragement made me confident to go through all challenging situations in graduate life at University of Toledo.

iv

Table of Contents

Abstract ...... i

Acknowledgements ...... iv

Table of Contents ...... v

List of Tables ...... ix

List of Figures ...... xi

List of Abbreviations ...... xv

1Introduction … ...... 1

1.1 Amyloidosis Diseases ...... 1

1.2 Immunoglobulin Light Chain Amyloidosis (AL amyloidosis) ...... 3

1.3 Analytical Methods Used to Study Amyloid , Folding and Assembly...... 5

1.4 Mass Spectrometry ...... 6

1.5 Ion Sources in Mass Spectrometry ...... 7

1.5.1 Matrix-Assisted Laser Desorption/Ionization (MALDI) ...... 8

1.5.2 Electrospray Ionization (ESI) ...... 10

1.6 Mass Analyzers ...... 12 v

1.6.1 Quadrupole Mass Analyzer...... 13

1.6.2 TOF Mass Analyzer ...... 15

1.6.3 Ion Trap Mass Analyzer...... 17

1.6.4 Orbitrap Mass Analyzer ...... 19

1.7 Proteins and Proteomics ...... 20

1.7.1 Protein separation by gel electrophoresis ...... 21

1.7.2 Protein digestion ...... 22

1.8 Mass Spectrometry-Based Proteomics ...... 22

1.8.1 Tandem Mass Spectrometry (MS/MS) ...... 24

1.8.2 Peptide Fragmentation ...... 25

1.8.3 Peptide Mass Fingerprinting (PMF) ...... 26

1.8.4 Protein Sequencing Techniques ...... 27

1.8.5 De novo Sequencing of Peptides ...... 28

1.9 MS-based leucine (Leu) and isoleucine (Ile) discrimination approach ...... 30

1.10 Mass Spectrometry-Chromatography Coupling...... 32

1.11 Ion mobility spectrometry (IMS) ...... 33

1.12 Sequencing of immunoglobulin light chains by MS ...... 34

1.13 Circular Dichroism (CD) Spectroscopy ...... 35

vi

1.14 Dynamic light scattering (DLS) ...... 36

2 Materials and methodology...... 38

2.1 Materials and instruments ...... 38

2.2 Methodology ...... 39

2.2.1 Sample preparation ...... 39

2.2.2 Separation of proteins by SDS-PAGE ...... 40

2.2.3 Molecular mass determination of immunoglobulin light chain proteins and

analysis of their ions in the gas phase ...... 41

2.2.4 In-gel enzymatic digestion of immunoglobulin light chain proteins ...... 42

2.2.5 Identification of peptides in Mcg and Sea proteins by MALDI-MS, ESI-

MS/MS and IMS-MS ...... 43

2.2.6 Preparation of peptides for HPLC-MS analysis...... 44

2.2.7 Separation and identification of proteins and peptides by HPLC-ESI-

MS/MS ...... 45

2.2.8 Comparison between reported (Mcg, Sea) and unreported (Black, May,

Moz, Tew and Jen) light chain protein sequences ...... 46

2.2.9 De novo sequencing assisted database search for peptide identification of

unreported light chain protein sequences (Black, May, Moz, Tew, Jen) ... 47

2.2.10 Differentiation between Leu and Ile by ETD-HCD-MS3 analysis ...... 48

vii

2.2.11 Comparison of secondary structure of light chain proteins by CD

spectroscopy ...... 50

2.2.12 Detection of light chain aggregation by DLS ...... 50

3 Results and Discussion ...... 52

3.1 Determination of molecular masses of light chain proteins and the properties of their ions in the gas phase ...... 52

3.1.1 Separation of proteins by SDS-PAGE ...... 52

3.1.2 Determination of molecular masses of light chain proteins (Mcg, Sea,

Black, May, Moz, Nii, Tew, Jen) by MALDI-MS and ESI-MS ...... 53

3.1.3 The IMS-MS analysis of Black protein ...... 55

3.2 The identification of peptides of Mcg and Sea proteins by MALDI-MS, ESI- MS/MS and IMS-MS ...... 57

3.3 The separation and identification of peptides in enzyme-digested protein samples by HPLC-ESI-MS/MS ...... 61

3.4 Amino acid sequence assembly of unreported immunoglobulin light chain proteins (Black, May, Moz, Tew and Jen) ...... 63

3.5 Differentiation between Leu and Ile in light chain peptides having single or multiple Leu/Ile residues...... 66

3.6 Comparison of secondary structures of light chain proteins by CD spectroscopy ...... 70

3.7 Detection of light chain aggregation by DLS...... 71

4 Conclusion and Future Work ...... 73

viii

4.1 Conclusions ...... 73

4.2 Future Work ...... 74

References ...... 76

Appendix A ...... 85

Appendix B ...... 103

ix

List of Tables

Table 1. 1 Human diseases associated with defective ...... 1

Table 1.2 Common methods used to study structural characteristics of amyloid proteins. 6

Table 1. 3 Summary of main characteristics of different mass analyzers ...... 13

Table 1. 4 Masses of most common immonium ions...... 26

Table 1. 5 Disease related Light chains isolated from patients having MM or/and AL ...35

Table 3.1 Molecular weight of light chain proteins measured by ESI-QTOF-MS ...... 55

Table 3.2 A list of peptides observed by MALDI-MS and ESI-MS of tryptic digested Mcg

light chain protein in positive ion mode...... 58

Table 3.3 Peptide sequences of Black, May, Moz, Jen and Tew obtained by MASCOT.59

Table 3.4 A summary of protein sequence coverage by de novo assisted database search

...... 65

Table 3.5 Sequence alignment of variable region in light chain proteins ...... 66

Table 3.6 Predicted secondary structure elements for light chain proteins ...... 70

Table B.1 A list of peptides identified from trypsin digestion of Black protein ...... 103

Table B.2 A list of peptides identified from chymotrypsin digestion of Black protein ..104

Table B.3 A list of peptides identified from glu-c digestion of Black protein ...... 105

x

Table B.4 A list of peptides identified from trypsin digestion of May protein ...... 106

Table B.5 A list of peptides identified from chymotrypsin digestion of May protein ....107

Table B.6 A list of peptides identified from glu-c digestion of May protein ...... 107

Table B.7 A list of peptides identified from trypsin digestion of Moz protein ...... 118

Table B.8 A list of peptides identified from chymotrypsin digestion of Moz protein ....110

Table B.9 A list of peptides identified from glu-c digestion of Moz protein ...... 111

Table B.10 A list of peptides identified from trypsin digestion of Jen protein ...... 112

Table B.11 A list of peptides identified from chymotrypsin digestion of Jen protein ....114

Table B.12 A list of peptides identified from glu-c digestion of Jen protein ...... 114

Table B.13 A list of peptides identified from trypsin digestion of Tew protein ...... 115

Table B.14 A list of peptides identified from chymotrypsin digestion of Tew protein ..116

Table B.15 A list of peptides identified from glu-c digestion of Tew protein ...... 117

Table B.16 A list of proteins related to immunoglobulin light chains which were identified

by SEQUEST search ...... 118

xi

List of Figures

Figure 1-1 A Kidney biopsy from a patient having AL amyloidosis ...... 3

Figure 1-2 Schematic diagram of V region of light chain protein...... 4

Figure 1-3 Schematic diagram of (A) lambda and (B) kappa light chains...... 5

Figure 1-4 A general schematic of a mass spectrometer...... 7

Figure 1-5 The principle of MALDI (adapted from de Hoffmann and Stroobant) ...... 9

Figure 1-6 The mechanism of electrospray ionization...... 11

Figure 1-7 Schematic diagram of quadrupole mass analyzer ...... 14

Figure 1-8 Schematic diagram of linear TOF mass analyzer...... 15

Figure 1-9 An illustration of TOF fitted with a reflectron mass analyzer ...... 17

Figure 1-10 Schematic diagram of 3D ion trap mass analyzer ...... 18

Figure 1-11 Schematic diagram of linear ion trap mass analyzer...... 19

Figure 1-12 Schematic diagram of orbitrap ...... 19

Figure 1-13 The basic structure of a peptide ...... 21

Figure 1-14 Mass spectrometry based proteomics approaches ...... 23

Figure 1-15 Biemann nomenclature for peptide fragmentation ...... 25

Figure 1-16 Overview of PMF for protein identification ...... 27

xii

Figure 1-17 Differentiation between Leu and Ile based on the abundance of 69 Da ion.

...... 31

Figure 1-18 Differentiation between Leu and Ile based on the formation of w ions. .... 32

Figure 1-19 Separation of biomolecules by IMS-MS...... 33

Figure 1-20 The different types of secondary structure found in proteins ...... 36

Figure 3-1 Images of coomassie blue stained gels corresonding to SDS-PAGE ...... 53

Figure 3-2 ESI-MS spectrum of Mcg light chain protein...... 54

Figure 3-3 IMS-MS analysis of Black protein...... 56

Figure 3-4 ESI-MS/MS spectrum obtained for ADGSPVK sequence in Mcg...... 60

Figure 3-5 ESI-IMS-MS Driftscope plot for tryptic digest of Mcg...... 61

Figure 3-6 Base peak chromatogram of the separation of myoglobin tryptic peptides by

HPLC-ESI-MS/MS...... 63

Figure 3-7 Predicted sequence of Black protein...... 64

Figure 3-8 The total ion chromatograms observed for neurotensin peptide in each step of

ETD-HCD-MS3 approach...... 67

Figure 3-9 MS and MS/MS spectra obtained from ETD-HCD-MS2 approach for

neurotensin...... 68

Figure 3-10 MS spectrum obtained from HCD-MS3 approach for neurotensin ...... 69

Figure 3-11 MS spectrum obtained from HCD-MS3 approach for (A) m/z=116 ion and

(B) m/z=229 ion...... 69

Figure 3-12 The change in most common scattering particle size with temperature. .... 72

xiii

Figure A-1 MALDI-MS spectrum of Mcg light chain protein...... 85

Figure A-2 MALDI-MS spectrum of Sea light chain protein...... 85

Figure A-3 ESI-MS spectrum of Sea light chain protein...... 86

Figure A-4 ESI-MS spectrum of Black light chain protein...... 86

Figure A-5 ESI-MS spectrum of May light chain protein...... 87

Figure A-6 ESI-MS spectrum of Moz light chain protein...... 87

Figure A-7 ESI-MS spectrum of Jen light chain protein...... 88

Figure A-8 ESI-MS spectrum of Tew light chain protein...... 88

Figure A-9 MALDI-MS spectrum of trypsin digested Mcg...... 89

Figure A-10 MALDI-MS spectrum of chymotrypsin digested Mcg...... 89

Figure A-11 MALDI-MS spectrum of glu-c digested Mcg...... 90

Figure A-12 ESI-MS spectrum of trypsin digested Mcg...... 90

Figure A-13 ESI-MS spectrum of trypsin digested Sea...... 91

Figure A-14 ESI-MS/MS spectrum obtained for RPSGVPDR sequence in Mcg...... 91

Figure A-15 ESI-MS/MS spectrum obtained for ADSSPVK sequence in Sea...... 92

Figure A-16 ESI-MS/MS spectrum obtained for LTVLGQPK sequence in Sea...... 92

Figure A-17 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by trypsin digestion of Mcg protein...... 93

Figure A-18 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by trypsin digestion of Sea protein...... 93

xiv

Figure A-19 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by trypsin digestion of Black protein...... 94

Figure A-20 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by chymotrypsin digestion of Black protein...... 94

Figure A-21 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by Glu-C digestion of Black protein...... 95

Figure A-22 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by trypsin digestion of Moz protein...... 95

Figure A-23 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by chymotrypsin digestion of May protein...... 96

Figure A-24 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by trypsin digestion of Jen protein...... 96

Figure A-25 HPLC-ESI-MS/MS base peak chromatogram showing the separation of

peptides obtained by trypsin digestion of Tew protein...... 97

Figure A-26 Predicted sequence of May protein...... 97

Figure A-27 Predicted sequence of Moz protein...... 98

Figure A-28 Predicted sequence of Jen protein...... 98

Figure A-29 Predicted sequence of Tew protein...... 99

Figure A-30 The total ion chromatograms observed in each step of ETD-HCD-MS3

approach for Black tryptic peptide...... 99

xv

Figure A-31 MS spectra obtained from ETD-HCD-MS3 approach for doubly-charged

Black tryptic peptide ...... 100

Figure A-32 CD spectra of light chain proteins ...... 101

Figure A-33 The change in most common scattering particle size of Black light chain

with concentration...... 101

Figure A-34 The change in proteolytic fragment size for Mcg and Sea with temperature

...... 102

xvi

List of Abbreviations

ALC...... Average local confidence

BSA ...... Bovine serum

CD ...... Circular dichroism spectroscopy CDR ...... Complementarity-determining regions CHCA ...... α-cyano-4-hydroxy cinnamic acid CID ...... Collision induced dissociation

Da ...... Dalton DDT ...... Dithiothreitol DHB ...... 2,5-dihydroxybenzoic acid DLS ...... Dynamic light scattering

ETD ...... Electron transfer dissociation

FT-ICR ...... Fourier transform ion cyclotron resonance FT-OT ...... Fourier transform orbitrap

HCD ...... High-energy collision dissociation HPLC ...... High-performance liquid chromatography

IMS ...... Ion-mobility spectrometry

LIT ...... Linear ion trap

MALDI ...... Matrix-assisted laser desorption/ionization MWCO ...... Molecular weight cut-off MM ...... Multiple myeloma MS ...... Mass spectrometry m/z ...... Mass-to-charge ratio

xvii

PMF...... Peptide mass fingerprinting PTM ...... Post-translational modification

QIT ...... Quadrupole ion trap

RF ...... Radio frequency

SA ...... Sinapinic acid SDS-PAGE ...... Sodium dodecyl sulfate polyacrylamide gel electrophoresis SIM ...... Selected ion monitoring SPE ...... Solid-phase extraction

TEMED ...... N,N,N,N’-tetramethylenediamine TFA ...... Trifluoracetic acid TOF ...... Time-of-flight

UV ...... Ultraviolet v/v ...... Volume-to-volume

xviii

List of Symbols

κ...... kappa type light chain λ ...... lambda type light chain β ...... beta sheet structure

xix

Chapter 1

Introduction

1.1 Amyloidosis Diseases

Amyloidosis is a pathological state characterized by the deposition of proteins in a variety of organs as amyloid. The term amyloid refers to insoluble protein aggregates and was first discovered by Rudolph Virchow in the 19th century.1 Amyloid formation is associated with several disease conditions in human. Table 1.1 illustrates such disease conditions with their aggregating proteins or peptides.2,3

Table 1.1 Human diseases associated with defective protein folding.

The clinical symptoms of amyloidosis depend on the type of the protein and where it is deposited in the body. At present, 25 proteins are known to be associated with

1

amyloidosis and additional protein types are continually being added to the list.3,4,5,6

Therefore, a modern nomenclature has been developed to identify proteins and related disease conditions.1 Briefly, there is a prefix “A” for amyloid followed by an abbreviation derived from the name of the protein. For instance, AL indicates amyloid derived from immunoglobulin light chain, Aβ2M designates amyloid derived from β2-microglobulin and

Aβ indicates amyloid derived from β-peptide which is related to Alzheimer’s disease.

Amyloidosis can be classified into two types; localized amyloidosis, in which aggregation occurs in a single type of a tissue and systemic amyloidosis, in which aggregation occurs in multiple tissues.1 The amyloid formation occurs via nucleation and growth mechanism. Once a nucleus is formed by a protein or a peptide, fibril growth proceed rapidly by further association of either monomers or oligomers with the nucleus.2

Several outstanding and critical questions related to amyloid formation remain elusive such as the detailed mechanism of the aggregation process, factors determining the kinetics of aggregation, the structural nature of the intermolecular interactions, and how aggregation can be efficiently prevented in vivo.7

Although different proteins are responsible for a variety of protein misfolding diseases, they do not have similarities in terms of their amino acid sequences. Three main factors related to sequence have been identified which influence amyloid formation.2 One important property is the hydrophobicity of the polypeptide chain. Hydrophobic side chains are largely buried inside the folded structure of a . When the protein is partially folded or fragmented, the side chains will become exposed and possibly form

2

fibril structures.8 Substitution of amino acids with hydrophobic side chains in the regions which start nucleation may induce aggregation. This explains why three or more consecutive hydrophobic residues are less frequent in natural proteins.9 Another determinant is the charge. High net charge may hinder the self-association of the protein.10

Finally, high propensity to form β sheet structure is found as an important factor encouraging the amyloid formation.11

By definition, amyloid is formed in vivo, has cross β sheet structure which defines as the parallel chains of β peptides arranged perpendicular to the axis of fibril.12 These structures have high affinity for Congo red dye molecules and show specific optical behavior such as apple green birefringence under polarized light as shown in Figure 1-1.

Figure 1-1 A Kidney biopsy from a patient having AL amyloidosis. (A) Congo-red positive glomerular deposits under light microscope (original magnification x 400). (B) Typical apple-green birefringence under polarized light. Reprinted with permission from reference 12. 1.2 Immunoglobulin Light Chain Amyloidosis (AL amyloidosis)

In AL amyloidosis, abnormal plasma cells are responsible for the production of excess immunoglobulin light chains which then become misfolded and accumulate as

3

amyloid.13,14 This is considered as the most common form of systemic amyloidosis, with an incidence of ~1 case per 100,000 people/year in western countries.13,4 A typical immunoglobulin molecule is composed of four polypeptide chains; two identical heavy chains and two identical light chains. Each light chain consists of the N-terminal half, known as the variable domain (V region) and the C-terminal half, known as the constant domain (C region). The V domain contains three hypervariable regions called complementarity-determining regions (CDRs) that are involved in the antigen binding sites. The remaining portion consists of four less variable regions, designated as framework

(FR) regions.15 The schematic representation of light chain V region is shown in Figure 1-

2.

Figure 1-2 Schematic diagram of V region of light chain protein.

The V region and the C region each constitute approximately half of the chain.

There are two isotypes of light chains, named as lambda (λ) and kappa (κ). The schematic diagram of light chains is shown below in Figure 1-3. Not all light chains are equally amyloidogenic and it is suggested that λ is more amyloidogenic than κ.16 As reported in the literature, light chain V domain is the most susceptible for assembly as amyloid deposit.17,18,19

4

AL amyloidosis is diagnosed in a subset of patients with multiple myeloma (MM), a malignant disease condition characterized by bone marrow failure.20 In contrast to AL, amyloid deposits are not observed in vivo in most patients having MM.5 It is reported that

~50% of patients having MM excrete immunoglobulin light chains (also called Bence-

Jones proteins) in the urine.21 Therefore, sequence and structural analyses of proteins related to these two contrasting pathologies are important to understand which factors determine the amyloid formation.5 Several possible factors for amyloid formation are the sequence differences of light chains, high levels of expression of light chains and the post translational modifications.5

Figure 1-3 Schematic diagram of (A) lambda and (B) kappa light chains.

1.3 Analytical Methods Used to Study Amyloid Protein Structure, Folding and

Assembly

According to the general mechanism of amyloid formation, first the unfolded proteins or partially unfolded proteins form aggregates. Then they undergo further assembly into protofibrils and finally into mature fibrils.22 As illustrated in Table 1.2, several analytical methods are used to study these structures and assembly mechanisms.3 5

Our project is mainly focused on the identification of the human immunoglobulin light chain sequences obtained from several patients having AL and/or MM. Mass spectrometry was the main analytical technique used for this purpose. Moreover, dynamic light scattering and circular dichroism spectroscopy were used to study aggregation behavior and secondary structure of these proteins respectively.

Table 1.2 Common methods used to study structural characteristics of amyloid proteins.3

1.4 Mass Spectrometry

Mass spectrometry (MS) is a powerful analytical tool that can provide both qualitative and quantitative information about analyte molecules in the form of ions. A mass spectrometer converts the analyte molecules into gaseous ions, separates them

6

according to mass-to-charge ratio (m/z) and records the relative abundance of each ion. The basic components of a typical mass spectrometer include an ion source, a mass analyzer and a detector as shown in Figure 1-4.23,24

Figure 1-4 A general schematic of a mass spectrometer. The sample to be analyzed is introduced through the inlet. It can be directly infused as a liquid or introduced as a solid. Other techniques such as liquid chromatography and gas chromatography can also be coupled with a mass spectrometer. The gaseous ion formation occurs in the ion source. After the ionization process, ions should be separated according their m/z values and this is achieved by mass analyzers. Then, they pass to the detector which counts the ions and transforms ion count into an electrical signal. This can be read by a connected computer and a mass spectrum is generated accordingly.

1.5 Ion Sources in Mass Spectrometry

In the ionization source, analytes of interest are converted into gas phase ions while they acquire their charge. There are several ionization techniques used in MS such as electron ionization (EI), matrix-assisted laser desorption/ionization (MALDI), electrospray ionization (ESI) and atmospheric pressure chemical ionization (APCI).23 EI is considered

7

as a hard ionization method since it is very energetic and causes extensive fragmentation.

MALDI, ESI and APCI are soft ionization methods and mostly produce ions of molecular species. The two common techniques use to analyze liquid and solid biological samples are ESI and MALDI.24,25 Some desired properties of an ionization process are shown below

25 : a) All components in the sample should be ionized b) The number of ions formed should be proportional to the amount of sample component c) There should not be any unwanted adduct ion formation d) No fragmentation during the ionization process is desired unless the fragmentation is

essential for the analysis

Although no ionization source satisfy all these requirements, the choice of ionization source depends on the type of analyte.25

1.5.1 Matrix-Assisted Laser Desorption/Ionization (MALDI)

MALDI is an extremely popular method to produce gas-phase ions from broad range of compounds such as proteins, oligonucleotides, polymers and inorganic compounds.23 The principle of MALDI is shown in Figure 1-5. This process is achieved in two steps.23 First, the analyte is mixed with a matrix. Usually matrix solution contains small organic molecules having a chromophore that absorbs light at the laser wavelength. A small amount of this analyte-matrix mixture is placed on the metal plate. Second step occurs

8

inside the instrument. The matrix absorbs the energy from the laser and vaporization and ionization occur in samples. Therefore, ions are produced by pulsed-laser irradiation.24

Figure 1-5 The principle of MALDI (adapted from de Hoffmann and Stroobant).23

Typically singly-protonated analytes form in MALDI, but the mechanism of ion formation is not-fully understood.23 Two theories have been proposed to explain the ion formation.26 The older model is the Coupled Physical and Chemical Dynamics (CPCD) model also called photoionization model, and the more recent one is the “lucky survivor” model.26 The CPCD model assumes analyte molecules are neutral when they are incorporated in the matrix crystals. Due to the photoionization both protonated and deprotonated matrix ions are generated from intermolecular matrix reactions. Then the charge-transfer process leads to the formation of analyte ions which can be abbreviated as

[analyte+H]+ and [analyte-H]- ions. According to the “lucky survivor” model, some analytes, such as proteins are incorporated into the matrix as pre-charged species. Upon desorption, crystal lattice breaks-up into larger and smaller clusters. Some of the analyte- 9

containing clusters carry one or more excess protonated or deprotonated matrix ions.26 Due to the cluster dissociation, protonated and deprotonated analyte ions are generated. “Lucky survivors” are those newly formed singly charged analyte ions. This model well explains the observation of singly charged ions in MALDI.

MALDI is a pulsed ionization technique which produces ions in packets. This pulsed nature is well suited to couple MALDI with a time-of-flight analyzer. Some of the common UV lasers used in MALDI are N2 lasers (λ=337 nm) and Nd:YAG lasers (λ=266 or 355 nm).23 It is considered that the most important steps in MALDI analysis are matrix selection and sample preparation.23 For proteins and peptide analysis three matrices are commonly used, such as α-cyano-4-hydroxycinnamic acid (CHCA), 2,5-dihydroxybenzoic acid (DHB) and 3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid or SA).

Some of the favorable attributes associated with MALDI are the high levels of sensitivity even for femtomole amounts of some analytes, quick analysis and relatively high tolerance to salts and buffers.24 A drawback of MALDI involves high chemical noise at low mass range (below 500 Da) due to the presence of matrix ions.24

1.5.2 Electrospray Ionization (ESI)

In ESI electric energy is used to help transferring the ions from the solution phase to the gas phase. This process is achieved in three steps.27 First, a continuous flow of sample solution is sprayed in a strong electric field at a high voltage (e.g., 2.5-6.0 kV) through a stainless steel or fused silica capillary tube. A mist of highly charged droplets with the

10

same polarity will be generated at this point.27 The charged droplets pass down pressure and potential gradients toward the analyzer. Second, due to the high temperature in ESI source and the continuous flow of nitrogen gas, the size of charge droplets gets reduced by solvent evaporation. The third step is the ion ejection. When the electric field strength inside the droplet reaches a critical value, the ejection of ions occurs into gaseous phase.

This is considered as a kinetically and energetically feasible process. Under these conditions, most of the ionizable components will carry not only single proton but also more protons and show up as multiple charged ions.25 The mechanism of ESI is illustrated in Figure 1-6.

Figure 1-6 The mechanism of electrospray ionization. (adapted from Ho et al.)27 In early days ESI was frequently used for proteins, more specifically to detect their molecular weights. Later on, its use was extended to polymers and small molecules. The

ESI mass spectra of large molecules with several ionizable sites show multiple charged ions.

ESI has several advantages that allow its application for the analysis of biological molecules.23,24 First, the multiply charged ions enable the analyses of high-molecular 11

weight compounds in the working mass range that is lower than the actual mass. Second, its feasibility for coupling to liquid chromatography separation techniques is important. In addition, as this is a very soft ionization technique, detection of protein-protein complexes, protein-drug complexes and DNA-drug complexes are also possible.24

1.6 Mass Analyzers

The actual physical property that is measured by a mass analyzer is ion’s m/z value.23 This measurement is based on different principles such as momentum, velocity

(i.e, flight time) or resonance frequency of an ion.23 Different types of mass analyzers are used in MS such as magnetic sector, quadrupole, ion trap, time-of-flight (TOF), Fourier transform ion cyclotron resonance (FT-ICR) and Fourier transform orbitrap (FT-OT).

They can be divided into two categories based on the way they transmit ions; scanning analyzers and simultaneous transmission analyzers. The scanning analyzers transmit ions periodically allowing only certain masses to travel across mass analyzer and finally be detected. Two such analyzers are magnetic sector and quadrupole. In simultaneous transmission analyzers such as TOF, ion trap, FT-ICR and FT-OT instruments, ions pass through simultaneously towards the detector.

The five main characteristics of a mass analyzer are the mass range, analysis speed, transmission of ions, mass accuracy and resolution.23 The mass range expresses the limit of m/z measurement of the analyzer. The analysis speed is the rate at which the analyzer measures ions over a given mass range and is expressed in mass units per second (us-1) or

12

per millisecond (ums-1). The next characteristic is so called the ion transmission and is related to the ratio of number of ions reaching the detector and the ions entering the mass analyzer. Mass accuracy is another important term which is defined as the difference between theoretical m/z and the measured m/z. This is often expressed in parts per million

(ppm). The ability to resolve two peaks with small m/z difference is defined in terms of resolution or the resolving power. The comparison of several mass analyzers discussed in this chapter are summarized in Table 1.3.

Table 1.3 Summary of main characteristics of different mass analyzers. The Thomson (symbol: Th) is a unit of mass-to-charge ratio.

Mass spectrometers with more than one mass analyzers coupled together have been introduced to increase the versatility and to perform multiple experiments.23 Some of the common hybrid instruments are quadrupole-TOF, ion trap-FT-ICR, triple quadrupole and

LIT-orbitrap.

1.6.1 Quadrupole Mass Analyzer

Quadrupole was the typical choice for gas chromatography MS (GC-MS) and liquid chromatography MS (LC-MS) in 1990s.24 The principle of mass separation by quadrupole is the motion of ions in a dynamic radio frequency (RF) electric field and is directly related

13

to m/z of ions.24 In quadrupole mass analyzer four parallel metal rods are kept at an equal distance (Figure 1-7).27 Each pair of rods is connected electrically. An equal and opposite

DC voltage overlapped with a radio frequency AC voltage is applied for each pair of rods which are placed diagonally. Due to the resulting electric field, the ions travel toward z direction with oscillatory motion in the x-y plane.27 The amplitude of oscillation is related to the m/z of ions. Therefore, mass analysis is a function of RF voltage and DC voltage applied to rods.24 An attractive feature of quadrupole is that it can act as a filter which can be tuned to filter the desired m/z range by controlling the amplitude of oscillation by changing the DC and RF voltages. These voltages can be set to make only preferred m/z ratios to reach the detector without the hitting the quadrupole rods, so called the “stable” motion.27 On the other hand, undesirable m/z ratios can make “unstable” to allow them to hit the metal rods and get neutralized to prevent them reaching the detector.27 Depending on the physical parameters of quadrupole, its upper m/z limit can be 4000 and the mass accuracy is hundreds of ppm.

Figure 1-7 Schematic diagram of quadrupole mass analyzer (adapted from Ho et al.)27

14

1.6.2 TOF Mass Analyzer

A TOF is a vacuum chamber which separates ions according to their velocities. The simplest form of TOF analyzer is the linear TOF shown in Figure 1-8. When TOF is coupled to MALDI ion source, theoretically all ions are expelled from MALDI at the same time. Then, they are accelerated towards the field-free region called TOF drift tube through a fixed potential (e.g. 1-20 kV).23,24 All ions with the same charge acquire the same kinetic energy after acceleration. When they enter the field-free region, lower m/z ions achieve higher velocities than higher m/z ions.

Figure 1-8 Schematic diagram of linear TOF mass analyzer. The size of the circle indicates the molecular weight of the ion (adapted from de Hoffmann and Stroobant.)23 The field free region is typically 0.5-2.0 meters in length and all ions travel through this fixed distance. Therefore, m/z values are determined by measuring the time that ions take to reach the detector.23,24 This relationship can be explained in terms of mathematical equations as shown below.23,26

Kinetic energy of an ion is expressed as Ek, where z is the charge state of ion, e is elementary charge and U is the potential difference;

15

Ek= zeU

The kinetic energy of the ion can be expressed in terms of velocity v and the mass of ion m,

2 1/2 1/2 Ek= ½ mv or v= (2Ek/m) = (2zeU/m)

The arrival time t will be; t = L/v = L/(2zeU/m)1/2, where L is the flight tube length.

Therefore,

m/z= 2eU (t/L)2 and m/z is proportional to t2.

TOF requires a pulse of ions to be compatible with ionization methods. With the development of pulsed-ionization techniques such as MALDI, TOF is widely used for variety of MS applications. The best performance can be obtained on TOF instruments that include a reflectron.24 A reflectron is actually a series of ring electrodes creating a constant electric field that slows down the ions and turns them around before they reach the detector.26 The reflectron is able to correct TOFs of the ions having the same m/z values with slightly different velocities.25 This method also improves the mass resolution. The linear TOF is mainly used for the analyses of proteins while reflectron TOF is useful in peptide analyses.25 Another way to improve the resolution is by delayed pulsed extraction.23,25 In this method, a time delay between ion formation and extraction is introduced. When ions are ejected from the plate, they exhibit range of velocities. After a certain time from the initial ion ejection, an extraction voltage pulse is applied from a few 16

tens to a few hundreds of nanoseconds to extract ions from the source.23,25 Therefore, less energetic ions receive more kinetic energy and join the more energetic ions of the same mass. This corrects the kinetic energy spread of ions leaving the source and thus improves the resolution.23,25 The schematic diagram of a TOF anlayzer fitted with a reflectron is shown in Figure 1-9.

Figure 1-9 An illustration of TOF fitted with a reflectron mass analyzer (adapted from Eidhammer et al). 25 The two spheres represent two ions having same mass and charge, but with slightly different kinetic energies.

1.6.3 Ion Trap Mass Analyzer

As the name implies ion trap uses an oscillating electric field to trap ions. Ion trap analyzers can be categorized into 2D and 3D ion traps. The 2D ion trap is known as the linear ion trap (LIT) and the 3D ion trap is called the quadrupole ion trap (QIT).

Commercial ion traps have mass range and resolution similar to quadrupole.24 The basic set up of QIT is shown in Figure 1-10.

It consists of the ring electrode, the entrance end cap electrode and the exit end cap electrode.27 These three electrodes make a cavity which makes it possible to store and 17

analyze ions.27 Ions produced from the ionization source enter the cavity. RF potential

(constant frequency and variable amplitude) is applied to the ring electrode to produce electric filed in all three dimensions, x, y and z.23 Then different voltages are applied to eject ions depending on the m/z values of ions. The trapping voltage and m/z value affect the nature of ion trajectory.27 To detect the ions, electrode system potential is altered so that the ions eject in one direction according to the increasing order of m/z.

Figure 1-10 Schematic illustration of 3D ion trap mass analyzer (adapted from Ho et al.)

27

The LIT is comprised of a four-rod quadrupole, ending in lenses that repels ion inside the rods (Figure 1-11).23,28 Both RF potential and DC voltages are essential to trap the ions.23,28

If the same repelling voltages are applied at each end, ion cloud will be squeezed at the center of quadrupole.23 Ion ejection occurs selectively by applying an appropriate RF potential either along the axis of the trap (axial ejection) or perpendicular to its axis (radial ejection).23

18

Figure 1-11 Schematic diagram of linear ion trap mass analyzer. (adapted from Schwartz et al.) 28

1.6.4 Orbitrap Mass Analyzer

As the name implies, orbitrap is an ion trap, but the concept of trapping ions is not similar to the conventional ion trap analyzer.29 As illustrated in Figure 1-12, orbitrap contains an outer electrode that has the shape of a barrel split into two parts separated by a small gap.23,29

Figure 1-12 Schematic diagram of orbitrap containing a central electrode (a) and an outer electrode (b). The outer electrode is split into two parts and separated by an insulating ceramic ring (c). Reprinted with permission from reference 29.

19

The central electrode has a spindle shape. Ions are moving around the central electrode while oscillating back and forth along the z axis.23,29 The image current generated by the axial motion of these ions is detected by the detector and the signal is Fourier transformed to obtain a high resolution mass spectra.23,29,30 A Fourier transform is designed to obtain oscillation frequencies for ions with different m/z values.29

1.7 Proteins and Proteomics

Proteins are a diverse and abundant class of biomolecules in cells which catalyze certain biochemical reactions, regulate gene expression and act as structural components.25

They are composed of amino acids linked head to tail through the formation of peptide bonds. There are 20 different naturally occurring amino acids, containing a carboxyl group, an amino group, a central carbon atom and a side chain. A short chain of amino acids is usually referred to as a peptide, which has a structure as shown in Figure 1-13. The

25 alignment of atoms -N-Cα-C-N- is called the peptide backbone. The free amino group

(NH2) in a peptide chain is considered as the amino-terminal or N-terminal while the other end is named as carboxy-terminal or C-terminal. Each amino acid is abbreviated by a one- letter code and by convention amino acid sequence in a peptide is written from N-terminus to C-terminus.25

20

Figure 1-13 The basic structure of a peptide (adapted from Eidhammer et al.)25 Proteomics is the study of subsets of proteins in an organism and how they change with time and varying conditions.25 This is generally based on large-scale determination of gene and cellular functions at the protein level.31 Various technical approaches contribute to proteomics type research, such as cell imaging by microscopy techniques, MS techniques and DNA microarray experiments. Recent successes in proteomics-based applications illustrate the role of MS approaches as a useful tool.29,31–33,6,34–36

1.7.1 Protein separation by gel electrophoresis

Gel electrophoresis is the most common technique for protein separation. It can be done based on molecular mass and/or isoelectric point. Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is the technique for protein separation based on molecular mass.25 In this technique acrylamide is used as the separation medium.

Once this polymer is formed, it turns into a porous gel. Since this gel contain certain pore sizes, small molecules move through them much more readily than the large ones.

Therefore, the speed of migration is inversely proportional to the size of the molecule. The

21

protein masses can be estimated by comparing a standard sample containing molecules of known masses.

Naturally, proteins exist as 3D structures. They must be denatured into linear form to run in a gel. Then they are treated with a detergent such as SDS to introduce a net negative charge. The resulting denatured and negatively charged proteins are loaded onto the gel. When gel is placed in an electric field, negatively charged proteins migrate towards positive electrode.

1.7.2 Protein digestion

One of the important parts of protein identification process is the cleavage of a protein into peptides which is known as the protein digestion.25 This can be done chemically or enzymatically. The most common approach is the enzymatic digestion using proteases.25 Trypsin has been predominantly used due to its high specificity and widespread availability.32 Six other alternative proteases for enzymatic digestion of proteins are chymotrypsin, LysC, LysN, AspN, GluC and ArgC.32 Each enzyme has its distinctive specificities to generate different sets of peptides. Therefore, complementary parts of a protein sequence can be discovered by using multiple in parallel.32,35

1.8 Mass Spectrometry-Based Proteomics

With the technical advancement of MS instrumentation, MS-based proteomics has become increasingly popular for the analysis of complex protein samples. So far, MS has been successful for the identification of post-translational modifications (PTMs) and

22

protein profiling. 31 As illustrated in Figure 1-14 there are two proteomics approaches that can be identified by the level at which analysis takes place.

In top-down experiment, purified proteins are detected and directly fragmented to obtain intact protein mass and amino acid sequence. Due to the large size of analytes, high resolution MS instruments, such as orbitrap MS, are needed to acquire the data.37 However, majority of proteomics experiments are based on the analysis of peptides after subjecting them to enzymatic digestion. This method is referred to as bottom-up approach.37 ESI-ion trap, ESI-QTOF and MALDI-TOF/TOF are the most widely used instruments for this purpose.37

Figure 1-14 Mass spectrometry based proteomics approaches (adapted from Switzar et al.)37

23

1.8.1 Tandem Mass Spectrometry (MS/MS)

This method involves two or more stages of mass analysis and provides additional information about specific ions. The most common way of achieving this is to combine two mass analyzers.23,38. The first analyzer isolates the precursor ion, which then undergoes fragmentation to yield product ions. These product ions analyses are done by second analyzer. It is also possible to increase the number of steps to perform MSn experiment where n represents the number of generations of ions being analyzed.23 Some of the ion fragmentation methods use for the analysis of peptides and proteins include collision- induced dissociation (CID), high-energy collision dissociation (HCD) and electron transfer dissociation (ETD).23,39 The success of most proteomics experiments is the choice of fragmentation method.

CID is most commonly combined with ESI. Fragmentation in CID is performed in a separate part of the mass spectrometer, named as collison cell, which contains an inert gas.23,25 In this method ion of interest is selected and sent into the collision cell. When it collides with neutral gas atoms, kinetic energy of parent ion is transformed into vibrational energy which then redistributes over the parent ion.23,25,39 Due to this energy, ion dissociation occurs at amide bonds and other bonds within the peptide, generating different fragment types.

In HCD fragmentation, higher activation energy and shorter activation time is used compared to CID.39 HCD leads to the production of more fragment ions. This method has

24

been applied for de novo sequencing and PTM studies.39 ETD fragmentation occurs by transfering electrons to higher charge state precursor ions.40,41 It leads to the cleavage of

N-Cα bond along the peptide backbone, generating c- and z-type fragment ions

(nomenclature of fragment ions is described in section 1.5.2). Combination of two or more fragmentation methods provides comprehensive information for protein sequencing.42

1.8.2 Peptide Fragmentation

In peptides, fragmentation occurs mainly along the peptide backbone. The resulting fragments are called backbone fragments. Both singly and multiply charged ions are feasible for fragmentation. One major advantage of multiple charged precursor ions is that they tend to fragment easier and yield more fragment ions per fragmentation event.25 The accepted nomenclature for the fragment ions is the Biemann nomenclature as illustrated in

Figure 1-15.43 Briefly, different types of ions can be observed due to the fragmentation.

They will be detected if they carry at least one charge. If the charge is retained on N- terminal part of the peptide, ion is classified as either a, b or c. If the charge is retained on

C-terminal part of the peptide, ion is either x, y or z. Subscript designates the number of residues in the peptide.

Figure 1-15 Biemann nomenclature for peptide fragmentation (adapted from Eidhammer et al.)25 25

Other important fragment types are immonium fragments and side chain fragments.25 Immonium fragments are formed by a combination of a-type and y-type fragmentation. It is observed at m/z of 27 Da less than the residue mass of an amino acid.

A list of immonium ions commonly found as strong signals in MS is shown in Table 1.4.

Side chain fragmentation occurs due to the additional fragmentation of amino acid side chain. The common ions are abbreviated as d, v and w ions. Partial side chain fragmentation of both a ions and z ions form d ions and w ions, respectively. Complete side chain fragmentation of y-ions is responsible for v ion formation. The concept of side chain fragmentation is useful to differentiate between leucine and isoleucine residues.

Table 1.4 Masses of most common immonium ions.23

1.8.3 Peptide Mass Fingerprinting (PMF)

There are several ways of using MS data for protein/peptide identification and one of them is known as PMF. In this method protein is digested with an enzyme and the resulting peptide mixture is analyzed by MS. The observed mass values of peptides are

26

searched against a database of protein sequences using a search engine.33,34 A scoring system is used to identify the best matches. MASCOT is one of the popular search engines.34 An overview of PMF workflow is provided in Figure 1-16.

Figure 1-16 Overview of PMF for protein identification (adapted from Eidhammer et al.)25 When performing PMF, several parameters should be specified by the user such as enzyme, fixed and variable modifications, missed cleavages, mass error threshold and the preferred database.

1.8.4 Protein Sequencing Techniques

As reported in the literature there are four methods to determine the amino acid sequence of a protein.44 Two direct methods are Edman sequencing45 and tandem MS.25,6,36

It is also possible to obtain amino acid sequences indirectly by mRNA or cDNA sequencing46 and X-ray crystallography.47

The traditional technique for sequencing of proteins is the Edman degradation. It involves sequential identification of amino acids from N to C termini. The first step of

Edman sequencing is to remove the N-terminal amino acid from a peptide while leaving

27

the rest of the chain intact.25,45 The removed amino acid is identified in the next step. This process is repeated sequentially to obtain the N-terminal sequence of the protein/peptide.

Thereby, approximately 40-50 amino acid residues can be determined during one sequencing run.25,45

The use of crystallography method is greatly limited by the resolution of diffraction data and the quality of protein crystals.44 In addition, several amino acids (e.g. glutamine and glutamic acid, asparagine and aspartic acid, threonine and valine) have residues which are similar in shape and give rise to similar electron density maps.44 As a result, the use of

X-ray crystallography depends on availability of prior sequence and structural information.44

Tandem MS is an efficient method to obtain primary sequence information. There are different approaches to deduce peptide/protein sequences by this method. Most common methods include the use of bottom-up approach with multi enzyme digestions, use of different fragmentation/ion activation methods and de novo sequencing.6,36

1.8.5 De novo Sequencing of Peptides

This method is used to derive original peptide sequence from MS/MS spectra without any prior knowledge of amino acid sequence information.25 Historically, it was thought to be slow and to require spectra with high mass accuracy. However, with the recent developments of computer algorithms and mass spectrometers with high mass accuracy, de novo sequencing become a viable choice for proteomics research.6,48

28

SEQUEST is one such algorithm which uses the database to find the best sequence that matches with the experimental spectra.23 Briefly, these algorithms search for all the peptides in the database having similar mass to experimental precursor ion. Then the predicted fragments for the search results and the actual peptide are compared. Based on the results, the most probable sequence is proposed.23 However, this approach is not reliable for the sequencing and identification of proteins/peptides which are not available in the database.6

Instead new algorithms have been invented and PEAKS is one of them.6,36,48 It has the capability to extract amino acid sequence information without the use of database.

PEAKS perform de novo sequencing directly from MS/MS spectra and computes best possible sequence from all possible amino acid combinations.6 This method can be summarized in four steps:6 1) preprocessing, 2) candidate computation, 3) refined scoring, and 4) confidence scoring.

The first step is to filter noise and deconvolute multiply charged ions to singly charged ions in raw MS/MS data. In the second step, 10 000 possible sequences among all possible combinations of amino acids for a given precursor ion are calculated. The third step consists of re-evaluating above sequences by another scoring scheme and selecting the best candidates. User specified parameters play a major role in this step. During the final step, confidence score for each of the top-scoring peptide sequences are output.6,48 This provides the opportunity for the user to select reliable and correct peptides which are related to the protein of interest. In addition, PEAKS provides an option to perform database search 29

for raw MS/MS data.48 This is different to the conventional database search approach.

Instead, it relies heavily upon de novo sequencing results.

Furthermore, PEAKS assign a local confidence score for each amino acid in de novo sequences. This value varies from 0% to 99%, indicating how confident a particular amino acid is correctly sequenced. A color coding is also used to represent the confidence scores such as red is for confidence score greater than 90%, purple is for 80%-90%, blue for 60%-80% and black is for confidence score of below 60%. Moreover, PEAKS provide an average local confidence (ALC) score for peptide sequences. ALC is the average value of local confidence score of all amino acids in a peptide. This feature in PEAKS is useful to filter out low quality spectra with poor sequences in which residues are incorrectly identified when performing de novo sequencing.

1.9 MS-based leucine (Leu) and isoleucine (Ile) discrimination approach

Leu and Ile residues are considered to be indistinguishable by MS as their masses are exactly the same (i.e., 113.0840 Da). During the last three decades, considerable efforts have been taken to propose a MS approach for this.49 Lebedev et al. recently developed a

ETD-HCD MS3 method using an Orbitrap Fusion mass spectrometer.50 Later, Xiao et al. demonstrated multistage MS approach so called HCD MSn using the same instrument.51

Therefore, these two approaches can be implemented to distinguish between Leu and Ile residues in proteins and peptides as well as for de novo sequencing.

30

According to this new approach, HCD-MS3 approach can be used for peptides with single Leu or Ile residues. First, the precursor peptide ion is isolated from the MS spectrum and fragmented to obtain the MS2 spectrum. In this step, 86 Da immonium ion peak related to Leu and Ile should appear in MS2 spectrum. Further fragmentation of 86 Da ion results the 69 Da ion related to Leu-NH3 or Ile-NH3. The relative abundance of 69 Da ion can be used to differentiate between Leu and Ile as shown in Figure 1.17.

When peptides have multiple Leu and Ile residues, ETD-HCD MS3 method is recommended (Figure 1-18).49,51 This method starts with ETD fragmentation of selected peptides having Leu and Ile. The observed z-ions related to Leu and Ile are separated from

2 MS spectrum and fragmented by an additional HCD step. The characteristic loss of C3H7

(-43 Da) or C2H5 (-29 Da) is responsible for the formation of w-ions from above selected z-ions, which allows to distinguish Leu and Ile.

Figure 1-17 Differentiation between Leu and Ile based on the abundance of 69 Da ion. Reprinted with permission from reference 51.

31

Figure 1-18 Differentiation between Leu and Ile based on the formation of w ions. Reprinted with permission from reference 51. 1.10 Mass Spectrometry-Chromatography Coupling

The combination of high-performance liquid chromatography (HPLC) and MS is another powerful analytical technique for the analysis of complex mixtures. LC separates the sample components and then introduces them to the mass spectrometer to detect charged ions. The HPLC-MS data can be used to determine the information about the molecular weight, structure, identity and quantity of specific sample components.

Reversed-phase HPLC (RP-HPLC) is one of the most important technique for proteomics research. The basis of the LC separation is adsorption/desorption process. In RP chromatography, particle surface is hydrophobic due to the presence of hydrocarbon groups. Proteins and hydrophobic peptides retain in the column by adsorbing to hydrophobic surface. When increasing the organic solvent concentration, proteins/peptides desorb from the surface and elute from the column.

32

1.11 Ion mobility spectrometry (IMS)

Ion mobility is a technique to study the shape and conformation of small molecules and proteins in gas phase.52,53 This technique is coupled to MS which is abbreviated as

IMS-MS. For example, a commercially available IMS-MS instrument is the Synapt

HDMS, composed of quadrupole, ion mobility cell and TOF.54 The principle behind this technique is the separation of ions based on their shape, charge, and size (Figure 1-19).

After the initial ionization process in MS, ions are injected into a region containing a neutral gas (also named as buffer gas). They undergo mobility separation under the influence of an electric field and separate according to their m/z. Ion having higher charges travel fast due to the high field strength compared to ions with lower charges. In addition, large ions experience more collisions with gas molecules. Therefore, more time is needed for them to reach the detector in contrast to small ions. When considering the shape of molecules, compact folded structures easily travel through the buffer gas as the probability of collisions are less compared to unfolded structures.

Figure 1-19 Separation of biomolecules by IMS-MS. Reprinted with permission from reference 55.

33

1.12 Sequencing of immunoglobulin light chains by MS

Previously, a combination of enzymatic digestion and Edman degradation was used to determine the sequence information of light chains.56 In addition, electron-density maps obtained by X-ray crystallography were used to compare the sequence similarities of light chains.47 mRNA prepared by isolating bone marrow cells is another method to obtain sequence information.57 Most recently, MS based approaches with fragment ion spectral interpretation and de novo sequencing of peptides provided useful sequence information related to immunoglobulins.56 Additionally, several efforts were made to characterize light chains in serum by MS.13,58 Bergen et al. used ESI-MS to identify monomers and dimers of κ light chains.13 Mills et al. established a MS based method to detect low levels of immunoglobulin light chains in serum.58 Recently, Barnidge et al. reported several methods to monitor light chains in serum and urine by MALDI-MS and ESI-MS.59,60,61

This study focuses on the de novo sequencing assisted database search for peptide identification of disease related light chain proteins shown in Table 1.5. Each light chain, abbreviated by a three-letter or five-letter code, was isolated from a patient with MM or/and

AL amyloidosis.

34

Table 1.5 Disease related light chains isolated from patients having MM and AL.62,63

1.13 Circular Dichroism (CD) Spectroscopy

CD measures the difference in the absorption of left (L) and right (R) handed circularly polarized light. This effect occurs when a molecule contains one or more chiral chromophores. In a CD spectrometer, plane polarized light is split into L and R circularly polarized components by passing through a modulator.64,65 Then they are subjected to an alternating electric field. If sample absorbed one of the two components, the resultant radiation would be elliptically polarized and detected by a photomultiplier.65 Therefore, the

CD instrument display the dichroism at a given wavelength of radiation as either difference

64 in absorbance of two components (A=AL-AR) or as ellipticity (θ) in millidegrees.

CD is an excellent method to determine secondary structure of proteins.66 In proteins, the chromophores of interest include peptide bonds and aromatic amino acid side chains. Absorption below 240 nm is due to peptide bonds (a weak and broad n to * transition around 220 nm and a more intense  to * transition around 190 nm).65 Aromatic

35

amino acids show absorbance in the range of 250-320 nm.65 The Figure 1-20 illustrates far-

UV CD spectra associated with various types of protein secondary structures.64

Figure 1-20 The different types of secondary structure found in proteins: solid curve, α- helix; long dashes, anti-parallel β-sheet; dots, type I β-turn, dots and short dashes, irregular structure. Reprinted with permission from reference 64.

1.14 Dynamic light scattering (DLS)

Light scattering measurements are used in biochemistry to detect aggregates in macromolecular solutions, to determine size of macromolecules or to monitor the binding of ligands.67 In DLS, laser light scattered by particles in a solvent undergoing Brownian motion is detected in the direction of scattering angle θ.68 DLS provide measurements related to diffusion coefficient of macromolecules in solution and according to Fick’s law, diffusion coefficient relates to the concentration gradient of a solute in a solvent along an 36

axis with the flux across a 1 cm3 area.67 Two characteristics can be derived from diffusion coefficient; mean hydrodynamic radius and polydispersity. Mean hydrodynamic radius is calculated assuming particles have simple geometry like that of a sphere.67 The scattering signals are analyzed with a correlator to find this value. Polydispersity is given by the standard deviation on diffusion coefficient. DLS is also known as photo correlation spectroscopy or quasi-elastic light sacterring.69

37

Chapter 2

2 Materials and methodology

2.1 Materials and instruments

HPLC-grade water and acetonitrile were obtained from Fisher Scientific

(Pittsburgh, PA). Ammonium bicarbonate, ammonium peroxydisulfate, dithiothreitol

(DTT), iodoacetamide (IAM), 2-mercaptoethanol, trifluoroacetic acid (TFA), formic acid, sodium chloride, potassium chloride, disodium hydrogen phosphate, potassium dihydrogen phosphate, lysozyme, bovine (BSA), 2,5-dihydrobenzoic acid (DHB) and

α-cyano-4-hydroxycinnamic acid (CHCA) were purchased from Sigma-Aldrich (St. Louis,

MO). Neurotensin (8-13) peptide was obtained from AnaSpec (Fremont, CA). Proteases, such as trypsin, chymotrypsin and endoproteinase glu-C, were obtained from Sigma-

Aldrich (St. Louis, MO).

For SDS-PAGE preparation, bio-safe Coomassie Brilliant Blue G-250 stain,

Laemmli sample buffer, 10% (w/v) SDS, 30% acrylamide/bis solution (29:1), 1.5 M Tris-

HCl (pH 8.8), 0.5 M Tris-HCl (pH 6.8) and TEMED were obtained from Bio-Rad

(Hercules, CA). SDS-PAGE setup was from Bio-Rad and overnight incubation was done using Isotemp water bath (Model 215, Fisher Scientific). For protein purification, Slide-A-

Lyzer dialysis cassettes (MWCO of 7 kDa, volume range 0.5-3 mL) and syringe filters (0.2

μm) were purchased from Thermo Fisher. Sep-Pak C18 cartridges from Waters (Milford,

38

MA)) were used for solid-phase-extraction (SPE). Purified human lambda light chain protein sample was purchased from Bio-Rad.

For MALDI MS, Bruker Daltonics UltrafleXtreme MALDI-TOF/TOF (equipped with pulsed 355 nm Nd:YAG laser) was used to analyze samples. Waters Synapt high definition mass spectrometer (HDMS) equipped with a nano-ESI source was used for direct

ESI-MS analyses. HPLC system (Shimadzu) coupled to Orbitrap Fusion (Thermo Fisher) was used to perform HPLC-ESI-MS/MS. Flex Control (version 3.4, Bruker Daltonics),

Mass Lynx (version 4.1, Waters) and Xcalibur (version 3.0, Thermo) were used for data acquisition. MS data analysis was done by flex Analysis (version 3.4, Bruker Daltonics),

Bio Tools (version 3.2, Bruker Daltonics) and Intact Mass software (Protein Metrices). For the protein sequence analysis, Byonic (Protein Metrices), Biologic (Protein Metrices) and

PEAKS Studio (version 8.5, Bioinformatics Solutions Inc.) were used. Ion mobility data were visualized and processed using the Driftscope module (version 2.0, Waters).

DLS experiments were conducted with a dynamic light scattering instrument

(Litesizer model 500, Anton-Paar) and CD spectroscopy measurements were obtained from AVIV circular dichroism spectrometer (model 62DS).

2.2 Methodology

2.2.1 Sample preparation

The immunoglobulin light chain protein samples were obtained from our collaborators Dr. Leif Hanson and Dr. Allen Edmundson. These proteins have been isolated

39

using an already establish method.20,63 Briefly, the urine samples were obtained from patients having Multiple myeloma and/or AL amyloidosis. The proteins in each urine sample were precipitated by adding solid ammonium sulfate to 75% saturation. The precipitate was recovered by centrifugation (5000 RPM, 20 minutes).20,63 The pellet was washed twice or thrice with 75% ammonium sulfate. The resulting ammonium sulfate- protein paste was stored at -10 °C.20,63

In our study, these immunoglobulin light chains were used for sequence and structural analyses by MS, DLS and CD. MALDI-MS and ESI-MS were performed for molecular mass determination of light chain proteins. To obtain sequence information, in- gel enzymatic digested samples were subjected to MALDI-MS, ESI-MS/MS and HPLC-

ESI-MS/MS. Moreover, DLS and CD experiments were performed to obtain structural information about light chains.

2.2.2 Separation of proteins by SDS-PAGE

For SDS-PAGE, 50 µL of protein sample was combined with 2.5 µL of 2- mercaptoethanol and 47.5 µL of Laemmli sample buffer. This mixture was heated to 90 °C for five minutes to denature the proteins. After the sample was cooled to room temperature,

20 µL of sample was loaded into wells. The gel was run at a 180 V for about 45 minutes.

The proteins were visualized by staining with Coomassie Blue. This process involved rinsing the gel with deionized water, incubation the gel with stain for one hour and

40

destaining with water. Finally, the stained gel was scanned using a desktop scanner at its maximum resolution

2.2.3 Molecular mass determination of immunoglobulin light chain proteins and analysis of their ions in the gas phase

For MALDI analysis, 1 µL of each protein sample was mixed with 1 µL of 10 mg/mL DHB in acetonitrile: water (50:50, v/v) containing 0.1% TFA and spotted on the

MALDI plate. Sample was analyzed using linear positive ion mode in the m/z range of

10000-80000. For ESI-MS, protein samples were purified by dialysis. Each purified sample was collected, and dried using a vacuum centrifuge (Eppendorf). Dried sample was re-dissolved in acetonitrile: water (50:50, v/v) containing 0.1% formic acid. For ESI-MS, the nano-ESI ion source was used with capillary voltage of 3 kV and sample flow rate of

500 nL/min. Mass spectrum was obtained in the m/z range of 500-4000. The Mass Lynx and Intact Mass tools were used for the data acquisition and to obtain the deconvolved mass spectrum of each protein, respectively.

Light chain protein samples were further analyzed by IMS-MS to obtain their properties in gas phase. Samples were dissolved in acetonitrile: water (50:50, v/v) containing 0.1% formic acid and introduced to nano-ESI source by direct infusion. The synapt HDMS instrument was operated in ESI positive ion mode with a capillary voltage of 1.0 kV, cone voltage of 30 V and source temperature of 80 ⁰C. The ion mobility cell containing nitrogen was operated at an indicated pressure of 0.55 mbar with the sample flow rate of 500 nL/min. Data analysis was performed using MassLynx (version 4.1,

41

Waters). Ion mobility data were visualized and processed using the Driftscope module

(version 2.0, Waters).

2.2.4 In-gel enzymatic digestion of immunoglobulin light chain proteins

The observed protein bands related to light chains (~23 kDa) were subjected to in- gel digestion as described by Shevchenko et al.70 Briefly, the bands were excised from the gel, cut into ~ 1 mm3 cubes and placed into microcentrifuge tubes. The gel pieces were destained using neat acetonitrile for 10 minutes. When the gel pieces become opaque add stuck together, all the solvent was removed. Then, 50 μL of 10 mM DTT in 100 mM ammonium bicarbonate was added to completely cover the gel pieces and incubated at 56

°C for 30 minutes in a heat block. Samples were then cooled to room temperature with acetonitrile (500 μL) and incubated for 10 minutes. Later, all the solvent was removed and

50 μL of 55 mM IAM in 100 mM ammonium bicarbonate was added. Sample tubes were incubated for 20 minutes at room temperature in the dark. After 20 minutes, acetonitrile

(500 μL) was added and samples were incubated for another 10 minutes. At this point, all the liquid was removed from the tube. For the enzymatic digestion, 50 μL 13 ng/μL trypsin in 10 mM ammonium bicarbonate was added to completely cover the gel pieces. The samples were then placed in the fridge to allow the trypsin to saturate the gel pieces. After two hours, 100 mM ammonium bicarbonate (15 μL) was added, and the tubes were transferred for overnight (~16 hours) incubation in water bath at 37 °C. The same procedure was repeated for two other enzymes: chymotrypsin and glu-C.

42

2.2.5 Identification of peptides in Mcg and Sea proteins by MALDI-MS, ESI- MS/MS and IMS-MS

For MALDI-MS, 1 μL of each protein digest was co-spotted with saturated CHCA matrix in acetonitrile: water (70:30, v/v) in 0.1% TFA onto MTP 384 round steel target plate. MALDI spectrum was obtained using the reflectron positive ion mode in the m/z range of 600-5000. The flex Analysis (version 3.4, Bruker Daltonics) and Bio Tools

(version 3.2, Bruker Daltonics) were used for the data analysis. A MASCOT (Matrix

Science) search was performed to identify possible peptide hits related to immunoglobulin using Swiss-Prot database. The taxonomy was selected as Homo sapiens with one/two missed cleavages and with 0.2 Da mass tolerance. Carbamidomethylation of cysteine was set as the fixed modification and oxidation of methionine as the variable modification.

For ESI-MS, a volume of 50 μL of each protein digest was mixed with 50 μL of acetonitrile:water (50:50, v/v) in 0.1% FA. The sample was injected into nano-ESI source with capillary voltage of 3 kV at the flow rate of 500 nL/min. The ESI-MS spectrum was acquired in the m/z range of 100-2000. The Mass Lynx (version 4.1, Waters) was used for the data analysis. The experimental data were compared with the theoretical peptides obtain by in-silico digestion. To obtain in-silico digestion results, MS-Digest search was performed for the reported Mcg and Sea protein sequences using Protein Prospector

(http://prospector.ucsf.edu/prospector/mshome.htm) proteomics tool. Furthermore, ESI-

MS/MS was performed to confirm the precise order of amino acid sequences in each peptide. One precursor ion was selected at a time and fragmented under the specified

43

collision energy value. The resulted MS/MS spectra were analyzed manually to assign the possible amino acid residues.

For IMS-MS, a volume of 50 μL of each protein digest was mixed with 50 μL of acetonitrile:water (50:50, v/v) in 0.1% FA and introduced to nano-ESI source by direct infusion. The instrument was operated in ESI positive ion mode with a capillary voltage of

1.0 kV, cone voltage of 30 V and source temperature of 80 ⁰C. The ion mobility cell containing nitrogen was operated at an indicated pressure of 0.55 mbar with the sample flow rate of 500 nL/min. Data analysis was performed using MassLynx (version 4.1,

Waters). Ion mobility data were visualized and processed using the Driftscope module

(version 2.0, Waters).

2.2.6 Preparation of peptides for HPLC-MS analysis

For HPLC-MS, enzyme digested protein samples were prepared as described in section 2.2.4. Then, the peptide extraction was done by following the protocol described by Shevchenko et al.70 Briefly, 100 μL of 5% FA:acetonitrile (1:2, v/v) was added to each microcentrifuge tube and incubated for 15 minutes in water bath at 37 °C. The supernatant was collected into an Eppendorf tube and dried in the vacuum centrifuge. The dried extract was desalted using SPE protocol. To perform SPE, the sample was re-dissolved in ~1 mL of 0.1% FA.

There are five main steps in SPE protocol named as conditioning, equilibration, sample loading, desalting and elution. Initially, the SPE cartridge was conditioned by

44

passing 2 mL of methanol: water (90:10, v/v) in 0.1% FA. Then, it was equilibrated by adding 2 mL of 0.1% FA. At this point, sample loading was done slowly into the cartridge packing. Next, the salts and impurities were removed by passing 1 mL of 0.1% FA through the cartridge. Finally, the sample was eluted using 1 mL of acetonitrile: water (50:50, v/v) in 0.1% FA. Eluted sample was collected and dried in the vacuum centrifuge. Dried extract was re-dissolved in 200 μL of acetonitrile: water (50:50, v/v) in 0.1% FA and subjected to

HPLC-ESI-MS/MS.

2.2.7 Separation and identification of proteins and peptides by HPLC-ESI-MS/MS

Each sample obtained from section 2.2.6 was subjected to HPLC-ESI-MS/MS using HPLC coupled to ESI-Orbitrap Fusion. The peptides were separated using a XBridge

C8 column (3 mm x 100 mm, 3.5 μm particles, 130 Å pore size) on HPLC system. The flow rate was set to 0.2 mL/min. Mobile phase A was 0.1% FA in water, and mobile phase

B consisted of 0.1% FA in acetonitrile. A multistep gradient was used with a linear increase of B from 5-40% in 28 min, a linear ramp from 40-60% in 2 min and then dropping down

% of B from 60% to 5% in 2 min followed by re-equilibration.

The eluting peptides were analyzed by ESI-Orbitrap Fusion. Full MS scans were acquired in positive ion mode with orbitrap resolution of 120 000 and m/z 300-3000 scan range. The spray voltage was 2400 V and ion transfer tube temperature was set to 300 °C.

MS/MS spectra were acquired in a data-dependent mode using CID fragmentation in ion

45

trap. The parameters for MS/MS selection were a charge state of 1-4 with ion intensity threshold of 5.0e3 and mass tolerance of 10 ppm.

The MS/MS spectra were analyzed by SEQUEST HT using Discoverer

(version 1.4, Thermo). The taxonomy was set to Homo sapiens with two allowed missed cleavages. Carbamidomethylation of cysteine was set as the fixed modification and methionine oxidation as the variable modification. The mass tolerance of parent ion was set to 10 ppm while the fragment ion mass tolerance was 0.6 Da. The search results were used to confirm the presence of expected proteins in our samples.

Initially, the HPLC-ESI-MS/MS method optimization was performed using myoglobin tryptic digest. Then, the same procedure was repeated for enzyme-digested light chain proteins.

2.2.8 Comparison between reported (Mcg, Sea) and unreported (Black, May, Moz, Tew and Jen) light chain protein sequences

In order to compare the sequence similarities of reported light chain proteins and the proteins which we are interested in this project (Black, May, Moz, Tew and Jen), the

ESI-MS/MS spectra obtained from CID fragmentation were submitted to Byonic and

Byologic tools introduced by Protein Metrics software. In Byonic’s control window, the two main inputs were introduced, the spectrum data file in a standard format (e.g. Thermo raw data file) and a protein database in FASTA format. Byonic performed peptide and protein identification in HPLC-ESI-MS/MS raw data files and crated a result folder. This folder was introduced into Byologic tool. The final fragment map coverage provided 46

information related to protein sequence. This search was performed against 91 reported light chain sequences responsible for AL and MM patients provided by AL base, a visual platform tool to study immunoglobulin light chain protein sequences.57

2.2.9 De novo sequencing assisted database search for peptide identification of unreported light chain protein sequences (Black, May, Moz, Tew, Jen)

The ESI-MS/MS data obtained from section 2.2.7 were used for de-novo sequencing assisted data base search using PEAKS software.6 PEAKS in de novo sequencing has been evaluated previously. It has been reported that more than 80% of amino acids in a protein sequence can be computed correctly by this software.6

To perform PEAKS search, raw MS/MS data were imported into PEAKS Studio

8.5 for data refinement and to improve the overall quality of data. Four different search options were generated after the pre-processing named as DENOVO, PEAKS DB, PEAKS

PTM and SPIDER. The DENOVO search was generated from de novo sequencing with precursor and fragment mass error tolerance as 10 ppm and 0.5 Da respectively.

Carbamidomethylation of cysteine was set as the fixed modification and oxidation of methionine as the variable modification. The low quality de novo sequences were filtered out by using 70% threshold value of the ALC score.

PEAKS DB was used to identify peptide spectrum matches (PSM) from database search. The data sets were searched against Swiss-Prot database, precursor mass tolerance of 10 ppm, fragment ion mass tolerance of 0.5 Da and three maximum missed cleavages

47

per peptide. Carbamidomethylation of cysteine was selected as the fixed modification and oxidation of methionine as the variable modification.

PEAKS PTM was used to analyze spectra with good de novo sequences that were not identified by PEAKS DB. A threshold on de novo ALC score was specified as 70%.

Finally, the SPIDER search was performed to obtain the homology searches on spectra with de novo sequences that were not identified by either PEAKS DB or PEAKS

PTM. The resulted peptides were assembled to generate the full sequence of light chains in homology to the already reported light chain sequences.

2.2.10 Differentiation between Leu and Ile by ETD-HCD-MS3 analysis

This procedure was performed as a multistep MS acquisition using HPLC coupled to ESI-Orbitrap Fusion. Precursor ions for MS2 (ETD fragmentation) and z-ions for MS3

(HCD fragmentation) were introduced manually at the beginning of each experiment depending on the size and the sequence of the peptide.

The method optimization was done using neurotensin (8-13) peptide (amino acid sequence RRPYIL and m/z of [M+2H]2+ 409.2). A solution of neurotensin (8-13) peptide

(concentration of 100 μg/L in methanol: water (50:50) in 0.1% FA) was prepared and desalted using SPE protocol described in section 2.2.6. This sample was subjected to HPLC separation using a XBridge C8 column (3 mm x 100 mm, 3.5 μm particles, 130 Å pore size). The flow rate was set to 0.2 mL/min. Mobile phase A was 0.1% FA in water, and mobile phase B consisted of 0.1% FA in acetonitrile. A multistep gradient was used for a

48

linear increase of B from 5-40% in 28 min, a linear ramp from 40-60% in 2 min and then dropping down % of B from 60-5% in 2 min followed by re-equilibration. The eluted peptide was subjected to multistep MS acquisition by ESI-Orbitrap Fusion. Full MS scan was acquired in positive ion mode with an orbitrap resolution of 120 000 and a scan range from m/z 300-3000. The spray voltage was 2400 V and ion transfer tube temperature was set to 300 °C. The MS/MS spectrum was acquired in a data-dependent mode using CID fragmentation in ion trap. The parameters for MS/MS selection were a charge state of 1-4 with ion intensity threshold of 5.0e3 and mass tolerance of 10 ppm.

As the next step SIM scans were acquired in positive ion mode with an orbitrap resolution of 120 000. The targeted masses were then subjected to ETD fragmentation with the ETD reaction time of 100 ms and 40% supplemental activation collision energy. Then, the expected z ions were selected for HCD-MS3 fragmentation with HCD collision energy of 30%. The parameters such as ETD reaction time, supplemental activation collision energy, and HCD collision energy were optimized to obtain the high-quality spectra.

This optimized method was used to unambiguously identify Leu vs Ile residue in light chain peptides. The isolation of peptides having Leu/Ile were performed by the quadrupole using selected ion monitoring (SIM). Six individual peptides were isolated from Mcg light chain sample and performed ETD-HCD-MS3 analysis as described above.

49

2.2.11 Comparison of secondary structure of light chain proteins by CD spectroscopy

For CD measurements, each protein sample was prepared as a 0.1 mg/mL solution in 10 mM phosphate-buffered saline (PBS) buffer. Sample was filtered using 0.2 μm syringe filter prior to CD analysis. CD spectrum was collected on the AVIV spectrometer at 25°C in a 1 mm quartz cuvette. Each CD spectrum represents the average of 2 accumulated scans. Data were collected between 190 nm and 260 nm by steps of 0.5 nm with the averaging time of 1.5 seconds. The wait time between scans was 0.3 seconds. The raw CD data were analyzed by CAPITO; a web server-based CD analysis and plotting tool

(http://capito.nmr.leibniz-fli.de/).

2.2.12 Detection of light chain aggregation by DLS

For temperature dependent DLS experiments, each protein sample was prepared as a 0.5 mg/mL solution in 10 mM PBS buffer. This solution was filtered using 0.2 μm Nylon membrane syringe filter and protein sample was filtered using 0.2 μm surfactant-free cellulose acetate syringe filter. A series of three measurements was recorded at 25°C in back-scattering mode at 175°. The solvent used was 154 mM NaCl with the refractive index of 1.3318. The equilibration time was set in between 5 to 7 minutes depending on the experiment. The number of runs, time for each run and the analysis model were automatically set by the instrument for all measurements. To investigate the temperature- dependent protein aggregation, variation of hydrodynamic radius of light chains with

50

temperature was measured by this experiment. Temperature was increased from 25 °C to

60 °C in steps of 5°C. All the measurements were recorded after 10-minute equilibration time.

For concentration dependent DLS experiments, a series of Black light chain samples were prepared ranging from 0.25 mg/mL to 6 mg/mL. The sample preparation method and the instrumental parameters were set similar to the temperature dependent experiment described above. All the measurements were recorded at 37 °C with 5-minute equilibration time.

51

Chapter 3

3 Results and Discussion

3.1 Determination of molecular masses of light chain proteins and the properties of

their ions in the gas phase

3.1.1 Separation of proteins by SDS-PAGE

The purity of the light chain protein samples was checked using SDS-PAGE as shown in Figure 3-1. Mcg, Sea and Black samples showed one band close to the size of the molecular weight (MW) marker with MW of ~ 25 kDa indicating that light chain dimers were dissociated into monomers due to cleavage of disulfide bonds. Other four proteins showed several bands apart from band at ~25 kDa suggesting that they are less pure compared to Mcg, Sea, and Black. Depending on the type of the experiment, those protein samples were further purified by dialysis or by using MWCO filters. For HPLC-ESI-

MS/MS experiments, samples were run on a gel and the observed protein bands related to light chains were subjected to in-gel digestion.

52

Figure 3-1 Images of Coomassie blue-stained gels corresponding to SDS-PAGE analyses of light chain protein samples. 3.1.2 Determination of molecular masses of light chain proteins (Mcg, Sea, Black, May, Moz, Nii, Tew, Jen) by MALDI-MS and ESI-MS

The reported predominant forms of light chain proteins are lambda and kappa having masses of ~46 kDa and ~23 kDa, respectively.13,56 We used MALDI-MS and ESI-

MS to measure the molecular masses of light chains as both methods can provide molecular mass information for large proteins. This was performed using the protocol described in section 2.2.2. Figures A-1 and A-2 in Appendix A illustrate the MALDI spectra obtained for Mcg and Sea light chain samples. These mass spectra showed peaks at ~46 kDa which were assigned to singly charged dimers of Mcg and Sea light chains, respectively. The peaks at 69 kDa were tentatively assigned to a singly charged trimer, which may be formed during MALDI process. The additional peaks at 22, 15 and 11 kDa were assigned to doubly, triply and quaternary charged dimer, respectively. Generally, MALDI-MS produces singly charged ions, but it is possible to observe multiply charged ions for large molecules, such as proteins.

53

However, ESI-MS usually provides more accurate molecular weight information for proteins than MALDI-MS. As shown in Figure 3-2, ESI mass spectrum of Mcg light chain sample showed a distribution of multiply charged ions.

Figure 3-2 ESI-MS spectrum of Mcg light chain protein. Multiply charged ion corresponding to each peak is shown in the spectrum. The deconvoluted mass spectrum is shown in the inset indicating that molecular weight of the protein is 45610 Da. The observed deconvoluted mass for Mcg light chain obtained from this spectrum using Protein Matrices software is 45610.3 Da. A lower intensity peak with the MW close to the MW of a Mcg monomer (~24.7 kDa) was observed in the deconvoluted spectrum as well. Additionally, three minor species with MWs of ~45.6 kDa may be present in this sample based on multiply charged ion series that appear in this spectrum with low intensity

54

(Figure 3-2). The molecular masses of other light chain proteins were obtained in a similar way to identify the light chain type. The ESI mass spectra of light chain proteins are shown in the Appendix A (Figures A-3 to A-8). The predominant MWs of light chains obtained using Protein Matrices software are shown in Table 3.1.

Table 3.1 Molecular masses of light chain proteins measured by ESI-Q-TOF-MS

Based on the results, Black, May and Moz light chains belongs to λ type and most likely contain similar number of amino acids as Mcg and Sea. Jen and Tew can be categorized under κ type and approximately contain half the number of amino acids present in Mcg and Sea.

3.1.3 The IMS-MS analysis of Black protein

Proteins could populate many conformational states simultaneously and ESI-MS shows overlapping charge state distributions, which are difficult to use to identify the number of conformational states. IMS-MS resolves ions having identical masses with

55

different collisional cross-sections and charge states. Therefore, the conformational properties of proteins can be observed when ESI is coupled to IMS-MS.

Figure 3-3 IMS-MS analysis of Black protein (A) ESI-IMS-MS Driftscope plot showing m/z (x axis) versus drift time (y axis). (B) The full scan m/z spectrum. The charge state of each ion is shown in each plot. As can be seen in Figure 3.3, multiple charge state distribution of Black protein can be clearly differentiated and detected by IMS. Ions exhibit different drift times depending on the collisional cross-sections and the charge states of the protein. However, further experiments are needed to identify the conformations of Black protein under different conditions.

56

3.2 The identification of peptides of Mcg and Sea proteins by MALDI-MS, ESI-

MS/MS and IMS-MS

The enzyme-digested Mcg and Sea protein samples were analyzed by MALDI-MS as described in section 2.2.5, to elucidate their amino acid sequences. The experimental data obtained for Mcg digest were compared with the reported Mcg sequence. Ten singly- charged peptide ions were identified by MALDI-MS (Figure A-9), while both singly and doubly charged ions were observed by ESI-MS corresponding to eight peptides (Figure A-

12). The results are summarized in Table 3.2. The calculated percent sequence coverage for Mcg using trypsin by both methods was 52%.

To increase the sequence coverage, the same peptide digestion approach was extended using two additional enzymes, chymotrypsin and glu-C. Several new peptides were obtained by this approach (Figures A-10 and A-11) and the total sequence coverage was increased to 78%. The same set of experiments was performed for Sea sample and the total sequence coverage was improved to 81%. Therefore, the preliminary results suggested that this workflow can be used to obtain ≥78% sequence coverage for Mcg and Sea.

The same set of experimental procedures was combined with MASCOT search to identify the possible peptides in Black, May, Moz, Tew and Jen. While the sequences of those light chains have not been reported, they were identified by MASCOT as constant and variable regions of Ig light chains using SwissProt database. The summary of peptide sequence coverage for both C and V regions are shown in Table 3.3.

57

Table 3. 2 A list of peptides observed by MALDI-MS and ESI-MS of tryptic digest of Mcg light chain protein in positive ion mode.

Ionization m/z m/z Charge Sequence Peptide sequence method Theoretical Measured state position

MALDI 673.3515 673.444 1 154-160 ADGSPVK MALDI 841.5142 841.548 1 107-160 VTVLGQPK MALDI 865.3608 863.53 1 209-216 TVAPTECS MALDI 883.4744 883.536 1 56-83 RPSGVPDR MALDI 977.5666 977.61 1 48-55 VIIYEVNK MALDI 1389.7233 1389.832 1 56-68 RPSGVPDRFSGSK MALDI 1712.7432 1711.858 1 194-208 SYSCQVTHEGSTVEK MALDI 1842.2032 1842.09 1 48-63 VIIYEVNKRPSGVPDR MALDI 2043.0393 2043.106 1 115-133 ANPTVTLFPPSSEELQANK MALDI 2212.1359 2211.2 1 134-153 ATLVCLISDFYPGAVTVAWK ESI 673.3515 673.3257 1 154-160 ADGSPVK ESI 841.5142 841.4818 1 107-114 VTVLGQPK ESI 864.3768 864.3018 1 209-216 TVAPTECS ESI 883.4744 883.433 1 56-83 RPSGVPDR ESI 977.5666 977.51 1 48-55 VIIYEVNK ESI 1017.5575 1017.4686 1 161-170 AGVETTKPSK ESI 673.3515 337.1805 2 154-160 ADGSPVK ESI 841.5142 421.2458 2 107-114 VTVLGQPK ESI 864.3768 442.2573 2 209-216 TVAPTECS ESI 883.4744 489.2573 2 56-83 RPSGVPDR ESI 977.5666 509.262 2 48-55 VIIYEVNK ESI 1610.7115 856.3302 2 161-175 AGVETTKPSKQSNNK ESI 1743..8588 872.3648 2 176-190 YAASSYLSLTPEQWK ESI 1021.5196 1022.4504 2 115-133 ANPTVTLFPPSSEELQANK

58

Table 3.3 Peptide sequences of Black, May, Moz, Jen and Tew obtained by MASCOT

In addition, ESI-MS/MS was used to confirm the precise order of amino acids in tryptic digest peptides originating from Mcg and Sea light chains. In this method, a precursor parent ion was selected and fragmented further to obtain fragment ions. For instance, the ESI-MS/MS spectrum obtained for ADGSPVK tryptic peptide of Mcg (m/z =

673.3, z = +1) is shown in Figure 3-4. The b and y-ions were assigned to observe fragments in the MS/MS spectrum based on the Biemann nomenclature for fragmentation of peptides.43 The amino acid sequence was confirmed based on the m/z values of observed ions. The ESI-MS/MS spectra of a few tryptic peptides of Mcg and Sea are shown in

Figures A-14 to A-16 in Appendix A.

59

Figure 3-4 ESI-MS/MS spectrum obtained for ADGSPVK sequence in Mcg. To further separate the peptides in gas-phase, IMS was combined with ESI-MS.

The observed ESI-IMS spectrum for tryptic digested Mcg sample is shown in Figure 3-5.

The solid lines indicate the position of the [M+H]+, [M+2H]2+ and [M+3H]3+ charge state families. Several +3 charged ions were identified by this experiment such as m/z = 769.3, m/z = 681.8 and m/z =326.3. Those ions were not identified by ESI-MS alone because of the background noise. Therefore, IMS is also a feasible technique to separate and identify peptides obtained by enzymatic-digestion of light chain proteins.

60

Figure 3-5 ESI-IMS-MS Driftscope plot for tryptic digest of Mcg. The solid lines indicate the position of [M+H]+, [M+2H]2+ and [M+3H]3+ charge state families. The identified triply charged ions are labeled as m/z = 326.3, 682.8 and 769.3. The above-mentioned approaches are feasible only with known protein sequences and the manual interpretation of MS/MS spectra is relatively time consuming. Therefore,

HPLC-ESI-MS/MS approach was used to analyze complex light chain peptide mixtures and data interpretation was performed with the help of several search tools.

3.3 The separation and identification of peptides in enzyme-digested protein

samples by HPLC-ESI-MS/MS

In this approach, peptides were separated by reversed-phase HPLC. Different peptides eluted at different time intervals and full scan mass spectrum of eluted peptides

61

was obtained by ESI-orbitrap-MS. Then, each precursor ion was subjected to CID fragmentation to obtain MS/MS spectra. The MS/MS spectra were analyzed with the help of SEQUEST, Byologic, and PEAKS search tools.

The parameters of HPLC-ESI-MS/MS method were initially adjusted using tryptic digested myoglobin. Sample preparation was performed as described in section 2.2.6.

Different gradients were used to obtain an optimal separation of peptides and many of the peaks were relatively well resolved with a 40-min gradient described in section 2.2.7.

Figure 3-6 represents the base peak chromatogram showing the HPLC-ESI-MS separation of trypsin-digested myoglobin sample. The peptide identification was performed using

SEQUEST, Byologic and PEAKS search tools and the sequence coverage of 88.96%,

99.35% and 95% were calculated, respectively. For myoglobin, combination of the results obtained from above search tools provides useful information related to the protein sequences.

Therefore, the above optimized HPLC-ESI-MS/MS method was used to obtain

MS/MS data for enzyme-digested light chain protein samples. The base peak chromatograms showing the HPLC-ESI-MS separation of trypsin-digested light chain samples are shown in Figures A-17 to A-25 in Appendix A. The ESI-MS/MS raw data files were then introduced to SEQUEST, Byologic and PEAKS search tools to obtain the sequence information.

62

Figure 3-6 Base peak chromatogram of the separation of myoglobin tryptic peptides by HPLC-ESI-MS/MS.

3.4 Amino acid sequence assembly of unreported immunoglobulin light chain

proteins (Black, May, Moz, Tew and Jen)

The HPLC-ESI-MS/MS raw data obtained for three different enzymatic digestions

were imported into PEAKS Studio 8.5. After the data refinement, three lists of peptides

were generated for each data set. Then PEAKS DB was used to identify peptide matches

with peptides related to immunoglobulin light chains. Based on the database search and

de novo sequencing results, the second set of peptides was obtained by PEAKS PTM. Next,

PEAKS SPIDER search was performed to obtain homology searches on spectra with de

novo sequences that were not identified by either PEAKS DB or PEAKS PTM. The

63

accuracy of the target sequences was guaranteed by the validation with three criteria; (1)

The false discovery rate (FDR) at the peptide spectrum match (PSM) level was kept less than 0.1%; (2) The PSMs that relate to contaminant proteins were filtered out; (3) Average

6 36 local confidence (ALC) scores of PSMs were set to 70%. ’

Under the above criteria, possible peptides for each protein were identified using all peptide lists generated for three enzyme digestions. The three lists of peptides obtained for Black protein digestion are shown in Tables B.1 to B.3 in Appendix B. The peptide assembler option to generate the full possible sequence was not included in the trial version of PEAKS Studio 8.5. Therefore, the assembly task was done manually with the homology of Mcg, Sea and other reported light chain protein sequences. The results obtained for

Black protein is outlined in Figure 3-7.

Figure 3-7 Predicted sequence of Black protein. Peptides obtained by three different enzyme digestions are shown here; tryptic peptides in blue, peptide from chymotrypsin digestion in green and peptides from glu-c digestion in red. “X” indicates the undetermined amino acids.

64

Peptides obtained from three enzyme digestions were shown in three different color codes. The same procedure was repeated to sequence May, Moz, Jen and Tew proteins and the results are shown in Figures A-26 to A-29 in Appendix A. The lists of peptides identified for each protein are summarized in Appendix B (Tables B.4 to B.15). The summary of sequence coverages of light chains obtained by de novo sequencing approach is shown in Table 3.4. Although the sequences of variable regions were not complete, sequence coverages were significantly improved in comparison to those obtained by PMF

(Table 3.3). The sequence alignment for V regions of all light chains is presented in Table

3.5, showing some of the conserved amino acid sequences in FR1, FR2, FR3 and CDR1 regions of those proteins. A search of AL base57 indicates that similar amino acids are found in FR1 regions of human λ light chains present in this database and light chains analyzed in this study.

Table 3.4 A summary of protein sequence coverage by de novo assisted database search

65

Table 3.5 Sequence alignment of variable region in light chain proteins. The framed regions indicate similar amino acids.

3.5 Differentiation between Leu and Ile in light chain peptides having single or

multiple Leu/Ile residues

PEAKS uses Leu to represent both Leu and Ile when generating de novo sequences.

However, we need a precise experimental approach to confirm the presence of Leu and/or

Ile residues for newly identified peptide sequences. As reported in the literature, the suggested MS-based approach for this purpose is ETD-HCD-MS3. Therefore, this method was incorporated into our experiments to differentiate Leu and Ile residues.

The method optimization was performed using neurotensin (8-13) peptide

(RRPYIL) which contains Ile and Leu residues at position 5 and 6 respectively. This peptide was subjected to ETD-HCD-MS3 using multistep MS/MS approach. The total ion chromatograms obtained in each step are illustrated in Figure 3-8.

66

Figure 3-8 The total ion chromatograms observed for neurotensin peptide in each step of ETD-HCD-MS3 approach. This method starts with ETD-HCD fragmentation of doubly charged precursor ion at m/z of 409.2 (Figure 3-9). The resulting MS2 spectrum contains fragment ions including immonium ion at m/z 86, Z1 ion at m/z of 116 and Z2 ion at m/z of 229. Further isolation and subsequent HCD fragmentation of immonium ion and Z ions generated unique ions which allow the confident identification of each Leu and Ile residue.

67

Figure 3-9 MS and MS/MS spectra obtained from ETD-HCD-MS2 approach.

As shown in Figure 3-10, immonium ion with m/z of 86 produced highly abundant fragment with m/z of 69 which confirms the presence of Ile. On the other hand, HCD fragmentation of Z1 ion generated W1 ion at m/z of 73.9, due to the loss of C3H7 (-43 Da) from the Leu at position 6 (Figure 3-11). In addition, HCD fragmentation of Z2 ion successfully produced W2 ion related to the characteristic loss of C2H5 (-29 Da), confirming that residue at position 5 is Ile (Figure 3-11).

68

Figure 3-10 MS spectrum obtained from HCD-MS3 approach.

Figure 3-11 MS spectrum obtained from HCD-MS3 approach for (A) m/z=116 ion and (B) m/z=229 ion. The application of HCD-MS3 method to confirm the presence of Leu vs Ile in Black

tryptic peptide (RPSGIPDRFSGSK) was shown in Appendix A (Figures A-30 and A-31).

The doubly charged precursor ion of this peptide was fragmented to obtain the HCD-MS2

spectrum. The resulting immonium ion appeared at 86Da. Further fragmentation of 86 Da

69

ion showed 69 Da ion in HCD-MS3 spectrum with a high intensity. This confirms the presence of Ile residue at position 5. Therefore, this approach is feasible to use for the identification of Leu and Ile residues in newly identified light chain peptides.

3.6 Comparison of secondary structures of light chain proteins by CD spectroscopy

The purpose of this experiment was to evaluate the similarities and differences between secondary structure elements of unreported light chain proteins. The CD spectrum of each protein was obtained using the method described in section 2.2.11 and each set of data was analyzed by CAPITO web server-based tool. Following the input of experimental parameters such as protein concentration, cuvette pathlength and the number of amino acids, spectral data (in millidegrees) were converted to mean residue ellipticity (Figure A-

32 in Appendix A). The predicted secondary structure elements of each protein are shown in Table 3.6. Among the seven light chain proteins, six contain high percentage of β- strands.

Table 3.6 Predicted secondary structure elements for light chain proteins

70

When comparing the above data with X-Ray diffraction data for Mcg (helix: 6% and β-strand 47%) , they are quite similar. However, for Sea, X-ray data (helix: 6% and B- strand 49%)are not similar to CD data. Therefore, more experiments are needed to account for these changes.

Although there is no direct correlation between the stability and percentage of β- strands, proteins with high β-strand content are strongly associated with the formation of amyloid-like aggregates. Therefore, these light chains are likely to form aggregates.

However, further experiments are needed to confirm this behavior and some of the suggested experiments are described in section 4.2.

3.7 Detection of light chain aggregation by DLS

DLS measures the speed of particles diffusion due to Brownian motion and detects scattered intensity with time at a fixed scattering angle. It is well suited to study the dynamics of protein aggregation.68 Information about such aggregates can be derived from particle size distributions given as hydrodynamic radius. . The primary aim of this experiment was to study the effect of parameters, such as protein concentration and sample temperature, on the aggregation behavior of light chain proteins.

When changing the concentration of light chain proteins, a significant change in particle size was not observed (Figure A-33 in Appendix A), which suggests that the degree of aggregation remains independent of this parameter. To investigate the effect of temperature, the DLS experiments were performed as described in section 2.2.12. When

71

measured at different temperatures, the particle size provides information about denaturation and aggregation, both related to the thermal stability of the protein.69 As can be noted in Figure 3-12, Black, a lambda light chain, is as stable as BSA and Lysozyme until the critical melting temperature is reached. However, other light chains are less well- behaved. The same experiment was performed for Mcg and Sea proteins, which are known to be amyloidogenic, and the data is shown in Figure A-34 in Appendix A. The observed particle sizes were very small, suggesting that the samples may contain some degradation products. In accordance with this observation, it is difficult to conclude about the thermal stability of Mcg and Sea. A standard lambda light chain commercially prepared was highly aggregated at all temperatures, and had small fragments suggestive of degradation as well.

Future experiments have been suggested in section 4.2 to further study the aggregation behavior of Mcg, Sea and other light chain proteins.

Figure 3-12 The change in the most common scattering particle size with temperature

72

Chapter 4

4 Conclusion and Future Work

4.1 Conclusions

In our study, average molecular masses of light chain proteins were obtained by

ESI-MS. In-gel digested light chain protein samples were subjected to HPLC-ESI-MS and

MS/MS. De novo sequencing-assisted database search was used to identify the new peptide sequences of light chains. While the sequences of variable regions of immunoglobulin light chains are still not complete, a progress was made to determine many of the peptides.

Amino acid sequence assembly tasks were performed manually by the homology between unknown light chain sequences and reported sequences of Mcg, Sea and other Ig light chains. By using ETD-HCD-MS3 approach, the presence of Leu or Ile residues were confidently identified in selected light chain peptides.

Furthermore, the secondary structure elements of light chains were predicted by

CD spectroscopy. Most lambda type light chains showed high percentage of β-sheets which resembles the aggregation behavior. Aggregation of light chains was monitored by DLS with respect to concentration and temperature. Preliminary results indicate that

73

concentration had no effect on aggregation. Compared to other proteins, light chains showed high degree of aggregation at elevated temperatures. However, further experiments are needed to confirm these observations.

4.2 Future Work

Future experiments will be aimed to identify the complete light chain sequences using X-ray diffraction data. The light chain proteins will be crystallized and the electron density maps of the resultant crystal structures will be used to fill the gaps of sequences uncovered by LC-ESI-MS/MS. By combining tandem mass spectrometry and X-ray structures of light chain proteins, complete amino acid sequences of light chains will be determined. Additionally, these electron density maps will be used to confirm the presence of Leu vs Ile residues, which were differentiated by ETD-HCD-MS3 approach.

To study the structural stability of the light chain proteins, the effect of pH, denaturing agents and temperature will be investigated using spectroscopic techniques. The

β-sheet content at low and high pH conditions and temperature ranges will be obtained by

CD spectroscopy. The effect of denaturing agents such as guanidinium hydrochloride will be monitored by tryptophan florescence assay.

After characterizing the structural stability, DLS experiments will be performed to study the in vitro aggregation propensity of light chain proteins. The effect of biologically relevant salts (e.g. NaCl, NaH2PO4, Na2SO4) towards aggregation will be investigated by this approach. Considering sequence and structural features of light chains obtained from

74

previous studies and from the current study, factors responsible for amyloid formation will be investigated.

75

References

(1) Picken, M. M. Amyloidosis—Where Are We Now and Where Are We Heading? Arch. Pathol. Lab. Med. 2010, 134, 545-551.

(2) Chiti, F.; Dobson, C. M. Protein Misfolding, Functional Amyloid, and Human Disease. Annu. Rev. Biochem. 2006, 75 (1), 333–366.

(3) Li, H.; Rahimi, F.; Sinha, S.; Maiti, P.; Bitan, G. Amyloids and Protein Aggregation – Analytical Methods. In: Meyers RA, editor. Encyclopedia in Analytical Chemistry, New York: Wiley; 2009, pp 1–32.

(4) Kyle, R. A.; Linos, A.; Beard, C. M.; Linke, R. P.; Gertz, M. A.; O ’fallon, W. M.; Kurland, L. T. Incidence and Natural History of Primary Systemic Amyloidosis in Olmsted County, Minnesota, 1950 Through 1989. Blood 1992, 79 (7), 1817–1822.

(5) Andrich, K.; Hegenbart, U.; Kimmich, C.; Kedia, N.; Bergen 3rd, H. R.; Schönland, S.; Wanker, E.; Bieschke, J. Aggregation of Full Length Immunoglobulin Light Chains from AL Amyloidosis Patients Is Remodeled by Epigallocatechin-3-Gallate. J. Biol. Chem. 2017, 292 (6), 2328-2344.

(6) Ma, B.; Zhang, K.; Hendrie, C.; Liang, C.; Li, M.; Doherty-Kirby, A.; Lajoie, G. PEAKS: Powerful Software for Peptidede Novo Sequencing by Tandem Mass Spectrometry. Rapid Commun. Mass Spectrom. 2003, 17 (20), 2337–2342.

(7) Ghahghaei, A.; Faridi, N. Review: Structure of Amyloid Fibril in Diseases. J. Biomed. Sci. Eng. 2009, 2, 345–358. 76

(8) Chiti, F.; Stefani, M.; Taddei, N.; Ramponi, G.; Dobson, C. M. Rationalization of the Effects of Mutations on Peptide and Protein Aggregation Rates. Nature 2003, 424 (6950), 805–808.

(9) Schwartz, R.; King, J. Frequencies of Hydrophobic and Hydrophilic Runs and Alternations in Proteins of Known Structure. Protein Sci. 2006, 15 (1), 102–112.

(10) Chiti, F.; Calamai, M.; Taddei, N.; Stefani, M.; Ramponi, G.; Dobson, C. M. Studies of the Aggregation of Mutant Proteins in Vitro Provide Insights into the Genetics of Amyloid Diseases. Proc. Natl. Acad. Sci. U. S. A. 2002, 99 (4), 16419-16426.

(11) Broome, B. M.; Hecht, M. H. Nature Disfavors Sequences of Alternating Polar and Non-Polar Amino Acids: Implications for Amyloidogenesis. J. Mol. Biol. 2000, 296 (4), 961–968.

(12) Merlini, G.; Bellotti, V.; Sirac, C.; Delbes, S.; Bender, S.; Fernandez, B.; Quellard, N.; Lacombe, C.; Goujon, J.-M.; Lavergne, D.; et al. Molecular Mechanisms of Amyloidosis. N. Engl. J. Med. 2003, 349 (6), 583–596.

(13) Bergen, H. R.; Abraham, R. S.; Johnson, K. L.; Bradwell, A. R.; Naylor, S. Characterization of Amyloidogenic Immunoglobulin Light Chains Directly from Serum by on-Line Immunoaffinity Isolation. Biomed. Chromatogr. 2004, 18 (3), 191–201.

(14) Ashcroft, A. E. Mass Spectrometry and the Amyloid Problem-How Far Can We Go in the Gas Phase? J. Am. Soc. Mass Spectrom. 2010, 21 (7), 1087–1096.

(15) Solomon, A. Light Chains of Immunoglobulins: Structural-Genetic Correlates. Blood 1986, 68 (3), 603–606.

(16) Dispenzieri, A.; Gertz, M. A.; Buadi, F. What Do I Need to Know about

77

Immunoglobulin Light Chain (AL) Amyloidosis? Blood Rev. 2012, 26, 137–154.

(17) Hurle, M. R.; Helms, L. R.; Li, L.; Chan, W.; Wetzel, R. A Role for Destabilizing Amino Acid Replacements in Light-Chain Amyloidosis. Proc. Natl. Acad. Sci. U. S. A. 1994, 91 (June), 5446–5450.

(18) Peterson, F. C.; Baden, E. M.; Owen, B. A. L.; Volkman, B. F.; Ramirez-Alvarado, M. A Single Mutation Promotes Amyloidogenicity through a Highly Promiscuous Dimer Interface. Struture 2010, 18 (5), 563-570.

(19) Brumshtein, B.; Esswein, S. R.; Landau, M.; Ryan, C. M.; Whitelegge, J. P.; Phillips, M. L.; Cascio, D.; Sawaya, M. R.; Eisenberg, D. S. Formation of Amyloid Fibers by Monomeric Light-Chain Variable Domains. J. Biol. Chem. 2014, 289 (40), 27513-27525.

(20) Alvarado, U. R.; DeWitt, C. R.; Shultz, B. B.; Ramsland, P. A.; Edmundson, A. B. Crystallization of a Human Bence–Jones Protein in Microgravity Using Vapor Diffusion in Capillaries. J. Cryst. Growth 2001, 223 (3), 407–414.

(21) Salomo, M.; Gimsing, P.; Nielsen, L. B. Simple Method for Quantification of Bence Jones Proteins. Clin. Chem. 2002, 48 (12), 2202–2207.

(22) Dobson, C. M. Protein Folding and Misfolding. Nature 2003, 426 (6968), 884–890.

(23) Hoffmann, E. de.; Stroobant, V. Mass Spectrometry : Principles and Applications; Third ed.; John Wiley & Sons Ltd: West Sussex, England, 2007.

(24) Glish, G. L.; Vachet, R. W. The Basics of Mass Spectrometry in the Twenty-First Century. Nat. Rev. Drug Discov. 2003, 2 (2) 140–150.

(25) Eidhammer, I.; Flikka, K.; Martens, L.; Mikalsen, S.-O. Computational Methods for Mass Spectrometry Proteomics; Second ed.; John Wiley & Sons, Ltd: Chichester, 78

UK, 2007.

(26) Hillenkamp, F.; Jaskolla, T.W.; Karas, M. The MALDI process and Method. In MALDI MS, Wiley-VCH Verlag GmbH & Co. KGaA: 2013; pp 1-40.

(27) Ho, C. S.; Lam, C. W. K.; Chan, M. H. M.; Cheung, R. C. K.; Law, L. K.; Lit, L. C. W.; Ng, K. F.; Suen, M. W. M.; Tai, H. L. Electrospray Ionisation Mass Spectrometry: Principles and Clinical Applications. Clin. Biochem. Rev. 2003, 24 (1), 3–12.

(28) Schwartz, J. C.; Senko, M. W.; Syka, J. E. P. A Two-Dimensional Quadrupole Ion Trap Mass Spectrometer. J. Am. Soc. Mass Spectrom. 2002, 13 (6), 659-669.

(29) Scigelova, M.; Makarov, A. Orbitrap Mass Analyzer – Overview and Applications in Proteomics. Proteomics 2006, 6 (S2), 16–21.

(30) Michalski, A.; Damoc, E.; Hauschild, J.-P.; Lange, O.; Wieghaus, A.; Makarov, A.; Nagaraj, N.; Cox, J.; Mann, M.; Horning, S. Mass Spectrometry-Based Proteomics Using Q Exactive, a High-Performance Benchtop Quadrupole Orbitrap Mass Spectrometer. Mol. Cell. Proteomics 2011, 10 (9), M111.011015.

(31) Aebersold, R.; Mann, M. Mass Spectrometry-Based Proteomics. Nature 2003, 422 (6928), 198–207.

(32) Giansanti, P.; Tsiatsiani, L.; Low, T. Y.; Heck, A. J. R. Six Alternative Proteases for Mass Spectrometry-Based Proteomics beyond Trypsin. Nat. Protoc. 2016, 11 (5), 993–1006.

(33) Henzel, W. J.; Watanabe, C.; Stults, J. T. Protein Identification: The Origins of Peptide Mass Fingerprinting. J. Am. Soc. Mass Spectrom. 2003, 14 (9), 931-942.

(34) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Probability-Based 79

Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data. Electrophoresis 1999, 20 (18), 3551–3567.

(35) Swaney, D. L.; Wenger, C. D.; Coon, J. J. The Value of Using Multiple Proteases for Large-Scale Mass Spectrometry-Based Proteomics. J. Proteome Res. 2010, 9 (3), 1323-1329.

(36) Tran, N. H.; Ziaur Rahman, M.; He, L.; Xin, L.; Shan, B.; Li, M. Complete De Novo Assembly of Monoclonal Sequences. Sci. Rep. 2016, 6 (31730), 1-10.

(37) Switzar, L.; Giera, M.; Niessen, W. M. A. Protein Digestion: An Overview of the Available Techniques and Recent Developments. J. Proteome Res. 2013, 12 (3), 1067–1077.

(38) Amorim Madeira, P. J.; Helena, M. Applications of Tandem Mass Spectrometry: From Structural Analysis to Fundamental Studies. In Tandem Mass Spectrometry - Applications and Principles; InTech, 2012, DOI: 10.5772/31736.

(39) Quan, L.; Liu, M. Modern Chemistry and Applications CID, ETD and HCD Fragmentation to Study Protein Post-Translational Modifications. Mod Chem appl 2013, 1:e102. DOI:10.4172/mca.100 0e102.

(40) Elviri, L. ETD and ECD Mass Spectrometry Fragmentation for the Characterization of Protein Post Translational Modifications. InTech, 2012, DOI: 10.5772/35277.

(41) Syka, J. E. P.; Coon, J. J.; Schroeder, M. J.; Shabanowitz, J.; Hunt, D. F.; Mclafferty, F. W. Peptide and Protein Sequence Analysis by Electron Transfer Dissociation Mass Spectrometry. Proc. Natl. Acad. Sci. U. S. A. 2004, 101 (26), 9528-9533.

(42) Nardiello, D.; Palermo, C.; Natale, A.; Quinto, M.; Centonze, D. Strategies in Protein Sequencing and Characterization: Multi-Enzyme Digestion Coupled with Alternate CID/ETD Tandem Mass Spectrometry. Anal. Chim. Acta 2015, 854, 106– 117. 80

(43) Biemann, K. Contributions of Mass Spectrometry to Peptide and Protein Structure. Biol. Mass Spectrom. 1988, 16 (1–12), 99–111.

(44) Guo, J.; Uppal, S.; Easthon, L. M.; Mueser, T. C.; Griffith, W. P. Complete Sequence Determination of from Endangered Feline Species Using a Combined ESI-MS and X-Ray Crystallography Approach. Int. J. Mass Spectrom. 2012, 312, 70–77. (45) Smith, J. B. Peptide Sequencing by Edman Degradation. In Encyclopedia of Life Sciences. Wiley-Blackwell, Hoboken, NJ, USA, 2001.

(46) Mardis, E. R. Next-Generation DNA Sequencing Methods. Annu. Rev. Genomics Hum. Genet 2008, 9, 387–402.

(47) Bourne, P. C.; Ramsland, P. A.; Shan, L.; Fan, Z. C.; DeWitt, C. R.; Shultz, B. B.; Terzyan, S. S.; Moomaw, C. R.; Slaughter, C. A.; Guddat, L. W. Three-Dimensional Structure of an Immunoglobulin Light-Chain Dimer with Amyloidogenic Properties. Acta Crystallogr. D. Biol. Crystallogr. 2002, 58 (Pt 5), 815–823.

(48) Zhang, J.; Xin, L.; Sha, B.; Chen, W.; Xie, M.; Yuen, D.; Zhang, W.; Zhang, Z.; Lajoie, G.A.; Ma, B.; PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification. Mol. Cell. Proteomics, 2011, 10.1074/mcp.M111.010587.

(49) Zhokhov, S. S.; Kovalyov, S. V; Yu Samgina, T.; Lebedev, A. T. An EThcD-Based Method for Discrimination of Leucine and Isoleucine Residues in Tryptic Peptides. J. Am. Soc. Mass Spectrom. 2017, 28, 1600–1611.

(50) Lebedev, A. T.; Damoc, E.; Makarov, A. A.; Yu Samgina, T. Discrimination of Leucine and Isoleucine in Peptides Sequencing with Orbitrap Fusion Mass Spectrometer. Anal.Chem. 2014, 86, 7017-7022.

(51) Xiao, Y.; Vecchi, M. M.; Wen, D. Distinguishing between Leucine and Isoleucine 81

by Integrated LC−MS Analysis Using an Orbitrap Fusion Mass Spectrometer. Anal.Chem. 2016, 88, 10757-10766.

(52) Kriwacki, R.; Reisdorph, N.; Siuzdak, G. Protein Structure Characterization with Mass Spectrometry. Spectroscopy 2004, 18 (1), 37–47.

(53) Lanucara, F.; Holman, S. W.; Gray, C. J.; Eyers, C. E. The Power of Ion Mobility- Mass Spectrometry for Structural Characterization and the Study of Conformational Dynamics. Nat. Chem. 2014, 6 (4), 281–294.

(54) Ruotolo, B. T.; Benesch, J. L. P.; Sandercock, A. M.; Hyung, S.-J.; Robinson, C. V. Ion Mobility–mass Spectrometry Analysis of Large Protein Complexes. Nat. Protoc. 2008, 3, 1139–1152.

(55) Baker, E. S.; Burnum-Johnson, K. E.; Ibrahim, Y. M.; Orton, D. J.; Monroe, M. E.; Kelly, R. T.; Moore, R. J.; Zhang, X.; Théberge, R.; Costello, C. E.; et al. Enhancing Bottom-up and Top-down Proteomic Measurements with Ion Mobility Separations. Proteomics 2015, 15 (16), 2766–2776.

(56) Barnidge, D. R.; Lundstr??m, S. L.; Zhang, B.; Dasari, S.; Murray, D. L.; Zubarev, R. A. Subset of Kappa and Lambda Germline Sequences Result in Light Chains with a Higher Molecular Mass Phenotype. J. Proteome Res. 2015, 14 (12), 5283–5290.

(57) Bodi, K.; Prokaeva, T.; Spencer, B.; Eberhard, M.; Connors, L. H.; Seldin, D. C. AL-Base: A Visual Platform Analysis Tool for the Study of Amyloidogenic Immunoglobulin Light Chain Sequences. Amyloid 2009, 16 (1), 1–8.

(58) Mills, J. R.; Barnidge, D. R.; Murray, D. L. Detecting Monoclonal Immunoglobulins in Human Serum Using Mass Spectrometry. Methods 2015, 81, 56–65.

(59) Barnidge, D. R.; Krick, T. P.; Griffin, T. J.; Murray, D. L. Using Matrix-Assisted Laser Desorption/ionization Time-of-Flight Mass Spectrometry to Detect Monoclonal Immunoglobulin Light Chains in Serum and Urine. Rapid Commun. 82

Mass Spectrom. 2015, 29 (21), 2057–2060.

(60) Barnidge, D. R.; Dasari, S.; Botz, C. M.; Murray, D. H.; Snyder, M. R.; Katzmann, J. A.; Dispenzieri, A.; Murray, D. L. Using Mass Spectrometry to Monitor Monoclonal Immunoglobulins in Patients with a Monoclonal Gammopathy. J. Proteome Res. 2014, 13 (3), 1419–1427.

(61) Barnidge, D. R.; Dispenzieri, A.; Merlini, G.; Katzmann, J. A.; Murray, D. L. Monitoring Free Light Chains in Serum Using Mass Spectrometry. Clin. Chem. Lab. Med. 2016, 54 (6), 1073–1083.

(62) Edelman, G.M.; Gally, J.A.; Nature of Bence-Jones Proteins. Chemical similarities to polypetide chains of myeloma and normal gamma-globulins. J. Exp. Med. 1962, 116 (8), 207-227.

(63) Ely, K. R.; Peabody, D. S.; Holm, T. R.; Cheson, B. D.; Edmundson, A. B. Accessible Intrachain Disulfide Bonds in Hybrids of Light Chains. Mol. Immunol. 1985, 22 (2), 85–92.

(64) Kelly, S. M.; Price, N. C. The Application of Circular Dichroism to Studies of Protein Folding and Unfolding. Biochim. Biophys. Acta 1997, 1338, 161–185.

(65) Kelly, S. M.; Jess, T. J.; Price, N. C. How to Study Proteins by Circular Dichroism. Biochim. Biophys. Acta - Proteins Proteomics 2005, 1751 (2), 119–139.

(66) Greenfield, N. J. Using Circular Dichroism Spectra to Estimate Protein Secondary Structure. Nat. Protoc. 2006, 1 (6), 2876–2890.

(67) Lorber, B.; Fischer, F.; Bailly, M.; Roy, H.; Kern, D. Protein Analysis by Dynamic Light Scattering: Methods and Techniques for Students. Biochem. Mol. Biol. Educ. 2012, 40 (6), 372–382.

83

(68) Georgalis, Y.; Starikov, E. B.; Hollenbach, B.; Lurz, R.; Scherzinger, E.; Saenger, W.; Lehrach, H.; Wanker, E. E. Huntingtin Aggregation Monitored by Dynamic Light Scattering. Biophysics (Oxf). 1998, 95, 6118–6121.

(69) Nobbmann, U.; Connah, M.; Fish, B.; Varley, P.; Gee, C.; Mulot, S.; Chen, J.; Zhou, L.; Lu, Y.; Sheng, F.; et al. Biotechnology and Genetic Engineering Reviews Dynamic Light Scattering as a Relative Tool for Assessing the Molecular Integrity and Stability of Monoclonal ) Dynamic Light Scattering as a Relative Tool for Assessing the Molecular Integrity and S. Biotechnol. Genet. Eng. Rev. 2007, 24 (24), 117–128. (70) Shevchenko, A.; Tomas, H.; Havli\[sbreve], J.; Olsen, J. V; Mann, M. In-Gel Digestion for Mass Spectrometric Characterization of Proteins and . Nat. Protoc. 2007, 1 (6), 2856–2860.

84

Appendix A

Supplemental figures and identifications

- Figure A-1 MALDI-MS spectrum of Mcg light chain protein.

Figure A-2 MALDI-MS spectrum of Sea light chain protein.

85

Figure A-3 ESI-MS spectrum of Sea light chain protein.

Figure A-4 ESI-MS spectrum of Black light chain protein. 86

Figure A-5 ESI-MS spectrum of May light chain protein.

Figure A-6 ESI-MS spectrum of Moz light chain protein.

87

Figure A-7 ESI-MS spectrum of Jen light chain protein.

Figure A-8 ESI-MS spectrum of Tew light chain protein.

88

Figure A-9 MALDI-MS spectrum of trypsin digested Mcg.

Figure A-10 MALDI-MS spectrum of chymotrypsin digested Mcg.

89

Figure A-11 MALDI-MS spectrum of glu-c digested Mcg.

Figure A-12 ESI-MS spectrum of trypsin digested Mcg.

90

Figure A-13 ESI-MS spectrum of trypsin digested Sea.

Figure A-14 ESI-MS/MS spectrum obtained for RPSGVPDR sequence in Mcg.

91

Figure A-15 ESI-MS/MS spectrum obtained for ADSSPVK sequence in Sea.

Figure A-16 ESI-MS/MS spectrum obtained for LTVLGQPK sequence in Sea.

92

Figure A-17 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by trypsin digestion of Mcg protein.

Figure A-18 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by trypsin digestion of Sea protein. 93

Figure A-19 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by trypsin digestion of Black protein.

Figure A-20 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by chymotrypsin digestion of Black protein.

94

Figure A-21 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by Glu-C digestion of Black protein.

Figure A-22 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by trypsin digestion of Moz protein.

95

Figure A-23 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by chymotrypsin digestion of May protein.

Figure A-24 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by trypsin digestion of Jen protein.

96

Figure A-25 HPLC-ESI-MS/MS base peak chromatogram showing the separation of peptides obtained by trypsin digestion of Tew protein.

Figure A-26 Predicted sequence of May protein. This was created using the sequence homology of known light chains. Peptides obtained by three different enzyme digestions are shown here; tryptic peptides in blue, peptide from chymotrypsin digestion in green and peptides from glu-c digestion in red. “X” indicates the undetermined amino acids.

97

Figure A-27 Predicted sequence of Moz protein. This was created using the sequence homology of known light chain proteins. Peptides obtained by three different enzyme digestions are shown here; tryptic peptides in blue, peptide from chymotrypsin digestion in green and peptides from glu-c digestion in red. “X” indicates the undetermined amino acids.

Figure A-28 Predicted sequence of Jen protein. This was created using the sequence homology of known light chain proteins. Peptides obtained by three different enzyme digestions are shown here; tryptic peptides in blue, peptide from chymotrypsin digestion in green and peptides from glu-c digestion in red. “X” indicates the undetermined amino acids.

98

Figure A-29 Predicted sequence of Tew protein. This was created using the sequence homology of known light chain proteins. Peptides obtained by three different enzyme digestions are shown here; tryptic peptides in blue, peptide from chymotrypsin digestion in green and peptides from glu-c digestion in red. “X” indicates the undetermined amino acids.

Figure A-30 The total ion chromatograms observed in each step of ETD-HCD-MS3 approach for Black tryptic peptide (amino acid sequence of RPSGIPDRFSGSK and m/z of 468.5). 99

Figure A-31 MS spectra obtained from ETD-HCD-MS3 approach for doubly-charged Black tryptic peptide (amino acid sequence of RPSGIPDRFSGSK and m/z of 468.5).

100

Figure A-32 CD spectra of light chain proteins (A) lambda type light chains (B) kappa type light chain.

Figure A-33 The change in most common scattering particle size of Black light chain with concentration.

101

Figure A-34 The change in proteolytic fragment size for Mcg and Sea with temperature

102

Appendix B

Supplemental tables

Table B.1 A list of peptides identified from trypsin digestion of Black protein

Peptide m/z tR(min) Accession RPSGIPDR 449.2499 3.71 P01701|LV151_HUMAN SYSC(+57.02)QVTHEGSTVEK 856.3857 3.72 P0DOY3|IGLC3_HUMAN KADSSPVK 416.2331 3.73 P0DOY3|IGLC3_HUMAN SC(+57.02)QVTHEGSTVEK 731.3379 3.74 P0DOY3|IGLC3_HUMAN ADSSPVKAGVETTTPSK 558.9581 3.8 P0DOY3|IGLC3_HUMAN ADSSPVK 352.1848 9.7 P0DOY3|IGLC3_HUMAN AGVETTTPSK 495.7589 11.34 P0DOY3|IGLC3_HUMAN PSGIPDR 371.1983 13.19 P01701|LV151_HUMAN TQPPSVSAAPGQK 634.3368 13.7 P01701|LV151_HUMAN QQLPGTAPK 470.2669 13.93 P01701|LV151_HUMAN YAASSY 661.2834 14.55 P0DOY3|IGLC3_HUMAN RPSGIPDRFSGSK 468.5847 14.85 P01701|LV151_HUMAN PDRFSGSK 447.2276 14.94 P01701|LV151_HUMAN GIPDRFSGSK 532.2806 15.25 P01701|LV151_HUMAN YQQLPGTAPK 551.7987 15.52 P01701|LV151_HUMAN PSGIPDRFSGSK 416.5512 15.68 P01701|LV151_HUMAN LLIYDNNKRPSGIPDR 936.0112 17.81 P01701|LV151_HUMAN SVLTQPPSVSAAPGQK 522.9555 18.3 P01701|LV151_HUMAN LLIYDNNK 496.7748 19.55 P01701|LV151_HUMAN QSVLTQPPSVSAAPGQK 847.959 20.78 P01701|LV151_HUMAN LSLTPEQWK 551.3011 22.32 P0DOY3|IGLC3_HUMAN QSNNKYAASSYLSLTPEQWK 772.3829 23.23 P0DOY3|IGLC3_HUMAN AAPSVTLFPPSSEELQANK 993.5162 23.6 P0DOY3|IGLC3_HUMAN SSYLSLTPEQWK 719.8665 24.15 P0DOY3|IGLC3_HUMAN 103

Peptide m/z tR(min) Accession SYLSLTPEQWK 676.3502 24.26 P0DOY3|IGLC3_HUMAN YAASSYLSLTPEQWK 872.4355 25.29 P0DOY3|IGLC3_HUMAN ISDFYPGAVTVAWK 777.4065 25.46 P0DOY3|IGLC3_HUMAN LISDFYPGAVTVAWK 556.3008 27.17 P0DOY3|IGLC3_HUMAN YAASSYLSLTPEQW 808.3877 28.31 P0DOY3|IGLC3_HUMAN VC(+57.02)LISDFYPGAVTVAWK 963.4964 28.94 P0DOY3|IGLC3_HUMAN KATLVC(+57.02)LISDFYPGAVTVAWK 780.4225 30.63 P0DOY3|IGLC3_HUMAN ATLVC(+57.02)LISDFYPGAVTVAWKADSSPVK 724.6306 31.33 P0DOY3|IGLC3_HUMAN TLVC(+57.02)LISDFYPGAVTVAWK 1070.5621 31.84 P0DOY3|IGLC3_HUMAN ATLVC(+57.02)LISDFYPGAVTVAWK 1106.0818 33.25 P0DOY3|IGLC3_HUMAN AAPSVTLFPPSSEELQANKATLVC(+57.02)LISDFYPGAVTV 1045.2964 34.08 P0DOY3|IGLC3_HUMAN AWK

Table B.2 A list of peptides identified from chymotrypsin digestion of Black protein

Peptide m/z tR(min) Accession KSHKSY 375.2011 3.14 P0DOY3|IGLC3_HUMAN KADSSPVKAGVETTTPSKQSNNKY 846.4348 3.71 P0DOY3|IGLC3_HUMAN SC(+57.02)QVTHEGSTVEK 731.3373 3.72 P0DOY3|IGLC3_HUMAN KADSSPVKAGVETTTPSKQSNNKYAASSY 754.8792 3.74 P0DOY3|IGLC3_HUMAN TVAPTEC(+57.02)S 432.693 3.75 P0DOY3|IGLC3_HUMAN AGVETTTPSKQSNNKYAASSY 735.3563 3.76 P0DOY3|IGLC3_HUMAN QSNNKYAASSY 616.7825 3.76 P0DOY3|IGLC3_HUMAN AGVETTTPSKQSNNKY 862.9269 12.4 P0DOY3|IGLC3_HUMAN SNNKYAASSY 552.7518 12.81 P0DOY3|IGLC3_HUMAN SGSKSGTSATL 498.2541 12.94 P01701|LV151_HUMAN SC(+57.02)QVTHEGSTVEKTVAPTEC(+57.02)S 1154.0175 15.77 P0DOY3|IGLC3_HUMAN DNNKRPSGIPDRF 505.9276 17.2 P01701|LV151_HUMAN IYDNNKRPSGIPDRF 597.9767 17.77 P01701|LV151_HUMAN AGVETTTPSKQSNNKYAASSYL 773.0515 18.56 P0DOY3|IGLC3_HUMAN LSLTPEQWKSHKSY 568.6306 18.74 P0DOY3|IGLC3_HUMAN QSNNKYAASSYL 673.3237 19.37 P0DOY3|IGLC3_HUMAN SNNKYAASSYL 609.2942 19.71 P0DOY3|IGLC3_HUMAN GITGLQTGDEADYY 751.8372 22.34 P01701|LV151_HUMAN QQLPGTAPKLL 583.3512 22.4 P01701|LV151_HUMAN YQQLPGTAPKLL 664.8836 23.26 P01701|LV151_HUMAN SGSKSGTSATLGITGLQTGDEADYY 1240.0781 23.35 P01701|LV151_HUMAN GQPKAAPSVTLFPPSSEEL 978.0114 25.11 P0DOY3|IGLC3_HUMAN GITGLQTGDEADYYC(+57.02)GTW 1003.9271 25.54 P01701|LV151_HUMAN 104

YQQLPGTAPKLLIY 802.9576 25.7 P01701|LV151_HUMAN LSLTPEQW 973.5015 26.16 P0DOY3|IGLC3_HUMAN AASSYLSLTPEQW 726.8558 27.42 P0DOY3|IGLC3_HUMAN ISDFYPGAVTVAW 713.3572 29.21 P0DOY3|IGLC3_HUMAN C(+57.02)LISDFYPGAVTVAW 849.9178 31.75 P0DOY3|IGLC3_HUMAN VC(+57.02)LISDFYPGAVTVAW 899.4504 32.18 P0DOY3|IGLC3_HUMAN TLVC(+57.02)LISDFYPGAVTVAW 1006.5175 34.67 P0DOY3|IGLC3_HUMAN

Table B.3 A list of peptides identified from glu-c digestion of Black protein

Peptide m/z tR(min) Accession QWKSHKSYSC(+57.02)QVTHE 635.6292 3.68 P0DOY3|IGLC3_HUMAN KTVAPTE 373.2082 3.69 P0DOY3|IGLC3_HUMAN SHKSYSC(+57.02)QVTHE 731.8232 3.7 P0DOY3|IGLC3_HUMAN KTVAPTEC(+57.02)S 496.7385 10.39 P0DOY3|IGLC3_HUMAN KTVAPTEC(+57.02) 453.2226 10.4 P0DOY3|IGLC3_HUMAN SYSC(+57.02)QVTHE 555.7284 11.49 P0DOY3|IGLC3_HUMAN SSPVKAGVE 437.2364 12.3 P0DOY3|IGLC3_HUMAN ADSSPVKAGVE 530.2684 12.87 P0DOY3|IGLC3_HUMAN GSTVEKTVAPTE 609.8137 13.07 P0DOY3|IGLC3_HUMAN SLYQSKYEE 573.7684 15.04 P04264|K2C1_HUMAN WKADSSPVKAGVE 458.5726 15.21 P0DOY3|IGLC3_HUMAN LQANKATLVC(+57.02) 559.3046 15.81 P0DOY3|IGLC3_HUMAN AVTVAWKADSSPVKAGVE 605.6592 18.58 P0DOY3|IGLC3_HUMAN GAVTVAWKADSSPVKAGVE 936.4946 18.81 P0DOY3|IGLC3_HUMAN NNKRPSGIPDRFSGSKSGTSATLGITGLQTGDE 837.9247 19.08 P01701|LV151_HUMAN TTTPSKQSNNKYAASSYLSLTPE 830.0785 20.1 P0DOY3|IGLC3_HUMAN ADYYC(+57.02)GTWD 575.71 20.4 P01701|LV151_HUMAN DRFSGSKSGTSATLGITGLQTGDE 795.7216 20.41 P01701|LV151_HUMAN GSKSGTSATLGITGLQTGDE 940.465 20.45 P01701|LV151_HUMAN SKSGTSATLGITGLQTGDE 911.9544 20.46 P01701|LV151_HUMAN KSGTSATLGITGLQTGDE 868.4374 20.5 P01701|LV151_HUMAN FSGSKSGTSATLGITGLQTGDE 1057.513 20.97 P01701|LV151_HUMAN TTTPSKQSNNKYAASSYLSLTPEQ 977.4856 21.03 P0DOY3|IGLC3_HUMAN GIPDRFSGSKSGTSATLGITGLQTGDE 884.7736 21.04 P01701|LV151_HUMAN QSNNKYAASSYLSLTPE 936.9513 21.42 P0DOY3|IGLC3_HUMAN SNNKYAASSYLSLTPE 872.9237 21.46 P0DOY3|IGLC3_HUMAN

105

GQPKAAPSVTLFPPSSEE 921.4688 21.54 P0DOY3|IGLC3_HUMAN FYPGAVTVAWKADSSPVKAGVE 760.3939 21.97 P0DOY3|IGLC3_HUMAN TSATLGITGLQTGDE 732.3639 22.22 P01701|LV151_HUMAN SGTSATLGITGLQTGDE 804.3908 22.24 P01701|LV151_HUMAN ATLGITGLQTGDE 638.324 22.39 P01701|LV151_HUMAN AASSYLSLTPE 569.784 22.45 P0DOY3|IGLC3_HUMAN LQANKATLVC(+57.02)LISD 773.418 23.61 P0DOY3|IGLC3_HUMAN AAPSVTLFPPSSEE 716.3531 24.15 P0DOY3|IGLC3_HUMAN ELQANKATLVC(+57.02)LISD 837.9396 24.16 P0DOY3|IGLC3_HUMAN KATLVC(+57.02)LISDFYPGAVTVAWKADSSPVKAGVE 845.6965 28.78 P0DOY3|IGLC3_HUMAN LQANKATLVC(+57.02)LISDFYPGAVTVAWKADSSPVKAGVE 1269.335 29.32 P0DOY3|IGLC3_HUMAN LQANKATLVC(+57.02)LISDFYPGAVTVAWKAD 984.5159 30.57 P0DOY3|IGLC3_HUMAN ATLVC(+57.02)LISDFYPGAVTVAWKADSSPVKAGVE 813.6719 30.64 P0DOY3|IGLC3_HUMAN

Table B.4 A list of peptides identified from trypsin digestion of May protein

Peptide m/z tR(min) Accession VC(+57.02)LISDFYPGAVTVA 963.4976 27.7 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN WK VDNALQSGNSQESVTEQDSK 712.662 13.34 P01834|IGKC_HUMAN AAPSVTLFPPSSEELQANK 993.5153 23.24 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN ATLVC(+57.02)LISDFYPGAV 737.7241 31.72 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN TVAWK SYSC(+57.02)QVTHEGSTVEK 856.3847 11.93 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN AGVETTTPSK 495.7593 9.07 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN LISDFYPGAVTVAWK 556.3005 25.85 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN ISDFYPGAVTVAWK 518.6057 24.11 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN SYLSLTPEQWK 676.3501 22.95 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN RPSGVPDR 442.2425 3.59 P01700|LV147_HUMAN:P01699|LV144_HUMAN WYQQLPGTAPK 644.839 18.2 P01700|LV147_HUMAN:P01699|LV144_HUMAN SSYLSLTPEQWK 719.8666 22.9 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN YAASSYLSLTPEQWK 581.959 23.73 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN QSNNKYAASSYLSLTPEQWK 772.3842 22.02 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN YQQLPGTAPK 551.7989 13.79 P01700|LV147_HUMAN:P01699|LV144_HUMAN YAASSYLSLTPEQW 808.387 26.88 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P15814|I GLL1_HUMAN ATLVC(+57.02)LISDFYPGAV 1042.036 34.56 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN TVAW ADSSPVK 352.1858 3.59 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN GQPKAAPSVTLFPPSSEELQA 799.4183 26.61 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN NK LSLTPEQWK 551.3018 21 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN

106

YAASSY 661.2839 12.34 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P15814|I GLL1_HUMAN TVAPTEC(+57.02)S 864.3784 10.62 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN LTVLSQPK 443.2744 16.64 P15814|IGLL1_HUMAN

Table B.5 A list of peptides identified from chymotrypsin digestion of May protein

Peptide m/z tR(min) Accession KADSSPVKAGVETTTPSKQ 644.3438 3.63 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN

KADSSPVKAGVETTTPSK 601.6575 3.63 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN YQQLPGTAPKLL 664.8843 21.66 P01699|LV144_HUMAN:P01701|LV151_HUMAN AGVETTTPSKQSNNKY 862.9277 11.28 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN VC(+57.02)LISDFYPGAVTVA 899.4503 31.13 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P0DOX8|IG W L1_HUMAN TISGLQAEDEADYY 787.849 20.93 P01706|LV211_HUMAN:P01704|LV214_HUMAN ASLAISGLQSEDEADYY 916.4177 24.02 P01699|LV144_HUMAN C(+57.02)LISDFYPGAVTVAW 849.916 30.68 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P0DOX8|IG L1_HUMAN QQLPGTAPKLL 583.3519 20.71 P01699|LV144_HUMAN:P01701|LV151_HUMAN ISGLQSEDEADYY 745.3226 19.91 P01699|LV144_HUMAN QVSLQDKTGF 561.7944 18.33 P17538|CTRB1_HUMAN:Q6GPI1|CTRB2_HUMAN AC(+57.02)EVTHQGL 507.7384 14.2 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN ISDFYPGAVTVAW 713.3577 28.08 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P0DOX8|IG L1_HUMAN SLAISGLQSEDEADYY 880.9002 23.33 P01699|LV144_HUMAN SC(+57.02)QVTHEGSTVEK 731.339 3.65 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P0DOX8|IG L1_HUMAN ATLVC(+57.02)LISDFYPGAV 1042.036 34.69 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P0DOX8|IG TVAW L1_HUMAN AAPSVTLFPPSSEEL 772.8976 26.67 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN AISGLQSEDEADYY 780.8395 20.99 P01699|LV144_HUMAN LSLTPEQW 973.5005 25.11 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P0DOX8| YQQLPGTAPKLLIY 802.9581 24.44 P01699|LV144_HUMAN:P01701|LV151_HUMAN

Table B.6 A list of peptides identified from glu-c digestion of May protein

Peptide m/z tR(min) Accession GSTVEKTVAPTEC(+57.02)S(+14.02) 740.3517 13.51 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N KTVAPTEC(+57.02)S(+14.02) 503.7458 10.77 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N SYS(+14.02)C(+57.02)QVTHE 562.7356 12.37 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N

107

TTTPSKQSNNKYAASSYLSLTPE 830.0765 19.99 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N TTTPSKQ(+.98)SNNKYAASSYLSLTPE 830.4049 20.66 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N KTVAPTE(+14.02)C(+57.02)S 1006.485 3.68 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N KTVAPTE 373.207 3.68 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N QWK(+14.02)SHRSYSC(+57.02)QVTHE 487.4771 3.86 P0DOY2|IGLC2_HUMAN KTVAPT(+14.02)EC(+57.02)S 1006.484 10.54 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N QWK(+42.01)SHKSYSC(+57.02)QVTHE 487.4771 10.89 P0DOY3|IGLC3_HUMAN GSTVE 492.2284 5.15 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N Q(- 483.2207 13.5 P0DOY3|IGLC3_HUMAN 17.03)WK(+42.01)SHKSYSC(+57.02)QVTH E SYSC(+57.02)QVT(+14.02)HE 562.7354 11.7 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N QWKSHK(+42.01)SYSC(+57.02)QVTHE 487.4774 3.68 P0DOY3|IGLC3_HUMAN AAPSVTLFPPSSEE 716.3519 23.77 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N QWKSHRSYS(+14.02)C(+57.02)QVTHE 649.634 3.68 P0DOY2|IGLC2_HUMAN Q(+42.01)WKSHKS(- 482.9746 3.69 P0DOY3|IGLC3_HUMAN 18.01)YSC(+57.02)QVTHE SNNKYAASSYLSLTPE 872.9219 21.62 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N QSNNKYAASSYLSLTPE 936.9498 21.58 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMA N

Table B.7 A list of peptides identified from trypsin digestion of Moz protein

Peptide m/z tR(m Accession in) SYSC(+57.02)QVTHEGSTVEK 856.3 3.68 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 87 0DOX8|IGL1_HUMAN ADSSPVK 703.3 5.52 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN 635 AGVETTTPSK 495.7 9.48 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN 598 TVAPTEC(+57.02)S 432.6 10.7 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 927 4 0DOX8|IGL1_HUMAN VTVLGQPK 421.2 14.7 P0DOX8|IGL1_HUMAN 613 8 GTLVTVSSASTK 575.8 14.9 P0DOX5|IGG1_HUMAN 202 6 VYAC(+57.02)EVTHQGLSSPVTK 625.9 15.7 P01834|IGKC_HUMAN 815 3 LMIYEVSKR 569.8 17.6 P01709|LV208_HUMAN 193 LM(+15.99)IYEVSK 499.7 17.7 P01709|LV208_HUMAN 664 8 QSNNKYAASSYL 673.3 18.1 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 242 0DOX8|IGL1_HUMAN FSGSGSGTDFTLK 652.3 18.6 P06310|KV230_HUMAN:A0A075B6S6|KVD30_HUM 134 3 AN:A0A075B6P5| 108

SLSSTLTLSK 518.7 18.9 P01834|IGKC_HUMAN 989 1 STSGGTAALGC(+57.02)LVK 661.3 19.3 P0DOX5|IGG1_HUMAN 442 9 YAASSYL 774.3 19.9 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 687 4 0DOX8|IGL1_HUMAN LMIYEVSK 491.7 19.9 P01709|LV208_HUMAN 685 8 GPSVFPLAPSSK 593.8 20.8 P0DOX5|IGG1_HUMAN 282 4 LSLTPEQWK 551.3 20.9 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 022 3 0DOX8|IGL1_HUMAN QSNNKYAASSYLSLTPEQWK 772.3 22.0 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 854 8 0DOX8|IGL1_HUMAN AAPSVTLFPPSSEELQANK 993.5 22.1 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN 17 5 YAASSYLSLTPEQ 715.3 22.5 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 488 1 0DOX8|IGL1_HUMAN ANPTVTLFPPSSEELQANK 1022. 22.6 P0DOX8|IGL1_HUMAN 028 3 SYLSLTPEQWK 676.3 23.0 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 511 5 0DOX8|IGL1_HUMAN SSYLSLTPEQWK 719.8 23.0 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 672 8 0DOX8|IGL1_HUMAN SGNTASLTISGLQAEDEADYYC(+57.02)SSYT 935.0 23.8 P01704|LV214_HUMAN 659 5 SGNTASLTISGLQAEDEADYY 1102. 24.0 P01704|LV214_HUMAN 999 4 YAASSYLSLTPEQWK 872.4 24.1 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 359 4 0DOX8|IGL1_HUMAN ISDFYPGAVTVAWK 777.4 24.2 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 062 0DOX8|IGL1_HUMAN TVAAPSVFIFPPSDEQLK 649.3 25.4 P01834|IGKC_HUMAN 493 9 LISDFYPGAVTVAWK 556.3 25.8 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 009 9 0DOX8|IGL1_HUMAN VVSVLTVLHQDWLNGK 603.3 27.4 P0DOX5|IGG1_HUMAN 419 8 VC(+57.02)LISDFYPGAVTVAWK 642.6 27.8 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 677 6 0DOX8|IGL1_HUMAN SGNTASLTISGLQAEDEADYYC(+57.02)SSYT 1168. 28.9 P01704|LV214_HUMAN SSSTLHS 168 5 KATLVC(+57.02)LISDFYPGAVTVAWK 780.4 29.5 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 235 5 0DOX8|IGL1_HUMAN ATLVC(+57.02)LISDFYPGAVTVAWKADSSP 724.6 30.2 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN VK 326 8 TLVC(+57.02)LISDFYPGAVTVAWK 714.0 30.7 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 458 2 0DOX8|IGL1_HUMAN ATLVC(+57.02)LISDFYPG 728.3 31.7 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 664 9 0DOX8|IGL1_HUMAN ATLVC(+57.02)LISDFYPGAVTVAWK 1106. 31.8 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 085 9 0DOX8|IGL1_HUMAN ATLVC(+57.02)LISDFYPGAVT 863.9 31.9 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 435 7 0DOX8|IGL1_HUMAN ANPTVTLFPPSSEELQANKATLVC(+57.02)LI 1059. 32.3 P0DOX8|IGL1_HUMAN SDFYPGAVTVAWK 553 1 ATLVC(+57.02)LISDFYPGAVTV 609.3 33.0 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 201 2 0DOX8|IGL1_HUMAN AAPSVTLFPPSSEELQANKATLVC(+57.02)LIS 1045. 33.7 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN DFYPGAVTVAWK 297 6 ATLVC(+57.02)LISDFYPGAVTVAW 1042. 34.5 P0DOY2|IGLC2_HUMAN:P0DOY3|IGLC3_HUMAN:P 036 7 0DOX8|IGL1_HUMAN 109

Table B.8 A list of peptides identified from chymotrypsin digestion of Moz protein

Peptide m/z tR(mi Accession n) KADSSPVKAGVETTTPSKQ 644.34 3.64 P0DOY2|IGLC2_HUMAN 37 AGVETTTPSKQSNNKY 862.93 3.65 P0DOY2|IGLC2_HUMAN KADSSPVKAGVETTTPSK 901.98 3.66 P0DOY2|IGLC2_HUMAN 27 KADSSPVK 416.23 4.23 P0DOY2|IGLC2_HUMAN 29 SC(+57.02)QVTHEGSTVEK 487.89 9.39 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 37 SGSKSGNTASL 504.75 10.71 P01705|LV223_HUMAN:P0DOX8|IGL1_HUMAN:P01706|LV 25 211_HUMAN SNNKYAASSY 552.75 12.16 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 3 KADSSPVKAGVETTTPSKQSNNKY 783.15 16.19 P0DOY2|IGLC2_HUMAN AASSYL 13

LSLTPEQWKSHR 741.39 17.39 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 97 TQPASVSGSPGQSITS 808.91 18.38 P01705|LV223_HUMAN:P01706|LV211_HUMAN 29 YQQHPGKAPKLMLY 577.96 18.42 P01705|LV223_HUMAN:P01706|LV211_HUMAN 94 SNNKYAASSYL 609.29 18.42 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 5 QSNNKYAASSYL 673.32 18.57 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 45 TISGLQAEDEADY 706.31 19.41 P01705|LV223_HUMAN:P01706|LV211_HUMAN 69 SVSVVETDYDQY 702.81 19.94 P41222|PTGDS_HUMAN 48 RFSGSKSGNTASLTISGLQAEDEAD 902.09 20.54 P01705|LV223_HUMAN:P01706|LV211_HUMAN Y 74 TISGLQAEDEADYY 787.84 20.66 P01705|LV223_HUMAN:P01706|LV211_HUMAN 86 LLQPAGSLGSY 553.29 21.68 P41222|PTGDS_HUMAN 98 SGSKSGNTASLTISGLQAEDEADYY 1282.5 22.16 P01705|LV223_HUMAN:P01706|LV211_HUMAN 88 SGNTASLTISGLQAEDEADYY 1103 23.8 P01705|LV223_HUMAN:P01706|LV211_HUMAN LSLTPEQW 487.25 24.95 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 49 AASSYLSLTPEQW 726.85 26.25 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 64 SDFYPGAVTVAW 656.81 27.28 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 65 ISDFYPGAVTVAW 713.35 28.18 P0DOY2|IGLC2_HUMAN:P0DOX8|IGL1_HUMAN 91

110

Table B.9 A list of peptides identified from glu-c digestion of Moz protein

Peptide m/z tR(mi Accession n) KTVAPTE 373.20 3.63 P0DOY2|IGLC2_HUMAN 92 QWKSHRSYSC(+57.02)QVTHE 483.97 10.72 P0DOY2|IGLC2_HUMAN 51 SYSC(+57.02)QVTHE 555.73 11.65 P0DOY2|IGLC2_HUMAN 02 SSPVKAGVE 437.23 11.85 P0DOY2|IGLC2_HUMAN 76 SYVVHTNYDE 613.76 15.53 P02760|AMBP_HUMAN 98 LRADGTVNQIEGE 701.35 16.35 P05090|APOD_HUMAN 36 KAPKLM(+15.99)IYE 554.80 17.14 P01705|LV223_HUMAN:P01704|LV214_HUMAN:P01 75 709|LV208_HUMAN LLRFSN 375.21 18.43 P02760|AMBP_HUMAN 92 AVTVAWKADSSPVKAGVE 605.66 18.52 P0DOY2|IGLC2_HUMAN 01 GAVTVAWKADSSPVKAGVE 624.66 18.73 P0DOY2|IGLC2_HUMAN 79 TTTPSKQSNNKYAASSYLSLTPE 830.08 19.98 P0DOY2|IGLC2_HUMAN 09 SGSKSGNTASLTISGLQAEDE 1026.4 20.08 P01705|LV223_HUMAN:P01704|LV214_HUMAN:P01 93 706|LV211_HUMAN FSGSKSGNTASLTISGLQAEDE 733.68 20.64 P01705|LV223_HUMAN:P01704|LV214_HUMAN:P01 71 706|LV211_HUMAN SNNKYAASSYLSLTPE 872.92 21.44 P0DOY2|IGLC2_HUMAN 44 NKYAASSYLSLTPE 772.38 21.51 P0DOY2|IGLC2_HUMAN 7 GQPKAAPSVTLFPPSSEE 921.46 21.57 P0DOY2|IGLC2_HUMAN 89 SLTISGLQAEDE 631.81 21.72 P01705|LV223_HUMAN:P01704|LV214_HUMAN:P01 01 706|LV211_HUMAN SGNTASLTISGLQAEDE 846.90 21.73 P01705|LV223_HUMAN:P01704|LV214_HUMAN:P01 15 706|LV211_HUMAN TASLTISGLQAEDE 717.85 22.07 P01705|LV223_HUMAN:P01704|LV214_HUMAN:P01 37 706|LV211_HUMAN SYLSLTPE 455.23 22.55 P0DOY2|IGLC2_HUMAN 25 FYPGAVTVAWKAD 712.86 23 P0DOY2|IGLC2_HUMAN 55 LQANKATLVC(+57.02)LISD 773.42 23.69 P0DOY2|IGLC2_HUMAN 21 NFDVNKYLGRWYE 568.61 24.37 P05090|APOD_HUMAN 22 LQANKATLVC(+57.02)LISDFYPGAVTVAW 1269.3 29.33 P0DOY2|IGLC2_HUMAN KADSSPVKAGVE 39 TLLQDFRVVAQGVGIPE 921.51 29.47 P02760|AMBP_HUMAN 27

111

Table B.10 A list of peptides identified from trypsin digestion of Jen protein

Peptide m/z tR(mi Accession n) VDNALQSGNSQESVTEQDSK 1068.4 17.71 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 87 TVAAPSVFIFPPSDEQLK 973.51 28.38 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 7 SGTASVVC(+57.02)LLNNFYPR 899.45 30.1 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 12 EIVLTQSPGTLSLSPGER 942.50 25.25 P01619|KV320_HUMAN 74 DSTYSLSSTLTLSK 751.88 24.3 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 35 GVC(+57.02)EETSGAYEK 665.28 15.95 P02760|AMBP_HUMAN 45 VLGEGATEAEISM(+15.99)TSTR 884.42 20.25 P02760|AMBP_HUMAN 56 SDVVYTDWK 556.76 21.17 P02763|A1AG1_HUMAN 63 WYNLAIGSTC(+57.02)PWLK 854.92 28.6 P02760|AMBP_HUMAN 94 LVNEVTEFAK 575.31 21.29 P02768|ALBU_HUMAN 16 AYLEEEC(+57.02)PATLR 726.34 20.86 P25311|ZA2G_HUMAN 55 RHPDYSVVLLLR 489.95 24.12 P02768|ALBU_HUMAN 28 TVAAC(+57.02)NLPIVR 607.33 21.43 P02760|AMBP_HUMAN 52 THQGLSSPVTK 577.81 15.2 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 16 DDNPNLPR 470.72 16.58 P02768|ALBU_HUMAN 74 LLNNFYPR 518.78 22.36 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 25 VFDEFKPLVEEPQNLIK 682.37 27.11 P02768|ALBU_HUMAN 06 AVMDDFAAFVEK 671.82 27.46 P02768|ALBU_HUMAN 17 AC(+57.02)EVTHQGLSSPVTK 538.60 16.98 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 32 QNC(+57.02)ELFEQLGEYK 829.38 25.06 P02768|ALBU_HUMAN 02 AGALNSNDAFVLK 660.35 22.57 P06396|GELS_HUMAN 16 SLSSTLTLSK 518.79 20.56 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 82 SLHTLFGDK 509.27 19.97 P02768|ALBU_HUMAN 17 C(+57.02)LLNNFYPR 598.79 24.89 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 79 KVDNALQSGNSQESVTEQDSK 755.35 16.41 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 91 AGEVQEPELR 564.28 18.14 P25311|ZA2G_HUMAN 8 EVTHQGLSSPVTK 691.86 16.88 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 72 TVAAPSVFIFPPSDEQLKSGTASVVC(+57.02)LL 1242.3 33.77 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN NNFYPR 06

112

VVAQGVGIPEDSIF 715.87 28.64 P02760|AMBP_HUMAN 98 YQQKPGQAPR 586.81 13.99 P01619|KV320_HUMAN:P04433|KV311_HUMA 21 N EIPAWVPFDPAAQITK 891.97 29.34 P25311|ZA2G_HUMAN 47 VFIFPPSDEQLK 710.37 26.03 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 95 AVM(+15.99)DDFAAFVEK 679.81 24.8 P02768|ALBU_HUMAN 96 RTVAAPSVFIFPPSDEQLK 701.38 26.29 P01834|IGKC_HUMAN 17 ATLSLSPGER 515.78 18.94 P04433|KV311_HUMAN SGTASVVC(+57.02)LLN 560.78 26.17 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 72 HPYFYAPELLFFAK 581.63 30.25 P02768|ALBU_HUMAN 61 AQPVQVAEGSEPDGFWEALGGK 1136.5 27.6 P06396|GELS_HUMAN 48 YQQLPGTAPK 551.79 17.52 P01701|LV151_HUMAN:P01700|LV147_HUMAN 82 :P01699| SVVC(+57.02)LLNNFYPR 741.38 27.41 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 24 KVPQVSTPTLVEVSR 547.31 21.31 P02768|ALBU_HUMAN 74 AFIQLWAFDAVK 704.88 30.49 P02760|AMBP_HUMAN 5 LDELRDEGK 537.77 15.28 P02768|ALBU_HUMAN 46 GTVAAPSVFIFPPSDEQLK 1002.0 28.44 P0DOX7|IGK_HUMAN 28 GEIVLTQSPGTLSLSPGER 971.01 25.3 P01619|KV320_HUMAN 84 VVAQGVGIPEDSIFTM(+15.99)ADR 1011.0 25.74 P02760|AMBP_HUMAN 05 RPC(+57.02)FSALEVDETYVPK 955.96 23.68 P02768|ALBU_HUMAN 9 YIC(+57.02)ENQDSISSK 722.32 16.96 P02768|ALBU_HUMAN 4 EYC(+57.02)GVPGDGDEELLR 854.87 22.79 P02760|AMBP_HUMAN 79 LNNFYPR 462.24 19.74 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 01 FQNALLVR 480.78 21.37 P02768|ALBU_HUMAN 49 DSQEEEKTEALTSAK 833.39 16.64 P06396|GELS_HUMAN 54 ETLLQDFR 511.26 25.02 P02760|AMBP_HUMAN 93 ETYGEM(+15.99)ADC(+57.02)C(+57.02)AK 725.76 15.36 P02768|ALBU_HUMAN 86 AAFTEC(+57.02)C(+57.02)QAADK 686.28 16.65 P02768|ALBU_HUMAN 67 TC(+57.02)VADESAENC(+57.02)DK 749.79 14.7 P02768|ALBU_HUMAN 25 DVFLGM(+15.99)FLYEYAR 820.39 31.53 P02768|ALBU_HUMAN 54 VLGEGATEAEISMTSTR 876.42 22.19 P02760|AMBP_HUMAN 79 EQLGEFYEALDC(+57.02)LR 871.90 28.73 P02763|A1AG1_HUMAN 56 113

YSLTYIYTGLSK 704.87 25.9 P25311|ZA2G_HUMAN 18

Table B.11 A list of peptides identified from chymotrypsin digestion of Jen protein

Peptide m/z tR(min) Accession AC(+57.02)EVTHQGL 507.7383 13.78 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN AC(+57.02)EVTHQGLSSPVTK 807.4027 13.86 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN KVDNALQSGNSQESVTEQDSKDSTY 910.7522 14.5 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN SSPVTKSF 426.7275 14.86 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN SLSPGERATL 515.7809 16.35 P01619|KV320_HUMAN AC(+57.02)EVTHQGLSSPVTKSF 924.4525 17.94 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN QVSLQDKTGF 561.7945 18.42 P17538|CTRB1_HUMAN:Q6GPI1|CTRB2_HUMAN RTVAAPSVF 474.2698 19.59 P01834|IGKC_HUMAN TLSLSPGERATL 622.8477 20.42 P01619|KV320_HUMAN LNNFYPREAKVQW 832.9326 20.82 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN IFPPSDEQLKSGTASVV 887.967 21.49 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN IFPPSDEQL 523.265 21.51 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN EIVLTQSPGTL 579.3255 22.71 P01619|KV320_HUMAN IFPPSDEQLKSGTASVVC(+57.02)L 1024.526 23.15 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN KSGTASVVC(+57.02)LL 567.8138 23.62 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN TLVGIVSW 437.756 28.24 P17538|CTRB1_HUMAN:Q6GPI1|CTRB2_HUMAN

Table B.12 A list of peptides identified from glu-c digestion of Jen protein

Peptide m/z tR(min) Accession SPVTKSFNRGEC(+57.02) 461.2213 12.45 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VTHQGLSSPVTK 627.3448 12.63 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN QDSKDSTYSLS 615.7771 13.95 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VTHQGLSSPVTKSFNRGEC(+57.02) 702.0115 15.24 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VTHQGLSSPVTKSFNRGE 648.668 15.29 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN GLSSPVTKSFNRGEC(+57.02) 546.9338 16.33 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN AKVQWKVDNALQSGNSQE 1001.501 17.81 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN AKVQWKVDNALQSGNSQESVTE 806.733 18.59 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN QDSKDSTYSLSSTLTLSKADYE 813.7179 21.29 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN IVLTQSPGTLSLSPGE 799.9346 23.19 P01619|KV320_HUMAN EIVLTQSPGTLSLSPGE 864.4551 23.83 P01619|KV320_HUMAN GEIVLTQSPGTLSLSPGE 892.9667 23.87 P01619|KV320_HUMAN QLKSGTASVVC(+57.02)LLNNFYPRE 766.0626 26.58 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN

114

Table B.13 A list of peptides identified from trypsin digestion of Tew protein

Peptide m/z tR(min Accession ) ASGVPDR 351.183 3.6 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 8 N SFNRGEC(+57.02) 435.182 8.85 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 3 VDNALQSGN 459.220 12.11 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 4 AC(+57.02)EVTHQGLSSPVTK 538.604 13.5 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 2 HKVYAC(+57.02)EVTHQGL 514.589 13.66 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 8 VDNALQSGNSQESVTEQDSK 1068.49 13.69 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VYAC(+57.02)EVTHQGLSSPVTK 938.468 15.69 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 1 LNNFYPR 462.240 16.53 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 8 VYAC(+57.02)EVTHQGL 638.804 16.98 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 3 SLPVTPGEPASISC(+57.02)R 524.269 17.82 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 1 N VQWKVDNALQSGNSQESVTEQDSK 893.097 17.97 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 1 ISRVEAEDVGVY 668.842 18.15 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 1 N VEAEDVGVY 490.732 18.2 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 9 N FSGSGSGTDFTLK 652.313 18.73 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 8 N SLSSTLTLSK 518.798 18.89 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 6 ASGVPDRFSGSGSGTDFTLK 662.658 19.24 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA N ISRVEAEDVGVYY 750.374 20.2 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 1 N VEAEDVGVYY 572.264 20.54 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 7 N TLNNDIMLIK 587.829 22.42 P07477|TRY1_HUMAN 7 RTVAAPSVFIFPPSDEQLK 701.382 23.64 P01834|IGKC_HUMAN 5 DIVMTQSPLSLPVTPGEPASISC(+57.02)R 852.434 24.46 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA 6 N GDIVMTQSPLSLPVTPGEPASISC(+57.02) 871.442 24.78 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUMA R 8 N TVAAPSVFIFPPSDEQLK 973.518 25.99 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 5 SGTASVVC(+57.02)LLNNFYPR 899.453 27.89 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 3

115

Table B.14 A list of peptides identified from chymotrypsin digestion of Tew protein

Peptide m/z tR(mi Accession n) EKHKVY 402.22 3.61 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 5 LQKPGQSPQ 491.77 3.64 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 1 AN:A2NJV5 TLSKADYEKHKVY 527.94 3.69 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 89 SFNRGEC(+57.02) 435.18 9.46 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 29 SKADYEKHKVY 456.57 9.49 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 05 TLSKADYEK 527.77 10.24 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 55 KVDNALQSGNSQ 630.81 11.35 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 34 YLQKPGQSPQ 573.30 11.58 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 22 AN:A2NJV5 TLSKADY 797.40 12.92 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 55 AC(+57.02)EVTHQGL 507.73 13.82 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 87 AC(+57.02)EVTHQGLSSPVTK 538.60 14.04 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 5 SGSGSGTDF 814.32 14.22 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 4 AN:A2NJV5| KVDNALQSGNSQESVTEQDSKDSTY 1365.6 14.54 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 24 RASGVPDRF 502.76 14.65 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 9 AN TLSKADYEKHKVYAC(+57.02)EVTHQGL 645.07 14.83 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 68 AC(+57.02)EVTHQGLSSPVTKSFNRGEC(+57. 822.05 15.8 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 02) 18 AC(+57.02)EVTHQGLSSPV 692.83 16.45 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 15 SLSPGERATL 515.78 16.52 P01619|KV320_HUMAN:P04433|KV311_HUMAN:A 12 0A0C4DH25| RVEAEDVGVY 568.78 17.23 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 44 AN:A2NJV5| VYAC(+57.02)EVTHQGL 638.80 17.31 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 49 LNNFYPREA 562.28 17.45 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 14 SLPVTPGEPA 484.25 18.04 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 99 AN SLPVTPGEPASISC(+57.02)R 524.26 18.09 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 94 AN WKVDNALQSGNSQESVTEQDSKDSTY 972.77 18.26 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 89 ISRVEAEDVGVY 668.84 18.26 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 35 AN:A2NJV5| SGVPDRFSGSGSGTDF 786.85 18.93 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 16 AN:A2NJV5| VQWKVDNALQSGNSQESVTEQDSKDSTY 1048.4 18.99 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 88 NNFYPREAKVQWKVDNALQSGNSQESVTE 1066.5 19.02 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN QDSKDSTY 05

116

RVEAEDVGVYY 650.31 19.17 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 62 AN:A2NJV5| ASGVPDRFSGSGSGTDF 822.37 19.18 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 04 AN TLKISRVEAEDVGVY 839.95 19.43 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 67 AN:A2NJV5| RTVAAPSVF 474.27 19.58 P01834|IGKC_HUMAN 04 SLPVTPGEPASISC(+57.02) 707.84 19.6 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 95 AN SLPVTPGEPASIS 627.83 19.74 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 36 AN SLPVTPGEPASISC(+57.02)RSSQSL 691.68 19.75 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 28 AN NNFYPREAKVQW 776.39 19.77 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 09 LQKPGQSPQLL 604.85 19.95 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 54 AN:A2NJV5| DIVM(+15.99)TQSPL 510.25 20.39 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 89 AN LNNFYPREAKVQW 832.93 20.85 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 35 IFPPSDEQL 523.26 21.29 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 44 SLPVTPGEPASI 584.31 21.33 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 79 AN IVMTQSPL 444.74 21.44 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 76 AN IFPPSDEQLKSGTASVV 887.96 21.48 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 69 ASGVPDRFSGSGSGTDFTL 929.43 22.54 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 75 AN GDIVMTQSPL 530.77 23.13 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 22 AN DIVMTQSPL 502.26 23.44 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 12 AN SGSGSGTDFTLKISRVEAEDVGVYY 1319.1 23.57 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM 42 AN:A2NJV5| IFPPSDEQLKSGTASVVC(+57.02)L 1024.5 24.49 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 25 DIVMTQSPLSLPVTPGEPASISC(+57.02)RSSQ 1019.8 25.37 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM SL 5 AN GDVVMTQSPL 523.76 25.84 P06310|KV230_HUMAN:A0A075B6S6|KVD30_HUM 38 AN DIVMTQSPLSLPVTPGEPASISC(+57.02)RSSQ 1057.5 26.97 A0A075B6P5|KV228_HUMAN:P01615|KVD28_HUM SLL 44 AN IFPPSDEQLKSGTASVVC(+57.02)LL 1081.0 27.85 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN 67

Table B.15 A list of peptides identified from glu-c digestion of Tew protein

Peptide m/z tR(min) Accession KHKVYAC(+57.02)E 517.7578 3.61 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VTHQGLS 371.198 3.64 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN

117

VTHQGLSSPVTK 418.5661 12.46 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN SPVTKSFNRGEC(+57.02) 691.3301 12.61 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN NALQSGNSQESVTE 732.3344 14.47 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VTHQGLSSPVTKSFNRG 605.655 15.02 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VTHQGLSSPVTKSFNRGEC(+57.02) 702.0129 15.16 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VTHQGLSSPVTKSFNRGE 648.6692 15.27 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN AKVQWKVDNALQSGNSQE 668.0044 17.54 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN STLTLSKADYE 614.3084 17.72 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN WKVDNALQSGNSQE 788.3736 18.14 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN AKVQWKVDNALQSGNSQESVTEQD 887.7623 18.29 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN AKVQWKVDNALQSGNSQESVTE 605.303 18.45 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN WKVDNALQSGNSQESVTE 996.469 18.87 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VQWKVDNALQSGNSQE 901.9358 19.25 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN QDSKDSTYSLSSTLTLSKAD 716.3485 20.63 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN SKDSTYSLSSTLTLSKADYE 732.6907 21.22 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN QDSKDSTYSLSSTLTLSKADYE 813.7188 21.68 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN STYSLSSTLTLSKADYE 622.6392 22.47 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN VVC(+57.02)LLNNFYPRE 762.3871 24.99 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN LKSGTASVVC(+57.02)LLNNFYPRE 723.377 26.48 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN QLKSGTASVVC(+57.02)LLNNFYPRE 1148.592 26.67 P01834|IGKC_HUMAN:P0DOX7|IGK_HUMAN

Table B.16 A list of proteins related to immunoglobulin light chain which were identified by SEQUEST search

Protein Accession Amino acids MW (kDa) Coverage Black (Trypsin digestion) A0A075B6L0 106 11.33952827 93.4 A0A075B6K8 106 11.39358646 77.36 P01702 111 11.44653666 54.05 May (Trypsin digestion) P0CG05 106 11.28654974 97.17 P01834 106 11.60166763 25.47 P01707 111 11.55363598 7.21 Moz (Trypsin digestion) A0A075B6L0 106 11.33952827 93.4 P01834 106 11.60166763 83.96 B9A064 214 23.0486394 42.06 Tew (Trypsin digestion) P01834 106 11.60166763 92.45 P01617 113 12.30815059 41.59 P0CG06 106 11.23051229 40.57

118

Jen (Trypsin digestion) P01834 106 11.60166763 54.72 P01620 109 11.76784497 31.19 Black (Chymotrypsin P0CG05 106 11.28654974 90.57 digestion) P0CG06 106 11.23051229 67.92 P01702 111 11.44653666 45.95 May (Chymotrypsin digestion) A0A075B6L0 106 11.33952827 83.02 A0A075B6K1 120 12.58900688 50.83 P01834 106 11.60166763 39.62 P06887 112 11.78215946 32.14 Moz (Chymotrypsin digestion) P0CG05 106 11.28654974 90.57 P04209 112 11.57359255 58.04 P01834 106 11.60166763 43.4 P15814 213 22.94863573 25.35 Tew (Chymotrypsin digestion) P01834 106 11.6 100 P01617 113 12.3 77.88 A0A075B6R1 110 12.1 60 A0A075B6K8 106 11.4 27.36 P01615 113 12.7 9.73 Jen (Chymotrypsin digestion) P01834 106 11.60166763 94.34 P01620 109 11.76784497 27.52 P80362 108 11.72977787 17.59 Black (Glu-C digestion) P0CG05 106 11.28654974 85.85 P01702 111 11.44653666 37.84 Moz (Glu-C digestion) P0CG05 100 106 11.28 Tew (Glu-C digestion) P01834 100 106 11.6 P01617 53 113 12.3 Jen (Glu-C digestion) P01834 106 11.60166763 90.57

119