Biomarker Discovery for Bronchopulmonary Dysplasia Using Mass Spectrometry Based Urine Proteomics

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Ahmed, Saima. 2016. Biomarker Discovery for Bronchopulmonary Dysplasia Using Mass Spectrometry Based Urine Proteomics. Master's thesis, Harvard Extension School.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:33797336

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Biomarker Discovery for Bronchopulmonary Dysplasia using Mass Spectrometry-Based Urine

Proteomics

Saima Ahmed

A Thesis in the Field of Biotechnology

for the Master of Liberal Arts Degree in Extension Studies

Harvard University

May 2016

Abstract

Bronchopulmonary dysplasia (BPD) is a chronic lung disorder that primarily affects premature infants. BPD commonly begins as respiratory distress syndrome (RDS) and progresses to BPD when respiratory complications persist past the original due date of the infant. BPD can last for years depending on the severity, with respiratory complications seen into adulthood for a subset of cases. To ensure the best possible health outcome, it is imperative to identify the premature infants at risk for BPD as early as possible. Given the fragility of these premature infants and the limited availability and invasive extraction of blood, I investigated the use of urine for the discovery of biomarker candidates for BPD. These samples were uniquely collected by inserting cotton balls into the diapers of these premature infants. Using less than 150 ul of urine per sample, I employed data dependent and data independent acquisition LC/MS methods to perform high throughput urine proteomics on premature infants with and without BPD.

I identified several urinary proteins that show altered abundance levels in the severe BPD population. Interestingly, many of these proteins have been described before in the context of BPD, though not as urinary proteins. The identified proteins potentially point to prognostic markers to identify infants at risk of BPD and ultimately to develop novel targeted therapeutics for prevention and treatment of BPD.

iv

You were born with wings, why prefer to crawl through life?

~ Rumi

v Dedication

I dedicate this thesis to the motivation behind my vision and hard work, my son Haroon.

vi Acknowledgements

I would like to sincerely thank Professor Hanno Steen for his genuine mentorship and for believing in me. I would also like to thank Dr. Stella Kourembanas for providing and trusting me with her precious patient samples. I sincerely thank Dr. Melissa Rotunno for her input in experiments and in my thesis. Also, I would like to sincerely thank Dr.

Sebastian T Berger for his guidance in my data analysis and helping to frame my thesis.

Finally, a special recognition and thank you to Maria Khan, BS for meticulously organizing and cataloging my precious patient samples.

vii Table of Contents

Dedication ...... vi

Acknowledgements ...... vii

List of Tables ...... ix

List of Figures ...... x

I. Introduction ...... 1

Bronchopulmonary Dysplasia ...... 1

Biomarkers ...... 5

Proteomics and Mass Spectrometry ...... 11

What is a proteome?...... 11

What is a Mass Spectrometer? ...... 14

Proteomic Sample Processing ...... 16

Mass Spectrometry-based Proteomic Analysis Methods ...... 18

II. Materials and Methods ...... 22

Urine Sample Cohort ...... 22

MStern Blot Sample Preparation Method ...... 24

Pilot Study ...... 25

vii Mass Spec Analysis Data Dependent Acquisition and Data

Independent Acquisition ...... 26

Data Analysis ...... 27

Validation ...... 29

Mass Spec Analysis Data Dependent Acquisition and Data

Independent Acquisition ...... 29

Data Analysis ...... 31

III. Results ...... 32

Pilot Study ...... 32

Proteomics Reveals 5 Significant Proteins in 4 Different

Quantitation Methods...... 34

Validation Study ...... 40

IV. Discussion ...... 50

V. Appendix ...... 60

Endnotes ...... 74

viii List of Tables

Table 1. NIH Severity Based Diagnosis Criteria for BPD ...... 3

Table 2. Patient cohort representation ...... 23

Table 3. Table of 46 significant proteins from validation study ...... 62

Table 4. DAVID Ontology (GO)...... 67

Table 5. Top 7 proteins resulting from Steen Lab Biomarker Discovery

Classification Tool...... 69

ix List of Figures

Figure 1. Schematic of Biomarker Stages ...... 6

Figure 2. Representation for potential outcomes for biomarker validation

of disease ...... 7

Figure 3. Relationship of plasma, glomerulus, tubules and urine...... 9

Figure 4. Representation of adult and fetal tissue types ...... 13

Figure 5. Mass Spectrometer...... 15

Figure 6. A typical proteomic schematic...... 16

Figure 7. MStern Blot ...... 18

Figure 8. BPD Proteomics Workflow ...... 31

Figure 9. Proteomics reveals 5 significant proteins in 4 different

quantitation methods ...... 60

Figure 10. Volcano plot of pilot study ...... 61

Figure 11. Box Plot analysis of Ferritins...... 35

Figure 12. Box Plot analysis of Plasminogen ...... 36

Figure 13. Box Plot analysis of Uteroglobin...... 37

x Figure 14. Box Plot analysis of Cartilage Intermediate...... 38

Figure 15. Volcano plot of validation study...... 66

Figure 16. PCA analysis (Dante R) of 46 proteins...... 68

Figure 17. PCA analysis (Dante R) of 7 proteins...... 70

Figure18. Box Plot analysis of Plastin-2...... 54

Figure 19. Box Plot analysis of Transmembraneemp24

domain-containing protein 7 levels ...... 55

Figure 20. Box Plot analysis of Heatshock cognate 71kDA protein...... 56

Figure 21. Box Plot analysis of Heatshock protein HSP 90-beta ...... 57

Figure 22. .Box Plot of Carboxypeptidase M ...... 59

Figure 23. Box Plot analysis of Carboxypeptidase A1 ...... 60

Figure 24. Box Plot analysis of HLA class I histocompatibility antigen...... 61

Figure 25. 3D Scatter Plot ...... 71

Figure 26. ROC Curve ...... 72

xi

Chapter I

Introduction

Preterm birth is a leading cause of infant morbidity and mortality and extremely premature infants are at a much greater risk. One of the most confounding health consequences of preterm birth is related to respiratory complications. Bronchopulmonary

Dysplasia is a chronic lung disorder with possible life-long complications.

Bronchopulmonary Dysplasia

Preterm infants born before 37 weeks have risen to 35% in the past 25 years in

North America. Although these fragile infants continue to suffer from debilitating morbidities, improvements in prenatal care have increased survival rates (Piersigilli &

Bhandari, 2015). The increase of maternal age for first delivery and the increase in assisted reproduction are believed to be causes related to preterm birth. Infant respiratory distress syndrome or (RDS) is the most common preterm infant morbidity. Infants with

RDS are born with the inability to generate enough lung surfactant and suffer from a difficulty in breathing. Approximately 7% of preterm infants develop RDS with an increase in incidence correlated with decreasing gestational age. Extremely low birth

1

weight infants born <28 weeks can have up to a 93%risk of developing RDS(Zysman-

Colman, Tremblay, Bandeali, & Landry, 2013). Infants who have RDS are treated with surfactant replacement therapy and oxygen therapy by mechanical ventilation. If surfactant therapy and oxygen support do not improve breathing by the infant’s original due date, these infants are diagnosed with bronchopulmonary dysplasia (BPD). BPD is considered a chronic lung disease characterized by a spectrum of disease phenotypes with possible lifelong complications. Furthermore, exposure to high levels of oxygen from mechanical ventilation, lung inflammation and infection, lung injury, and dysfunctional alveolar are all confounding factors in the development of BPD (Lal & Ambalavanan,

2015).

Because BPD diagnosis is still unclear and is considered a spectrum of disease that clinicians use an array of diagnostic tools such as chest X-rays, oxygen level blood tests and echocardiography. These diagnostic procedures are rather invasive due to the limited blood supply of such fragile infants. Also, chest x rays and echocardiography can be expensive and only give a narrow view of BPD characteristics.

In 2000 The US National Institute of Child Health and Human Development and

the National Heart, Lung, and Blood Institute suggested a new definition of BPD based

on severity for infants <32weeks gestational age (GA) (Table1). Mild BPD is defined as

infants needing supplemental oxygen (O2) for ≥ 28 days but no longer at 36 weeks

postmenstrual age (PMA) or discharge. Moderate BPD is defined for infants needing

2

Table 1. NIH Severity Based Diagnosis Criteria for BPD (Ehrenkranz et al., 2005) This table represents stages of BPD as defined by the NIH according to oxygen needed.

supplemental oxygen O2 for ≥ 28 days plus treatment needed with <30% O2 at a

PMA of 36 weeks. Severe BPD is defined for infants needing supplemental oxygen as O2 for ≥ 28 days plus ≥ 30% O2 and/or positive pressure at a PMA of 36 weeks (Ehrenkranz et al., 2005). This narrative provides a solid foundation for severity based definition, but fails to incorporate definitive pathophysiological or clinical phenotype of the disease.

3

Currently there is no definitive treatment regimen for BPD other than close monitoring and meeting basic survival needs such as warmth, nutrition, and protection

(Budhraja et al., 2014). Early predictors of infants with greater risk of developing BPD

would allow initiating early intervention to reduce disease severity and complications.

The ability to accurately predict the onset of BPD with a more definite biological marker

could initiate appropriate preventative therapies to slow down or even to inhibit the

progression of BPD and avoid or minimize detrimental effects of disease progression.

Furthermore, knowing infants at risk for BPD at later stages could initiate targeted

therapies to improve health outcomes of patients at risk for developing BPD (Zhang,

Huang, & Lu, 2014). Because BDP can have a spectrum of severity, and a variety of

clinical phenotypes, having a more definitive biological marker of disease or risk of

developing disease can better equip clinicians with appropriate information to provide

targeted interventions for BPD. A more sensitive and specific biological marker is

necessary to accurately distinguish BPD vs. non BPD patients. Furthermore, this

biomarker should be a reflection of the pathophyiology of BPD which is reflected

primarily in body fluids. Limited blood supply and invasive blood collection is

suboptimal in these fragile patients, another body fluid just as reflective as blood should

be used to facilitate the early prognostication and ultimately, the treatment of BPD.

4

Biological markers are used in the clinic to help characterize the health state of a

patient. Biomarkers can be used to distinguish healthy from disease or even stratify

disease states. Clinicians can use biomarkers to best plan treatment for a patient.

Biomarkers

Biological markers or biomarkers, as defined by the World Health Organization, are any substance, structure, or process that can be measured in the body or its products and influence or predict the incidence or outcome or disease (Strimbu & Tavel, 2010). An excellent example of a biomarker is human chorionic gonadotropin (hCG) detected by a blood or urine test to accurately identify pregnancy. Biomarkers can reflect a cell’s state, therefore providing insight into patient physiology (Strimbu & Tavel, 2010). Moreover,

biomarkers can be key indicators for diagnosis, prognosis, and treatment of medical

conditions. Recently, the importance of biomarkers in a clinical setting for personalized

and precision medicine has been further recognized leading to substantial public and

private funding for biomarker discovery research.

The conventional paradigm for biomarker research includes three fundamental

stages: discovery, validation, and implementation (Figure 1). The first stage, the

discovery of biomarker candidates requires an initial unbiased analysis of human samples

to identify differentially abundant proteins that allow for the distinction between the

5

physiological states of interest. Typically, the discovery phase uses only a few samples but tests thousands of analytes to identify tens to hundreds of candidate biomarkers. The second stage requires validating these potential biomarkers to investigate if truly these proteins can be used in a clinical setting. Generally, validation studies require distinguishing a true positive from a true negative as well as false positives and false negatives, which in turn allows for the estimation of sensitivity and specificity, and negative and positive predictive values of any biomarker or combinations thereof.

Briefly, sensitivity is the measurable ability to detect true positives and specificity is the measurable ability to eliminate false positives (Figure 2). In order to perform

proper sensitivity and specificity studies during validation, a large number of samples are

required which ultimately narrow down the list of candidate biomarkers (Vitzthum,

Behrens, Anderson, & Shaw, 2005).

Figure 1. Schematic of Biomarker Stages. Biomarker workflow showing the three steps required for biomarker discovery for routine clinical application.(Pisitkun, Johnstone, & Knepper, 2006)

6

Figure 2. Representation for potential outcomes for biomarker validation of disease Sensitivity and Specificity. Graphical representation of sensitivity and specificity calculations. (Specificity)

Finally, in the implementation stage, the measurement of the single biomarker

candidate (or a set thereof) validated previously is go through optimized utilizing a

clinically appropriate assay, such as an enzyme-linked immunosorbant assay (ELISA)

that can be performed quickly and economically.

Among the many body fluids, blood-derived fluids such as serum or plasma are

the most widely used sources for biomarker candidates. Although these blood-derived

fluids have the advantage of being readily available and noninvasive in the general

population, the procedure becomes more invasive and disadvantageous when dealing

with more vulnerable patient populations such as (extremely) premature infants.

Furthermore, serum/plasma proteomes are characterized by an extreme dynamic range covering about 11 orders of magnitude of concentration, i.e. a handful of the most

7

abundant proteins such as albumins, immunoglobulins etc, account for >95% of the entire protein content (Veenstra et al., 2005). Therefore, mining for the very low abundant proteins that could potentially indicate a differential disease state is very challenging

(Veenstra et al., 2005). Recently, urine has become a biological fluid of choice in biomarker research (Kentsis et al., 2010; Kentsis et al., 2013; Veenstra et al., 2005) Urine is an excellent body fluid to investigate biomarkers because it is easily and noninvasively

obtainable, and abundant (Kentsis et al., 2010; Kentsis et al., 2013; Pedroza-Diaz &

Rothlisberger, 2015). As a systemic filtrate body fluid, urine I not only a source of

biomarkers for nephrological and urogenital conditions, but has also been associated with

distal organ diseases such as cancer, Kawasaki disease, and pediatric

appendicitis.(Beretov, Wasinger, Schwartz, Graham, & Li, 2014; Jedinak et al., 2015;

Yamamoto, Langham, Ronco, Knepper, & Thongboonkerd, 2008) .

Urine contains approximately 0.1 mg/ml protein. There are two major sources of

urinary proteins: Firstly, blood-based proteins are filtered through the glomerulus in the

kidneys of which some end up in the urine (Figure 3). Other proteins are shed or secreted by proximal organs such as kidney and urogenital tract. Therefore, because urine is a filtrate of serum, its low complexity, but rich protein content, can be used to develop routine diagnostic tests (Beretov et al., 2014).

8

1. Plasma Proteins 2. Glomerular filtration of plasma components 3. Glomerular proteins

4. Tubular absorption/secretion 5. Urine proteins

Figure 3. Relationship of plasma, glomerulus, tubules and urine.

(Yamamoto et al., 2008) Schematic indicating how proteins result in being present in urine.

It is estimated that approximately 2500 proteins constitute the urinary proteome.

Around 1000 of those proteins are derived from urogenital tract and kidney (Adachi,

2006) which can be exploited to discover biomarkers for disease of the kidney and the urogenital tract including prostate cancer (Pedroza-Diaz, 2015). As mentioned earlier, urine as a systemic filtrate body fluid has been associated with distal organ diseases such as cancer, Kawasaki disease, and pediatric appendicitis.(Beretov et al., 2014; Jedinak et al., 2015; Yamamoto et al., 2008). One study found over 2000 proteins in urine that could be assigned to physiologically relevant molecular groups including glomerularly filtered cytokines, renal and urogenital structural proteins, as well as protein from distal organs

9

glomerularly filtered via serum. Also discovered in this data set were proteins associated

with over 500 human diseases including biomarkers for pediatric acute appendicitis and

pediatric Kawasaki disease (Kentsis et al., 2012; Kentsis et al., 2013).

As described earlier, urine is especially an ideal body fluid for biomarker research in a pediatric population. Collecting body fluids from such fragile patient populations

should be easily obtainable and noninvasive. Since using catheters in newborns can risk

infection, obtaining urine from diapers is more optimal. It has also been shown that

proteins can be extracted from urine collected from diapers for biomarker discovery

research (Kennedy, Griffin, Su, Merchant, & Klein, 2009). Diapers with cotton fibers

were shown in this study to be the best for recovering urine for proteomic biomarker investigation.

Proteomics is the study of all proteins, or proteome, in a living system and their functions. This rapidly growing field in molecular biology uses very sensitive and accurate analytical instrumentation called mass spectrometers to identify and quantify proteins in a sample. Proteomics can have a wide range of applications including clinical proteomic applications such as biomarker discovery.

10

Proteomics and Mass Spectrometry

What is a proteome?

The genome is an inventory of all the in an organism. Genes consist of specific DNA sequences made up of four nucleic acids. In 2003, the

Project (HGP) marked an unprecedented milestone in the history of human health and disease research (Lander et al., 2001). This global collaboration to sequence and map all

20,000 genes in the human body impacted the fields of biotechnology and medicine in previously inconceivable ways (Lander et al., 2001). Mapping the human genome could not have been possible without superior, technologically advanced instrumentation such as the DNA sequencer (Chial, 2008) and associated highly advanced bioinformatics capabilities.

Subsequently, another revolutionary global collaboration of unprecedented potential to map the human proteome for all genes previously sequenced in the Human

Genome Project began in 2009 (Wilhelm et al., 2014) (Kim et al., 2014) (Figure 4).

Using mass spectrometry, proteins equivalent to 17,000 genes -which equates to almost

85% of all genes mapped in the human genome project - were identified, including nearly

200 proteins not previously identified (Wilhelm et al., 2014) (Kim et al., 2014). Also The

Human Proteome Project (HPP), led by the Human Proteome Organization (HUPO),

11

aims to deepen our understanding of human health and disease with unprecedented depth

and precision. Just as the DNA sequencer was essential to mapping the human genome,

the mass spectrometer, a pillar of The Human Proteome Project (HPP) would

analogously identify the human proteome (Yamamoto et al., 2008). The proteome is a comprehensive library of all the proteins present in an organism. The number of protein species in the human body is significantly larger than the genome because of isoforms

and splicing variations. Also, proteins can be further modified by post translational

modifications (PTMs) thereby making the proteome more complex and challenging. For

example, a caterpillar and butterfly is the same organism with the same genome yet,

morphologically they are very different as their proteomes are completely different. The

proteins in a caterpillar provide functional characteristics from the original genes that transform a caterpillar to a butterfly. Therefore to further appreciate how a gene can become functional, one must investigate the proteome.

12

Figure 4. Representation of adult and fetal tissue types Representation of adult and fetal tissue types used and the proteomics workflow used to analyze these tissues for making the draft human proteome. (Kim et al., 2014)

A mass spectrometer is an analytical instrument used to measure the mass of a molecule. More specifically, a mass spectrometer can measure the mass to charge ratio of an ion. This phenomenon makes a mass spectrometer very sensitive to small mass changes in a molecule making it easy for it to distinguish between very small molecules with very small masses, such as proteins or peptides.

13

What is a Mass Spectrometer?

In order to appreciate the potential of this breakthrough in global health one must first understand what mass spectrometry is, how it works and how it is already a pioneer in the field of proteomics, the comprehensive and systematic study of all proteins in an organism. Mass spectrometry is an analytical tool that measures the mass to charge ratio of ionized atoms or molecules (figure 5). Since the charge can be easily determined, the mass of the atom/molecule of interest is readily calculated.

Figure 5. Mass Spectrometer. A Mass spectrometer is an analytical tool that measures the mass to charge ratios of atoms or molecules in a given matrix.

A mass spectrometer consists of an ion source, a mass analyzer, and a detector.

The function of the ion source is to transfer analytes as ions into the gas phase in order to

14

allow them to “fly” into the instrument. Electro spray ionization is a routinely used ionization technique to introduce non-volatile analytes such as proteins and peptides into the mass spectrometer (Steen & Mann, 2004). A mass analyzer works to measure the mass to charge ratio (m/z) of the ions. The detector quantifies the number of ions at a particular mass to charge ratio using e.g. image currents in the case of Fourier transformation based instruments, or by electron multiplier induced currents. A typical proteomics workflow using mass spectrometry requires sample preparation, which entails fractionating a protein sample with e.g. an SDS PAGE gel, then enzymatically digesting proteins into peptides. These peptides are then introduced into a mass spectrometer

(Steen & Mann, 2004) (Figure 6).

Current mass spectrometers optimized for proteomics provide exceptional mass accuracy, resolution, sensitivity, dynamic range, and speed. The latest in instrumentation can detect molecules with sub-ppm (parts per million) mass accuracy with unparalleled high resolution and speed across a wide-range of concentrations (Kentsis et al., 2010).

Advances in MS and proteomics has allowed for unprecedented identification as well as quantitation of thousands of proteins and peptides in complex biological matrixes

(Kentsis et al., 2010).

15

Figure 6. A typical proteomic schematic. A typical proteomic beginning with bench preparation of a sample to analysis using a mass spectrometer.(Steen & Mann, 2004).

In order to aid the mass spectrometer to analyze complex biological samples, it is

imperative to decomplexify a sample at the lab bench before introduction to a mass spec.

There are many well defined methods but few can handle numerous biological samples at

once and be processed in a short period of time. In order to make proteomic clinical

applications possible, the sample processing methods must not be so cumbersome.

Proteomic Sample Processing

The ability to process large numbers of patient samples efficiently and effectively

is rapidly becoming the forefront of biomarker discovery research platforms. Traditional

sample processing methods such as SDS-PAGE, is effective but laborious and time consuming. A paradigm shift in sample processing method was the introduction of filter- aided sample processing method (FASP) which eliminated the need to run an SDS PAGE gel but still allowed the efficient capture and purification of proteins using a single filter

(Wisniewski, Zougman, Nagaraj, & Mann, 2009). Irrespective of being a major advancement, this method was still found to be slow and laborious when applied to larger

16

number of samples, but also expensive because each individual sample would need a single filtration device

Patient sample processing which normally include large cohorts need a more robust and faster approach. The initial application of FASP in a 96-wellplate format has been described, but the need for centrifugation in this method for liquid transfer significantly extended sample processing times. A novel sample processing workflow for

MS-based proteomics termed MStern Blot (figure 7) significantly improved throughput as 96 samples (or multiples thereof) can be completely processed within a single workday. This novel proteomic sample processing method can be exploited to more efficiently and effectively perform any proteomic experiments needing a large number of samples, primarily biomarker discovery research (Sebastian T. Berger 2015).

17

Figure 7. MStern Blot. A representation of the time saved using MStern Blot technique to process a large amount of proteomic samples compared to a traditional FASP method. MStern Blot technique allows between 96 to 192 samples to be processed within a single working day (Sebastian T. Berger 2015).

Mass Spectrometers can be given instructions best catered for how to analyze the samples of interest. Traditional mass spectrometry methods can be very useful in attaining a window into identifying as much of the sample as possible. Though, recent advances in mass spectrometry methods can give even further depth into identifying and quantifying a proteomic sample.

Mass Spectrometry-based Proteomic Analysis Methods

Data Dependent Acquisition and Data Independent Acquisition. The method of choice for deep proteome identification and quantitation is initially performed by cleaving proteins into peptides using specific proteases. These peptides are then introduced into a mass spectrometer using the directly coupled directly to a liquid chromatography system.

This concept is called LC MS. The traditional and most commonly used mass spectrometry method is called “shotgun” proteomics. This type of mass spectrometry method involves analyzing peptides in a “data-dependent analysis” (DDA) mode. This

18

method requires the mass spectrometer to detect and select “precursor ions” in a “MS1”

survey scan; then the instrument is programmed to sequentially isolate the n most intense peptides and to fragmented them by collision-induced dissociation using a collision gas

such as nitrogen to break peptide bonds to reveal the amino acid sequence of the isolated

and fragmented peptide. This second level of DDA, where the peptides are fragmented to reveal the sequence, is called “MS2”. In other words, mass spectrometer intakes charged peptides from a liquid chromatography system, dehydrates them into “naked ion species” and then performs and MS1 survey scan and selected precursor ions are fragmented at the

MS2 level. The resulting fragmented ion spectra are dealt with using computational tools to statistically match these ion spectra to identifiable proteins. Traditionally, protein quantitation is performed at the MS1 level when using DDA methods.(Domon &

Aebersold, 2006; Muntel et al., 2015)

Nowadays with increasing demands for higher throughput, large scale proteomic sample sets, and faster instrumentation there is a demand for better and more comprehensive mass spectrometry methods. Because DDA relies on the detection of precursor ion intensity, using this method for precise quantitation with large scale data sets can be considered unreliable. Therefore a data independent acquisition or DIA method is preferred. DIA was developed in order to avoid any need for prior knowledge of the sample in order to perform MS2 fragmentation. DIA method instructs the mass spectrometer to select consecutive precursor ion ranges without any precondition aside from using a window of mass ranges and to fragment everything within that mass-to- charge window. DIA data sets are then analyzed computationally at the MS2 level using

19

previously acquired DDA files (spectra library) in order to identify and precisely quantify

large protein datasets (Gillet et al., 2012; Muntel et al., 2015; Venable, Dong,

Wohlschlegel, Dillin, & Yates, 2004),

Using MStern blotting to process 96 samples (or multiples thereof) within a single

workday greatly enhances the ability to provide a better biomarker discovery research

platform. Furthermore, DIA provides an analytical depth unlike traditional MS methods.

Therefore, using the powerful combination of a high throughput robust sample processing

method, MStern blotting with the high accuracy high sensitivity of mass spectrometry,

and robust quatitation via DIA methods, a more thorough biomarker discovery of BPD

can be performed. A handful of protein biomarkers for BPD have been discovered, but with many limitations. One study found plasma ferritin levels were higher in very low birth weight infants with BDP as compared to very low birth weight infants without BPD

(Cifuentes, Miller, & Deinard, 1984). Another study found tracheal aspirate levels of plasminogen activator receptor to be significantly higher in BPD patients than non-BPD patients (Didiasova, Wujak, Wygrecka, & Zakrzewicz, 2014). Another interesting finding was plasma CC16 levels were higher in mechanically ventilated BPD patients (Sarafidis et al., 2008). CC16 was also used as a biomarker for lung injury in this subset of BPD patients. Though these biomarkers have been found to be relevant to BPD, none have been found to be a promising non-invasive prognostic marker for BPD. Serum and tracheal aspirate have been investigated using proteomics but limitations of extracting patient blood and tracheal aspirate has been one confounding limitation in furthering

BPD prognostic biomarkers. Larger samples sizes and data sets are necessary to validate

20

biomarkers for disease, which was a further limitation for the previously mentioned studies.

Using a non-invasive, abundant, protein rich body fluid such as urine in combination with high throughput proteomic sample processing methods like MStern

Blot and robust protein quantitation methods like DIA can be more effective in the investigation of prognostic biomarker discovery for BPD patients.

21

Chapter II

Materials and Methods

Urine sample collection from premature infants can easily be obtained by inserting cotton balls in diapers then extracting urine from cotton balls by squeezing urine into a small tube. The tube is then frozen immediately and stored until further analysis.

Urine Sample Cohort

All urine samples were obtained from Dr. Stella Kourembanas, MD; Division

Chief of Newborn Medicine at Boston Children’s Hospital and was consented for proteomic analysis. A total of 97 samples were used for both discovery and validation stages (Table 2). BPD patients were all defined as severe according to NIH severity based diagnosis. Each case was age matched with controls. All patients were <27 weeks gestational age, except for 6 patients ( 3 case, 3 controls) at 27+ weeks. Samples from

22

each patient were taken at 2 different time points. The first time point being between

post-natal day 1-3 and the third time point between post-natal day 14-28. Urine was

collected by placing cotton balls in diapers. Urine soaked cotton balls were extracted

from diapers and squeezed into 1ml tubes. Pilot study entailed a total of 13 samples (all

<27 weeks gestational age): 6 severe BPD cases and 7 age matched controls. For the validation a total of 42 (21 severe cases and 21 age matched controls) were analyzed using urine samples taken at the first time point.

Experiment Time Point -Post Natal Day(s) Case Controls

Pilot Study 14-28 7 6

Validation 1-3 21 21

Table 2. Patient cohort representation Explanation of number of patient samples used at which time point during different stages of experiments.

23

Traditional proteomic sample processing methods are ideal for smaller sample sizes and for low complexity samples. High throughput proteomic sample processing methods are necessary for optimal time saving and appropriate analysis in context for a practical clinical application.

MStern Blot Sample Preparation Method

135ul of Neat urine was added to a 150μg urea and 30μl dithiothreitol (DTT)

(100mM in 1M Tris/HCl pH 8.5). The resulting denatured and reduced urine was incubated for 20min at 27°C and 1100rpm in a thermo mixer. Reduced cysteine side chains were alkylated with 50mM iodoacetamide (IAA), and incubated for 20min at 27°C and 750rpm on a thermo mixer. The a 96-well hydrophobic PVDF membrane plate

(Millipore) was primed with 150μl of 70% ethanol and equilibrated with 300μl urea supernatant and vacuumed through a 96 well plate adaptable vacuum manifold

(Millipore). . All subsequent liquid transfers were carried out using this 96-well microplate vacuum manifold (Millipore). Each sample was vacuumed three times through the PVDF membrane 96 well plate. After adsorption of the proteins onto the membrane, proteins were washed 2x with 50mM ammonium bicarbonate. Protein digestion was performed using 1 ug of sequencing grade trypsin (Promega). To this end,

100μl digestion buffer (5% acetonitrile ACN, 50mM ABC and trypsin) were added to each well. After incubation for 2 hours at 37°C in a humidified incubator, the cleaved

24

proteins, now peptides were eluted through the vacuum onto a collection 96 well plate..

Resulting peptides were eluted using vacuum twice with 150μl of aqueous 40% ACN

containing 0.1% formic acid. Subsequently, the elution solutions were pooled and dried

in a vacuum concentrator. Lyophilized samples were resuspended in 50ul of MS loading

buffer (5% FA 5%ACN). Internal retention time standard peptides HRM (Spectronaut)

were added at a 1:10 ratio before introduction to the Thermo Q Exactive mass

spectrometer.

All biomarker discovery research should begin with a small cohort of samples, used as a discovery phase experiments. The purpose of this so called fishing expedition is to evaluate whether or not the larger experiments can be feasible.

Pilot Study

A total of 13 Case (6) and control (7) third time point urine biomarker samples

were analyzed for the pilot study.

25

Mass Spec Analysis Data Dependent Acquisition and Data Independent Acquisition

Data Dependent Acquisition. A total of 13 Case (6) and control (7) third time point urine biomarker samples were analyzed using a MS data-dependent TOP10 acquisition method by initial liquid chromatography separation by using a microfluidic chip system

(Eksigent, trapping column: 200 μm x 0.5 mm ReproSil-Pur C18-AQ 3 μm; analytical

column: 75 μm x 15 cm ReproSil-Pur C18-AQ 3 μm) followed by DDA analysis on a Q-

Exactive mass spectrometer (Thermo Fisher Scientific). Peptides 3ul were separated by a

linear gradient from 93% buffer A (0.2% FA in HPLC water) / 7% buffer B (0.2% FA in

ACN) to 75% buffer A / 25% buffer B within 75 min. The mass spectrometer was

operated in data-dependent TOP10 mode with the following settings: mass range 400–

1000 Th; resolution for MS1 scan 70 000 @ 200 Th; lock mass: 445.120025 Th;

resolution for MS2 scan 17 500 @ 200 Th; isolation width 1.6 Th; NCE 27; underfill

ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s. Resulting

DDA data was used for MS1 level quantitation as well as generation of a spectra library.

Data Independent Acquisition. A total of 13 Case (6) and control (7) urine biomarker

samples were analyzed using a MS DIA acquisition method by initial liquid

chromatography separation by using a microfluidic chip system (Eksigent, trapping

column: 200 μm x 0.5 mm ReproSil-Pur C18-AQ 3 μm; analytical column: 75 μm x 15 cm ReproSil-Pur C18-AQ 3 μm) followed by DIA analysis on a Q-Exactive mass

spectrometer (Thermo Fisher Scientific). Peptides 3ul were separated by a linear gradient

from 93% buffer A (0.2% FA in HPLC water) / 7% buffer B (0.2% FA in ACN) to 75%

26

buffer A / 25% buffer B within 75 min. Resulting DIA data was used for MS2 level

quantitation by searching against DDA generated spectra library.

Data Analysis

DDA Spectra Count MS 2 Quantitation. Acquired MS raw files were analyzed using

ProteinPilot (version 4.5.1; Sciex) using the human UniProtKB database (version 06-

2014). The ‘thorough’ search mode was used. These results were used for the spectral

counting-based protein quantification. Briefly, only proteins identified at a 1% false

discovery rate and proteins above a p value of 0.05 were considered for further analysis.

All spectra were normalized to the average of the sum of all spectra acquired for each MS

run.

DDA MaxQuant MS 1Quantitation.For additional label free quantification MS1 intensity- based absolute protein quantitation (iBAQ) [10] for dynamic range analysis, MaxQuant

(Cox et al., 2009)(version 1.5.1) was used. Briefly, the acquired raw files were loaded into MaxQuant and searched against the human UniProtKB database (version 06-2014).

For quantification, the ‘iBAQ’ and ‘label-free quantification’ (LFQ) were selected.

Default settings were used for the analyses. Resulting intensities were normalized by taking the average of the sum of all spectra acquired for each MS run. Proteins below a p value of 0.05 were considered for further analysis.

27

DIA Spectronaut MS 2 Quantitation. The spectral library was generated by searching

DDA data in Proteome Discoverer (Thermo) using only best PSMs of all peptide groups.

DIA data was searched against Proteome Discoverer generated spectral library in

Spectronaut 6.0 (Biognosys) using a q value cut-off of 0.01 . Resulting MS2 intensities

were normalized by taking the average of the sum of all spectra acquired for each MS run

. Proteins below a p value of 0.05 were considered for further analysis.

Prior to any further experimentation a pilot study must show sufficient data in order to proceed with the next phase of a research project. Here, the pilot study showed sufficient data in order to proceed with larger patient cohorts and larger experiments with a validation study.

28

Validation

Mass Spec Analysis Data Dependent Acquisition and Data Independent Acquisition

Data Dependent Acquisition. A total of 42 case (21) and control (21) first time point urine biomarker samples were analyzed using a MS data-dependent TOP10 acquisition method by initial liquid chromatography separation by using a microfluidic chip system

(Eksigent, trapping column: 200 μm x 0.5 mm ReproSil-Pur C18-AQ 3 μm; analytical column: 75 μm x 15 cm ReproSil-Pur C18-AQ 3 μm) followed by DDA analysis on a Q-

Exactive mass spectrometer (Thermo Fisher Scientific). Peptides 3ul were separated by a linear gradient from 93% buffer A (0.2% FA in HPLC water) / 7% buffer B (0.2% FA in

ACN) to 75% buffer A / 25% buffer B within 75 min. The mass spectrometer was operated in data-dependent TOP10 mode with the following settings: mass range 400–

1000 Th; resolution for MS1 scan 70 000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17 500 @ 200 Th; isolation width 1.6 Th; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s. Resulting first time point DDA data for was used for MS1 level quantitation as well as generation of a spectra library.

29

Data Independent Acquisition. A total of 42 case (21) and control (21) first point urine

biomarker samples were analyzed using a MS DIA acquisition method by initial liquid

chromatography separation by using a microfluidic chip system (Eksigent, trapping

column: 200 μm x 0.5 mm ReproSil-Pur C18-AQ 3 μm; analytical column: 75 μm x 15 cm ReproSil-Pur C18-AQ 3 μm) followed by DIA analysis on a Q-Exactive mass

spectrometer (Thermo Fisher Scientific). Peptides 3ul were separated by a linear gradient

from 93% buffer A (0.2% FA in HPLC water) / 7% buffer B (0.2% FA in ACN) to 75%

buffer A / 25% buffer B within 75 min. DIA samples on the Q-Exactive mass

spectrometer (Thermo Fisher Scientific) were performed with the same LC setup and

gradient as DDA method but instead using a DIA method with following settings: 1 full

MS scan with 17,500 resolution at 200 Th; AGC target – 3e6; maximum injection time –

80 ms; scan range – 400 to 1,000 Th; followed by a DIA scan event at 17,500 resolution

at 200 Th; AGC target – 1e6; maximum IT – auto; loop count – 15; MSX count 1;

isolation width – 20 Th; fixed first mass – 200 Th; nCE – 27; covering a mass range from

400 to 700 Th. After that an additional full MS scan with the same parameters was used,

followed by an additional DIA scan event with the same parameters covering a mass

range from 700 to 1,000 Th. In total a mass range from 400 – 1,000 Th was covered.

Resulting DIA data was used for MS2 level quantitation by searching against DDA

generated spectra library

30

Data Analysis

DIA Spectronaut MS 2 Quantitation. The spectral library generated for the pilot study

which was generated by searching DDA data in Proteome Discoverer (Thermo) using

only best PSMS of all peptide groups was used to analyze the DIA data for first and third

time points. DIA data was searched against Proteome Discoverer generated spectral

library in Spectronaut 6.0 (Biognosys) using a q value cut-off of 0.01. First and third time

point DIA data was also searched against the spectral library generated with pooled DDA

runs with same parameters as mentioned above. Resulting MS2 intensities were

normalized by taking the average of the sum of all spectra acquired for each MS run.

Proteins below a p value of 0.05 were considered for further analysis.

31

Chapter III

Results

All biomarker discovery research should begin with a small cohort of samples, used as a discovery phase experiments. The purpose of this so called fishing expedition is to evaluate whether or not the larger experiments can be feasible.

Pilot Study

For the pilot study I hypothesized BPD to influence the urinary proteome of BPD patients compared to controls. In order to look for these differences a series of proteomic experiments were performed. 135ul of urine samples taken at post natal day 14- 28 for severe BPD patients and appropriate age-matched controls. Our proteomic workflow consisted of processing urine samples from 6 controls and 7 severe BPD patients by

MStern blotting, and analyzing the resulting peptide mixtures by LC/MS using DDA and

DIA methods (Figure 8). Searching DDA results against the human UniProtKB database

(version 06-2014) in ProteinPilot (Sciex) with a 1% False Discovery Rate and using approximating spectral count analysis for quantitation, 1098 proteins were identified.

Similarly, using another more robust and stringent protein quantitation methods provided by MaxQuant (Cox et al., 2009) using both LFQ and iBAQ(Mann & Edsinger, 2014)

32

MS1 based methods which resulted in the identification and quantitation of 991 proteins.

For the DIA data, the Spectronaut software package was used (Biognosys). This analysis resulted in the identification and quantitation of 1200 proteins (Figure 9). Overall, the initial endeavor to be able to produce a comprehensive proteome analysis with sparse amounts of urine from cotton balls placed in diapers was accomplished. This initial analysis produced a comprehensive foundation, similar to my hypothesis that BPD induces changes in the urinary proteomes of BPD patients.

Figure 8. BPD Proteomics Workflow. Schematic representation of sample cohort

and proteomic analysis.

33

Proteomics Reveals 5 Significant Proteins in 4 Different Quantitation Methods

The availability of various proteomic data analysis software gives the user an

opportunity to generate various statistically significant representations of data. All

statistical software packages gave consistent results of close to 50 significant proteins

with a p value of <0.05 (without multiple testing correction). Here, four different

software packages were used each with its own unique algorithmic form of quantitation

(MS1 or MS2 level). When comparing the results of these 4 different data analyses, a set of 5 proteins was identified that consistently showed statistically significant abundance differences independent of the analytical method used (Figure 9). These 5 proteins were amongst the most significantly changed proteins as it is evident from the volcano plot in

Figure 10. The five proteins were ) ferritin heavy chain, ii) ferritin light chain, iii) uteroglobin (aka CC10 or CC16), iv) plasminogen, and v) cartilage intermediate layer protein 2.

34

Proteomics Reveals Protein Differences in Urine of BPD patients vs. non BPD patients

After initial identification of 5 proteins consistent across 4 data sets, I decided to

focus Spectronaut-based DIA and MaxQuant-based iBAQ quantitation because of their

robustness to perform more detailed analyses of the 5 proteins of interest. First I looked at ferritin level differences between case and control groups. We found ferritin heavy chain to be higher in severe BPD patients as compared to control patients

(Ave 8.90E+07 ±8.33E+07). Similarly, when looking at ferritin light chain ferritin levels were higher in severe BPD patients as compared to control patients (Ave

6.00E+06±6.98E+06). Interestingly, both ferritin proteins light and heavy chain were consistently higher in severe BPD patients at the time point under investigation than compared with controls (Figure 11).

35

Figure 11. Box Plot analysis of Ferritins. Ferritin levels consistently lower in controls as compared with severe BPD patients as shown with two different proteomic data analysis methods (iBAQ and DIA). Graph Pad Prism6 (v6.07) was used to generate plots.

Here similar to ferritin levels, plasminogen levels increased in the severe BPD patients in comparison to the controls (Ave6.05E+06±3.97E+06) (Figure 12). These findings were consistent between the DDA and the DIA data sets when comparing case vs. controls.

36

Figure 12. Box Plot analysis of Plasminogen . Plasminogen levels consistently lower in controls as compared with severe BPD patients as shown with two different proteomic data analysis methods (iBAQ and DIA). Graph Pad Prism6 (v6.07) was used to generate plots.

Uteroglobin levels were looked at next to see differences between case and control groups. Both DIA and iBAQ methods uteroglobin were lower in severe BPD patients compared with controls (Ave11.73E+08±7.88E+07) (Figure 13).

Figure13. Box Plot analysis of Uteroglobin. Uteroglobin levels consistently lower in

37

severe BPD patients as compared with controls patients as shown with two different proteomic data analysis methods (iBAQ and DIA). Graph Pad Prism6 (v6.07) was used to generate plots.

Figure 14. Box Plot analysis of Cartilage Intermediate. Cartilage Intermediate levels were higher in severe BPD patients as compared with controls patients as shown with two different proteomic data analysis methods (iBAQ and DIA). Graph Pad Prism6 (v6.07) was used to generate plots.

Finally, cartilage intermediate layer protein 2 was higher in severe BPD patients compared to controls (Ave3.15E+06 ±1.74E+06) (Figure 14). The relative abundance levels cartilage intimidates protein to be very narrow compared to a larger spread in control cases.

38

Overall my hypothesis for the initial pilot study to discovery urinary biomarkers for BPD patients looked promising. The urinary proteome was comprehensively analyzed for severe BPD patients and appropriate age-matched controls from sparse amounts of urine collected from cotton balls placed in diapers. Furthermore, from as little as 135ul of urine statistically significant differences were found in urinary protein profiles from severe BPD patients compared with controls. I hypothesized to identify these same significantly different proteins from the same patient cohorts but using post natal day 1-3 time point. This cohort was about 3 times as large as the pilot study (42 vs. 13 samples) and considered a validation stage.

To detect patients at risk for BPD before they develop the disease would indicate the potential for a prognostic marker for very low birth weight infants who are at risk for developing BPD, which is clinically more relevant than being able to diagnose after X days at which point there are fewer treatment options. Therefore, 42 post natal day 1-3

(21 severe cases, 21 controls) were analyzed.

39

Prior to any further experimentation a pilot study must show sufficient data in order to

proceed with the next phase of a research project. Here, the pilot study showed sufficient

data in order to proceed with larger patient cohorts and larger experiments with a

validation study.

Validation Study

Validation Study reveals distinct proteome from pilot study

For the validation study, I used 42 urine samples from prematurely born infants, collected within 72 hours of birth. Half of these patients developed severe BPD later during their time in the NICU, while the other half did not develop any signs of BPD. All

42 samples were analyzed in DIA mode and were searched using the same spectral library that was used in the pilot study. In total close to 1200 proteins were identified and

of these 46 proteins were significant with a p value <0.05 (not corrected for multiple

testing; Table 3). The volcano plot analysis of the data (Figure 15) shows all proteins

identified as well as top proteins identified from using an in-house written random forest- based classification tool discussed below in more detail.

40

Gene Ontology and Principal Component Analysis PCA reveal unique characteristics for validation study BPD proteome.

I had hypothesized the same proteins that were found in the pilot study would be found in this validation analysis to be identified as potential biomarkers prognostic for

BPD. However, none of the proteins identified as significant in the pilot study showed significant differences in the validation study. Thus, a deeper statistical analysis of the proteins identified as being of differential abundance was performed. Because the list of significant proteins with a p value of 0.05 was so large, various more stringent statistical and bioinformatic analyses were applied to narrow down to a handful of biomarker candidates. First, a GO Analysis (DAVID (Huang da, Sherman, &

Lempicki, 2009)was performed to investigate the biological processes these proteins are involved in (Table 4). This online statistical resource is widely used as an analytical bioinformatics platform to extract biological meaning from a group of genes or proteins

The GO analysis indicated the top significant proteins mostly involved with immune response and cell death / apoptosis. Though this initial analysis seemed quite interesting and consistent with the possibility of proteins involved with response to disease, no solid conclusion could be made because of low p values.

41

Next, a principal component analysis (PCA) was performed using those proteins that showed differential abundance at a p-value of ≤0.05 (Figure 16). It was interesting to note that the PCA did not provide a clear separation of the cases and controls. Instead, the cases fell into two groups: one group was clearly different from the controls and the other group overlap with the controls. The figures shows a bi-plot, which shows the separation of the samples based on PC1 and PC2, but also shows the loadings (effect strengths) of the relevant proteins.

Top 7 BPD proteins indentified using unique classification algorithm

In order to narrow down the list of significant proteins another level of statistical analysis was performed using an in-house written random forest-based classification tool.

This classification was performed 10 times using the top 46 proteins that showed the most significant abundance differences in the case and control group. I preformed this analysis 10 times in order to identify the proteins that were most consistently identified as being good classifiers. More detailed analysis of the output of these 10 classification resulted in 7 high priority proteins (Table 5), 3 of which had been identified in all 10 classifications, 3 in 9 out of 10 classifications and 1 in 8 out of 10 classifications. I then proceeded to look further into these 7 proteins and their power to separate the cases from

42

the controls. First a PCA was performed with these 7 proteins. The result of this PCA is

shown as a bi-plot in Figure 17 to also include the effect strength of the different proteins.

The PCA of the top 7 proteins showed a clear separation between case a control

groups. The most sigificant separation is along PC1, to which P08238, Q9Y3B3, P11142

(heatshock HSP 90, transmembrane emp24, heatshock 71kDA) contribute the most. Next,

a further investigation was done into these 7 proteins individually by looking at boxplots

to show the relative abundance differences between case and controls for each group

Plastin-2 was higher in severe BPD patients than control patients

(Ave2.85E+05±6.61E+05). with a p value of 0.025 (Figure 18).

Figure 18. Box Plot Analysis of Plastin-2. Plastin-2 levels were higher in severe BPD patients as compared with controlspatients as shown with DIA proteomic data analysis method. GraphPad Prism6 (v6.07) was used to generate plots.

43

Next, Transmembrane emp24 domain-containing protein 7 box plot analysis

showed relative abundance lower in severe BPD patients than in controls

(Ave3.25E+04±2.71E+04) with a p value of 0.033 (Figure 19).

Figure 19. Box plot analysis of Transmembraneemp24 domain-containing protein 7 . Transmembraneemp24 domain-containing protein 7 levels were lower in severe BPD patients as compared with controls patients as shown with DIA proteomic data analysis method. GraphPad Prism6 (v6.07) was used to generate plots.

Then, heatshock congnate 71kDA protein box plot analysis indicated the relative

abundance levels of severe BPD patients lower compared to relative abundance levels of

heatshock congnate 71kDA protein in control patients(Ave1.22E+06±1.29E+06) with a

p value of 0.016 (Figure 20).

44

Figure 20. Box Plot analysis of Heatshock cognate 71kDA . Heatshock cognate 71kDA protein levels were lower in severe BPD patients as compared with controls patients as shown with DIA proteomic data analysis method. GraphPad Prism6 (v6.07) was used to generate plots.

Then heatshock protein HSP 90-beta relative abundance levels were lower in severe BPD

patients but the spread in control patient data poitnts was also quite wide

(Ave3.33E+04±5.62E+04 (Figure 21). Though this protein was still considered

signficiant with a p value 0.015.

45

Figure 21. Box Plot Analysis of Heatshock protein HSP 90-beta . Heatshock protein HSP 90-beta levels were lower in severe BPD patients as compared with controls patients as shown with DIA proteomic data analysis method. GraphPad Prism6 (v6.07) was used to generate plots.

Carboxypeptidase M showed relative abundance levels lower in severe BPD patients compared

with controls (Ave4.24E+05±3.02E+05) .Though the spread of severe BPD data points

was larger than controls, the the distribution of the urine levels in the controls was much

tighter. Carboxypepdiase M differences were significant with a p value of 0.014 (Figure

22).

46

Figure 22. Box Plot Analysis of Carboxypeptidase M . Carboxypeptidase M levels were lower in severe BPD patients as compared with controls patients as shown with DIA proteomic data analysis method. GraphPad Prism6 (v6.07) was used to generate plots.

Carboxypeptidase A1 box plot analysis showed relative abundance levels higher in severe BPD

patients than in control patients (Ave4.32E+05±4.25E+05) with a pvalue of 0.037

(Figure 23). Here, the spread of controls was more with control data points but severe

patients spread of data was much less.

47

Figure 23. Box Plot Analysis of Carboxypeptidase A1. Carboxypeptidase A1 levels were higher in severe BPD patients as compared with controls patients as shown with DIA proteomic data analysis method. GraphPad Prism6 (v6.07) was used to generate plots.

Finally, HLA Class I histocompatibility antigen protein box plot analysis showed the relative

abundance levels of severe BPD patients higher compared with control realtive

abundance levels (Ave1.73E+05±1.73E+05) with a pvalue of 0.004 (Figure 24).

48

Figure 24. Box Plot Analysis of HLA class I histocompatibility antigen protein. HLA class I histocompatibility antigen protein levels were higher in severe BPD patients as compared with controls patients as shown with DIA proteomic data analysis method. GraphPad Prism6 (v6.07) was used to generate plots.

The final analysis performed for biomarker discovery was a scatter plot of the most sigificant

separation from PCA analysis P08238, Q9Y3B3, P11142 (heatshock HSP 90,

transmembrane emp24, heatshock 71kDA) (Figure 25). A manually drew line separated

case from control to calculate the sensitivty 0.67, specificity 0.90 and accuracy 0.79.

Then, an ROC curve of analysis P08238, Q9Y3B3, P11142 (heatshock HSP 90,

transmembrane emp24, heatshock 71kDA) to indicate an AUROC of 0.853 (Figure 26).

49

Chapter IV

Discussion

The goal of this study was to investigate whether there are urinary biomarkers for

BPD patients. This analysis resulted in large overwhelming data sets and (???) yet equally fascinating revelations protein differences for BPD. Using the powerful combination of a robust high throughput sample processing method, MStern blotting, with the high accuracy high sensitivity of mass spectrometry, and robust quantitation via

DIA methods, the possibility of a better performing biomarker for BPD was investigated.

The analysis began with a pilot study of 6 BPD cases and 7 controls all collected between post natqal days 14 and 28. Using only 135ul of urine collected from soaked cotton balls that had been placed in the diapers of the NICU patients, i.e. the specimens were collected truly non-invasively. A comprehensive urinary proteome of premature infants with BPD and premature infants without BPD was identified. Despite the small amount of urine used, close to 1200 proteins were identifed with an FDR of 1%. The ability to identify and quantify so many proteins from such little urine is remarkable and especially useful when researching the physiological state of a fragile infant at the brink of a potentially lifelong debilitating disease. Furthermore, since urine is considered to be a

50

systemic body fluid, a vast amount of information of the body’s physiological state can

be mined to make clinically relevant statements of health.

In the initial pilot study, in order to make more sense of this large proteomic data

set, these 1200 proteins were analyzed using various proteomic software packages.

Consistently among each of the 4 proteomic software packages used (Protein Pilot,

MaxQuant iBAQ, MaxQuant LFQ, Spectronaut DIA) was the identification of 5 proteins

that showed consistent statistically significant abundance differences: ferritin heavy

chain, ferritin light chain, plasminogen, uteroglobin, and cartilage intermediate protein.

PubMed literature search was performed for each of these proteins to identify prior knowledge about any of these proteins in the context of BPD.

A literature search for ferritin in context of BPD identified a study performed by

Cifuentes et al. which found that very low birth weight infants with BPD had higher plasma ferritin levels than patients without BPD (Cifuentes et al., 1984). Strikingly, the patient cohort mentioned in the study was also considered VLBW (<27 weeks). The same protein identified in plasma by Cifuentes et al. was found in our study, but in urine. An extensive literature search found no previous publication relating ferritin levels in urine for BPD biomarker studies neither by mass spectrometry nor any other method. These findings in urine are internally consistent in that both ferritin chains, i.e. heavy and light chain, are increased in BPD vs. non-BPD controls, strongly supporting the notion that this is not a random findings irrespective of the fact that any multiple testing correction would dismiss this finding.

51

In this proteomic study plasminogen levels were lower in severe BPD patients compared with controls. Interestingly a paper that performed a prospective study showing trachea aspirate fluid plasminogen activator receptor levels being significantly higher in

BPD patients than non BPD patients (Tunc et al., 2014). Of note: plasminogen can be cleaved into up to 5 products (Plasmin heavy chain A, Activation peptide, Angiostatin,

Plasmin heavy chain A-short form, Plasmin light chain B)(Didiasova et al., 2014).

However, the identified peptides did not allow us to identify the nature of the plasminogen precursor or product present in the urine samples. Instead, the identified plasminogen peptides spanned the entire protein. Irrespective of which plasminogen species was present, it was noteworthy that the dysregulation of plasminogen and its pathways had been described before in the context of BPD (not urine, though).

The protein name uteroglobin did not show any results when PubMed was searched in combination with BPD. However, it turns out that uteroglobin has several synonyms including CC10 and CC16, which resulted in PubMed hits when searched in combination with BOPD. For instance, a study by Sarafadis et al. described CC16 as a potential blood marker for lung injury in mechanically ventilated neonates (Sarafidis et al., 2008). The study had found uteroglobin (CC16) levels to be higher in mechanically ventilated BPD patients than mechanically ventilated non BPD patients. Interestingly, my proteomic study showed that uteroglobin levels are lower in the urine of severe BPD patients compared to controls. My finding is consistent with the fact that CC16 is currently tested as a therapeutic approach to ameliorate the effects of BPD

52

Finally, cartilage intermediate layer protein 2 was found to be higher in severe

BPD patients than controls. It was noteworthy that the increased abundance levels in the severe cases showed a very tight distribution as shown in the box plot (Figure 14).

However, no previous study was found relating this protein to BPD. Though, it was interesting to see that 4 out of the5 proteins identified in the pilot study were consistently deregulated in BPD cases which were all previously described in literature as implicated in BPD.

The next phase, the validation phase consisted of analyzing a larger cohort comprising 42 samples (21 severe BPD cases, 21 controls) and restricting LC/MS analysis to only the robust DIA method. In contrast to the urine specimens of the cohort of the pilot study, the urine specimens of this validation cohort had been collected within

72 hours of birth, i.e. before any manifestation of BPD; however, we had the information which premature infants developed severe BPD later on and which remained BPD-free. I

hypothesized to detect patients at risk for BPD before they develop the disease to

indicate the potential for a prognostic marker for very low birth weight infants who are at

risk for developing BPD, which is clinically more relevant than being able to diagnose

after X days at which point there are fewer treatment options. Therefore, 42 post natal

day 1-3 (21 severe cases, 21 controls) were analyzed.

Close to 1200 proteins were identified and quantified in this validation study. This

number was expected since the same spectral library was used as in the pilot study

53

described above. Of these 1200 proteins, 46 proteins were identified as showing statistically significant abundance changes with a p value of <0.05. Though a different set of significant proteins were identified here in the validation study as compared to the pilot study. Although the initial hypothesis had to be dismissed the new hypothesis to be able to identify a significant biomarker with appropriate sensitive and specificity was initiated. At first step, the list 46 proteins were prioritized to a more manageable number using different statistical and bioinformatic approaches. Using an in-house developed tool to use the random forest algorithm to identify classifiers to separate cases and controls.

This strategy identified 7 significant proteins (HLA class I histocompatibility antigen, plastin-2, heatshock HSP 90-beta, heatshock 71kDA protein, transmembrane emp24 domain containing protein 7, carboxypeptidase M, and carboxypeptidase A1). A PCA and biplot analysis for the top 46 proteins which gave good separation of case vs. control groups was done next. The subsequent PCA and biplot analysis using only the 7 proteins identified with the random forest classifier resulted in a separation of cases and controls for heatshock HSP 90-beta, heatshock 71kDA protein, and transmembrane emp24 domain containing protein 7 all showed to have the largest effect on the separation of cases and controls. The PCA of all 46 proteins gave a nice separation for all proteins groups as influenced in the case group vs. the control group. The 7 protein PCA plot gave a clear separation for 3 proteins most influenced in the case group ( , heatshock HSP 90- beta, heatshock 71kDA protein, transmembrane emp24 domain containing protein 7).

Subsequently, a PubMed literature searched was used in order to identify any prior knowledge of any of these seven proteins in the context of BPD.

54

For protein transmembrane emp24 domain containing protein 7, HSP90-beta and heatshock cognate 71kDA, all of which showed lower levels in severe BDP cases vs. controls, no publications in the context of BPD could be identified. Nevertheless, it is noteworthy, that two out of the three proteins most responsible for the separation into cases and controls were heat shock proteins, which might indicate a misregulation of the misfolded protein response that precedes the development of BPD.

While Carboxypeptidase A1 was higher in severe BDP patients than controls, carboxypeptidase M was lower in severe BPD patients than controls. There was no publication related to BPD. A search indicated both proteins being involved in proteolysis.

Finally, HLA class I histocompatibility protein and plastin 2 were investigated.

First, a PubMed literature search for HLA class I histocompatibility antigen identified three publications for HLA protein related to BPD. The first publication took autopsy tissue and found that tissue from BPD patients had significantly higher HLA than controls (Jacobson JD, 1993), which is consistent with our observed increased urinary levels of HLA protein. Another study performed a genetic test and found a genetic association with HLA allele and BPD (Rocha et al., 2011). Then, another genetic study had found an association between HLA and chronic lung disease in neonates (Clark et al.,

1982). The proteomics data had suggested HLA to be higher in BPD patients than controls. This was very promising to me because there were similar associations between

55

BPD patients and controls with my urine proteomics study. Also, the ability to find at

least three previously reported publications on HLA and BPD was quite reassuring.

Subsequently, plastin-2 association with BPD was not found in any previous

publications. However, similar to HLA class I histocompatibility antigen, it is an immune

response-related protein and it is interesting to note that both of these proteins were

upregulated in the preemies that went on to develop BPD than those that remained free of

BPD, potentially indicating an early immune response in the infants at risk of BPD.

I also took a closer look at chosen subset from the 46 proteins identified to have

statistically significant abundance differences in urine from preemies that went on to

develop BPD vs. those without BPD. A PubMed search to find any relation of these

proteins to BPD identified several connections. Interestingly all of the proteins randomly

chosen that had been described in the context of BPD, namely isocitrate dehydrogenase

[NADP], ceruloplasmin, chitinase-3-like protein 1, and vitamin D-binding protein (Lee et

al., 2011; McCarthy, Bhogal, Nardi, & Hart, 1984; Serce Pehlevan et al., 2015;

Vohwinkel et al., 2011).

Finally, a 3D scatter plot and receiver operating characteristic analysis (ROC) was generated for the 3 most significant separation in the PCA analysis from the top 7 proteins; P08238, Q9Y3B3, P11142 (heatshock HSP 90, transmembrane emp24, heatshock 71kDA). The definite potential to use these three proteins as biomarkers for

BPD was reflective in the 0.67 sensitivity, 0.90 specificity, with accuracy of 0.79.Also,

56

the 0,853 AUROC also gives a promising result for clinical implementation of these

biomarkers for BPD.

Overall, this study sought to identify potential biomarkers for BPD in a pilot study with samples taken at post natal day 1-3 then proceeded to look for those same proteins in a validation study in a larger cohort with samples taken at post natal day14-28. Although the same proteins in the pilot study and validation study were not the same, what was found was numerous proteins that showed differential abundances in cases vs. controls that had been described in the context of BPD, albeit never as urinary proteins in BPD patients. Even though making a definitive statement for any protein identified is well beyond the scope of this study, a solid method using this mass spectrometry based workflow has been further established for biomarker discovery for BPD. Also, this study revealed numerous proteins that could lead towards a better understanding of BPD pathophysiology and more clinically relevant information for BPD patients.

One speculation is that the different proteins identified in post natal day 1-3 vs. post natal day 14-28 could very well be a reflection of the disease progression. Since the time between the first and last specimen collection amounted to up to 4 weeks the disease progression from no symptoms to severe BPD could vastly change the urinary proteome within that timeframe. One argument could be that if these infants were already predisposed to BPD then the same proteins would be identified in both time points.

Therefore, these proteins would very likely be a reflection of disease, and if these proteins could be identified very early on then proper monitoring and intervention could

57

prevent further progress of BPD in these patients. On the other hand, the proteins

identified post natal day 1-3 were more reflective of immune response. These specific proteins could be reflective of the early stages of disease. Furthermore, proteins identified in the specimen collected 14-28 days postnatal could be reflective of late stage of disease.

Overall, there is a great need for a reliable prognostic biomarker that can accurately identify infants at risk for developing BPD. Early predictors of infants with greater risk of developing BPD could initiate early intervention to reduce disease severity and complications. This study had reiterated the power of mass spectrometry based proteomics when catered for urine biomarker research. The outstanding ability to identify over 1000 proteins in microliter amounts of urine could only have been performed using this unique technology and opens the door to more systematic urinary biomarker discovery studies in preemies and extreme low birth weight infants, where even available urine volumes are extremely limited. Also, the unbiased method in which these samples were acquired gave more confidence in the proteins that were identified and how remarkably specific these proteins identified were in context to BPD.

In conclusion, using a non-invasive, abundant, protein rich body fluid such as urine in combination with high throughput proteomic sample processing methods like

MStern Blot and robust LC/MS protein quantitation methods like DIA can be more effective in the investigation of biomarker discovery for BPD. Having a more definitive biological marker of disease or risk of developing disease can better equip clinicians with

58

appropriate information for early prognostication and ultimately provide targeted therapies for the treatment of BPD.

59

Chapter V

Appendix

P02794 Ferritin heavy chain P02792 Ferritin light chain P11684 Uteroglobin P00747 Plasminogen

Q8IUL8 Cartilage intermediate

Figure 9. Proteomics reveals 5 significant proteins in 4 different quantitation methods Proteomics reveals 5 significant proteins in 4 different quantitation methods, MaxQuant LFQ and MaxQuant iBAQ are all MS 1 level quantitation and Spectra count and DIA are MS 2 level quantitation. Protein names with uniprot accession numbers are listed.

60

Figure 10. Volcano plot of pilot study .Volcano plot of all proteins identified using DIA MS2 quantitation (blue). Top significant proteins consistent amongst different proteomic software(red).

61

Protein Accession Protein Name P value

P05534 HLA class I 0.004

histocompatibility

antigen

O75874 Isocitrate 0.009

dehydrogenase

[NADP]

cytoplasmic

P00450 Ceruloplasmin 0.010

P07911 Uromodulin 0.012

Q13162 Peroxiredoxin-4 0.0124

P36222 Chitinase-3-like 0.014

protein 1

P14384 Carboxypeptidase 0.015

M

P08238 Heat shock protein 0.015

HSP 90-beta

P11021 78 kDa glucose- 0.016

regulated protein

P22392 Nucleoside 0.016

diphosphate kinase

62

B;

P68371 Tubulin beta-4B 0.016

chain;Tubulin beta-

4A chain

P11142 Heat shock cognate 0.017

71 kDa protein

P02774 Vitamin D-binding 0.021

protein

P02748 Complement 0.021

component C9

P52566 Rho GDP- 0.023

dissociation

inhibitor 2

P04406 Glyceraldehyde-3- 0.023

phosphate

dehydrogenase

P07737 Profilin-1 0.024

Q9H6Y7 E3 ubiquitin- 0.0245

protein ligase

RNF167

P13796 Plastin-2 0.025

63

Q9NUQ9 Protein FAM49B 0.026

P24855 Deoxyribonuclease 0.027

-1

P60660 Myosin light 0.028

polypeptide 6

P06681 Complement C2 0.0326

Q9Y3B3 Transmembrane 0.034

emp24 domain-

containing protein

7

P60709 Actin 0.035

P01042 Kininogen- 0.036

1promoting factor

P15085 Carboxypeptidase 0.037

A1

P25774 Cathepsin S 0.037

Q9BYX7 Putative beta-actin- 0.037

Q08397 Lysyl oxidase 0.038

homolog 1

Q63HM1 Kynurenine 0.038

64

formamidase cRAP_sp crap 0.038

Q15375 Ephrin type-A 0.038

receptor 7

P05937 Calbindin 0.038

P02533 Keratin, type I 0.039

cytoskeletal 14

P05164 Myeloperoxidase 0.039

P28072 Proteasome subunit 0.045

beta type-6

P01024 Complement C3 0.045

P07900 Heat shock protein 0.047

HSP 90-alpha

P07437 Tubulin beta chain 0.047

Q9NS68 Tumor necrosis 0.048

factor receptor

superfamily

member 19

O60888 Protein CutA 0.048

Q9BUD6 Spondin-2 0.048

P06733 Alpha-enolase 0.050

65

Table 3. List of top 46 significant proteins from first time point DIA analysis with p value of <0.05.

References

Figure 15. Volcano plot of validation study. Volcano plot of all validation study proteins identified using DIA MS2 quantitation (blue). Top significant proteins identified with random forest classifier(red

66

Biological Process P Value lymphocyte mediated immunity 0.008 leukocyte mediated immunity 0.009 innate immune response 0.014 positive regulation of cell death 0.017 positive regulation of apoptosis 0.017 positive regulation of programmed 0.017 cell death immune response 0.026 cell killing 0.031 defense response 0.034 oxidation reduction 0.035 regulation of apoptosis 0.040 regulation of programmed cell 0.040 death regulation of cell death 0.040 immune effector process 0.040 complement activation, alternative 0.062 pathway

I-kappaB kinase/NF-kappaB 0.082 cascade natural killer cell mediated 0.082 cytotoxicity

67

natural killer cell mediated 0.082

immunity

induction of programmed cell death 0.090

induction of apoptosis 0.090

Table 4. DAVID Gene Ontolgoy (GO) (Huang da et al., 2009)

68

Figure 16. PCA analysis (Dante R) of 46 proteins. PCA analysis (Dante R) of top significant proteins with controls indicated in red and cases indicated in blue. Arrows show the effect strength for the protein indicated.

69

Accession Number Protein Name

P13796 Plastin-2

Q9Y3B3 Transmembrane emp24

domain-containing

protein 7

P11142 Heatshock cognate

71kDA protein

P08238 Heatshock protein HSP

90-beta

P14384 Carboxypeptidase M

P15085 Carboxypeptidase A1

P05534 HLA Class I

histocompatibility

antigen

Table 5. Most significant proteins resulting random forest classification tool.

70

PCA Bi-Plot (55.8%) 1. P13796 Case_9 Case_5 1. Case_9 Case_2 0. Control_443 Control_100 Case_4 P15085

P08238 Case_3 PC2 (17.9%) Q9Y3B3 Case_6

P11142 Control_640Control_658Control_379Control_599 Case_6Case_5 Case_3

Control_631Control_110Control_444Case_5 Control_553Case_4 0. Control_351Case_5Control_994Control_663 Case_2 Case_1001Control_523

Control_550Control_481 Control_369Case_3

Control_311Control_373 Case_2 -0.5

Control_377 Case_6 Control_503 Case_6Case_3 Case_2

-1.0 P05534

P14384

-2.5 -2.0 -1.5 -1.0 -0.5 0. 0.

PC1 (37.9%)

Figure 17. PCA analysis (Dante R) of 7 proteins. PCA (Dante R) using the 7 significant proteins with controls indicated in red and cases indicated in blue. Arrows signify which protein accession numbers provide effect strength.

71

Figure 25. 3D scatter plot. 3D scatter plot of P08238,Q9Y3B3,P11142(heatshock HSP 90, transmembrane emp24, heatshock 71kDA) using SPSS.

72

Figure 26. ROC Curve. ROC Curve of P08238,Q9Y3B3,P11142(heatshock HSP 90, transmembrane emp24, heatshock 71kDA) using SPSS.

73

Endnotes

Beretov, J., Wasinger, V. C., Schwartz, P., Graham, P. H., & Li, Y. (2014). A standardized and reproducible urine preparation protocol for cancer biomarkers discovery. Biomark Cancer, 6, 21-27. doi: 10.4137/BIC.S17991

Budhraja, R., Sreevidya, M., Rathi, S., Jalali, S., Chhablani, P., Kekunaya, R., . . . Rao, C. (2014). Identification of biomarkers for risk prediction and disease progression in Retinopathy of Prematurity (ROP): a potentially blinding disorder. Paper presented at the Brainstorming Meeting and Workshop: Proteomics Present and Future, India.

Chial, H. (2008). DNA Sequencing Technologies Key to the Human Genome Project. Nature Education, 1(1).

Cifuentes, R. F., Miller, P. A., & Deinard, A. S. (1984). ESTIMATED IRON BALANCE AND PLASMA FERRITIN LEVELS IN VLBW INFANTS. Pediatr Res, 18(S4), 238A-238A.

Clark, D. A., Pincus, L. G., Oliphant, M., Hubbell, C., Oates, R. P., & Davey, F. R. (1982). HLA-A2 and chronic lung disease in neonates. JAMA, 248(15), 1868- 1869.

Cox, J., Matic, I., Hilger, M., Nagaraj, N., Selbach, M., Olsen, J. V., & Mann, M. (2009). A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc, 4(5), 698-705. doi: 10.1038/nprot.2009.36

Didiasova, M., Wujak, L., Wygrecka, M., & Zakrzewicz, D. (2014). From plasminogen to plasmin: role of plasminogen receptors in human cancer. Int J Mol Sci, 15(11), 21229-21252. doi: 10.3390/ijms151121229

Domon, B., & Aebersold, R. (2006). Mass spectrometry and protein analysis. Science, 312(5771), 212-217. doi: 10.1126/science.1124619

Ehrenkranz, R. A., Walsh, M. C., Vohr, B. R., Jobe, A. H., Wright, L. L., Fanaroff, A. A., . . . Human Development Neonatal Research, N. (2005). Validation of the

74

National Institutes of Health consensus definition of bronchopulmonary dysplasia. Pediatrics, 116(6), 1353-1360. doi: 10.1542/peds.2005-0249

Gillet, L. C., Navarro, P., Tate, S., Rost, H., Selevsek, N., Reiter, L., . . . Aebersold, R. (2012). Targeted data extraction of the MS/MS spectra generated by data- independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics, 11(6), O111 016717. doi: 10.1074/mcp.O111.016717

Huang da, W., Sherman, B. T., & Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 4(1), 44-57. doi: 10.1038/nprot.2008.211

Jacobson JD, T. W., Benjamin DR. (1993). Increased expression of human leukocyte antigen-DR on pulmonary macrophages in bronchopulmonary dysplasia. Nature Pediatric Research, 34(3), 341-344.

Jedinak, A., Curatolo, A., Zurakowski, D., Dillon, S., Bhasin, M. K., Libermann, T. A., . . . Moses, M. A. (2015). Novel non-invasive biomarkers that distinguish between benign prostate hyperplasia and prostate cancer. BMC Cancer, 15(1), 259. doi: 10.1186/s12885-015-1284-z

Kennedy, M. J., Griffin, A., Su, R., Merchant, M., & Klein, J. (2009). Urine collected from diapers can be used for 2-D PAGE in infants and young children. Proteomics Clin Appl, 3(8), 989-999. doi: 10.1002/prca.200900045

Kentsis, A., Ahmed, S., Kurek, K., Brennan, E., Bradwin, G., Steen, H., & Bachur, R. (2012). Detection and diagnostic value of urine leucine-rich alpha-2-glycoprotein in children with suspected acute appendicitis. Ann Emerg Med, 60(1), 78-83 e71. doi: 10.1016/j.annemergmed.2011.12.015

Kentsis, A., Lin, Y. Y., Kurek, K., Calicchio, M., Wang, Y. Y., Monigatti, F., . . . Bachur, R. (2010). Discovery and validation of urine markers of acute pediatric appendicitis using high-accuracy mass spectrometry. Ann Emerg Med, 55(1), 62- 70 e64. doi: 10.1016/j.annemergmed.2009.04.020

Kentsis, A., Shulman, A., Ahmed, S., Brennan, E., Monuteaux, M. C., Lee, Y. H., . . . Kim, S. (2013). Urine proteomics for discovery of improved diagnostic markers

75

of Kawasaki disease. EMBO Mol Med, 5(2), 210-220. doi: 10.1002/emmm.201201494

Kim, M. S., Pinto, S. M., Getnet, D., Nirujogi, R. S., Manda, S. S., Chaerkady, R., . . . Pandey, A. (2014). A draft map of the human proteome. Nature, 509(7502), 575- 581. doi: 10.1038/nature13302

Lal, C. V., & Ambalavanan, N. (2015). Biomarkers, Early Diagnosis, and Clinical Predictors of Bronchopulmonary Dysplasia. Clin Perinatol, 42(4), 739-754. doi: 10.1016/j.clp.2015.08.004

Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., . . . International Human Genome Sequencing, C. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921. doi: 10.1038/35057062

Lee, C. G., Da Silva, C. A., Dela Cruz, C. S., Ahangari, F., Ma, B., Kang, M. J., . . . Elias, J. A. (2011). Role of chitin and chitinase/chitinase-like proteins in inflammation, tissue remodeling, and injury. Annu Rev Physiol, 73, 479-501. doi: 10.1146/annurev-physiol-012110-142250

Mann, K., & Edsinger, E. (2014). The Lottia gigantea shell matrix proteome: re-analysis including MaxQuant iBAQ quantitation and phosphoproteome analysis. Proteome Sci, 12, 28. doi: 10.1186/1477-5956-12-28

McCarthy, K., Bhogal, M., Nardi, M., & Hart, D. (1984). Pathogenic factors in bronchopulmonary dysplasia. Pediatr Res, 18(5), 483-488.

Muntel, J., Xuan, Y., Berger, S. T., Reiter, L., Bachur, R., Kentsis, A., & Steen, H. (2015). Advancing Urinary Protein Biomarker Discovery by Data-Independent Acquisition on a Quadrupole-Orbitrap Mass Spectrometer. J Proteome Res, 14(11), 4752-4762. doi: 10.1021/acs.jproteome.5b00826

Pedroza-Diaz, J., & Rothlisberger, S. (2015). Advances in urinary protein biomarkers for urogenital and non-urogenital pathologies. Biochem Med (Zagreb), 25(1), 22-35. doi: 10.11613/BM.2015.003

76

Piersigilli, F., & Bhandari, V. (2015). Biomarkers in neonatology: the new "omics" of bronchopulmonary dysplasia. J Matern Fetal Neonatal Med, 1-7. doi: 10.3109/14767058.2015.1061495

Pisitkun, T., Johnstone, R., & Knepper, M. A. (2006). Discovery of urinary biomarkers. Mol Cell Proteomics, 5(10), 1760-1771. doi: 10.1074/mcp.R600004-MCP200

Rocha, G., Proenca, E., Areias, A., Freitas, F., Lima, B., Rodrigues, T., . . . Guimaraes, H. (2011). HLA and bronchopulmonary dysplasia susceptibility: a pilot study. Dis Markers, 31(4), 199-203. doi: 10.3233/DMA-2011-0811

Sarafidis, K., Stathopoulou, T., Diamanti, E., Soubasi, V., Agakidis, C., Balaska, A., & Drossou, V. (2008). Clara cell secretory protein (CC16) as a peripheral blood biomarker of lung injury in ventilated preterm neonates. Eur J Pediatr, 167(11), 1297-1303. doi: 10.1007/s00431-008-0712-3

Sebastian T. Berger , S. A., Jan Muntel, Nerea Cuevas Polo, Alex Kentsis, Richard Bachur and Hanno Steen. (2015). MStern blotting –high throughput PVDF membrane-based proteomic samples preparation for 96-well plates.

Serce Pehlevan, O., Karatekin, G., Koksal, V., Benzer, D., Gursoy, T., Yavuz, T., & Ovali, F. (2015). Association of vitamin D binding protein polymorphisms with bronchopulmonary dysplasia: a case-control study of gc globulin and bronchopulmonary dysplasia. J Perinatol, 35(9), 763-767. doi: 10.1038/jp.2015.58

Specificity, S. a. from https://en.wikipedia.org/wiki/Sensitivity_and_specificity

Steen, H., & Mann, M. (2004). The ABC's (and XYZ's) of peptide sequencing. Nat Rev Mol Cell Biol, 5(9), 699-711. doi: 10.1038/nrm1468

Strimbu, K., & Tavel, J. A. (2010). What are biomarkers? Curr Opin HIV AIDS, 5(6), 463-466. doi: 10.1097/COH.0b013e32833ed177

Tunc, T., Cekmez, F., Yildirim, S., Bulut, O., Ince, Z., Saldir, M., . . . Coban, A. (2014). Predictive value of soluble urokinase plasminogen activator receptor, soluble

77

ST2, and IL-33 in bronchopulmonary dysplasia. Pediatr Res, 75(6), 788-792. doi: 10.1038/pr.2014.28

Veenstra, T. D., Conrads, T. P., Hood, B. L., Avellino, A. M., Ellenbogen, R. G., & Morrison, R. S. (2005). Biomarkers: mining the biofluid proteome. Mol Cell Proteomics, 4(4), 409-418. doi: 10.1074/mcp.M500006-MCP200

Venable, J. D., Dong, M. Q., Wohlschlegel, J., Dillin, A., & Yates, J. R. (2004). Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat Methods, 1(1), 39-45. doi: 10.1038/nmeth705

Vitzthum, F., Behrens, F., Anderson, N. L., & Shaw, J. H. (2005). Proteomics: from basic research to diagnostic application. A review of requirements & needs. J Proteome Res, 4(4), 1086-1097. doi: 10.1021/pr050080b

Vohwinkel, C. U., Lecuona, E., Sun, H., Sommer, N., Vadasz, I., Chandel, N. S., & Sznajder, J. I. (2011). Elevated CO(2) levels cause mitochondrial dysfunction and impair cell proliferation. J Biol Chem, 286(43), 37067-37076. doi: 10.1074/jbc.M111.290056

Wilhelm, M., Schlegl, J., Hahne, H., Moghaddas Gholami, A., Lieberenz, M., Savitski, M. M., . . . Kuster, B. (2014). Mass-spectrometry-based draft of the human proteome. Nature, 509(7502), 582-587. doi: 10.1038/nature13319

Wisniewski, J. R., Zougman, A., Nagaraj, N., & Mann, M. (2009). Universal sample preparation method for proteome analysis. Nat Methods, 6(5), 359-362. doi: 10.1038/nmeth.1322

Yamamoto, T., Langham, R. G., Ronco, P., Knepper, M. A., & Thongboonkerd, V. (2008). Towards standard protocols and guidelines for urine proteomics: a report on the Human Kidney and Urine Proteome Project (HKUPP) symposium and workshop, 6 October 2007, Seoul, Korea and 1 November 2007, San Francisco, CA, USA. Proteomics, 8(11), 2156-2159. doi: 10.1002/pmic.200800138

Zhang, Z. Q., Huang, X. M., & Lu, H. (2014). Early biomarkers as predictors for bronchopulmonary dysplasia in preterm infants: a systematic review. Eur J Pediatr, 173(1), 15-23. doi: 10.1007/s00431-013-2148-7

78

Zysman-Colman, Z., Tremblay, G. M., Bandeali, S., & Landry, J. S. (2013). Bronchopulmonary dysplasia - trends over three decades. Paediatr Child Health, 18(2), 86-90.

79