Proteomic Analyses of Fetal Atria and Ventricles

by

Zhen Qi Lu

A thesis submitted in conformity with the requirements for the degree of Master of Science Department of Physiology University of Toronto

© Copyright by Zhen Qi Lu 2014

Proteomic Analyses of Human Fetal Atria and Ventricles

Zhen Qi Lu

Master of Science

Department of Physiology University of Toronto

2014 Abstract

In this study we carried out a mass spectrometry-based proteome analysis of human fetal atria and ventricles. Heart lysates were analyzed on the Q-Exactive mass spectrometer in biological triplicates. Protein identification using MaxQuant yielded a total of 2,754 atrial (91%) and 2,825 ventricular protein groups (83%) in at least 2 of the 3 runs with ≥2 unique peptides.

Statistical analyses using fold-enrichment (>2) and p-values (≤0.05) selected high confidence chamber enriched atrial (134) and ventricular (81) protein groups. Several previously characterised cardiac chamber-enriched were identified in this study including atrial isoform of light chain 2 (MYL7), atrial natriuretic peptide (NPPA), and connexin 40

(GJA5) for atria, and ventricular isoforms of myosin light chains (MYL2 and MYL3), myosin heavy chain 7 (MYH7), and connexin 43 (GJA1) for ventricle. Our data was compared to in- house generated and publicly available human microarrays, several human cardiac proteomes, and phenotype ontology databases.

ii

Acknowledgments

I am very fortunate with this opportunity to study and conduct research in Dr. Gramolini’s lab.

Dr. Gramolini’s encouragement and guidance not only in cardiovascular research but also life lessons have really inspired me to work hard. You have always believed in me and given me more responsibilities throughout my degree that I will always be grateful for.

I am also thankful for my committee members, Drs. Peter Backx and Thomas Kislinger, for taking time from your busy schedules to attend my committee meetings. Dr. Backx has always provided insightful ideas throughout this project with your vast knowledge in cardiovascular physiology. Dr. Kislinger offered great help with your expertise in bioinformatics and proteomics and allowed me to run my sample on your mass spectrometer.

There are also many other colleagues within Dr. Gramolini’s lab that I would like to express my appreciations for. Dr. Parveen Sharma, you will always be one of my closest mentors and I am sure we will always stay in touch. Mr. Jake Cosme, thank you for patiently teaching me all the initial laboratory skills and mass spectrometry techniques. Dr. Allen Teng, I really appreciate your scientific insights, life talks, and endless laughter we shared. Ms. Wenping Li, it was always great meeting you on the subway and we shared many great conversations. I also want to thank

Ms. Kathryn Lipsett, Dr. Dingyan Wang, Ms. Cynthia Abbasi, Dr. Tetsuaki Miyake, and Mr.

Aaron Wilson for making the journey so enjoyable.

Lastly, I want to take this opportunity to thank my family, parents, grandparents, and brother for constant support all the way. I will always be appreciative of everything I have and happy to help others. iii

Table of Contents

Table of Contents ...... iv

List of Tables ...... vi

List of Figures ...... vii

List of Appendices ...... viii

Commonly used abbreviations ...... ix

Chapter 1 : Introduction ...... 1

1 Heart Development ...... 1

1.1 In utero ...... 2

1.1.1 Heart Tube Formation ...... 2

1.1.2 Heart Tube Looping ...... 3

1.1.3 Heart Tube Septation ...... 3

1.2 Ex utero ...... 4

1.3 Cardiac Cell Types ...... 4

2 Atrial and Ventricular Differences ...... 5

2.1 Morphological Differences ...... 6

2.2 Functional Differences ...... 7

2.3 Electrophysiological Differences ...... 9

3 Cardiovascular Disease ...... 10

3.1 Atrial Disease ...... 10

3.2 Ventricular Disease ...... 11

4 Systems Biology...... 12

4.1 Transcriptomics via RNA Microarray ...... 14

4.2 Mass Spectrometry Based Proteomics ...... 15

4.3 Liquid Chromatography – Tandem Mass Spectrometry ...... 15

iv

4.4 Mass Spectrometry Protein Identification ...... 17

5 Project Overview ...... 17

5.1 Rationale ...... 18

5.2 Aim ...... 18

Chapter 2 : Material and Methods ...... 20

1 Sample Preparation for Mass Spectrometry ...... 20

2 Mass Spectrometry Pipeline ...... 21

3 Data Search ...... 21

4 Database Comparisons ...... 21

4.1 Microarray Data ...... 22

4.2 Mass Spectrometry Data ...... 22

Chapter 3 : Proteomic Analyses of Human Fetal Atria and ventricle ...... 23

Chapter 4 : Discussion ...... 66

1 Dataset Integration and Congenital Heart Defects ...... 66

2 Mass Spectrometry Comparisons ...... 68

3 Transcriptomics ...... 69

Chapter 5 : Limitations ...... 71

Chapter 6 : Future Directions ...... 72

Chapter 7: References ...... 74

Appendices ...... 79

v

List of Tables

Table 1. Atrial enriched products with significant protein and transcript enrichment ...... 62

Table 2. Ventricular enriched gene products with significant protein and transcript enrichment ...... 63

Table 3. Atrial enriched gene products with previous human and mouse phenotypes ...... 64

Table 4. Ventricular enriched gene products with previous human and mouse phenotypes ...... 65

vi

List of Figures

Figure 1.1. Overview of human heart development in utero ...... 1

Figure 1.2. Overview of atrial and ventricular known proteins, biological processes, and action potentials...... 5

Figure 1.3. Overview of systems biology and “omics” approach ...... 13

Figure 1.4. Shotgun proteomics and mass spectrometry workflow ...... 16

Figure 3.1. Overall experimental design and analysis flowchart ...... 44

Figure 3.2. Human cardiac chamber proteomics ...... 46

Figure 3.3. Protein expression across human atrial and ventricular samples ...... 47

Figure 3.4. Correlation of proteomic data with microarray and cardiac phenotype ontology ...... 48

Figure 3.5. Ventricular enriched protein groups expression in human heart sections via immunohistochemistry ...... 50

Figure 3.6. Abstract graphic ...... 52

Supplemental Figure 1. Pearson correlation of human fetal four heart chamber transcriptomic data . 80

Supplemental Figure 2. LTQ ion trap mass spectrometry data of human fetal biological triplicates .. 81

Supplemental Figure 3. Comparison of the proteins identified via LTQ ion trap and Q-Exactive mass spectrometer ...... 81

Supplemental Figure 4. Protein interaction of atrial enriched proteins with congenital heart defects . 82

Supplemental Figure 5. Atrial and ventricular enriched compared to human whole heart proteome ...... 83

vii

List of Appendices

Appendix A ...... 79

Appendix B ...... 81

Appendix C ...... 82

Appendix D ...... 83

viii

Commonly used abbreviations

A Atria

AF Atrial fibrillation

APD90 Action potential duration at 90%

ARVC Arrhythmogenic right ventricular dysplasia/cardiomyopathy

CHD Congenital heart defects

DCM Dilated cardiomyopathy

GO

HCM Hypertrophic cardiomyopathy

HPLC High-performance liquid chromatography

LC-MS/MS Liquid chromatography – tandem mass spectrometry

LFQ Label free quantification

MudPIT Multidimensional protein identification technology

V Ventricles

ix 1

Chapter 1 : Introduction 1 Heart Development

The heart is one of the most vital organs in the body and is the first to develop during embryogenesis [1]. The human heart develops in a well-defined spatial and temporal fashion in order to provide efficient nutrients and oxygen that the rest of the body requires to function after birth. The heart is a highly adaptive organ capable of exhibiting profound changes and remodeling both during normal development and as compensatory mechanisms during cardiac stress. This section will examine the major developmental events that occur as the heart forms and matures into the designated morphology and function as illustrated in Figure 1.1 [2].

Profound structural changes also occur immediately after birth. It is important to understand normal heart development as the basis for understanding disease pathways and mechanisms.

Figure 1.1. Overview of major human heart development in utero including heart tube formation, looping, and septation into different chambers. The range of days (postconception) where these developmental events take place are also indicated above. The main chambers are labelled as ventricles (V), atria (A), right ventricle (RV), left ventricle (LV), left atrium (LA), and right atrium (RA). Figure adapted from Srivastava et al [3].

2

1.1 In utero

During the first eight weeks of gestation, the human embryo undergoes four main stages of development, which includes fertilization, implantation, gastrulation, and embryogenesis. The heart starts to form during the beginning of the third week of gestation (at approximately day 15) where the epiblast cells migrate through the primitive streak to form three germ layers, the ectoderm, mesoderm, and endoderm [4]. The mesoderm in particular elongates laterally and cranially forming various structures and organs, such as the somites from the lateral mesoderm giving rise to axial structures and cardiac progenitor cells from the cranial mesoderm giving rise to different components of the heart [2].

1.1.1 Heart Tube Formation

Cardiac progenitor cells continue to migrate forming a crescent shaped tubular structure inferior to the pericardial cavity [2]. As the embryonic disc continues to fold, the heart also folds into a tube, residing in the new region between the pericardiac space and the foregut. The folding is also due to the growth of the neural tube and invagination of the endoderm. At the same time, the endothelial plexus is also formed as the circulatory system develops. The growing heart tube receives contribution from the secondary heart field as a source of cardiac progenitor cells forming the myocardium, smooth muscle cells, and endothelial cells as well as giving rise to the right ventricle [5].

On the other hand, the first heart field gives rise to the ventricular conduction cells, left ventricular and atrial cardiomyocytes [6]. The heart at this stage is centrally positioned as an

3 inverted “Y”, where the arms thicken and develops into the precursor for the atrial chambers and the stem becomes the left ventricle [2]. However, it is important to note that not all of the definitive chambers are present when the heart tube forms and asymmetrical events take place especially among the blood vessels attached to the chambers.

1.1.2 Heart Tube Looping

As the left ventricle dissociates from the mediastinum on day 21, the tube bends to the right in a process referred to as looping. Left and right asymmetry forms via leftward flow of secreted factors [2]. The outflow tract also forms from the outlet part of the heart loop. Distinct chambers appear first with the left ventricle as a spherical bulge and the right ventricle anterior to it by day

25 [2]. As looping continues, the left and right ventricles are now side by side and the atria are directly anterior to the ventricles separated by the atrioventricular canal.

1.1.3 Heart Tube Septation

At approximately day 24, the left and right atria are separated by two septa. The septum primum spans the ventral and posterior walls while the septum secundum spans the dorsal and posterior walls of the atria [7]. By day 50, the septum primum fuses with the atrioventricular cushions and the septum secundum overlaps partially with the septum primum. In utero, the foramen ovale remains open to receive oxygen and nutrients from the maternal circulation.

Approximately at the same time as the atria develop, the ventricles also begin to separate and both chambers expand in size. This process continues until about day 46, where a thick ridge

4 forms separating the left and right ventricles [7]. Only later on during development does the septum completely close off connecting the endocardial cushion.

1.2 Ex utero

After birth, when the lung inflates from the first sets of breaths and the infant is cutoff from the maternal circulation, the pulmonary blood flow increases significantly after alveoli are filled with oxygen [8]. There is a decrease in pulmonary pressure in the right side of the heart but an increase in the pressure in the left side. The difference in pressure between the two atria result in septum primum and secundum pressing tightly against each other until both are fused together.

Human pulmonary arterial pressure in the infants decrease to adult levels by two weeks after birth and the majority of changes occur in the first 2 to 3 days ex utero [8].

1.3 Cardiac Cell Types

The heart is composed of primarily four difference cell types, including fibroblasts, cardiomyocytes, smooth muscle and endothelial cells [9]. Previous studies have shown that the majority of cardiomyocytes and endocardial cells are derived from the cardiac mesoderm. On the other hand, the proepicardium progenitors give rise to the interstitial fibroblasts, smooth muscle and endothelial cells as well as myocytes within the atrioventricular septum [10]. Understanding proteins specific for each cardiac cell type from different chambers is important in future experiments as specific protein differences are not well-studied in . In addition, functional differences should also be studied as a whole with different cardiac cells intercommunicate with one another in the heart.

5

2 Atrial and Ventricular Differences

The atria and ventricles play distinct roles in the structure and function of the heart. The heart allows for proper propagation of electrical signals, generates contraction and ensures proper blood flow to the lungs and the rest of the body. The morphological and electrophysiological properties of different chambers need to be examined in detail to better understand the heart as a synchronous unit as shown in Figure 1.2.

Figure 1.2. An overview of the known atrial and ventricular enriched proteins, biological processes, and action potentials from stem cell derived cardiomyocytes. The known atrial proteins as shown in the order above include connexin 40, myosin light chain 2 atrial isoform (MLC-2a), voltage dependent potassium channel 1.5 (enriched but not exclusive to atria), sarcolipin (SLN), and atrial natriuretic peptides (ANF). The ventricle enriched proteins include connexin 43, myosin light chain 2 ventricular isoform (MLC-2v), phospholamban (PLN), hes- related family bHLH transcription factor with YRPW motif 2 (HEY2, knockout in ventricles increased the expression of atrial proteins), iroquois homeobox 4 (IRX4), and four and a half LIM domains protein 2 (FHL2). In addition, there is an enrichment of Golgi and ER proteins in the atria and mitochondrial and cytoskeletal proteins in the ventricles [11]. The action potentials of atrial and ventricular cardiomyocytes derived from stem cells indicate the traditional way of separating cells from different chambers via electrophysiological recordings [12].

6

Golgi and ER proteins are previously shown to be enriched in the atria and these organelles are important in the secretory pathway [11]. Proteins synthesized within the ER are shipped to the

Golgi for modification. The modified proteins then have multiple fates include shipment to the lysosome to be degraded, membrane proteins are integrated to the membrane, and soluble proteins are stored in secretory vesicles.

Correlation comparisons between the four chambers at the transcript level generated from this study showed that the left and right side of the same chamber have the highest correlation compared to different chambers (Appendix A). Although the overall correlation between the chambers in terms of the transcript expressed and the expression levels are relatively similar between the chambers, the differences are not yet completely understood and may be of great interest to further investigate in humans.

2.1 Morphological Differences

The morphology and localization of the heart chambers are extremely important for a functional heart pump. Cellular and tissue morphologies are two levels critical to consider when discussing about chamber differences. In terms of the cellular morphology, atrial cardiomyocytes are more elongated and form crisscrossing patterns in bundles of various sizes. In contrast, ventricular cardiomyocytes are larger in area and well-organized in a uniform direction [13].

At the tissue level, the atria are initially smooth-walled and atrial trabeculae only develop at approximately 16 years of age in humans, which eventually develop into auricles [14]. These auricles will then in turn determine the size and shape of the left and right atria. The trabeculae

7 are important in cardiac contractility and direct blood flow [15]. The ventricles on the other hand develop myocardial trabeculae already in utero during ballooning of the ventricles to increase in size and also contractility. It is also thought that the early ventricular trabeculae help to match the hemodynamic demand of the embryo.

The left and right atria are composed of venous components, a vestibule and an appendage [16].

The right atrium is positioned anterior to the left atrium and predominated by its appendage. The body of the left atrium is much more apparent compared to the right atrium. The right ventricle after birth becomes a thin, heavily trabeculated, and crescent shaped chamber [17]. The left ventricle on the other hand is ellipsoid in shape with thick, smooth walls and fine trabeculations.

The left atrium and ventricle are separated by the mitral valve and on the right side, the upper and lower chambers are separate by the tricuspid valve.

2.2 Functional Differences

The atria pump blood to the ventricles with the right side carrying de-oxygenated blood and the left side oxygenated blood. The right atrium receives de-oxygenated blood from the rest of the body via the superior and inferior vena cava and right ventricle pumps blood to the lung via the pulmonary artery. On the other side of the heart, the left atrium receives oxygenated blood via the pulmonary vein and that blood then enters the left ventricle to supply blood to the rest of the body via the aorta. In addition, the right ventricle only utilizes one fifth of the energy compared to the left ventricle despite equal cardiac output [17].

8

Cardiac and skeletal muscle contractions occur via cross-bridge formations between the thick and thin filaments. Previous studies examining the expression and biochemical properties of the myosin heavy and light chains of the thick filaments, between the two heart chambers provide additional insight into the functional differences between the chambers. In the human heart, there are 12 different types of myosin heavy and light chains. However, myosin heavy chain 6 (MYH6) and myosin light chain 2 atrial isoform (MYL7) are expressed in the atria; whereas myosin heavy chain 7 (MYH7) and myosin light chain 1 and 2 ventricular isoforms (MYL3 and MYL2, respectively) are expressed higher in the ventricles [18]. The reason for different types of myosin light chains expressed in the two chambers is not well understood but MYL2 and MYL3 are strictly expressed in the ventricles and MYL7 is found to be expressed in the adult ventricles at low levels. MYL7 knockout mice showed significantly reduced atrial contraction and resulted in embryonic lethality with improper cardiac morphogenesis [19]. MYH6 is expressed in the ventricles during fetal stage, but the level is lower than in the atria and the level in ventricles also decrease during development; the opposite trend is seen for MYH7 with higher expression in the ventricles.

The α-myosin heavy chain (MYH6) is the major isoform in human fetal atria (specific percent ranges differ between individuals but generally much greater than 50% in atria) and β-myosin heavy chain (MYH7) is the major isoform in the ventricles (>95%) [20]. MYH7 exhibits a lower level of ATPase and lower filament sliding speed in order to conserve intracellular energy compared to MYH6 [21]. The different composition of myosin heavy chains between the chambers suggest that the atria expend more energy per myosin heavy chain during each contraction as compared to the ventricular isoform. The reason for different myosin compositions may be that the atria ensure blood will pass onto the ventricles quickly using

9 relatively more energy consuming myosin, whereas the ventricles conserve as much energy as possible for generating greater pressure during ventricular systole.

2.3 Electrophysiological Differences

Under normal physiological conditions, the electrical signaling pathway is initiated at the sinoatrial node in the right atrium and that signal propagates through both atria causing them to contract and pass blood into the ventricles [22]. The signal travels to the atrioventricular node, a signal relay site, before it reaches and spreads throughout the ventricles via the Purkinje fibres.

Researchers have often used electrophysiological recordings in order to confirm the identity of different types of cardiomyocytes derived from cardiac progenitor cells [12, 23]. At the electrophysiological level, one of the important distinguishing characteristics between the atrial and ventricular cardiomyocytes is the smaller action potential duration at 90% (APD90) in the atria compared to the ventricles, meaning a faster rate of relaxation [24].

Traditionally, electrical recording is a useful tool to verify the identity of stem cell derived cardiomyocytes from different chambers but it is time-consuming with large variations, depending on the experimental conditions. Novel chamber specific proteins obtained from this experiment can be coupled to flow cytometry for more efficient and accurate sorting of cardiac cells. Nevertheless, the different cardiomyocytes derived from the patient’s own induced pluripotent stem cells provide an alternate strategy as cell-based therapies and use as cellular models for studying the underlying individualized diseased mechanisms [23].

10

3 Cardiovascular Disease

Cardiovascular disease is the number one killer of men and women worldwide [25]. Cellular, structural, and electrophysiological components of both heart chambers need to operate in perfect synchronicity for each normal heartbeat to occur. If any one of the above components fails to develop, it can also greatly impact the normal function of the heart.

Congenital heart defects are quite common affecting approximately 1 in 100 children showing certain forms of cardiac defect [1]. Structural defects include valve defects, improper septum formation between the chambers, and inverted arteries. Electrical defects can also result in abnormal heart rates and electrocardiogram either at the pacemaker or anywhere along the conduction system. It is important to understand chamber differences when considering diseases that target a particular chamber such as atrial fibrillation in the atria and dilated and hypertrophic cardiomyopathies that mainly occur in the ventricles. This section will briefly examine atrial and ventricular cardiovascular diseases.

3.1 Atrial Disease

One of the better-studied atrial specific sustained arrhythmogenic diseases is atrial fibrillation

(AF). During AF, early after-depolarization results in depolarization of atrial cells during the repolarization phase due to shortening of action potential and faster recovery of inactivated L- type Ca2+ current [26]. Instead of the normal electric potential propagating from the atria to the ventricles, re-entry results with a retrograde formation of a spiral or electrical motor. The wavelength of the potential is critical and calculated as the product of refractory period and the

11 conduction velocity. A shorter wavelength corresponds to a greater number of re-entry circuits that the atrial cardiomyocytes can accommodate. As a result of decreased heart efficiency, heart rates increase in order to pump enough blood to the rest of the body.

The most striking structural remodeling associated with AF is the formation of fibrosis and it has been shown previously that fibrosis can lead to permanent AF [27]. Hence, knowing the exact composition of fibrosis accompanied by AF can serve as therapeutic targets and this project can help to elucidate atrial specific or enriched proteins that may be involved in fibrosis formation.

For instance, different chambers express different types and levels of integrin proteins that can be identified in this study. However, ventricular fibrillation has also been previously reported but it is much more lethal resulting in sudden cardiac death compared to AF [28]. Another study reports that the atria express ventricular proteins during AF, such as an overexpression of ventricular myosin isoforms as a process of dedifferentiation [29]. Hence it is important to understand the proteins that are expressed in the healthy atria to determine the differences during disease states.

3.2 Ventricular Disease

Dilated (DCM) and hypertrophic (HCM) cardiomyopathies are two major cardiomyopathies that occur primarily in the ventricles. Patients with dilated cardiomyopathies exhibit dilated ventricular chamber and reduced ejection fraction of less than 50% [30]. Almost one-half of the

DCM cases are due to unknown causes and termed as “idiopathic” DCM and only a small number of DCM is familial. On the other hand, HCM is associated with an abnormally thickened wall of the right and left ventricles and most of the patients are asymptomatic [31]. Inheritable

12

HCM and DCM are mainly caused by proteins involved in force generation (such as myosin heavy chain 6 and 7), calcium regulation (such as troponin, tropomyosin, and myosin binding protein), and structural framework (such as actin, vinculin, and ankyrin repeat domain containing proteins) [30]. Genetic testing of patients with HCM or DCM is a valuable tool for understanding genetic differences among patients with idiopathic cardiomyopathy.

Another cardiac disease of the ventricles is the arrhythmogenic right ventricular dysplasia/cardiomyopathy (ARVC) that affects desmosomal proteins and not normally found in the atria due to different gap junctions and myosin compositions between the ventricles and atria

[18, 32, 33]. Defective desmosomal proteins may also result in increasing gap junction turnover, improper action potential conduction, and arrhythmogenicity. Fibrotic tissue and inflammatory cells replace detached cardiomyocytes due to defective desmosomes, which result in fat deposition as the hallmark of ARVC. A close examination of ventricular enriched structural proteins from this study can help to identify potential therapeutic targets specifically in the ventricles.

4 Systems Biology

The field of systems biology encompasses the study of the entire system as a whole shown in

Figure 1.3 via inter-disciplinary approaches and complex technological equipment [34]. Instead of the traditional reductionist view, systems approach or the study of “omics” provide the technology to acquire a large amount of data at various cellular levels (genomic, transcriptomic, and proteomic levels) in order to study complex systems [35]. Alterative splicing and post- translational modifications add further complexity to the system and result in a large number of

13 possible proteins as the final function unit in the cell. In this study, we undertook both transcriptomic and proteomic approaches to better understand the biological differences between the heart chambers. While the transcriptome examines the total RNA, proteome provides insight into all of the proteins at a particular time point in the heart’s development [35]. Recent advances such as next generation RNA sequencing has enabled the ability of unbiased acquisition of the entire transcriptome as well as transcriptional modifications. This section will provide an overview of the different technologies available with specific references to previous works in characterizing chamber differences. Advances in the field of proteomics from gel-based proteomics to off-gel proteomics have increased the efficiency of sample preparation and accuracy as separation is limited to the dimension of the gel.

Figure 1.3. Systems biology involves the identification and characterization of the genome, proteome, biological pathways, and functional networks at the large-scale. The central dogma contains additional complexity as the result of transcriptional differences and post- translational modifications listed above.

14

4.1 Transcriptomics via RNA Microarray

Transcriptomics via RNA microarray has been a useful and relatively inexpensive systems biology tool for a long time dating back to 1982 in cancer studies [36]. Essentially, microarrays consist of a collection of wells giving off different fluorescent intensities where the intensities correlate to the number of transcripts bound to the complementary probes on the chip. A higher probe intensity is correlated to a greater number of transcripts in the cell. There have been many previous transcriptomic studies utilizing RNA microarray to elucidate chamber differences in both mouse and human samples [11, 37, 38]. These studies reported enriched genes and gene ontology pathways that are found to be enriched in different heart chambers.

Microarrays provide useful information regarding mRNA that are transcribed from DNA; however, there are disadvantages associated with it include unspecific binding, different commercial microarray chips available and normalization methods making it difficult to compare directly across different studies. In addition, previous studies show that transcript data do not perfectly match with protein expression [39]. The difference between protein and transcript expression may be attributed to differences in synthesis, stability, modification, degradation, and ribosomal capacity [35, 40].

Gene ontology of the chamber enriched transcript data from this study do also match with the expected functional roles of the chambers such as relatively more structural and cytoskeletal proteins in the ventricles and signal transduction proteins in the atria [11]. Recent studies have relied on proteomics utilizing sophisticated machines and bioinformatic tools to directly provide the protein profile in an efficient manner.

15

4.2 Mass Spectrometry Based Proteomics

Mass spectrometry based proteomics allow direct profiling of the heart chambers at the protein level. The mass spectrometer is composed of three major components: an ion source, mass analyzer, and detector that are responsible for protein ionization into the gas phase, sample filtering based on mass to charge ratios, and detection of ions, respectively [41]. Earlier proteomic strategies utilize in-gel digestion where the bands of interest, separated by two- dimensional gel electrophoresis, are digested by trypsin and analyzed by the mass spectrometer

[42]. The limitations of this approach include inability to resolve proteins that are extremely basic, acidic, large, or small. In addition, when studying two conditions and comparing two gels, it may be difficult to match the proteins as the location on the gel can change and difficult to visualize proteins at low concentrations on the SDS gel. Subcellular fractionation is another sample preparation procedure help to reduce the complexity of sample into different organelles by ultracentrifugation for better protein detection [43].

4.3 Liquid Chromatography – Tandem Mass Spectrometry

Recent advances in mass spectrometry technology in gel-free shotgun proteomics coupled to liquid chromatography (LC-MS/MS) show improved peptide separation. The peptides are injected into columns with C18 coated beads and are separated based on hydrophobicity as one dimensional reverse phase. Two-dimensional chromatography contains both reverse phase and strong cation exchange that provide further separation of peptides based on charges.

16

In the earlier models of LC-MS/MS mass spectrometer, multi-dimensional protein identification technology (MudPIT) offers additional peptide separation by increasing salt concentrations of the solution in the microcapillary columns to elute peptides with different affinities to the strong cation exchange and reverse phase as shown in Figure 1.4 [43, 44]. The parent ions are separated and analyzed in the mass spectrometer generating MS peaks. Then, the parent ions are fragmented and the mass to charge ratios of the fragmented ions are then recorded to produce tandem MS/MS peaks.

Figure 1.4. Shotgun proteomics using two-dimensional microcapillary columns. Digested peptides are eluted through the high-performance liquid chromatography and the column is coupled to a high voltage source. The ionized particles are then detected by the mass spectrometer producing MS and MS/MS peaks based on the mass to charge ratios. Subsequent spectra analysis and protein identification are carried out by automated computer software. Figure adapted from Arab et al [45].

The new model of mass spectrometer, Q-Exactive hybrid quadruple-Orbitrap mass spectrometer offers higher sensitivity and mass accuracy in comparison to earlier models. The main components of the system include quadruple mass filter, C-TRAP, HCD collision cell, and

Orbitrap. In addition, the EASY-Spray column in this system provides tight seal between connections capable of producing highly reproducible and sharp peaks on the chromatograph as well as maintain an overall stable temperature. A comparison of data generated from Q-Exactive

17 to the LTQ ion trap coupled with a two-dimensional microcapillary column showed that the Q-

Exactive data was more consistent with a greater number of proteins identified (Appendix B). In this study, label free quantification was used in order to report the reproducibility of the data with relative quantification as opposed to absolute quantification.

4.4 Mass Spectrometry Protein Identification

Protein identification and analytical software invented over the years have helped to provide accurate and efficient identification and interpretation of a large amount of data generated by the mass spectrometers. There are many publicly available peptide identification search algorithms such as X! Tandem, Myrimatch, OMSSA, Comet, SEQUEST, and Andromeda. Previous studies showed that these algorithms produced consistent output and similar number of proteins identified when the same searching parameters are applied [46]. However, it is advantageous to combine multiple protein identification engines to obtain as many matched peptides and proteins as possible. The false discovery rate and other cut-offs are important to select confidently matched entries against the database.

5 Project Overview

The overall objectives of this research project include:

1. Characterization of the atrial and ventricular proteome

2. Identification of chamber enriched proteins

3. Comparison of chamber enriched data against other proteome database, previous cardiac

phenotypes, and immunohistochemical staining in human cardiac tissues

18

5.1 Rationale

One important area that remains under-studied is the characterization of chamber enriched and specific protein differences in the human fetal hearts. Prior to understanding pathological mechanisms it is important to examine healthy tissues as the basis to explain chamber specific differences in disease mechanisms and identify targets for potential therapeutic treatment.

Although many of the structural, electrophysiological, and functional differences have been worked out between the atria and ventricles, as discussed throughout this chapter, the proteomic differences between the chambers are less characterized especially in human samples. Previous studies were carried out mainly at the transcriptome level but studies have shown that the transcriptomes do not perfectly correlate with the proteome or the proteins expressed [39]. In addition, many of the studies were carried out in murine models [38, 47]. Although mice provide insights into the anatomical functions that are common to humans, the relative protein expression and isotypes are practically difficult to project to human samples. Hence, it is of great interest to study protein differences between chambers at the proteome level directly on human heart samples. In this study, we are interested in fetal hearts as a source of healthy tissue. Fetal hearts provide valuable insights into heart development as the morphology of the fetal heart do differ compared to the adult. Clinically, healthy fetal heart sample also provide valuable reference for understanding congenital heart defects and other cardiac diseases.

5.2 Aim

The aim of this study is to characterize the differences between the human fetal cardiac chambers via proteomic and transcriptomic analyses. We will compare the fetal microarray data generated

19 in this study to available adult data to identify expression differences between fetal and adult hearts. We will examine the cardiac phenotype of the chamber enriched proteins to determine proteins involved in cardiac diseases as potential therapeutic targets and diseased mechanisms.

Lastly, we will also cross-reference the chamber enriched proteins to human heart sections and verified bioinformatics findings via immunohistochemistry staining from publicly available databases.

20

Chapter 2 : Material and Methods

In this Chapter, I outline the materials and methods that are not included in the Chapter 3.

1 Sample Preparation for Mass Spectrometry

Cardiac tissues were subjected to standard mass spectrometry protocol optimal for the Q-

Exactive mass spectrometer from Dr. Thomas Kislinger’s laboratory at the Toronto Medical

Discovery Tower, Toronto, Ontario. Basically, isolated proteins were denatured, reduced, alkylated and digested into peptides for shotgun proteomics [48, 49]. The protocol was based on previously published procedures and each sample was analyzed with a four hour gradient [49].

The trypsinized peptides were desalted using OMIX C18 pipette tips. The OMIX C18 cleanup protocol include conditioning (5 cycles of 200 uL of 50% ACN with 0.1% FA), equilibration (10 cycles of 200 uL of HPLC-Water with 0.1% FA), sample binding (10 cycles of entire sample), cleanup (5 cycles of 200 uL of HPLC-Water with 0.1% FA), and elusion (7 cycles of 200 uL of

80% ACN with 0.1% FA). The samples were spun at 16,000 g for 30 minutes to remove any solid debris or undissolved peptides. The supernatant was then dried under vacuum centrifugation to 5 µL. The lyophilized peptide pellet was re-dissolved in LC–MS grade water/0.1% formic acid and concentration was determined using NanoDrop spectrophotometer

(Thermo Scientific) after which 2 µg of peptides were loaded for each sample into the mass spectrometer.

21

2 Mass Spectrometry Pipeline

The lysates were analyzed on the Q-Exactive tandem mass spectrometer with an Easy-Spray nano-electrospray ionization source and Easy-nLC 1000 nano flow ultra-performance liquid chromatography system (Thermo Scientific). The analytical column was packed with C18 (50 cm EASY-Spray, 2 µm particles, and 100 Å pore size). Elution buffer for reverse phase chromatography contained 0.1% (v/v) formic acid (buffer A) and 100% CNA with 0.1% (v/v) formic acid (buffer B). Data were acquired in a top 10 data dependent acquisition mode at a flow rate of 250 nL/minute over a gradient of 5-30% buffer B for 230 minutes. MS spectra were in the scan range of 400-1600 m/z and resolution of 70,000 and the MS/MS spectra were scanned at a resolution of 17,500. The charge states were +2, +3, +4, and +5.

3 Data Search

In order to select chamber enriched proteins at the proteomic level, p-values (≤0.05) and fold- changes (>2) via LFQ (label free quantification) intensities were used. On the other hand, significance at the transcriptomic level was determined by the p-values due to variations between microarray chips and different literature cutoffs for fold-change.

4 Database Comparisons

Additional transcriptomic and proteomic datasets were obtained and used to validate the results generated from this experiment. Publicly available databases include Omnibus

22

[50], Human Gene Association Database [51], Mouse Genome Informatics [52], Human

Proteome Map [53] and ProteomicsDB [54].

4.1 Microarray Data

Raw CEL files, containing intensity values of the probes, from both in-house fetal and publicly available adult microarrays were normalized via RMA normalization method and reported genes with more than three times the global median probe intensities in R using the “affy” library.

4.2 Mass Spectrometry Data

Proteins were found to be associated with heart phenotypes if there are reported problems in either the cardiac or vascular system. Visual Basics and R were also used to automate data tabulation and data mining. In addition, public available cardiac phenotype databases did not specify previous cardiac conditions in different heart chambers. The results from this study can then be used to examine expression differences between the heart chambers for proteins known to cause cardiac diseases.

23

Chapter 3 : Proteomic Analyses of Human Fetal Atria and ventricle

The manuscript in this chapter has been submitted to the Journal of Proteome Research and is under review.

Zhen Qi Lu1, Ankit Sinha2, Parveen Sharma1, Thomas Kislinger2,4, Anthony O. Gramolini1,3,4,*

1Department of Physiology or 2Medical Biophysics, Faculty of Medicine; 3Heart and

Stroke/Richard Lewar Centre of Excellent for Cardiovascular Research, University of Toronto,

Toronto, Ontario, Canada; 4Princess Margaret Cancer Center, Toronto, Ontario, Canada

*Address correspondence to:

Anthony Gramolini, PhD Thomas Kislinger, PhD MaRS, Toronto Medical Discovery Tower MaRS, Toronto Medical Discovery Tower 101 College Street, 3-311 101 College Street, 9-807 M5G 1L7 M5G 1L7 University of Toronto, Canada Princess Margaret Cancer Center Tel: 416-634-8813 Tel: 416-581-7627 [email protected] [email protected]

KEYWORDS: Bioinformatics, Chamber specificity, Fetal tissue, Mass spectrometry, Ventricle,

Q-Exactive

24

CONTRIBUTIONS

Z. Q. Lu prepared the samples for mass spectrometry and A. Sinha analyzed the samples on the mass spectrometry as well as searched the data on MaxQuant. Z. Q. Lu analyzed the data, interpreted results, prepared the figures and tables, and wrote the manuscript. P. Sharma, P.

Backx, T. Kislinger, and A. O. Gramolini provided essential guidance throughout this project and revised the manuscript. Human fetal samples were obtained from Dr. Robert Hamilton.

Note: Four modified supplemental tables from the manuscript are included in this chapter and the rest of the tables will be publicly available on the Journal of Proteome Research website.

25

ABSTRACT

In this study we carried out a mass spectrometry-based proteome analysis of human fetal atria and ventricles. Heart protein lysates were analyzed on the Q-Exactive mass spectrometer in biological triplicates. Protein identification using MaxQuant yielded a total of 2,754 atrial protein groups (91%) and 2,825 ventricular protein groups (83%) in at least 2 of the 3 runs with ≥2 unique peptides. Statistical analyses using fold-enrichment (>2) and p-values (≤0.05) selected high confidence chamber enriched atrial (134) and ventricular (81) protein groups. Several previously characterised cardiac chamber-enriched proteins were identified in this study including atrial isoform of myosin light chain 2 (MYL7), atrial natriuretic peptide (NPPA), connexin 40 (GJA5), and peptidylglycine alpha-amidating monooxygenase (PAM) for atria, and ventricular isoforms of myosin light chains (MYL2 and MYL3), myosin heavy chain 7 (MYH7), and connexin 43 (GJA1) for ventricle. Our data was compared to in-house generated and publicly available human microarrays, several human cardiac proteomes, and phenotype ontology databases.

.

26

INTRODUCTION

Cardiovascular disease is the leading cause of deaths worldwide 1. There is a continuing need for a greater understanding of cardiovascular physiology to provide effective methods in improving the diagnostic and therapeutic outcomes of cardiovascular disease. The human heart is composed of 4 chambers, with two upper atrial chambers and two lower ventricular chambers. The atria and ventricles are responsible for different functions within the heart. The atria initiates electrical events through its pacemaker potential, primes the ventricles, and are enriched with numerous golgi and ER proteins 2, whereas the ventricles provide muscular contraction for pumping blood over a large distance in the body, and contain an abundance of mitochondrial proteins that provide energy for contractions 2. Limited access to healthy human cardiac tissues has restricted previous studies to focus on the transcriptomic and proteomic analysis of the mouse heart 3,4.

Although mouse studies have provided essential insight into chamber differences, the relative protein expression patterns and differences are difficult to extrapolate and apply to human hearts.

Initial studies examining human cardiac tissues were mostly conducted via microarray analysis of the adult human heart, and studies were performed on available diseased adult tissues, primarily in ventricles 5,6,7,8.

Detailed, high accuracy global proteomic analyses of heart chambers has been made possible with advancements in instrumentation and computational tools. The aim of this study was to characterize the differences between the human cardiac chambers. To this extent we carried out comparative proteomic and transcriptomic analysis of human fetal atria and ventricular samples, which identified chamber enriched and chamber specific proteins. We further compared the fetal microarray data generated in this study to available adult data to identify expression differences

27 between fetal and adult tissues. Publicly available cardiac proteome and phenotype databases also provided further validation of the novel chamber enriched proteins.

28

MATERIAL AND METHODS

Tissue Collection

The human fetal samples were obtained from elective terminations of normal pregnancies via the Health Center Biobank Program (Hospital for Sick Children Research Institute, REB protocol #1000029263). Written consent was obtained from the patients and heart tissues were sent for banking (REB protocol #1000011232). Further details of biobanking and surgical procedures were outlined in Strandberg et al. (2013) 9. Fetal hearts of 17 to 23 weeks of age were obtained and stored frozen in -80°C until mass spectrometry.

Sample Preparation for Mass Spectrometry

The frozen hearts were thawed and homogenized in 50% (v/v) 2,2,2-trifluoroethanol (TFE) in phosphate-buffered saline using a dounce homogenizer for 25 strokes on ice. Total proteins from

15 mg of tissue were denatured, reduced, alkylated and digested into peptides as previously described 10,11. Briefly, after homogenization, the lysate was incubated for 120 minutes at 60 °C, and cysteines were reduced using 5 mM of dithiothreitol at 60 °C for 30 min and subsequently alkylated using 25 mM of iodoacetic acid. Tissue lysate were then diluted five times (v/v) using

100mM ammonium bicarbonate buffer and digested using 5 µg of mass spectrometry grade trypsin (Thermo). Digested peptides were desalted using OMIX C18 pipette tips and dried under vacuum centrifugation. The lyophilized peptide pellet was re-dissolved in LC–MS grade water/0.1% formic acid and peptide concentration was determined using NanoDrop spectrophotometer (Thermo Scientific). Peptide mass spectrometry was performed exactly as described previously 10. Mass spectrometer and MaxQuant parameters can be found in the

29

Analysis and Documentation of Peptide and Protein Identifications (in the supporting information).

Mass Spectrometry Data Search

Raw files were searched using MaxQuant version 1.4.1.2 12 using UniProt complete human proteome protein sequence database (version: 2012-07-19, number of sequences: 20,232). Target decoy was utilized to control for false discovery rate (FDR) of the matched peptide spectra, with

FDR set to 1%. “Protein groups” consist of proteins grouped together that cannot be separated with commonly shared peptides. Protein groups identified with two or more peptides were used for subsequent analysis (razor plus unique). The unfiltered list of all identified protein groups can be found in Table S1 (in the supporting information).

Statistics and Data Analysis

The Gene Ontology (biological processes) for each group of interest were computed using the

DAVID Bioinformatics Database (http://david.abcc.ncifcrf.gov) 13. The overrepresented GO terms (p-value ≤0.05 ranked by the number of proteins in each category) were reported to characterize large-scale biological differences between the chambers.

Venn diagrams were generated using the Venn Diagram Plotter v. 1.5.4798.30840 written by the

Department of Energy (PNNL, Richland, WA) and VennDIS v.1.0 from Dr. Kislinger’s laboratory. Volcano plots for fold-changes and p-values as well as all bioinformatic analyses were done in R with standard packages. Protein groups with fold-change >2 and p-value ≤0.05

30 were considered enriched in a chamber and proteins with label-free quantification (LFQ) intensities of value 0 were replaced with 1 to compute the fold-changes. Heatmaps were generated using Java TreeView v.1.1.6r2 via k-mean clustering (1000 runs) using Cluster v.3.0.

The most optimal number of clusters was computed by the Elbow Analysis in R that maximized the percent of variances explained with the least number of clusters. We further compared the atrial and ventricular proteome from this study to recently published Human Proteome Map database for human fetal whole heart proteome (http://www.humanproteomemap.org) 8 and the

ProteomicsDB for adult ventricular proteome (https://www.proteomicsdb.org) 14. Proteins reported by other studies were compared to the protein groups from this study and a match was reported if the protein was identified within a protein group.

Microarray Analysis

Cardiac tissue was also analyzed by a parallel microarray analysis. Tissue was immediately stored in RNAlater Stabilization reagent (Qiagen). Total mRNA was extracted using 1 mL of

TRIzol and 200 µL of chloroform as described previously 15. Microarray analysis was carried out in triplicate by the University Health Network Microarray Facility (Toronto) using the

Affymetrix Human 1.0 st v1 microarray chips. The human adult atrial (samples GSM80654,

GSM80655, GSM80698) and ventricular (samples GSM657, GSM658, GSM659) microarrays were obtained from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) 16 on the same microarray chip. The raw data from in-house and publicly available (series number

GSE3526) microarrays were RMA normalized and only genes with individual probe intensities greater than three times the global median intensity were reported. The processed microarray data was then cross-referenced with the proteome data, via gene mapping.

31

Both atrial and ventricular enriched gene products, clustered according to fetal proteome and transcriptome, were searched for known cardiac phenotypes in humans against the Human

Genetic Association Database (http://geneticassociationdb.nih.gov/cgi-bin/index.cgi) 17 and known phenotypes in mouse against the Mouse Genome Informatics

(http://www.informatics.jax.org) 18. Human to mouse orthologous were generated using the

Biomart database (http://central.biomart.org).

Proteins with both high protein and transcript expressions from the ventricles were compared to the Human Proteome Map. Cardiac cell expression (in cardiomyocytes or non-cardiomyocytes) of these ventricular enriched proteins identified in both studies was confirmed by immunohistochemical staining of adult ventricular sections from the Human Protein Atlas v.12

(http://www.proteinatlas.org) 19.

32

RESULTS

Assessment of Proteome Data Quality

To interrogate chamber specific proteomes, human fetal cardiac chambers were obtained and prepared for mass spectrometry analyses as previously described (Figure 3.1). The samples from each chamber were run in biological triplicates to observe biological variability and subsequent analyses used LFQ intensities to adjust for systematic biases between runs. Following mass spectrometry, a total of 32,441 atrial and 32,816 ventricular peptides were identified. For the subsequent analysis, we selected only those proteins with at least two peptides, resulting in a total of 2,754 protein groups in the atria and 2,825 in the ventricles. To select for ‘chamber- enriched’ protein groups, we calculated the LFQ intensity ratios as a measure of fold-abundance

(>2) and significance by p-values (≤0.05). Using both parameters as a prioritization strategy, we identified 134 atrial enriched protein groups (Table S2) and 81 ventricular protein groups (Table

S3). The key experimental steps are summarized in Figure 3.1.

A comparison of the data generated by the triplicate biological samples showed that 91% of protein groups (2,518/2,754) were identified in at least two atrial biological replicates, whereas

83% (2,343/2,825) of protein groups were identified in at least two ventricle replicates as shown in Figures 3.2A and 3.2B, respectively.

We next compared the atrial and ventricular proteins identified in this study to two previously published human proteomics studies (Figure 3.2C); the data from Wilhelm et al. contains a human adult ventricular proteome 14 and the data from Kim et al. contains the proteome of human fetal whole heart 8. Wilhelm et al. identified 4,031 unique proteins and Kim et al.

33 identified a total of 9,665 unique proteins. The gene names were compared between the two previously published proteomes and the 3,012 proteins identified in this study (≥2 unique peptides). We identified 898 proteins that were not previously identified in the adult ventricle and detected 29 fetal proteins not found in either of the other two large scale studies. The 465 proteins only identified in the adult ventricles data showed GO terms enriched in immune response (n=49) and defense response (n=38). Comparison of the 2,754 fetal atrial and 2,825 ventricular protein groups from this study showed 187 uniquely identified protein groups in the atria, 258 in the ventricles, and 2,567 in common (Figure 3.2D).

Enriched Atrial and Ventricular proteins

A detailed analysis of the atrial and ventricular protein groups expression levels by volcano plot is shown in Figure 3.3 of 3,012 unique protein groups (134 atrial enriched, 81 ventricular enriched, and 2,797 common). Protein groups were considered enriched within a specific chamber (atria or ventricles) if it had more than 2-fold enrichment in a chamber and the enrichment were statistically significant (p-value ≤0.05). This cut-off identified 134 atrial and 81 ventricular protein groups. Within these groups we observed enrichment of previously known as the atrial enriched proteins such as peptidylglycine alpha-amidating monooxygenase (PAM), natriuretic peptide A (NPPA), and myosin light chain 2, atrial isoform (MYL7) 20,21,22, while the ventricular enriched proteins included, myosin light chain 2, ventricular isoform (MYL2), myosin light chain 1, ventricular isoform (MYL3), and myosin heavy chain 7 (MYH7) 20.

Although connexin 40 (GJA5) and connexin 43 (GJA1) 23 were also detected in this study and were commonly known to be enriched in the atria and ventricles, respectively, they did not make

34 statistical significance cutoffs. A complete list of chamber-enriched protein groups can be found in Tables S2 and S3.

Chamber Enriched Processes

In order to gain insight into the biological process enriched in each chamber, we grouped the chamber enriched protein groups into Gene Ontology terms (134 atrial enriched, 81 ventricular enriched, and 2,797 common). Top atrial biological processes showed a significant enrichment

(p ≤0.05) of intracellular transport (n = 13). Enrichment of vesicular transport may be associated with greater cellular communication and signaling. Ventricular enriched proteins showed overrepresented biological processes of oxidation-reduction (n = 9) and muscle contraction (n =

8). Muscular contractile proteins generate the force required for systematic circulation. Oxidation reduction GO terms were mainly involved in mitochondrial processes and the electron transport chain to supply energy to the ventricles. Lastly, the common proteins showed overrepresented

GO terms from the atria and ventricles with general biological processes of protein localization

(n = 275), translation (n = 194), and other enriched GO terms from both chambers. The full list of biological processes can be found in Table S4.

Bioinformatic Validation via Transcriptomics

To gain a better understanding of human cardiac chambers and transcript levels we compared the chamber enriched proteins to an in-house human fetal heart microarray generated as part of this study. A total of 8,853 genes were identified by microarray profiling with a probe intensity three- fold above the global median. Gene products from the transcriptome analyses were determined to

35 be differentially expressed if the p-value was ≤0.05 between the biological triplicates. The proteome and transcriptome expression of all 2,456 gene products are shown in Figure 3.4A indicating different patterns of transcript and actual protein expression. We then examined the transcript and protein expressions of only chamber enriched proteins from the proteome level

(134 atrial plus 81 ventricular proteins) as shown in Figure 3.4B. Transcriptomic and proteomic data were separated into three different groups, namely high protein and transcript expression, high protein and low transcript expression, and sub-threshold transcript expression. We further ranked the high protein and transcript expression group by the sum of log10 LFQ intensity ratios and log2 fetal probe intensity ratios for both the atria and the ventricles, whereas the high protein and low transcript expression group and sub-threshold transcript level group were ranked by log10 LFQ intensity ratios. The complete ranked list can be found in Table S5 for the atria and

Table S6 for the ventricles. The 134 atrial enriched proteins were grouped into 46 gene products with high protein and transcript expression, 69 high protein and low transcript expression, and 19 sub-threshold at the transcript level, whereas the 81 ventricular enriched proteins were divided into 45 high protein and transcript expression, 24 high protein and low transcript expression, and

12 sub-threshold at the transcript level. The lack of correlation between transcript and protein expressions in the low and sub-threshold transcript levels group were most likely attributed to difference in protein and transcript stability, modification, and rates of transcription versus translation 24. The fetal data was further compared to the adult microarray from the Gene

Expression Omnibus database by gene symbols, with 12,253 genes having a probe intensity three-fold above the global median. In order to better visualize the data, we clustered the atrial and ventricular enriched proteins according to the fetal proteome, fetal microarray, and adult microarray. Heatmap and clustering for the 134 atrial enriched proteins with the protein and

36 transcript data are shown in Figure 3.4C and 81 ventricular enriched proteins are shown in

Figure 3.4D.

In order to determine which proteins have previously been implicated in disease we searched our chamber-enriched proteins against the Human Association Genomic Database and Mouse

Genome Informatics Database. This identified 33 of the 134 atrial enriched proteins were previously associated with a human cardiovascular phenotype and 22 with a mouse cardiovascular phenotype. Similarly, 29 out of the 81 ventricular enriched proteins had a previously associated human cardiovascular phenotype and 20 in mouse. Information about previous cardiac phenotypes can be found in Tables S5 and S6. Analysis of 100 randomly selected proteins from the chamber enriched proteins showed that on average, 47% (n=3) of the proteins had previous cardiovascular phenotypes in either human or mouse. In contrast, randomly selected proteins from the 3,012 proteins identified in this study showed that on average, only 33% of the proteins had previous cardiovascular phenotypes, with a p-value of

0.048 compared to chamber enriched proteins.

Cell-type Specific Identifications

Finally, we assessed cellular origin of the proteomic signals by examining chamber enriched proteins found in fetal ventricular proteome dataset against stained human heart sections in the online database, Human Protein Atlas. This imaging allowed us to visualize protein expression within specific cell types in the heart, namely cardiomyocytes and non-cardiomyocytes. Since this immunohistochemistry imaging database has ventricular biopsy sections, only ventricular proteins were considered. Out of the 45 ventricular proteins examined, all proteins had antibody

37 staining in heart sections and 32 proteins showed specific staining for cardiomyocytes and included immunohistochemical staining for all proteins in Figure 3.5. Similarly, we identified 1 protein (TNS1) to have non-cardiomyocyte specific expression. Five ventricular enriched proteins (ADHFE1, ATP1A3, GPD1L, PYGL, and PYGM) showed no clear heart staining pattern, and 7 proteins (CD36, LDHD, MACROD1, PDLIM1, PFKP, STRN, and SYNPO2) showed staining in both cardiomyocytes and non-cardiomyocytes.

38

DISCUSSION

System approaches by proteomics and transcriptomics generate large amount of data in a relatively short period of time providing potential insights into molecular pathways and differences in protein abundance between multiple biological samples and tissues. The accuracy of these large-scale studies has increased tremendously with advances in instrumentation and computational tools. This study is the first to report a comprehensive characterization of the human fetal atrial and ventricular proteome and transcriptome to identify chamber enriched and specific proteins. All protein samples were run on the Q-Exactive tandem mass spectrometer and showed high reproducibility between all three biological repeats (Figures 3.2A and 3.2B) giving us greater confidence in the data. The data showed high reproducibility between the biological triplicates with a slightly greater degree of reproducibility in the atria compared to the ventricles.

This might be attributed to the fact that the ventricles have greater abundance of structural proteins that might mask less abundant proteins.

The majority of the atrial and ventricular proteome data from this study were also detected in other fetal heart proteomes but the relative protein expression between the chambers examined in this study help to broaden the scope of the previously established data (Figure 3.2C).

Identification of several novel proteins in this study might be attributed to differences in origin and handling of sample, sample preparation, and mass spectrometer utilized. Adult data showed an enrichment for immune and defense responses attributed most likely due to a combination of innate body defense mechanisms, and the fact that the source of tissue in previous experiments were from patient autopsies; highlighting the difficulties of obtaining healthy human adult cardiac samples for studying chamber differences. Finally, we identified slow skeletal troponin I

39

(TNNI1) and fetal form of tropomyosin (TPM1) in the mass spectrometry data at high levels, which serve as excellent hallmarks of fetal hearts 25,26.

The data generated in this study allowed us to examine chamber enriched proteins and proteins commonly shared between the two chambers (Figure 3.3). This study reported many novel chamber enriched proteins. In the atria, the most significantly enriched novel protein by p-value is the WBP11 (WW domain binding protein 11), a protein with a role in nucleocytoplasmic shuttling and pre-mRNA splicing 25, 27, while the highest fold-difference expression in the atria is

GNAO1 (G protein alpha activating activity polypeptide O). GNAO1 plays an important role in cancer progression 28 and regulation of intracellular Ca2+ 29. In the ventricle, the protein that is the most enriched by p-value is RPL3L (ribosomal protein L3-like); with only one study examining its expression in skeletal and cardiac muscles 30. By fold-difference, the most differentially expressed novel ventricular protein is MYL5 (myosin light chain 5), a protein that binds Ca2+ ions and is involved in muscle contraction 31. For a complete list of differentially expressed proteins in the atria and ventricles please refer to Tables S2 and S3, respectively.

By cross-referencing publicly available adult heart chamber microarray data we assessed expression differences between the fetal and adult hearts and identified differences that occur in expression during development. Among the fetal atrial high protein and transcript expression group, 27 out of 46 proteins also showed high enrichment in the adult with p-value ≤0.05. Of interest, atrial natriuretic factor (NPPA) was the most differentially expressed in the adult atrial data. Bone morphogenetic protein 10 (BMP10) and PAM also showed higher expression in the adult atrial data where BMP10 has been previously shown to be involved in cardiac growth and chamber maturation 32; and PAM also showed higher expression during development but its

40 function is still unknown 22. The adult ventricular data showed greater discordance compared to the fetal data with only 12 out of 45 proteins (with high protein and transcript expressions) enriched in the adult ventricles. The two proteins with the highest expression in the adult ventricles were structural proteins, MYL2 and MYL3, essential to provide the higher energy and mechanical forces for the adult hearts compared to the fetal hearts. The similarity of chamber enriched protein expressions between the adult and fetal atria suggested smaller degree of modification during development compared to the ventricles.

Proteins with cardiac phenotypes from both chambers showed an enriched biological processes of cell adhesion. Atrial enriched proteins with cell adhesion GO terms include ATPase slow twitch 2 (ATP2A2), protein tyrosin kinase 7 (PTK7), collagen type II (COL2A1), collagen type

XIV (COL14A1), fibronectin 1 (FN1), fibulin 5 (FBLN5), neural cell adhesion molecule 1

(NCAM1), and periostin and osteoblast specific factor (POSTN); and ventricular proteins include CD36, angiotension (AGT), armadillo repeat gene deletes in velocardiofatal syndrome

(ARVCF), cadherin type 2 (CDH2), catenin alpha 3 (CTNNA3), collagen type XV (COL15A1), plakophilin 2 (PKP2), sorbin and SH3 domain containing 1 (SORBS1), and thrombospondin 4

(THBS4). These proteins and other chamber-enriched proteins provide important candidates for understanding chamber differences during cardiac diseases. Furthermore, seven of the atrial enriched gene products (MYL7, MYH6, TBX20, NPPA, FN1, TAGLN, and BMP10) with known cardiac phenotypes have been linked to congenital heart diseases including defects in cardiac looping, septation, and heart morphology 32,33,34,35,36,37,38.

Hundreds of novel chamber enriched proteins identified in this study provide new avenues for cardiac chamber and disease specific research. We compared the high protein and transcript

41 expression group against known human and mouse cardiac phenotype databases. The results showed that a significant proportion of the proteins have been previously linked to cardiac abnormal phenotypes but the phenotype in a specific chamber is not well explained especially in the atria. The majority of proteins do not have any known cardiac phenotypes associated with it and hence we provide increased knowledge for proteins with new areas of chamber specific research. Lastly, we checked expression of the 45 ventricular high protein and transcript expression proteins in human heart sections stained by immunohistochemistry for expression exclusively to the cardiomyocytes or non-cardiomyocytes. There was significantly more cardiomyocyte specific proteins compared to non-cardiomyocytes, as 75% of the myocardium is composed of cardiomyocytes, but no previous studies were reported for these proteins in terms of chamber and cell type specific expressions in the ventricles. The staining of human heart sections for the ventricular enriched proteins also confirmed their expression under physiological conditions.

42

CONCLUSIONS

Utilizing mass-spectrometry based proteomics to examine healthy human fetal atrial and ventricular tissues we identified 134 statistically enriched atrial and 81 statistically enriched ventricular protein groups. By further incorporating genomic studies this paper identified 46 atrial and 45 highly confident ventricular proteins with high protein and transcript expression.

These novel chamber enriched proteins establish a reference for understanding disease mechanisms and for identifying specific proteins that may be involved in cardiac diseases that target a particular chamber. The analysis of comprehensive proteomic data with clinical phenotype and advanced genomic information will prove invaluable in understanding cardiac development and disease. Future experiments should focus on verifying the novel proteins in vivo to gain a better insight of how these proteins are contributing to chamber specificity and for different chamber functions.

43

ACKNOWLEDGEMENTS

The authors thank Dr. Peter Backx for valuable discussions, Drs Bob Hamilton and

Vinayakumar Siragam for expert surgical assistance and valuable discussions, and Vladimir

Ignatchenko for expert assistance with the bioinformatics analyses.

SOURCES OF FUNDING

This project was funded by the Canadian Institutes of Health Research (MOP-106538; GPG-

102166), Heart and Stroke Foundation of Ontario (T-6281 and NS-6636), and the Ontario

Research Fund - Global Leadership Round in Genomics and Life Sciences (GL2–01012) to

AOG and TK. AOG holds the Canada Research Chair in Cardiovascular Proteomics and

Molecular Therapeutic. TK holds the Canada Research Chair in Proteomics in Cancer Research.

ZQL received an Ontario Graduate Scholarship in Science and Technology. AS was supported by a Department of Medical Biophysics Excellence Award and by a Kristi Piia CALLUM

Memorial Fellowship.

44

FIGURES

45

Figure 3.1. Overall experimental design and analysis flowchart.

Shown are the key experimental steps for sample preparation, protein identification, and statistical analysis of chamber enriched proteins. Total cardiac tissues were lysed, proteins were digested, and peptides were separated on the Q-Exactive mass spectrometer with three biological repeats. Protein identifications were performed via MaxQuant v. 1. 4.1.2 against the UniProt complete human proteome database with a false discovery rate of less than 1% at both peptide and protein levels. The number of spectra, matched peptides, and proteins identified for both chambers are shown above. 2,754 atrial and 2,825 ventricular protein groups, had at least 2 peptides, were subjected to further statistical enrichment via fold- enrichment by mean LFQ intensities (>2) and p-value (≤0.05) that resulted in 134 atrial and 81 ventricular protein groups.

46

Figure 3.2. Human cardiac chamber proteomics. A) The venn diagram shows the number of overlapping proteins between the atrial biological triplicates (atrium 1, 2, and 3) with at least 2 peptides. The number of protein groups in each of the groups is indicated by the color matched numbers. In total, 2,754 protein groups were identified in the atria. B) The venn diagram shows similar representation of the results and a total of 2,825 protein groups were identified in the ventricles. The atrial and ventricular biological triplicates showed high reproducibility. C) The venn diagram compares cardiovascular proteome between Wilhelm et al, Kim et al, and fetal atrial and ventricular dataset from this study. 898 proteins from this study were not present in the adult data and 465 proteins from the adult heart were absent in both fetal heart proteome. This study detected 29 proteins unidentified in the two other proteome studies. D) Comparison of fetal atrial and ventricular data showed 187 atrial unique, 258 ventricular unique, and 2,567 common protein groups with more than two peptides identified in this study.

47

Figure 3.3. Protein expression across the human atrial and ventricular samples.

Volcano plot of the total atrial and ventricular protein groups with at least 2 peptides plotted with p-value against the ratio of LFQ intensities for atria over ventricles. 134 Protein groups enriched in the atria are along the positive x-axis and 81 ventricular enriched protein groups are along the negative x-axis. Specifically, atrial enriched protein groups are shown in blue and red for ventricles with fold-enrichment in a chamber >2 and p-value ≤0.05. Previously known chamber enriched proteins are marked by *. 2,797 Protein groups with p-value >0.05 are indicated in grey.

48

49

Figure 3.4. Correlation of proteomic data with microarray and cardiac phenotype ontology.

A) Scatterplot of microarray log2 probe intensity ratio versus proteome log10 LFQ intensity ratio of atria over ventricles for all 2,456 gene products. Each dot represents a protein identified in this study with at least two peptides. B) Scatterplot of only chamber enriched proteins in the atria or ventricles at the proteome level. Blue dots indicate significance at transcript level (p-value ≤0.05) in the atria, red for the ventricles, and non-significant in grey. Highly significant enriched proteins are labelled by the respective gene symbols and previously known chamber markers are indicated by *. C) Heatmap of 134 atrial enriched proteins of the human fetal proteome, fetal microarray, and adult microarray. There are 46 high protein and transcript expression in the atria with high intensities between fetal proteome and microarray, 69 high protein but low transcript expression, and 19 sub-threshold transcript level. The human and mouse cardiac phenotypes of the three groups were also shown as either present or absent. Proteome is depicted by red to green and transcriptome by yellow to green for high to low. C) The same analysis and heatmap for the 81 ventricular enriched proteins (45 high protein and transcript expression, 24 high protein but low transcript expression, and 12 sub-threshold detection in fetal microarray).

50

51

Figure 3.5. Ventricular enriched protein expression in Human Protein Atlas.

Immunohistochemical (IH) staining of ventricular proteins with high protein and gene expressions in both cardiomyocyte and non-cardiomyocyte expression groups with the proteins labelled. In total, 32 proteins exhibited cardiomyocyte and 1 with non-cardiomyocyte specific pattern in the heart. Human heart sections were obtained from adult ventricular biopsies in online database Human Protein Atlas. IH staining are ordered based on fold- changes at the proteome and transcript levels.

52

Figure 3.6. Abstract Graphic

Abstract graphic of the overall experimental procedure and major findings of the study.

53

SUPPLEMENTAL TABLE LEGENDS

Table S1. List of total protein groups identified by MaxQuant.

This table contains the complete and unfiltered list of 3,731 protein groups identified via

MaxQuant (Andromeda search engine), including total number of peptides, sequence coverage,

LFQ intensities, and MS/MS counts for all 6 runs.

Table S2. List of atrial enriched protein groups by significant fold-change and p-value.

This table contains 134 atrial enriched protein groups by fold-enrichment in a chamber >2 and p- values ≤0.05.

Table S3. List of ventricular enriched protein groups by significant fold-change and p- value.

This table contains 81 ventricular enriched protein groups by fold-enrichment in a chamber >2 and p-values ≤0.05.

Table S4. Biological processes of atrial, ventricular, and commonly expressed proteins.

This table contains the complete list of biological processes (Gene Ontology terms) with p- values ≤0.05 for A) 134 atrial enriched proteins, B) 81 ventricular enriched proteins, and C)

2,797 commonly expressed proteins in both atria and ventricles.

Table S5. Ranking of atrial enriched proteins according to protein and transcript expression.

54

This table contains the atrial enriched proteins by proteome and divided into three groups based on protein and transcript expression including A) 46 Atrial enriched proteins with high protein and transcript expression (ranked by sum of log10 LFQ ratios and log2 fetal probe intensities).

B) 69 Atrial enriched proteins with high protein but low transcript expression (ranked by log10

LFQ). C) 19 Atrial enriched proteins with sub-threshold transcript level in fetal microarray

(ranked by log10 LFQ ratios).

Table S6. Ranking of ventricular enriched proteins according to protein and transcript expression.

This table contains the atrial enriched proteins by proteome and divided into three groups based on protein and transcript expression including A) 45 Ventricular enriched proteins with high protein and transcript expression (ranked by sum of log10 LFQ ratios and log2 fetal probe intensities). B) 24 Ventricular enriched proteins with high protein but low transcript expression

(ranked by log10 LFQ). C) 12 Ventricular enriched proteins with sub-threshold transcript in fetal microarray (ranked by log10 LFQ ratios).

Supporting Information for Analysis and Documentation of Peptide and Protein

Identifications

Documentation of mass spectrometer parameters including scanning resolution, ranges, charge states, mass tolerance, cleavage, and modifications. A table with complete MaxQuant search parameters and configurations is also included.

55

REFERENCES FOR PAPER

1. Gersh, B. J.; Sliwa, K.; Mayosi, B. M.; Yusuf, S., Novel therapeutic concepts: the epidemic of cardiovascular disease in the developing world: global implications. European heart journal 2010, 31 (6), 642-8.

2. Barth, A. S.; Merk, S.; Arnoldi, E.; Zwermann, L.; Kloos, P.; Gebauer, M.; Steinmeyer, K.; Bleich, M.; Kaab, S.; Pfeufer, A.; Uberfuhr, P.; Dugas, M.; Steinbeck, G.; Nabauer, M., Functional profiling of human atrial and ventricular gene expression. Pflugers Archiv : European journal of physiology 2005, 450 (4), 201-8.

3. Comunian, C.; Rusconi, F.; De Palma, A.; Brunetti, P.; Catalucci, D.; Mauri, P. L., A comparative MudPIT analysis identifies different expression profiles in heart compartments. Proteomics 2011, 11 (11), 2320-8.

4. Tabibiazar, R.; Wagner, R. A.; Liao, A.; Quertermous, T., Transcriptional profiling of the heart reveals chamber-specific gene expression patterns. Circ Res 2003, 93 (12), 1193-201.

5. Asp, J.; Synnergren, J.; Jonsson, M.; Dellgren, G.; Jeppsson, A., Comparison of human cardiac gene expression profiles in paired samples of right atrium and left ventricle collected in vivo. Physiological genomics 2012, 44 (1), 89-98.

6. Hammer, E.; Goritzka, M.; Ameling, S.; Darm, K.; Steil, L.; Klingel, K.; Trimpert, C.; Herda, L. R.; Dorr, M.; Kroemer, H. K.; Kandolf, R.; Staudt, A.; Felix, S. B.; Volker, U., Characterization of the human myocardial proteome in inflammatory dilated cardiomyopathy by label-free quantitative shotgun proteomics of heart biopsies. Journal of proteome research 2011, 10 (5), 2161-71.

7. Kline, K. G.; Frewen, B.; Bristow, M. R.; Maccoss, M. J.; Wu, C. C., High quality catalog of proteotypic peptides from human heart. J Proteome Res 2008, 7 (11), 5055-61.

8. Kim, M. S.; Pinto, S. M.; Getnet, D.; Nirujogi, R. S.; Manda, S. S.; Chaerkady, R.; Madugundu, A. K.; Kelkar, D. S.; Isserlin, R.; Jain, S.; Thomas, J. K.; Muthusamy, B.; Leal- Rojas, P.; Kumar, P.; Sahasrabuddhe, N. A.; Balakrishnan, L.; Advani, J.; George, B.; Renuse, S.; Selvan, L. D.; Patil, A. H.; Nanjappa, V.; Radhakrishnan, A.; Prasad, S.; Subbannayya, T.;

56

Raju, R.; Kumar, M.; Sreenivasamurthy, S. K.; Marimuthu, A.; Sathe, G. J.; Chavan, S.; Datta, K. K.; Subbannayya, Y.; Sahu, A.; Yelamanchi, S. D.; Jayaram, S.; Rajagopalan, P.; Sharma, J.; Murthy, K. R.; Syed, N.; Goel, R.; Khan, A. A.; Ahmad, S.; Dey, G.; Mudgal, K.; Chatterjee, A.; Huang, T. C.; Zhong, J.; Wu, X.; Shaw, P. G.; Freed, D.; Zahari, M. S.; Mukherjee, K. K.; Shankar, S.; Mahadevan, A.; Lam, H.; Mitchell, C. J.; Shankar, S. K.; Satishchandra, P.; Schroeder, J. T.; Sirdeshmukh, R.; Maitra, A.; Leach, S. D.; Drake, C. G.; Halushka, M. K.; Prasad, T. S.; Hruban, R. H.; Kerr, C. L.; Bader, G. D.; Iacobuzio-Donahue, C. A.; Gowda, H.; Pandey, A., A draft map of the human proteome. Nature 2014, 509 (7502), 575-81.

9. Sinha, A.; Ignatchenko, V.; Ignatchenko, A.; Mejia-Guerrero, S.; Kislinger, T., In-depth proteomic analyses of ovarian cancer cell line exosomes reveals differential enrichment of functional categories compared to the NCI 60 proteome. Biochemical and biophysical research communications 2014, 445 (4), 694-701.

10. Deshusses, J. M.; Burgess, J. A.; Scherl, A.; Wenger, Y.; Walter, N.; Converset, V.; Paesano, S.; Corthals, G. L.; Hochstrasser, D. F.; Sanchez, J. C., Exploitation of specific properties of trifluoroethanol for extraction and separation of membrane proteins. Proteomics 2003, 3 (8), 1418-24.

11. Cox, J.; Mann, M., MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature biotechnology 2008, 26 (12), 1367-72.

12. Huang da, W.; Sherman, B. T.; Lempicki, R. A., Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009, 4 (1), 44-57.

13. Wilhelm, M.; Schlegl, J.; Hahne, H.; Moghaddas Gholami, A.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H.; Mathieson, T.; Lemeer, S.; Schnatbaum, K.; Reimer, U.; Wenschuh, H.; Mollenhauer, M.; Slotta-Huspenina, J.; Boese, J. H.; Bantscheff, M.; Gerstmair, A.; Faerber, F.; Kuster, B., Mass-spectrometry-based draft of the human proteome. Nature 2014, 509 (7502), 582-7.

14. Gramolini, A. O.; Burton, E. A.; Tinsley, J. M.; Ferns, M. J.; Cartaud, A.; Cartaud, J.; Davies, K. E.; Lunde, J. A.; Jasmin, B. J., Muscle and neural isoforms of agrin increase utrophin

57 expression in cultured myotubes via a transcriptional regulatory mechanism. The Journal of biological chemistry 1998, 273 (2), 736-43.

15. Roth, R. B.; Hevezi, P.; Lee, J.; Willhite, D.; Lechner, S. M.; Foster, A. C.; Zlotnik, A., Gene expression analyses reveal molecular relationships among 20 regions of the human CNS. Neurogenetics 2006, 7 (2), 67-80.

16. Becker, K. G.; Barnes, K. C.; Bright, T. J.; Wang, S. A., The genetic association database. Nat Genet 2004, 36 (5), 431-2.

17. Blake, J. A.; Bult, C. J.; Eppig, J. T.; Kadin, J. A.; Richardson, J. E.; Mouse Genome Database, G., The Mouse Genome Database: integration of and access to knowledge about the . Nucleic Acids Res 2014, 42 (Database issue), D810-7.

18. Ponten, F.; Schwenk, J. M.; Asplund, A.; Edqvist, P. H., The Human Protein Atlas as a proteomic resource for biomarker discovery. J Intern Med 2011, 270 (5), 428-46.

19. England, J.; Loughna, S., Heavy and light roles: myosin in the morphogenesis of the heart. Cellular and molecular life sciences : CMLS 2013, 70 (7), 1221-39.

20. Maass, A. H.; De Jong, A. M.; Smit, M. D.; Gouweleeuw, L.; de Boer, R. A.; Van Gilst, W. H.; Van Gelder, I. C., Cardiac gene expression profiling - the quest for an atrium-specific biomarker. Neth Heart J 2010, 18 (12), 610-4.

21. Ouafik, L.; May, V.; Keutmann, H. T.; Eipper, B. A., Developmental regulation of peptidylglycine alpha-amidating monooxygenase (PAM) in rat heart atrium and ventricle. Tissue-specific changes in distribution of PAM activity, mRNA levels, and protein forms. J Biol Chem 1989, 264 (10), 5839-45.

22. Verheule, S.; van Kempen, M. J.; te Welscher, P. H.; Kwak, B. R.; Jongsma, H. J., Characterization of gap junction channels in adult rabbit atrial and ventricular myocardium. Circulation research 1997, 80 (5), 673-81.

23. Vogel, C.; Marcotte, E. M., Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nature reviews. Genetics 2012, 13 (4), 227-32.

58

24. Llorian, M.; Beullens, M.; Lesage, B.; Nicolaescu, E.; Beke, L.; Landuyt, W.; Ortiz, J. M.; Bollen, M., Nucleocytoplasmic shuttling of the splicing factor SIPP1. J Biol Chem 2005, 280 (46), 38862-9.

25. Liu, Z.; Zhang, J.; Wu, L.; Liu, J.; Zhang, M., Overexpression of GNAO1 correlates with poor prognosis in patients with gastric cancer and plays a role in gastric cancer cell proliferation and apoptosis. Int J Mol Med 2014, 33 (3), 589-96.

26. Hasdemir, C.; Aydin, H. H.; Celik, H. A.; Simsek, E.; Payzin, S.; Kayikcioglu, M.; Aydin, M.; Kultursay, H.; Can, L. H., Transcriptional profiling of septal wall of the right ventricular outflow tract in patients with idiopathic ventricular arrhythmias. Pacing Clin Electrophysiol 2010, 33 (2), 159-67.

27. Van Raay, T. J.; Connors, T. D.; Klinger, K. W.; Landes, G. M.; Burn, T. C., A novel ribosomal protein L3-like gene (RPL3L) maps to the autosomal dominant polycystic kidney disease gene region. Genomics 1996, 37 (2), 172-6.

28. Qin, H.; Morris, B. J.; Hoh, J. F., Isolation and structure of cat superfast myosin light chain-2 cDNA and evidence for the identity of its human homologue. Biochem Biophys Res Commun 1994, 200 (3), 1277-82.

29. Calderone, A.; Takahashi, N.; Izzo, N. J., Jr.; Thaik, C. M.; Colucci, W. S., Pressure- and volume-induced left ventricular hypertrophies are associated with distinct myocyte phenotypes and differential induction of peptide growth factor mRNAs. Circulation 1995, 92 (9), 2385-90.

30. Chen, H.; Shi, S.; Acosta, L.; Li, W.; Lu, J.; Bao, S.; Chen, Z.; Yang, Z.; Schneider, M. D.; Chien, K. R.; Conway, S. J.; Yoder, M. C.; Haneline, L. S.; Franco, D.; Shou, W., BMP10 is essential for maintaining cardiac growth during murine cardiogenesis. Development 2004, 131 (9), 2219-31.

59

Supporting Information for Analysis and Documentation of Peptide and Protein

Identifications

All of the methods, programs, and detailed version numbers for mass spectrometry and database searches can be found in Sinha et al. (2014). All samples were ran on the Q-Exactive tandem mass spectrometer and data were acquired in a top 10 data dependent acquisition mode. MS1 spectra were in the scan range of 400-1600 m/z and resolution of 70,000 whereas the MS/MS spectra were scanned at a resolution of 17,500. The charge states were +2, +3, +4, and +5. The database search and protein identification were carried out by MaxQuant version 1.4.1.2

(including Andromeda search engine) and searched against the human UniProt complete proteome database version: 2012-07-19 with 20,232 entries. The main settings on MaxQuant included fragment ion mass tolerance of 20 ppm, maximum cleavage of 2, fixed modification as carbamidomethylation of cysteine, and variable modification as oxidation of methionine. Mass window for precursor ion selection was set to 3.0 m/z. the relatively collision energy for tandem-

MS was 30 NCE, and dynamic exclusion time was 20 seconds. False discovery rates were set at

1% at both peptide and protein levels in target/decoy to minimize false positives. Human fetal samples were ran in biological triplicates with high data reproducibility as shown in Figure 3.2A and B. Proteins with at least two peptides were used and all statistical analysis were done in R with fold-change >2 and p-value ≤0.05.

60

Complete list of MaxQuant search parameters Value and configurations:Parameters Version 1.4.1.2 Fixed modifications Carbamidomethyl (C) Decoy mode revert Special AAs KR Include contaminants FALSE MS/MS tol. (FTMS) 20 ppm Top MS/MS peaks per 100 Da. (FTMS) 12 MS/MS deisotoping (FTMS) TRUE MS/MS tol. (ITMS) 0.5 Da Top MS/MS peaks per 100 Da. (ITMS) 8 MS/MS deisotoping (ITMS) FALSE MS/MS tol. (TOF) 0.1 Da Top MS/MS peaks per 100 Da. (TOF) 10 MS/MS deisotoping (TOF) FALSE MS/MS tol. (Unknown) 0.5 Da Top MS/MS peaks per 100 Da. (Unknown) 10 MS/MS deisotoping (Unknown) FALSE PSM FDR 0.01 Protein FDR 0.01 Site FDR 0.01 Use Normalized Ratios For Occupancy TRUE Min. peptide Length 7 Min. score for unmodified peptides 0 Min. score for modified peptides 40 Min. delta score for unmodified peptides 0 Min. delta score for modified peptides 17 Min. unique peptides 0 Min. razor peptides 1 Min. peptides 1 Use only unmodified peptides and FALSE Peptides used for protein quantification Razor Discard unmodified counterpart peptides TRUE Min. ratio count 2 Site quantification Use least modified peptide Re-quantify TRUE Use delta score FALSE iBAQ TRUE iBAQ log fit TRUE MS/MS recalibration FALSE Match between runs TRUE

61

Matching time window [min] 2 Alignment time window [min] 20 Find dependent peptides TRUE Dependent peptide FDR 0.01 Mass bin size 0.0055 Labeled amino acid filtering TRUE Site tables Oxidation (M)Sites.txt Cut peaks TRUE Decoy mode revert Special AAs KR Include contaminants FALSE RT shift FALSE Advanced ratios FALSE AIF correlation 0.47 First pass AIF correlation 0.8 AIF topx 20 AIF min mass 0 AIF SIL weight 4 AIF ISO weight 2 AIF iterative TRUE AIF threshold FDR 0.01

62

Gene names Fetal proteome Fetal microarray Adult microarray fold-change (log 10) fold-change (log 2) fold-change (log 2) BMP10 7.699794 2.678783 3.540755 CPNE5/8 7.550831 2.821192 0.976432 PLCB1 7.120101 1.918867 1.243745 DKK3 7.098186 1.66448 1.969692 SPATS2L 7.134508 0.972732 1.063522 P4HA2 7.263750445 0.457240333 0.764449639 SPARC 7.264133 0.421932 0.840439 OSBPL10/11 6.687038208 0.329117167 MYH6 2.33809 4.237302 1.710231 PAM 2.279652 1.870437 1.483964 EPHA2 0.914032 3.126312 1.304009 CACNA2D2 1.246547 2.124812 1.853011 NPPA 1.608001 1.486596 4.500164 FAM21A/21B/21C 0.815676433 1.994628167 FKBP10 1.09817063 1.110728333 GPC6 0.69723 1.255753 1.031549 RABGAP1L 0.896695 1.050876 1.004214 COL14A1 0.836099 1.109996 1.828191 FBLN2 0.631384 1.289313 1.562487 TAGLN 0.567002 1.179509 1.685268 UCHL1 0.70085 1.012015 0.820362 PGM5 0.337495 1.238428 1.278904 POSTN 0.420772 1.153493 1.915203 OGN 0.501043 0.997615 1.932727 GMPPA 0.748995102 0.6642315 PTGIS 0.385378 0.970133 0.754277 EPN2 0.754472 0.579632 0.171272 MYL7 0.824298 0.484424 2.585361 PRKAG2 0.468957 0.835339 1.427204 ACSF2 0.493281426 0.798508667 F13A1 0.620293 0.596402 0.466355 PRDXA4 0.545698 0.654767 1.492785 FBLN5 0.440364 0.655972 1.390101 FN1 0.320432 0.730461 1.338892 GSS 0.633431 0.399041 0.183934 BGN 0.501871 0.464198 0.739479 TBX20 0.647184498 0.307696 PRELP 0.486425081 0.456492167 0.881236554 PTGFRN 0.446068284 0.487812333 1.11810782 ATP2A2 0.499219 0.390443 0.677924 PPIB 0.362708 0.4958 0.962956 DCN 0.377227 0.417821 0.635434 PBXIP1 0.553367291 0.2380575 0.447444685 PRCP 0.36541 0.400726 0.951864 EIF3I 0.524148535 0.238977833 0.290231248 CTSB 0.353444739 0.3325135 0.657440926

Table 1. Protein names and gene symbols of 46 atrial enriched gene products with high protein and transcript expression.

63

Gene names Fetal proteome Fetal microarray Adult microarray fold-change (log 10) fold-change (log 2) fold-change (log 2) RPL3L 7.64947441 3.436784 1.157783 PHLDB2 6.8249693 1.847035 1.073296 SLC16A1 7.67217177 0.897051 0.171548 CPPED1 7.16606484 0.638741 CCNY 6.82806023 0.142376 0.008205 MYL2 3.32762629 3.429834 3.831358 MYL3 1.48625332 3.788749 3.388209 MYH7 0.5065159 2.733668 0.700544 NAV1 0.33939783 2.892233 0.879027 CPVL 0.56785216 2.311099 0.850305 MAOB 0.94027961 1.838076 -0.124983 FHL2 0.93814662 1.691642 1.933831 THBS4 0.84214958 1.783598 1.68285 ATP1A3 0.90092137 1.524031 -0.10301 INPP4B 1.07648041 1.2683 PYGM 0.70263953 1.524712 0.233671 FAM129A 0.70289953 1.356068 0.121218 CD36 0.49042748 1.372737 0.726756 SLC27A6 0.54466395 1.308156 -0.713676 TMEM65 0.39918803 1.408358 0.708472 SYNPO2 0.85223467 0.952244 0.141791 PYGL 0.49088715 1.259265 XIRP1 0.33417641 1.398237 -0.204513 ATP1B1 0.42953081 1.255529 0.114646 SMYD1 0.40661236 1.237785 0.19134 TNS1 0.67887582 0.927591 0.208457 LDHD 0.54571687 1.045768 HSDL2 0.45660705 1.031777 0.194235 GPD1L 0.40865129 0.994789 -0.558361 PKP2 0.31470769 1.075643 0.086584 CKMT2 0.58453776 0.785009 0.082316 NDRG4 0.7089764 0.60006 1.101878 SORBS1 0.59981668 0.708092 0.134936 CTNNA3 0.47092506 0.769505 0.462437 ACAT1 0.3194561 0.877626 0.367404 SORBS2 0.37804539 0.795319 0.681785 MYOM1 0.34162214 0.816913 0.453794 PFKP 0.35940101 0.775144 0.301627 ACADS 0.45171338 0.637796 0.414042 STRN 0.3094541 0.754584 0.13496 ADHFE1 0.40205243 0.605729 0.32597 PDLIM1 0.34020489 0.655458 -0.031674 PFKM 0.3250289 0.524608 0.376294 BCAT2 0.36309516 0.40927633 0.380052444 MACROD1 0.33342752 0.330904 0.819612

Table 2. Protein names and gene symbols of 45 ventricular enriched gene products with high protein and transcript expression.

64

Gene names Human cardiac Mouse cardiac phenotypes phenotypes MYH6 yes yes NPPA yes yes TAGLN yes yes PTGIS yes yes FBLN2 yes yes FN1 yes yes TBX20 yes yes ATP2A2 yes yes P4HA2 yes SPARC yes OSBPL10/11 yes PAM yes EPHA4 yes GPC6 yes COL14A1 yes PRKAG2 yes F13A1 yes GSS yes PRELP yes DCN yes BMP10 yes CACNA2D2 yes POSTN yes EPN2 yes MYL7 yes

Table 3. Atrial enriched gene products with human and mouse cardiac phenotypes.

65

Gene names Human cardiac Mouse cardiac phenotypes phenotypes MYL2 yes yes FHL2 yes yes THBS4 yes yes CD36 yes yes ATP1B1 yes yes PKP2 yes yes CTNNA3 yes yes MYL3 yes MYH7 yes NAV1 yes CPVL yes MAOB yes INPP4B yes SYNPO2 yes NDRG4 yes SORBS1 yes SORBS2 yes STRN yes XIRP1 yes SMYD1 yes CKMT2 yes PFKM yes

Table 4. Ventricular enriched gene products with human and mouse cardiac phenotypes.

66

Chapter 4 : Discussion 1 Dataset Integration and Congenital Heart Defects

This study is the first proteomic study of human fetal atrial and ventricular tissue utilizing mass spectrometry with unbiased bioinformatic strategies. The human fetal data provides the reference for understanding cardiac disease mechanism and possible origin of congenital heart defects.

The fetal data was compared to two other recently published comprehensive human proteome databases [53, 54]. The adult ventricular proteome from Wilhelm et al. in comparison to the fetal data showed that the biological processes for the proteins only identified in adults were enriched for immune response and defense response. The reason for these biological processes is mainly due to source of the samples where adult samples were from heart disease patient biopsies accentuating the difficulty in obtaining healthy heart proteome in adults compared to fetuses and due to innate defense responses. In addition, this study supplements the whole fetal heart proteome recently published by Kim et al. by characterizing and specifying chamber differences.

However, public databases provide another method of validation for the fetal heart proteins identified in this study. Another strength of this study is that the atrial and ventricular samples were obtained from the same patient allowing pairwise statistics to be performed and also observed high atrial data reproducibility as opposed to other studies with high variability [55].

Congenital heart disease (CHD) occurs as a result of abnormal heart development and is the most common birth defect. CHD can lead to absent or delayed heart structure formation, abnormal morphology, or failure to complete one or many cardiac developmental events. Among the

67 proteins that are associated with previous cardiovascular phenotypes in humans and mice reported in Chapter 3, five atrial enriched proteins showed abnormal cardiac development in mouse database (BMP10, TAGLN, MYL7, FN1, and TBX20) and two in humans (MYH6 and

NPPA). Mutations in BMP10 has been previously associated with absent atrioventricular cushion

[56], TAGLN with delayed heart development [57], MYL7 and FN1 with abnormal atrioventricular cushion and abnormal heart looping [19, 58], and TBX20 with abnormal heart looping, development, and morphology [59]. Genetic variants of MYH6 resulted in atrial septal defects and incomplete myofibril formation [60]. NPPA is expressed during heart looping and contributes to chamber-specific differentiation [61]. However, only MYL2 was found to have an abnormal heart development among ventricular enriched proteins in mice and none reported in the human database, where ventricular proteins are more associated with hypertrophic and dilated cardiomyopathies. The greater number of atrial enriched proteins linked to CHD than the ventricular proteins suggest that proper protein expression in the atria is crucial for normal cardiac development and morphology in the sample and particular time point of the tissue.

However, the integrity of the ventricles is also important as ventricular defects often result in embryonic lethality.

BMP10 was the most differentially expressed protein by proteome fold-change enriched in the atria but previous studies indicated that BMP10 is involved in heart chamber development in the ventricles and not much is known in the atria. Specifically, BMP10 plays a role in ventricular trabeculation and maintains cardiomyocyte population within the trabeculae [62]. Protein interaction computed by String v.9.1 and Gene Ontology of the atrial enriched proteins with

CHD (BMP10, TAGLN, MYL7, FN1, TBX20, MYH6, and NPPA) were all surprisingly interconnected where MYH6 and TBX20 are also previously known to be involved in atrium

68 morphology (Appendix C). Cross-referencing this study to other disease databases yields novel atrial enriched proteins involved in CHD and contribute to atrial morphogenesis. Other chamber enriched proteins that did not show CHD or other cardiac phenotypes are also interesting targets where mutations in early development may be embryonic lethal or have not been studied previously in the hearts.

2 Mass Spectrometry Comparisons

Prior to running the data on the Q-Exactive mass spectrometer, we analyzed the human fetal atrial and ventricular data on the LTQ ion trap mass spectrometer. Heart tissue was subjected to silica-bead extraction as described in previous publications [63]. Silica-beads extraction method enriches for membrane proteins that are otherwise underrepresented in proteome data due to reduced solubility in aqueous solution and low abundance in comparison to structural and soluble proteins in the cell [64]. Running membrane and homogenate fractions separately also increase the number of proteins identified without low abundant membrane proteins been masked by other higher abundance structural proteins. The number of proteins identified in the atria and ventricles with a 9-step MudPIT are shown in Supplemental Figure 2. The 9-step

MudPIT uses different salt concentrations of ammonium acetate on a 2-D (Luna SCX and Magic

C-18) liquid chromatography system. After running the samples separately and with each sample ran three times, the membrane and cytosolic fraction in total generated 1,744 (or 81% of the total) proteins in at least 2 out of the three atrial runs and 2,044 (or 91% of the total) proteins in the ventricular triplicates.

69

In the Q-Exactive data, as mentioned previously, 91% of the atrial proteins were identified in at least 2 out of the three runs and 83% for ventricular proteins. Data reproducibility from both mass spectrometry platforms can be improved with running multiple technical replicates for each biological sample and then compare the pooled result. In comparison to the Q-Exactive data, there were smaller number of proteins identified and large variation between the runs generated by the LTQ ion trap mass spectrometer, specific numbers are shown in Supplemental Figure 3.

The smaller number of proteins identified by the LTQ ion trap mass spectrometer may be attributed to differences in sample preparation. In particular, the silica-beads have a difficult time binding to tissue in-solution and proteins may be lost as a result of numerous washes and centrifugation spins compared to the tissue lysate that was run on the Q-Exactive.

One method of quantifying protein is by spectra counting where the number of peptides detected for one protein provide an estimate of the protein abundance. The LTQ ion trap data was searched via a combination of data search algorithms (X! Tandem, Myrimatch, Comet, OMSSA, and SEQUEST) and quantified by spectra counts. Variation in the amount of sample loaded will influence the number of spectra counts. However, MaxQuant reports the LFQ intensities with integrated normalization at both protein and peptide levels correcting for systematic errors.

3 Transcriptomics

As mentioned previously, gene expression does not always correlate with protein expression due to various reasons stated in Chapter 1 Section 4.1. Hence, this study overall took a direct proteomic approach comparing against other proteome datasets. Another approach is to compare

70 the microarray dataset generated in this project to the human fetal heart proteome by Kim et al

(Appendix D). The comparison of chamber enriched transcript data by p-value ≤0.05 to the whole heart proteome is shown by the Venn diagram in Supplemental Figure 4. Gene products in common between transcriptome and proteome provide additional candidates for understanding chamber differences. 399 Atrial gene products in common showed an enrichment of cellular adhesion (n = 38) and intracellular signaling (n = 37) whereas the 376 ventricular gene products in common were enriched in oxidation reduction (n = 44) and generation of metabolites and energy (n = 38). Biological processes of the atrial and ventricular genes are also similar to the biological processes identified via proteomic comparisons, but the transcriptomic approach yielded more number of candidate genes as not all genes are translated into proteins, which nevertheless also play important roles inside the cells. In order to obtain a complete biological profile of the cells, transcriptomic studies need to be complemented with the proteome due to differences between gene and protein expressions.

71

Chapter 5 : Limitations

Although the list of enriched proteins from this study provide important information regarding human fetal chambers, the sample was limited to healthy hearts from 20 to 22 weeks in utero. As shown previously in Figure 1.1, the fetal heart and four chambers are already distinct and developed after day 50. However, the sample in this study was from only one time point and changes that occur during the first month, before and after birth, and adult heart under normal physiological conditions are missing. It is difficult to obtain healthy heart tissues from these other time points therefore limiting our ability to observe protein changes throughout development under the same platform. Although we placed great consideration in data selection via objective filtering and multiple database integration, biochemical validation can provide additional insight into the protein expression under physiological conditions.

Another limitation of this study is that we only had access to healthy tissues for proteomic analysis and did not consider diseased models in both fetuses and adults. Comparison of diseased to healthy hearts can yield novel disease biomarkers and pathways. In addition, it is difficult and rare to obtain diseased heart tissue from both fetal chambers for proteomic studies, which present difficulty in examining early or congenital cardiac diseases among human fetuses. We also relied heavily on bioinformatic comparison and publicly available databases rather than our own biochemical validations due to limited amount of biological samples and antibodies available.

72

Chapter 6 : Future Directions

In order to address the limitations of this study, future experiments should focus on examining proteome profiling of different time points to characterize protein expression from fetal stages to adult. The additional time points worth examining during the fetal stage include day 28, day 50, and toward the end of pregnancy at 9 months where heart is the most developed in the fetus. Day

28 is when heart tube lopping completes with four discern chambers but orientation are not finalized and by day 50, all four chambers are in the correct orientations. Other interesting time points to consider include comparing heart proteomics right after birth in order to observe changes that occur after maternal circulation is closed off.

Fetal and adult hearts have different proteins expressed and the expression levels also differ throughout development. Although we incorporated adult microarray data to provide insight into developmental changes based on transcript data, proteome of healthy adult atria and ventricles will provide direct insight into the protein expression during development. There is also great interest for a detailed proteome profiling of healthy human adult atria where previous studies have focused heavily on ventricles. Healthy adult atria can provide reference for understanding atrial fibrillation that occurs more frequently among adults. Generation of cardiac disease proteomics under the same procedure and platform such as atrial fibrillation, dilated and hypertrophic cardiomyopathies from human patients are essential in order to better understand specific protein expression changes during diseases.

73

Future experiments should also focus on conducting biochemical and physiological studies of chamber enriched proteins in both fetal and adult heart tissues and examine how it contributes to chamber specificity. The top three novel enriched atrial proteins at both fetal proteome and transcriptome levels for validation include BMP10, CPNE5/8, and PLCB1; where RPL3L,

PHLDB2, and SLC16A1 were the highest among the ventricles, but the remaining list of candidate proteins are also interesting and useful for future studies.

In order to study the role of candidate proteins, knockout studies can be performed in mice hearts early on during cardiac development to observe cardiac morphology changes. Proteins pointed out in Chapter 4 that are known to cause congenital heart defects are interesting candidates for future studies. With a better understanding of the proteins involved in CHD, reintroducing or rescuing the mutated proteins may provide potential lifesaving treatment.

Overall, the dataset generated in this experiment will be useful for understanding cardiovascular physiology and chamber differences in healthy human hearts.

74

Chapter 7: References

1. Barnett, P., M. van den Boogaard, and V. Christoffels, Localized and temporal gene regulation in heart development. Curr Top Dev Biol, 2012. 100: p. 171-201.

2. Moorman, A., et al., Development of the heart: (1) formation of the cardiac chambers and arterial trunks. Heart, 2003. 89(7): p. 806-14.

3. Srivastava, D. and E.N. Olson, A genetic blueprint for cardiac development. Nature, 2000. 407(6801): p. 221-6.

4. Evans, S.M., et al., Myocardial lineage development. Circ Res, 2010. 107(12): p. 1428-44.

5. Kelly, R.G., The second heart field. Curr Top Dev Biol, 2012. 100: p. 33-65.

6. Spater, D., et al., A HCN4+ cardiomyogenic progenitor derived from the first heart field and human pluripotent stem cells. Nat Cell Biol, 2013. 15(9): p. 1098-106.

7. Anderson, R.H., et al., Development of the heart: (2) Septation of the atriums and ventricles. Heart, 2003. 89(8): p. 949-58.

8. Rudolph, A.M., The changes in the circulation after birth. Their importance in congenital heart disease. Circulation, 1970. 41(2): p. 343-59.

9. Souders, C.A., S.L. Bowers, and T.A. Baudino, Cardiac fibroblast: the renaissance cell. Circ Res, 2009. 105(12): p. 1164-76.

10. Buckingham, M., S. Meilhac, and S. Zaffran, Building the mammalian heart from two sources of myocardial cells. Nat Rev Genet, 2005. 6(11): p. 826-35.

11. Asp, J., et al., Comparison of human cardiac gene expression profiles in paired samples of right atrium and left ventricle collected in vivo. Physiol Genomics, 2012. 44(1): p. 89- 98.

12. Moretti, A., et al., Patient-specific induced pluripotent stem-cell models for long-QT syndrome. N Engl J Med, 2010. 363(15): p. 1397-409.

13. Pelouch, V., et al., Differences between atrial and ventricular protein profiling in children with congenital heart disease. Mol Cell Biochem, 1995. 147(1-2): p. 43-9.

14. Development of endocardial, myocardial, epicardial layers and derivatives, in Steding’s and Virágh’s Scanning Electron Microscopy Atlas of the Developing Human Heart. 2007, Springer New York. p. 175-199.

15. Tian, Y. and E.E. Morrisey, Importance of myocyte-nonmyocyte interactions in cardiac development and disease. Circ Res, 2012. 110(7): p. 1023-34.

75

16. Anderson, R.H. and A.C. Cook, The structure and components of the atrial chambers. Europace, 2007. 9 Suppl 6: p. vi3-9.

17. Friedberg, M.K. and A.N. Redington, Right versus left ventricular failure: differences, similarities, and interactions. Circulation, 2014. 129(9): p. 1033-44.

18. England, J. and S. Loughna, Heavy and light roles: myosin in the morphogenesis of the heart. Cell Mol Life Sci, 2013. 70(7): p. 1221-39.

19. Huang, C., et al., Embryonic atrial function is essential for mouse embryogenesis, cardiac morphogenesis and angiogenesis. Development, 2003. 130(24): p. 6111-9.

20. Reiser, P.J., et al., Human cardiac myosin heavy chain isoforms in fetal and failing adult atria and ventricles. Am J Physiol Heart Circ Physiol, 2001. 280(4): p. H1814-20.

21. Krenz, M. and J. Robbins, Impact of beta-myosin heavy chain expression on cardiac function during stress. J Am Coll Cardiol, 2004. 44(12): p. 2390-7.

22. Bootman, M.D., et al., Atrial cardiomyocyte calcium signalling. Biochim Biophys Acta, 2011. 1813(5): p. 922-34.

23. Zhang, Q., et al., Direct differentiation of atrial and ventricular myocytes from human embryonic stem cells by alternating retinoid signals. Cell Res, 2011. 21(4): p. 579-87.

24. Maltsev, V.A., et al., Embryonic stem cells differentiate in vitro into cardiomyocytes representing sinusnodal, atrial and ventricular cell types. Mech Dev, 1993. 44(1): p. 41- 50.

25. Go, A.S., et al., Heart disease and stroke statistics--2014 update: a report from the American Heart Association. Circulation, 2014. 129(3): p. e28-e292.

26. Iwasaki, Y.K., et al., Atrial fibrillation pathophysiology: implications for management. Circulation, 2011. 124(20): p. 2264-74.

27. Aldhoon, B., et al., Associations between cardiac fibrosis and permanent atrial fibrillation in advanced heart failure. Physiol Res, 2013. 62(3): p. 247-55.

28. Viskin, S. and B. Belhassen, Idiopathic ventricular fibrillation. Am Heart J, 1990. 120(3): p. 661-71.

29. Barth, A.S., et al., Reprogramming of the human atrial transcriptome in permanent atrial fibrillation: expression of a ventricular-like genomic signature. Circ Res, 2005. 96(9): p. 1022-9.

30. Parvari, R. and A. Levitas, The mutations associated with dilated cardiomyopathy. Biochem Res Int, 2012. 2012: p. 639250.

76

31. Nishimura, R.A., S.R. Ommen, and A.J. Tajik, Cardiology patient page. Hypertrophic cardiomyopathy: a patient perspective. Circulation, 2003. 108(19): p. e133-5.

32. MacRae, C.A., W. Birchmeier, and L. Thierfelder, Arrhythmogenic right ventricular cardiomyopathy: moving toward mechanism. J Clin Invest, 2006. 116(7): p. 1825-8. 33. Verheule, S., et al., Characterization of gap junction channels in adult rabbit atrial and ventricular myocardium. Circ Res, 1997. 80(5): p. 673-81.

34. Ideker, T., T. Galitski, and L. Hood, A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet, 2001. 2: p. 343-72.

35. Burniston, J.G., A.O. Gramolini, and R.J. Solaro, Cardiac proteomics. Biomed Res Int, 2014. 2014: p. 903538.

36. Augenlicht, L.H. and D. Kobrin, Cloning and screening of sequences expressed in a mouse colon tumor. Cancer Res, 1982. 42(3): p. 1088-93.

37. Barth, A.S., et al., Functional profiling of human atrial and ventricular gene expression. Pflugers Arch, 2005. 450(4): p. 201-8.

38. Tabibiazar, R., et al., Transcriptional profiling of the heart reveals chamber-specific gene expression patterns. Circ Res, 2003. 93(12): p. 1193-201.

39. Sharma, P., J. Cosme, and A.O. Gramolini, Recent advances in cardiovascular proteomics. J Proteomics, 2013. 81: p. 3-14.

40. Vogel, C. and E.M. Marcotte, Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet, 2012. 13(4): p. 227-32.

41. Chandramouli, K. and P.Y. Qian, Proteomics: challenges, techniques and possibilities to overcome biological sample complexity. Hum Genomics Proteomics, 2009. 2009.

42. Shevchenko, A., et al., In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc, 2006. 1(6): p. 2856-60.

43. Cosme, J., A. Emili, and A.O. Gramolini, Large-scale characterization of the murine cardiac proteome. Methods Mol Biol, 2013. 1005: p. 1-10.

44. Kislinger, T., et al., Multidimensional protein identification technology (MudPIT): technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue. J Am Soc Mass Spectrom, 2005. 16(8): p. 1207-20.

45. Arab, S., et al., Cardiovascular proteomics: tools to develop novel biomarkers and potential applications. J Am Coll Cardiol, 2006. 48(9): p. 1733-41.

77

46. Balgley, B.M., et al., Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics, 2007. 6(9): p. 1599-608.

47. Comunian, C., et al., A comparative MudPIT analysis identifies different expression profiles in heart compartments. Proteomics, 2011. 11(11): p. 2320-8.

48. Deshusses, J.M., et al., Exploitation of specific properties of trifluoroethanol for extraction and separation of membrane proteins. Proteomics, 2003. 3(8): p. 1418-24.

49. Sinha, A., et al., In-depth proteomic analyses of ovarian cancer cell line exosomes reveals differential enrichment of functional categories compared to the NCI 60 proteome. Biochem Biophys Res Commun, 2014. 445(4): p. 694-701.

50. Edgar, R., M. Domrachev, and A.E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 2002. 30(1): p. 207-10.

51. Becker, K.G., et al., The genetic association database. Nat Genet, 2004. 36(5): p. 431-2.

52. Blake, J.A., et al., The Mouse Genome Database (MGD). A comprehensive public resource of genetic, phenotypic and genomic data. The Mouse Genome Informatics Group. Nucleic Acids Res, 1997. 25(1): p. 85-91.

53. Kim, M.S., et al., A draft map of the human proteome. Nature, 2014. 509(7502): p. 575- 81.

54. Wilhelm, M., et al., Mass-spectrometry-based draft of the human proteome. Nature, 2014. 509(7502): p. 582-7.

55. Lin, H., et al., Gene expression and genetic variation in human atria. Heart Rhythm, 2014. 11(2): p. 266-71.

56. Chen, H., et al., BMP10 is essential for maintaining cardiac growth during murine cardiogenesis. Development, 2004. 131(9): p. 2219-31.

57. Carvalho, R.L., et al., Compensatory signalling induced in the yolk sac vasculature by deletion of TGFbeta receptors in mice. J Cell Sci, 2007. 120(Pt 24): p. 4269-77.

58. Astrof, S., D. Crowley, and R.O. Hynes, Multiple cardiovascular defects caused by the absence of alternatively spliced segments of fibronectin. Dev Biol, 2007. 311(1): p. 11-24.

59. Liu, C., et al., T-box transcription factor TBX20 mutations in Chinese patients with congenital heart disease. Eur J Med Genet, 2008. 51(6): p. 580-7.

60. Granados-Riveron, J.T., et al., Alpha-cardiac myosin heavy chain (MYH6) mutations affecting myofibril formation are associated with congenital heart defects. Hum Mol Genet, 2010. 19(20): p. 4007-16.

78

61. Shaw, G.M., et al., Risks of human conotruncal heart defects associated with 32 single nucleotide polymorphisms of selected cardiovascular disease-related genes. Am J Med Genet A, 2005. 138(1): p. 21-6.

62. Grego-Bessa, J., et al., Notch signaling is essential for ventricular chamber development. Dev Cell, 2007. 12(3): p. 415-29.

63. Kim, Y., et al., Use of colloidal silica-beads for the isolation of cell-surface proteins for mass spectrometry-based proteomics. Methods Mol Biol, 2011. 748: p. 227-41.

64. Savas, J.N., et al., Mass spectrometry accelerates membrane protein analysis. Trends Biochem Sci, 2011. 36(7): p. 388-96.

65. Hsu, J., et al., Whole genome expression differences in human left and right atria ascertained by RNA sequencing. Circ Cardiovasc Genet, 2012. 5(3): p. 327-35.

66. Kahr, P.C., et al., Systematic analysis of gene expression differences between left and right atria in different mouse strains and in human atrial tissue. PLoS One, 2011. 6(10): p. e26389.

79

Appendices Appendix A

The atria and ventricles are further separated by the left and right sides. Different sides of the chambers, such as the left and right atrium have different susceptibility to atrial fibrillation [65].

However, our ability to sort out cells from the left and right chambers are limited due to previous studies carried out in diseased human heart tissue and pairwise chamber comparisons were collected from different individuals [65, 66]. In this study, we carried out a transcriptome study of the four chambers from healthy human fetal heart to provide candidate genes for separating out chamber specific cells.

The top genes expressed in the right atrium compared to the other three chambers from this study include bone morphogenetic protein 10 (BMP10), solute carrier family 5 member 12 (SLC5A12), and guanine nucleotide binding protein (GNAO1). BMP10 and hepcidin antimicrobial peptide

(HAMP) are also found to be upregulated in the right atrium reported by Hsu et al [65]. The top genes expressed in the left atrium include potassium voltage-gated channel subfamily H member

7 (KCNH7), myosin binding protein H-like (MYBPHL), and dual oxidase 2 (DUOX2), which are different from the genes reported in Hsu et al. include ankyrin repeat domain 30B-Like

(ANKRD30BL), paired-like homeodomain transcription factor 2 (PITX2), and LOC100144602.

Reasons for the differences are most likely due to different microarray platforms, source of samples, and did not consider the left and right ventricles. The top genes expressed in the left ventricle include hes-related family BHLH transcription factor with YRPW Motif 2 (HEY2), neuroligin 1 (NLGN1), and fibroblast growth factor 18 (FGF18); the right ventricle is enriched in

80 membrane-associated ring finger 11 (MARCH11), leucine rich repeat containing 2 (LRRC2), and pyruvate dehydrogenase lipoamide kinase isozyme 4 (PDK4). Chamber enriched genes were ranked based on the average fold-change of one chamber over the other three chambers.

Next, we examined the gene expression correlation between the chambers in Supplemental

Figure 1 as there are no previous studies examining the expression profile of the four heart chambers. The left and right sides of the atria or ventricles had the highest correlation due to similar sizes, transcript that are expressed, and functions of the same chamber on different sides.

The left atrium showed slightly higher correlation compared to the right atrium than to the left and right ventricles. But the overall correlation score between the chambers are still very high of over 0.92. The heart in fact is a specialized organ different from other organs at least at the transcript level with approximately 0.66 correlation with skeletal muscle (GEO study

GSE33886) and correlation score drop down to approximately 0.46 in other organs such as the liver and intestines (GSE20671 and GSE41269, respectively).

Supplemental Figure 1. Pearson correlation of the transcriptome data for the four fetal heart chambers (8,853 probes with at least three-fold greater than the average global probe intensities). The left and right side of the chambers showed the highest correlation of approximately 0.97, left atria (LA) showed correlation of approximately 0.94 with the left and right ventricles. The right atrium showed the lowest correlation of approximately 0.92 with the left and right ventricles.

81

Appendix B

Supplemental Figure 2. Number of proteins identified in the human fetal atria and ventricles on the LTQ ion trap mass spectrometer. The left Venn diagram shows the overlap between atrial triplicates and ventricular samples are shown on the right. There were high variability between the triplicates especially in the atrial samples and a low number of proteins identified overall compared to the Q-Exactive data (number of proteins detected per individual runs on the LTQ ion trap ranged from 1,026 to 2,001).

Supplemental Figure 3. Comparison of the proteins detected in at least 2 out of 3 runs for both the LTQ ion trap and Q-Exactive (QE) data. In both atria and ventricles, the QE data detected significantly more number of proteins compared to the LTQ.

82

Appendix C

Supplemental Figure 4. Protein interaction (via co-expression and text mining) of atrial enriched proteins causing congenital heart defects. Interactions were generated by String v.9.1.

83

Appendix D

Supplemental Figure 4. Atrial and ventricular enriched transcript data from this study were compared to human fetal whole heart proteome from Kim et al. A total of 399 atrial and 376 ventricular enriched gene products were also detected in the whole heart proteome.