CALIFORNIA STATE UNIVERSITY, NORTHRIDGE

GENE EXPRESSION PROFILING IN A FAMILY WITH A NOVEL FORM OF

BETA-THALASSEMIA

A thesis submitted in partial fulfillment of the requirement

For the degree of Master of Science in Biology

By

Forough Taghavifar

May 2016 The thesis of Forough Taghavifar is approved:

Dr. Stan Metzenberg, Ph.D. Date

Dr. Steven B Oppenheimer, Ph.D. Date

Dr. Aida Metzenberg, Ph.D., Chair Date

California State University, Northridge

ii

ACKNOWLEDGEMENTS

I would like to extend the outmost gratitude to my father, mother and husband whose emotional support and guidance has enabled me to successfully enhance my academic journey.

Also, a heartfelt appreciation goes to Dr. Aida Metzenberg, my wise and trusting advisor. Dr. Aida Metzenberg, not only have you been my thesis advisor and mentor, you have also served as a mother figure for me ever since I emigrated to the US and started my master’s program at Cal State University Northridge, -for that I will always thank you! Furthermore, I would thank Dr. Stan Metzenberg for believing in me before I personally believed in my abilities. Dr. Stan Metzenberg, if it were not for your computer modeling class, a spark would not have ignited my interest in wanting to further sophisticate my project.

Lastly, I would like to acknowledge the Bruins in Genomics (BIG) summer program that I had the pleasure of attending in the summer of 2015. Also a very special thanks to my mentor at the BIG program, David Casero, who has more than willingly assisted me throughout all the RNA sequencing data analysis steps.

iii

TABLE OF CONTENTS

SIGNATURE PAGE.……………………………………………………………………..ii ACKNOELEDGMENT………………………………………………………………...... iii LIST OF FIGURES………………………………………………...………...... vi

LIST OF TABLES…..……………………………………………...... viii ABSTRACT…………………………………………………...... ix

CHAPTER I: INTRODUCTION

Globin and ………………………….………………...1

Culmination of literature review………………………………………………...... 4

The aims of this study and hypothesis…………………………………………….7

Preliminary data………………………………………………………………….10

CFAPTER II: MATERIALS AND METHODS

Splice site prediction using Alternative Splice Site Predictor (ASSP)…………..12

RNA isolation and RT-qPCR……………………………………………………12

Library construction and RNA Sequencing……………………………………...14

Mapping and RNA-seq data analysis…………………………………………….14

Analyzing effects of masking……………………………………….15

Gene ontology (GO) Analysis using the DAVID annotation database………….16

CHAPTER III: RESULTS

The novel mutation was predicted to generate a cryptic donor site……………...17

The novel mutation down-regulates β-globin expression………………………..18

RNA-Seq data visualization in the HBB …………………………………..21

iv

The novel mutation introduces a splice donor site……………………………….22

More than 300 genes are differentially expressed in ß-thalassemic blood………25

β-Thalassemia shows important similarities and differences with at the transcriptome level………………………………………………..38

CHAPTER IV: DISCUSSION…………………………………………………………..41

REFERENCES…………………………………………………………………………..53

APPENDIX A: Supplemental Matrix 1………………………………………………….61

APPENDIX B: Supplemental Table 1……..…………………………………………….66

APPENDIX C: RNA-Seq data analysis commands…...………………………………...69

v

LIST OF FIGURES

Figure 1: Globin gene clusters and location………………………………...2

Figure 2: Proposed model of alternative splicing……………………...…...…………...... 9

Figure 3: Sanger DNA sequencing chromatograms ……...…...……………………...... 10

Figure 4: Subject family structure..………………………………………...………….....11

Figure 5: Results of bioinformatics sequence analysis

Figure 5A:……………………………………………….……………...………..17

Figure 5B:……………………………………………….……………...………..18

Figure 6: qPCR amplification curves and bar chart……………………………………...20

Figure 7: IGV visualization of reads mapped to the HBB

Figure 7A:……………………………………………….……………...………..21

Figure 7B:……………………………………………….……………...………..22

Figure 8: UCSC genome browser snapshots of the HBB locus

Figure 8A:……………………………………………….……………...………..23

Figure 8B:…………………………………………….……………...…………..24

Figure 8C:……………………………………………….……………...………..24

Figure 9: Comparison of the transcript read counts (FPKMs)…………………………...26

Figure 10: Hierarchical clustering of the differentially expressed genes………………..28

vi

Figure 11: Heat-map showing the expression fold change of the DEGs………………...35

Figure 12: Diagonal (log-log) plots of expression levels of individual genes (FPKM units)

Figure 12A:……………………………………………….……………...………36

Figure 12B:……………………………………………….……………...………37

Figure 12C:……………………………………………….……………...………37

Figure 13: Venn diagram comparing the number of the DEGs……..…………………...40

vii

LIST OF TABLES

Table 1: Cryptic splice site mutations.……………………….....………..……………….8

Table 2: Hematological parameters……………………………………………………...10

Table 3: qPCR plate display……………………………………………………………..13

Table 4: qPCR quantification data……………………………………………………….19

Table 5: analysis table…………………………………………………...29

Table 6: Comparing the list of the DEGs in anemias……………………………………39

viii

ABSTRACT

GENE EXPRESSION PROFILING IN A FAMILY WITH ß-THALASSEMIA

By

Forough Taghavifar

Master of Science in Biology

The genetic causes of β-thalassemia are largely well described. However, the disease is very heterogeneous at both the molecular and clinical levels. Studying the transcriptome profiles of β-thalassemia patients, especially individuals who carry novel mutations in the β-globin gene (HBB), may improve our understanding of the heterogeneity and molecular mechanisms of the disease and its possible treatment. Here, I characterized members of a family with β-thalassemia using whole genome expression analysis. I report a novel mutation in the 1 region of HBB (HBB:c.51C>T) that was associated with an unexpected phenotype of β-thalassemia in a heterozygote who also carries a typical β-thalassemia allele. I analyzed effects of the novel mutation at the transcriptome level by RT-qPCR and high-throughput RNA sequencing using an Illumina

Hiseq2500 system. The results revealed that the novel mutation creates a cryptic donor splice site in the HBB, which causes alternative splicing from the site and down-regulates

(~0.7) expression of the β-globin. Gene expression profiling analysis showed that there were more than 300 differentially expressed genes (DEGs) in β-thalassemic blood. The

DEGs were enriched in pathways that are directly or indirectly related to β-thalassemia such as hemopoiesis, biosynthesis, response to oxidative stress, inflammatory responses, immune responses, controlling of Circadian rhythms, apoptosis, and other cellular activities. It was possible to compare these findings with published results of

ix

RNA-seq analysis of sickle cell disease and KLF1-null anemia, and recognize similarities and differences in their transcriptional expression patterns. While many DEGs involved in response to hemolysis, iron homeostasis, and anemia were in common between these three types of anemias, over 200 DEGs were unique to β-thalassemia. Although this study was limited by the small sample size of the patients, it provides a wealth of data on

β-thalassemia because it is the first broad investigation of blood cell gene expression in this disease, and gives us novel insight that can be used in drug discovery to identify novel therapeutic approaches for the disease.

x

CHAPTER I: INTRODUCTION

Globin genes and hemoglobinopathies

It has been determined that approximately 250,000 individuals are born with a disorder caused by a defect in the , each year. This means that these disorders, which are called hemoglobinopathies, contribute more mortalities than any other group of genetically inherited disorders [1]. For this reason, scientists have extensively studied the hemoglobin molecule and the diseases caused by mutations in the genes encoding for it.

Hemoglobin is a in red blood cells that carries oxygen from the lungs to every cell within the different tissues in the body, and transports carbon dioxide from the tissues to the lungs in order for it to be excreted from the body. The main hemoglobin molecule in human is called Adult Hemoglobin or HbA. This molecule is a tetramer that is made up of four subunits, two alpha-globin (α-globin) chains and two beta-globin (β- globin) chains (α2β2). Over the course of evolution, the gene that codes for α-globin and the one that codes for β-globin, have been separated. In humans, the α-globin gene cluster is located on , whereas the β-globin gene cluster is located on [2]. Human Hemoglobin is heterogeneous and during development undergoes a succession of various globin chains, which are differentially expressed during embryonic, fetal and adult life, i.e. α2ε2 ( (HbE)), α2γ2

( (HbF)), α2δ2 () and α2β2 (adult hemoglobin (HbA)).

The globin gene clusters and their chromosomal locations are shown on the Figure 1. In a normal adult, the overall hemoglobin composition is about 96%-98% HbA, 2%-3%

HbA2 and <1% HbF. This High percentage of HbA reflects the effectiveness of this form of the hemoglobin in carrying oxygen to adult tissues [1]. The official symbol for the

1 hemoglobin beta or β-globin gene is HBB, and it is located at position 15.5 on the short arm of chromosome 11 (11p15.5) [2].

Figure 1)

A A

h#p://www.fastbleep.com/biology6notes/32/156/846?

Figure 1. Globin gene clusters and chromosome location

While the hemoglobinopathies are endemic in many world populations, their genetic carrier rates exceed 40% in Africa and Southeast Asia [3]. These disorders include mutations in globin coding sequences that lead to structural changes in encoded , such as the sickle cell allele HbS, and mutations that alter the expression of the

α- and β- globin genes such as the thalassemias.

2

Beta Thalassemia (β-thalassemia) is an autosomal recessive blood disorder, which is caused by mutations in the HBB gene that codes for the β-globin chains of the hemoglobin molecule. Mutations that completely abolish expression of the HBB, resulting in the absence of β-globin production, are designated “β0” alleles. Other mutations in HBB cause different degrees of quantitative reduction in β-globin expression and are classified as “β+” or “β++” alleles [4]. A decrease or complete lack of

β-globin production results in various degrees of imbalance between α and β chain synthesis, and a resulting precipitation of excess α-globin chains within the cell. The latter problem is known to result in anemia due to hemolysis of red blood cells, compounded by a cellular stress-response and ineffective erythropoiesis [5]. The disease ultimately results in organ damage and poor overall growth of the affected individuals.

Patients may have to receive regular blood transfusions in order to relieve the effects of their disease [6 & 7].

The severity of the phenotype, which ranges from asymptomatic (thalassemia minor) to severe anemia (thalassemia intermedia and thalassemia major), is predominantly determined by the type of β-globin mutant allele(s) (β0, β+, β++) and the degree of imbalance between α- and non-α- globin chains synthesis [8]. There are also genetic modifiers that increase γ-globin chain production, mitigating the severity of the disorder by allowing increased production of the fetal hemoglobin (HbF: α2γ2).

Alternatively, coinheritance of interacting α-globin gene mutations (α-thalassemia), which reduces the pool of unbonded α-globin chains, may ameliorate the severity of the

β-thalassemia disease [9& 10].

3

It has been reported that β-thalassemia shares characteristics of protein aggregation disorders such as Alzheimers and Huntington disease, and the ubiquitin- proteasome system (UPS) and lysosome/autophagy pathways are important in reducing the accumulation of α-globin aggregates [11]. These protein quality control systems may become overwhelmed in individuals with β0-thalassemia, leading to HSP70 sequestration in the cytoplasm of developing erythroid cells and a concomitant failure to protect transcription factor GATA-1 from cleavage by caspase-3 in the nucleus, and ultimately to apoptosis [12]. Hemolysis leads to physiological iron overload, particularly in the severe form of the disease. The elevated stores of iron may be a key factor in increased inflammation and susceptibility to infection [13], and indeed, infection is among the most common causes of death in individuals with β thalassemia [14].

Culmination of literature review

Approximately 300 β-thalassemia alleles have now been characterized and archived in the Hb Var database (http://globin.cse.psu.edu/hbvar/menu.html). Unlike α- thalassemia, in which the vast majority of mutations are deletions in the α-globin gene cluster, β-thalassemia is almost always caused by point mutations involving one or a few nucleotides. These mutations are positioned within the β-globin gene or its immediate flanking regions [15].

There are many β-thalassemia mutations that interfere with processing of the primary β-globin mRNA transcript. Those that change the invariant dinucleotides GT or

AG at the exon– splice junction completely abolish normal splicing and are classified as β0 alleles. These mutations can be nucleotide substitutions or short deletions that change the invariant dinucleotides or completely remove them respectively.

4

Sequences placed at the side of the invariant dinucleotides are fairly well conserved and a consensus sequence can be identified at the exon–intron boundaries. They include approximately the last three nucleotides of the exon and the first six nucleotides of the intron for the 5' donor site and the last 10 nucleotides of the intron and the first nucleotide of the exon for the 3' acceptor site [16]. A catalogue of the sequences found at functional splice sites has identified the 5' (donor) consensus sequence “(A or C) AGGT (A or G)

AGT”, where the GT dinucleotide heralds the beginning of an intervening sequence [17].

The efficiency of normal splicing could be decreased to various degrees by mutations within the consensus sequences at the splice junctions, thereby causing β- thalassemia phenotypes ranging from mild to severe [16]. A common example of this type of mutations is IVS1-5 (G→ C, T or A), in which a point mutation at position 5 of intron I substantially reduces splicing in comparison to the normal allele. This mutation also caused activation of three cryptic donor sites, one in intron 1 and two in exon 1, which are preferentially used as the donor site rather than the actual donor site containing the IVS1-5 mutation [18].

“Cryptic” splice sites, which are sequences very similar to the consensus sequence for a splice site but are not normally used, have been identified in exon 1 and both intron

1 and 2 of the HBB gene. Mutations at these sites produce a sequence that more closely resembles the normal splice site and may activate the preferential use of these sites during

RNA processing. The phenotype associated with these types of mutations can be categorized as either β+ – or β0 – thalassemia alleles, depending on the nature and the position of the mutations [16].

5

Five mutations in the exon 1 of the HBB have been previously recognized that cause alternative splicing by creating or activating cryptic 5’ splice donor sites (Table 1).

Three of these mutations positioned within a cryptic splice site spanning from codon 24 to codon 27. These mutations modify the cryptic splice sites in a way that it more closely resembles the normal consensus splice sequence. The codon 24 mutation, CD24

(GGT→GGA), is translationally silent (Gly

(GCC→TCC) mutations result in βE and βKnossos variants respectively. CD 24 mutation resulting in a reduced level of normally proceed β-globin mRNA to about 75%

[19]. The CD26 and CD27 mutation are associated with a mild β+-thalassemia phenotype of βE and βKnossos alleles because they lead to a minor use of the alternative splice sites, and there is still a reasonable amount of normally spliced products [20& 21].

The two other exonic mutations, CD 19 (AAC→AGC) and CD10 (GCC→GCA), are more distant from the normal splice site. An Adenine to Guanine substitution in codon 19 (Hb Malary variant; Asn→Ser) activates a cryptic donor site spanning codons

17 to 19 which leads to a 60% decrease in the accumulation of the normally spliced β- globin mRNA [22]. In the case of mutation in the codon 10, a transcriptionally silent mutation (Ala

Although the genetic causes of β-thalassemia are largely well described, the disease is very heterogeneous at the both molecular and clinical levels. Studying the transcriptome profiling in β-thalassemia patients, especially individuals who carry novel

6 mutations, may improve our understanding of the heterogeneity and molecular mechanisms of the disease and its possible treatment.

The aims of this study and hypothesis

A novel mutation in the coding region of the HBB has been identified in an

Iranian family (unpublished observation of Dr. Mohammad Hamid). The mother was heterozygous for the novel mutation in the codon 16 of the exon 1 region of the HBB

(Codon 16 (GGCàGGT)), in which there is a synonymous C to T transition. The father was heterozygous for a common β0 allele (CD36/37 (–T)), in which a thymidine nucleotide is deleted from codon 37 of the β-globin, resulting in a translational frameshift and early termination with a stop codon at codon 60. The hematological indices of the father were in agreement with a typical minor β-thalassemia (career) individual, while the

Codon 16 (CD16) mutation in the mother looked like a polymorphism and her hematological indices were near normal (borderline). Their offspring, who was a compound heterozygous for both the novel CD16 and the paternal CD36/37(-T) mutations was unexpectedly affected by a severe β-thalassemia intermediate phenotype

(Table 2). Her physical examination showed anemia, paleness, jaundice, and splenomegaly. She also required periodic blood transfusions. Therefore, the novel CD16 mutation was suspicious to likely affect the expression of the β-globin, either alone or in combination with another mutation.

To investigate the molecular basis of the observed β-thalassemia phenotype in the aforementioned family, I proposed to analyze the structural and functional effects of the novel mutation on the β-globin expression at the transcriptional level. Also, I proposed to

7 perform transcriptional profiling of the carrier mother and the compound heterozygous daughter in order to characterize patterns of gene expression in β-thalassemia. This is the first such broad investigation of blood cell gene expression in thalassemias, and I compare my findings with prior studies of gene expression profiling in sickle cell disease and a patient with a severe congenital nonspherocytic hemolytic anemia.

Prior to functional analysis of the novel mutation, I analyzed the HBB sequence containing the novel mutation with some bioinformatics tools. Preliminary analysis showed that the CàT substitution in the codon 16 creates sequence of “GGTAAG” in the mutated region, containing the mutated codon 16 and un-mutated codon 17. This new sequence has homology to the 6 middle nucleotides of the 5’ splice (donor) consensus sequence at the exon 1-intron 1 boundary. It has been reported that this homologous sequence in the exon 1 causes alternative splicing at the site giving a β+-thalassemia phenotype [19- 23]. In addition, codon 17 has previously been recognized to be involved in the cryptic splice site created by a mutation in the codon 19 [22].

Table 1:

8

Table 1. Cryptic splice site mutations. Summary of mutations creating a cryptic donor site in the exon 1 region of the HBB. Mutated nucleotides are shown in red color.

The hypothesis of this study was that the CàT mutation at codon 16 likely creates a cryptic donor splice site, which causes alternative splicing from the site, reducing the efficiency of the normal 5’ splice (donor) site in the exon 1- intron 1 (IVS) boundary (Figure 2). This predicted splicing abnormality was expected to cause a decrease in the accumulation of normally proceed β-globin mRNAs, thereby causing the unpredicted β-thalassemia phenotype in the family mentioned. Also, I proposed that gene expression profiling would determine a large number of the differentially expressed genes (DEGs) in β-thalassemia, which are involved in homeostatic response to pathobiological stress in β-thalassemia. Comparison of the list of the DEGs in β- thalassemia with the two other types of anemia was expected to find fundamental differences between them, and the results can potentially be used in drug discovery to identify novel therapeutic approaches.

Figure 2)

Figure 2. Proposed model of alternative splicing from the mutated region (CD16).

9

Preliminary data:

Figure 3)

HBB:c.51C>T!

! ! HBB:c.112delT!

! !

Figure 3. Sanger! DNA sequencing chromatograms showing the HBB mutations in the daughter with severe β-thalassemia intermedia. The top panel shows her maternally-derived novel mutation in the HBB gene (HBB:c.51C>T). The bottom panel shows the paternally-derived (HBB:c.112delT) mutation. Mutations are called according to HGVS nomenclature .

Table 2)

Cases Hb Age MCV MCH RBC HBA HBB HGB MCHC 6 HBA HBF (fl) (pg) 2 (yrs) (g/d) (g/dL) (10 /µ (%) (%) Genotype L) (%)

Mother 12.8 HBB:c.51C>T 28 13.4 78.2 25.9 33.1 5.18 96.2 3.4 0.4

Father 10.5 HBB:c.112delT 35 11.2 60.5 19.6 32.4 5.72 94.4 5.0 0.6

Daughter

HBB:c.51C>T 7.1 5 7.6 56.5 17.3 30.6 4.16 93.5 4.8 1.7 HBB:c.112delT

10

Table 2. Hematological parameters and Hemoglobin profiles of the family.

Figure 4)

Figure 4. Subject family structure (family’s pedigree).

11

CHAPTER II: MATERIALS AND METHODS

Splice site prediction using Alternative Splice Site Predictor (ASSP)

To investigate the possible structural and functional effects of the novel mutation, a sequence analysis tool called “Alternative Splice Site Predictor (ASSP)” was used, which predicts and classifies the splice sites within the HBB gene sequence containing the novel mutation at codon 16 [24]. The ASSP tool is based on an analysis of constitutive, skipped, cryptic, and alternative exon isoform splice sites retrieved from the

Altextron database. ASSP identifies putative splice sites using pre-processing models

(position specific score matrices) and subsequently classifies them as either constitutive or alternative isoform/cryptic splice site using back propagation networks which combine several models and sequence statistics, such as position specific score matrices for the splice sites, GC content, oligonucleotide frequency models. Once the cutoff-value of the corresponding pre-processing model is surpassed, a splice site is labeled "real" and subsequently classified as "alternative isoform/cryptic" or "constitutive" by the corresponding neural network.

RNA isolation and RT-qPCR

Total RNA was isolated from buffy coat using EDTA as the anticoagulant and

Trizol reagent (Sigma-Aldrich, Germany) following the manufacturer's instructions. One microgram of the total RNA was used for cDNA synthesis with a QuantiTect reverse transcription (Qiagen). The concentration and quality of the cDNA from each sample was determined by absorbance at 260 and 280 nm, using a Nanodrop 2000c spectrophotometer (Thermo Fisher Scientific, Wilmington, DE).

12

Relative quantification of β-globin cDNA was performed by real-time quantitative Polymerase Chain Reaction (qPCR) using an iTaq™ Universal SYBR Green

Supermix Kit (Bio-Rad) following the manufacturer's instructions. Briefly, qPCR was performed in 10ul reaction volumes containing 5ul SYBR Green Supermix, 2ul cDNA

(40ng), 1ul nuclease-free water, and 1ul each of forward and reverse primers (3uM).

Primer pairs for β-globin cDNA (forward 5’- GAT GAA GTT GGT GGT GAG GCC -3’ and reverse 5’- GCC CAT AAC AGC ATC AGG AGT G -3’) as the target gene, and for both ACTB (forward 5’- TGG CAC CAC ACC TTC TAC AAT G -3’ and reverse 5’-

GGT CTC AAA CAT GAT CTG GGT CA -3’) and GAPDH (forward 5’- GGA AGG

TGA AGG TCG GAG TC -3’ and reverse 5’- ACA TGT AAA CCA TGT AGT TGA

GGT -3’) as the endogenous reference genes were designed to obtain amplified sizes between 70-150 bp for the best qPCR efficiency, as recommended by Bio-rad.

Table 3)

1 2 3 4 5 6 7 8

A Unk-1 Unk-1 Unk-1 Unk-2 Unk-2 Unk-2 NTC NTC HBB HBB HBB HBB HBB HBB HBB HBB N N N M M M

B Unk-3 Unk-3 Unk-3 Unk-4 Unk-4 Unk-4 NTC NTC GAPDH GAPDH GAPDH GAPDH GAPDH GAPDH GAPDH GAPDH N N N M M M

C Unk-5 Unk-5 Unk-5 Unk-6 Unk-6 Unk-6 NTC NTC Actin Actin Actin Actin Actin Actin Actin N N N M M M Table 3. qPCR plate display

13

The qPCR was performed on a CFX96 Touch™ Real-Time PCR Detection

System (Bio-Rad) following thermal cycling profile of 950C for 3 minutes, and 39 cycles of 950C for 10 seconds, 570C for 30 seconds. Samples were analyzed in triplicate (Table

X), and expression of the β-globin transcript relative to ACTB and GAPDH was normalized to a control sample from a healthy individual with normal hematological parameters and hemoglobin profile. The relative fold change was calculated using the method of Pfaffl [25& 26]. Bio-Rad CFX Manager 3.0 software was used for melt curves analyses and for calculating the efficiency of the primer sets and the reactions.

Library construction and RNA Sequencing

500ng of the total RNA from each sample was used for RNA-seq library preparation, and the quality of RNA was determined with Bioanalyzer 2100. Libraries for

RNA-seq were prepared with a KAPA Stranded RNA-seq Kit (Illumina), following the manufacture’s instructions. Sequencing was performed on an Illumina HiSeq 2500 system for pair-ended reads with a length of 100 bp (2*100 bp). Pre-processing and data quality control were performed using Illumina proprietary software.

Mapping and RNA-seq data analysis

More than 39 million 100bp pair-ended reads were generated for each sample.

Reads were aligned to the human reference genome with the STAR genome index [27],

Which included both the genome sequence (GRCh37.71 (hg19)) and the exon/intron structures of known RefSeq transcripts. On average, 80% of the reads (using a similarity score of 66%) were aligned successfully with the genomic sequence.

14

Samtools [28] was used to sort and index the alignment files for visualization with the Integrative Genomics Viewer (IGV) tool [29]. The sorted BAM files were also used to generate UCSC Browser tracks with a genomeCoverageBed from BedTools [30]. To this end, coverage files were normalized using the total signal for each sample.

The cufflinks [31] suite of tools (version 2.2.1) was used for differential gene expression analysis, which provides and compares gene expression levels in units of

FPKMs (fragments per kilobase of transcript per million fragments mapped). The mapped reads were counted for RefSeq transcripts using cuffquant. The cuffquant outputs were passed to cuffdiff to test for significant differences in transcript abundance for each pair-wise comparison between the three samples. Genes were flagged as differentially expressed when the log2|fold change| was >1 and the adjusted p-value was <0.05.

In order to handle highly expressed genes using the cuffdiff, option (--max-bundle- frags 5000000) was used for cuffquant (default is 1000000) to make cuffdiff able to compute the FPKM values for the genes that pass the default count and also to take them into comparison for the differential expression analysis.

A heatmap of the differentially expressed genes was generated for hierarchical clustering using GENE-E (http://www.broadinstitute.org/cancer/software/GENE-E/).

Analyzing effects of globin gene masking on gene expression analysis by RNA-seq

In many clinical studies where whole blood RNA is used, there is interference of the overall gene expression by the high levels of globin transcripts in whole blood, which account for more than 70% of the reference mRNA. In order to determine whether the high levels of globin transcripts interferes with the expression estimates and downstream

15 results obtained from RNA-seq libraries, the results were compared with those obtained after masking the globin genes from the reference annotation. For this purpose, all the reads overlapping the globin genes (HBA1, HBA2, HBB, HBD, HBG1, HBG2, HBE1,

HBM, HBQ, HBZ) were ignored for one of our samples (the wild-type sample) by using the masking option in cufflinks (-M/–mask-file ). Then the transcript read counts (FPKM) with and without masking the globin genes were compared using correlation analysis.

Gene ontology (GO) Analysis using the DAVID annotation database

The Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to determine the molecular functions, ontology terms and the pathways significantly enriched in the differentially expressed genes among our samples [32].

"DAVID: database for annotation, visualization, and integrated discovery. https://david.ncifcrf.gov

16

CHAPTER III: RESULTS

The novel mutation was predicted to generate a cryptic donor site by ASSP tool

The Alternative Splice Site Predictor (ASSP) tool revealed that the novel mutation creates a putative splice site encompassing codon 16 and 17, in which splicing can occur at nucleotide position 99 (Figure 5A and 5B). ASSP also shows the two other well-known natural cryptic splice sites spanning codon 17-19 and codon 24-26 within the

HBB, in which splicing happen from the nucleotide positions 104 and 126 respectively.

The activation of these cryptic splice sites through the mutations at codons 19, 24 and 26 has been investigated before. Further analysis showed that the aberrant splicing (from the nucleotide position 99) through the cryptic splice site generated by the CD16 mutation leads to a premature stop codon at exon 2 of the β-globin transcript.

Figure 5A)

17

Figure 5B)

Figure 5A & 5B. Results of the bioinformatics sequence analysis of the HBB harboring the novel CD16 mutation by using Alternative Splice Site Predictor tool.

* Scores of the preprocessing models reflecting splice site strength, i.e. a PSSM for putative acceptor sites, and an MDD model for putative donor sites. Intron GC values correspond to 70 nt of the neighboring intron.

** Activations are output values of the backpropagation networks used for classification. High values for one class with low values of the other class imply a good classification. Confidence is a simple measure expressing the differences between output activations. Confidence ranges between zero (undecided) to one (perfect classification).

The novel mutation down-regulates β-globin expression

A comparison was performed between the expression level of β-globin in the mother, versus a control sample from a healthy individual with normal hematological parameters and hemoglobin profile. Relative expression analysis showed that the novel mutation was associated with a down-regulation of the β-globin. As shown in figure 6, the relative normalized expression of β-globin in the mother (M) was approximately 0.77 of the control sample (N). This slight down-regulation of the β-globin as the result of the a cryptic splice site allele in combination with a wild type allele (heterozygous form), is consistent with the functional effect of the previously known cryptic splice site mutations at the HBB as they are categorized as either β+ or β++ thalassemia alleles [19- 23].

18

Table 4)

Well Fluor Target Content Sample Cq Cq Cq Std. mean Dev

A01 SYBR HBB Unk-1 N 19.65 19.54 0.138

A02 SYBR HBB Unk-1 N 19.38 19.54 0.138

A03 SYBR HBB Unk-1 N 19.59 19.54 0.138

A04 SYBR HBB Unk-2 M 19.17 19.21 0.068

A05 SYBR HBB Unk-2 M 19.29 19.21 0.068

A06 SYBR HBB Unk-2 M 19.17 19.21 0.068

A07 SYBR HBB NTC 36.55 36.55 0.000

A08 SYBR HBB NTC N/A 0.00 0.000

B01 SYBR GAPDH Unk-3 N 23.35 23.14 0.369

B02 SYBR GAPDH Unk-3 N 22.72 23.14 0.369

B03 SYBR GAPDH Unk-3 N 23.36 23.14 0.369

B04 SYBR GAPDH Unk-4 M 22.01 22.12 0.270

B05 SYBR GAPDH Unk-4 M 22.42 22.12 0.270

B06 SYBR GAPDH Unk-4 M 21.92 22.12 0.270

B07 SYBR GAPDH NTC 35.24 35.24 0.000

B08 SYBR GAPDH NTC 35.57 35.57 0.000

C01 SYBR ACTB Unk-5 N 19.03 18.85 0.240

C02 SYBR ACTB Unk-5 N 18.58 18.85 0.240

C03 SYBR ACTB Unk-5 N 18.94 18.85 0.240

C04 SYBR ACTB Unk-6 M 18.69 18.46 0.254

C05 SYBR ACTB Unk-6 M 18.49 18.46 0.254

C06 SYBR ACTB Unk-6 M 18.19 18.46 0.254

C07 SYBR ACTB NTC 34.45 34.45 0.000

C08 SYBR ACTB NTC 36.10 36.10 0.000

Table 4. qPCR quantification data

19

Figure 6)

Figure 6. qPCR amplification curves and bar chart for measuring relative expression of β-globin in mother (M) as compared to control sample (N). The curves are color-coded in the figure. The relative normalized expression of β-globin in M was 0.77 (± 0.1) of that detected in N.

20

RNA-Seq data visualization in the HBB locus

Integrative genomics viewer (IGV) visualization of the RNA-Seq reads mapped to the HBB showed that among the HBB transcripts from M, 59% originated from the wild- type allele while 41% carried the HBB:c.51C>T sequence change. So, there are approximately (41% / 59%) ~0.7 times fewer transcripts originating from the novel mutant allele compared to the wild-type allele. Results from D, a compound heterozygote, showed that 99% of the HBB transcripts carried the maternally derived HBB:c.51C>T allele and only 1% of the reads originated from the paternally derived HBB:c.112delT allele (Figure 7A & 7B). This disparity in the steady-state mRNA was likely caused by decay of the mRNA carrying the frameshifted sequence.

Figure 7A)

Figure 7A. Integrative genomics viewer (IGV) visualization of reads mapped to the HBB. The mismatches regarding to codon 16 mutation can be observed on the tracks of both sample M and sample D. A common SNP for codon 2 (Histamine) can be seen in the reads from all three samples.

21

Figure 7B)

Figure 7B. Analysis of steady-state levels of β-globin transcripts. Integrative genomics viewer (IGV) visualization of coverage tracks of the alignments in the exon 1 region of the HBB for the three samples, D, M, and N (indicated on the left). The reference sequence (complementary sequence) of the HBB is shown on the bottom. Nucleotides and amino acids are represented with single letters, and the direction of transcription is from right to left. Each coverage track is displayed using a gray bar chart showing the depth of the reads at each position, and the mismatches to the reference sequence are colored within the bar proportion to the read count of each base. The novel codon 16 mutation (GGC→GGT) can be seen in the coverage tracks of both sample M and sample D (see the asteroids). The bar corresponding to the third position of the codon 16 is highlighted in green which is complementary to the mutated T, and orange which corresponds to the original C nucleotide. A common SNP for codon 2 (Histamine) of the β-globin can also be seen in the coverage tracks of all three samples.

The novel mutation introduces a splice donor site

The relative expression of β-globin in M was ~70% of that detected in N, as can be compared by the UCSC tracks determined from the normalized count of RNA-seq reads mapping to the HBB (410599/ 285022 ~ 0.70) (Figure 8A). To visualize the splicing patterns at the HBB locus, RNA-seq reads with at least one intronic gap in their alignment were analyzed. In both M and D, a group of spliced reads were detected that

22 carried the HBB:c.51C>T mutation and spanned the novel joint of the putative splice site, confirming its use as a splice donor (Figure 8A, 8B, and 8C). These aberrantly spliced products were found at a low frequency compared to the canonically spliced reads, but without information on the degradation of alternatively spliced mRNAs it is difficult to draw an inference about the exact extent of use of the cryptic splice site. It is noteworthy to mention that a natural cryptic splice site spanning codon 17 to 19, which has been also predicted by the AASP tool as a cryptic donor site at the nucleotide position 104, is shown to be considerably subjected to splicing in all the three samples.

Figure 8A)

Window Position Human Feb. 2009 (GRCh37/hg19) chr11:5,246,696-5,248,301 (1,606 bp) Scale 500 bases hg19 chr11: 5,247,000 5,247,500 5,248,000 400000 _ Sample_N, linear coverage, scale factor 8.3

Sample_N

0 _ 400000 _ Sample_M, linear coverage, scale factor 8.0

Sample_M

0 _ 400000 _ Sample_D, linear coverage, scale factor 13.3

Sample_D

0 _ ALL spliced Sample N Sample N ALL spliced Sample M Sample M ALL spliced Sample D Sample D RefSeq Genes HBB Figure 8A. UCSC genome browser snapshots of the HBB locus. (A) Normalized, whole-genome RNA-Seq coverage tracks are shown for the three samples (linear coverage scale on the y-axis is adjusted to the normalized count of N, 410599). A dense view of the spliced reads (reads with at least one intronic gap in their alignment) observed in each sample is shown as grey/black block tracks (Black: highly abundant spliced reads, grey: lowly abundant spliced reads) at the bottom of the coverage track for each sample. The exon/intron structure of the HBB gene is shown at the bottom, and the direction of transcription is shown with arrows (right to left).

23

Figure 8B)

Window Position Human Feb. 2009 (GRCh37/hg19) chr11:5,248,154-5,248,255 (102 bp) Scale 50 bases hg19 chr11: 5,248,160 5,248,170 5,248,180 5,248,190 5,248,200 5,248,210 5,248,220 5,248,230 5,248,240 5,248,250 ---> ACCAACCTGCCCAGGGCCTCACCACCAACTTCATCCACGTTCACCTTGCCCCACAGGGCAGTAACGGCAGACTTCTCCTCAGGAGTCAGATGCACCATGGTG ALL spliced Sample N Sample N ALL spliced Sample M Sample M ALL spliced Sample D Sample D RefSeq Genes HBB R G L A E G G V E D V N V K G W L A T V A S K E E P T L H V M Figure 8B. A dense view of the spliced reads and splicing sites from the exon 1 region of the HBB. The complimentary strand to the HBB is shown on the top. For the sample M and sample D tracks, the dark black block corresponds to codon 16 (see the asteroids) showing that the mutated codon 16 has been used as an alternative splice site (G^GT).

Figure 8C)

Window Position Human Feb. 2009 (GRCh37/hg19) chr11:5,248,021-5,248,275 (255 bp) Scale 100 bases hg19 chr11: 5,248,050 5,248,100 5,248,150 5,248,200 5,248,250 ALL spliced Sample N

ALL spliced Sample M

ALL spliced Sample D

RefSeq Genes HBBV L L R R G L A E G G V E D V N V K G W L A T V A S K E E P T L H V M Figure 8C. A squish view of the spliced reads spanning exon 1 and exon 2 of the HBB. Inspection of the interonic gaps shows how alternative splicing from the mutated codon 16 in M and D (see the asteroids) removes half of exon 1 along with intron 1. The blocks for accepter splice site are shown on the left, which correspond to the begging of exon 2.

24

More than 300 genes are differentially expressed in ß-thalassemic blood

RNAseq analysis was used to characterize broadly the expression of alpha and other beta-like globin genes in M and N, to develop an understanding of compensatory gene expression in these individuals. Compared to a normal individual, the HBD gene encoding δ-globin was up-regulated approximately 80-fold in D, and the HBG1 and

HBG2 genes encoding γ-globin were up-regulated 18 and 13-fold, respectively. It is possible that HbA2 and HBF were generated in substantially increased amounts to compensate for the defective production of HbA in D. The expression of HBA1 encoding α-globin was represented at slightly higher FPKM levels in D, compared to N, perhaps as a consequence of heightened erythropoiesis rather than gene regulation at the cellular level.

The significant differentially expressed genes in samples M and D, compared to

N, were determined using the cuffdiff analysis tool with the software settings adjusted to include highly expressed genes, as described in the Materials and Methods. A separate analysis was performed to determine whether masking globin genes as targets in the assignment of reads using the cuffdiff software changes the FPKM value for non-globin genes. There was a high correlation of FPKM values (R > 0.997) between the two approaches (masking vs. not masking) for all but ten genes, as shown in Figure 9. The genes marked with asterisks below the diagonal are globin genes that appear to have null expression when their genomic targets are masked. The two genes marked with arrows above the diagonal are SMN2 and EIF3CL, which show an artificial increase in apparent expression when globin gene targets are masked. These may have adventitious similarity to globin reads and serve as an alternate target for cuffdiff. As the remaining genes lie

25 closely along the diagonal, the masking (or not) of globin genes in the analysis does not appear to significantly bias the FPKM read counts for non-globin genes.

Figure 9) 4 3 2 R = 0.9971939 log10(fpkms[, 2] + log10(fpkms[, 1) Log10 (FPKMs[,2]Log10 + 1) ! 1 FPKMs genes) globin (masking 0

0 1 2 3 4 FPKMs (no masking) Log10log10(fpkms[, (FPKMs[,1] 1] ++ 1)1) Figure 9. Comparison of the transcript read counts! (FPKMs) of all the genes with and without masking the globin genes. Log-log plot of the FPKMs from the two alternative analyses are shown (black hollow dots), along with a linear correlation curve. The genes marked with asterisks below the diagonal are globin genes. The two genes marked with arrows above the diagonal are SMN2 and EIF3CL.

26

In pairwise comparisons between the samples M, D, and N, there were a total of

398 differentially expressed genes (DEG), 312 of which were differentially expressed in sample D compared to N. There were only 37 DEG by comparison of sample M with N, and 177 that were differentially expressed between D and M, in accordance with the severe hematological changes seen in the daughter compared to the slight anemia present in M. Of the 398 DEG in the overall list, 343 had expression levels with FPKM > 2 in all the three samples (see Supplemental Matrix 1). The differential patterns of gene expression in samples D and M are shown in Figure 10, and depend on both changes in the cellular make-up of the blood and changes in gene expression within individual cell.

An increase in erythropoiesis, for example in the daughter who is severely anemic, would result in an increase in erythroblast-associated transcripts in the total pool of buffy-coat

RNA. The overall RNA-seq pattern is thus a snapshot of the heterogeneous blood compartment as a whole, and indicates the physiological state of that tissue.

Figure 10) Hierarchical clustering of the differentially expressed genes found by pairwise comparisons of the normal (N), the mother (M) and the daughter (D) samples. A total of 336 non-globin genes showed differential expression (P value< 0.05) in at least one comparison. A red color indicates high expression, and a dark blue color indicates low relative expression levels. The scale represents Z-scores obtained from the expression estimates in units of FPKMs (fragments per kilobase of transcript per million fragments mapped).

27

relative

-3 -2 0 2 3 N AH K id index ESPN 100 GSTM1 132 JUN 161

DEFA3 75 HIST1H1C 138 EGR1 90 SELK 268 C3orf58 40 CLCN3 57 FRMD4A 120 ABCC4 3 AIDA 8 RASGRP3 252 MSMO1 206 BMP2K 31 BPI 33 SLC30A1 281 NCEH1 209 KIAA1586 170 TCP11L2 299 C7orf58 43 DCUN1D1 74 PHLPP2 233 MKI67 198 ACSL6 4 IFI44L 151 FBXL4 107 MAP4K5 192 HSPA13 144 MSI2 205 CA2 45 DYNLL1 85 C19orf77 38 GMPR 127 HIST1H2BD 139 ERAL1 97 E2F1 87 FOSB 116 GATA1 123 FAM100A 102 HSPA1A 145 ABCB6 2 DLL3 78 FKBP4 115 KEL 167 HSPB1 147 ANKRD9 12 PTN 248 LTF 188 AQP1 14 GOLT1A 129 CSPG5 67 EGFR 89 EPB42 95 KRT1 173 PLVAP 238 AHSP 7 HMBS 142 TMEM14B 307 KLF1 171 RNF182 261 RUNDC3A 264 ARL4A 19 CCRL2 48 FAM83A 106 NGRN 215 DNAJA4 80 relative ERMAP 99 HEPACAM2 137 -3 -2 0 2 3IBA57 148 CLIC2 59 SOX6 288 CA1 44 IFIT1B 152 YPEL4 330 NSUN3 221 BPGM 32 MAP1LC3B2 191 TMOD1 310 PPME1 242 SLC2A1 280 E2F2 88 REXO2 257 GADD45A 122 TMEM158 308 N AH K RFESD 258 id index TAL1 295 ESPN 100 ART4 20 GSTM1 132 NET1 211 JUN 161 IFI27 149 SOSTDC1 287 DEFA3 75 EIF1B 91 HIST1H1C 138 HSPA1B 146 EGR1 90 GYPA 133 SELK 268 UBXN10 318 C3orf58 40 PLD6 237 CLCN3 57 BBS12 25 FRMD4A 120 ATG4A 23 ABCC4 3 FECH 112 AIDA 8 DYRK3 86 RASGRP3 252 C1orf61 39 MSMO1 206 CISD2 54 BMP2K 31 BPI 33 CRYAB 65 SLC30A1 281 MPO 203 NCEH1 209 SLC7A5 284 KIAA1586 170 SLC16A1 277 TCP11L2 299 C17orf103 36 C7orf58 43 MARCH3 193 DCUN1D1 74 TBCEL 297 PHLPP2 233 YOD1 329 MKI67 198 BCAM 26 ACSL6 4 C17orf39 37 IFI44L 151 TUBB2A 316 FBXL4 107 CITED2 55 MAP4K5 192 FEM1B 113 HSPA13 144 KANK2 162 MSI2 205 PNP 240 CA2 45 C5 41 DYNLL1 85 CTNNAL1 68 C19orf77 38 SPTA1 289 GMPR 127 FHDC1 114 HIST1H2BD 139 DNAJB4 81 ERAL1 97 XPO7 327 E2F1 87 NFIX 214 FOSB 116 ITSN1 160 GATA1 123 SFRP2 270 FAM100A 102 TMEM57 309 HSPA1A 145 ISCA1 159 ABCB6 2 ZRANB1 336 DLL3 78 ATG14 22 FKBP4 115 CTSL1 69 KEL 167 ELL2 92 HSPB1 147 KIAA1191 168 ANKRD9 12 ALDH5A1 9 PTN 248 FBXO30 108 LTF 188 SLFN14 285 AQP1 14 RNF14 260 GOLT1A 129 YES1 328 CSPG5 67 TRAK2 312 EGFR 89 BIRC2 30 EPB42 95 OSBP2 224 KRT1 173 TFDP1 301 PLVAP 238 C14orf45 35 AHSP 7 XK 325 HMBS 142 TFRC 302 TMEM14B 307 FAM118A 103 KLF1 171 TBC1D22B 296 RNF182 261 TCEANC 298 RUNDC3A 264 C5orf4 42 ARL4A 19 NUDT4 223 CCRL2 48 ARHGEF12 17 FAM83A 106 ANK1 11 NGRN 215 MARCH8 194 relative DNAJA4 80 USP12 319 ERMAP 99 GCLC 124 -3 -2 0 2 3 HEPACAM2 137 MOSPD1 202 IBA57 148 PBX1 230 CLIC2 59 ELOVL6 93 SOX6 288 KAT2B 163 CA1 44 ZNF23 334 IFIT1B 152 TRIM10 314 YPEL4 330 CPOX 62 NSUN3 221 RSAD2 263 BPGM 32 MAP1LC3B2 191 ZFAND4 332 TMOD1 310 SLC14A1 276 PPME1 242 DENND4A 76 SLC2A1 280 FCRL5 111 E2F2 88 PANK3 228 REXO2 257 FCRL3 110 GADD45A 122

N AH K ZNF192 333 TMEM158 308 id index RFESD 258 ESPN 100 IFI30 150 TAL1 295 IL8 155 GSTM1 132 ART4 20 JUN 161 G0S2 121 NET1 211 PTGDS 247 IFI27 149 DEFA3 75 SOSTDC1 287 ARHGEF40 18 HIST1H1C 138 EIF1B 91 EGR1 90 OSCAR 225 HSPA1B 146 CLEC4E 58 SELK 268 GYPA 133 C3orf58 40 FAM212B 105 UBXN10 318 MANSC1 190 CLCN3 57 PLD6 237 FRMD4A 120 HCAR2 135 BBS12 25 SLC45A4 282 ABCC4 3 ATG4A 23 AIDA 8 KRT23 174 FECH 112 ARAP3 16 RASGRP3 252 DYRK3 86 MSMO1 206 MRVI1 204 C1orf61 39 GNG10 128 BMP2K 31 CISD2 54 BPI 33 TNFRSF10C 311 CRYAB 65 CSF2RA 66 SLC30A1 281 MPO 203 HCAR3 136 NCEH1 209 SLC7A5 284 HRH2 143 KIAA1586 170 SLC16A1 277 NLRP12 218 TCP11L2 299 C17orf103 36 DAPK2 73 C7orf58 43 MARCH3 193 DCUN1D1 74 TBCEL 297 AATK 1 PHLPP2 233 YOD1 329 LPPR2 183 MKI67 198 BCAM 26 S100P 265 ACSL6 4 C17orf39 37 NTNG2 222 IFI44L 151 TUBB2A 316 CEP19 53 FBXL4 107 CITED2 55 PRAM1 244 MAP4K5 192 FEM1B 113 LILRB3 179 HSPA13 144 KANK2 162 ZNF467 335 MSI2 205 PNP 240 CPT1B 63 CA2 45 C5 41 PYGL 250 DYNLL1 85 CTNNAL1 68 FPR2 117 C19orf77 38 SPTA1 289 EMR3 94 GMPR 127 FHDC1 114 TECPR2 300 HIST1H2BD 139 DNAJB4 81 AQP9 15 ERAL1 97 XPO7 327 PLB1 236 E2F1 87 NFIX 214 CKLF 56 FOSB 116 ITSN1 160 PADI2 226 GATA1 123 SFRP2 270 CEACAM4 52 FAM100A 102 TMEM57 309 VNN1 321 HSPA1A 145 ISCA1 159 ST6GALNAC2 291 ABCB6 2 ZRANB1 336 LRRC25 186 DLL3 78 ATG14 22 KIAA1324 169 FKBP4 115 CTSL1 69 MEFV 196 KEL 167 ELL2 92 PFKFB4 231 HSPB1 147 KIAA1191 168 PRRG4 246 ANKRD9 12 ALDH5A1 9 PROK2 245 PTN 248 FBXO30 108 SIGLEC5 272 LTF 188 SLFN14 285 EPHB4 96 AQP1 14 RNF14 260 COL18A1 61 GOLT1A 129 YES1 328 LGALS2 176 CSPG5 67 TRAK2 312 MMP9 201 EGFR 89 BIRC2 30 ANPEP 13 EPB42 95 OSBP2 224 B3GNT8 24 KRT1 173 TFDP1 301 CMTM2 60 PLVAP 238 C14orf45 35 LRFN1 184 AHSP 7 XK 325 XKR8 326 HMBS 142 TFRC 302 GPR97 130 TMEM14B 307 FAM118A 103 CCR3 47 KLF1 171 TBC1D22B 296 SEPX1 269 RNF182 261 TCEANC 298 RFX2 259 RUNDC3A 264 C5orf4 42 SCARF1 267 ARL4A 19 NUDT4 223 BCL3 27 CCRL2 48 ARHGEF12 17 FAM83A 106 IL1R2 154 ANK1 11 HLX 141 relativerelative NGRN relative215 MARCH8 relative194 relative DNAJA4 80 FCGRT 109 USP12 319 TREM1 313 -3 -2 0 2 ERMAP3 99 -3 0 -3 3 0 -3 GCLC3 0 124 -3 PPCDC3 0 241 3 HEPACAM2 137 MOSPD1 202 IBA57 148 SLC19A1 279 PBX1 230 CDA 49 CLIC2 59 ELOVL6 93 SOX6 288 CCNJL 46 KAT2B 163 MMP25 200 CA1 44 ZNF23 334 IFIT1B 152 FRAT1 118 TRIM10 314 SLC16A3 278 Figure 10) YPEL4 330 CPOX 62 NSUN3 221 NCF4 210 RSAD2 263 CDC42EP2 50 BPGM 32 ZFAND4 332 MAP1LC3B2 191 LST1 187 SLC14A1 276 PHC2 232 TMOD1 relative310 PPME1 242 DENND4A 76 PLAUR 235 SLC2A1 280 BTNL8 34 relative-3 -2 relative0 2 FCRL53 111relative relative -3 -2 -30 -2E2F2 2 88 30 -3 -2PANK3 2 2283 0-3 -2SULF2 2 2943 0 2 3 REXO2 257 FCRL3 110 LILRA2 177 N AH K GADD45A 122 SLC11A1 275 N AH K N AH K N AH K ZNF192 333 N AH K id index id index id index id index ESPN 100 ESPN 100 ESPN 100 ESPN 100

N M D id GSTM1 index N 132M D GSTM1 132N M D GSTM1 N 132M D GSTM1 132 JUN 161 TMEM158JUN 308 161 JUN 161 PANX2JUN 229 161 DEFA3 75 DEFA3 75 DEFA3 75 DEFA3 75 HIST1H1C 138 HIST1H1C 138 HIST1H1C 138 HIST1H1C 138 EGR1 90 EGR1 90 EGR1 90 EGR1 90 SELK 268 SELK 268 SELK 268 SELK 268 ESPNC3orf58100 40 C3orf58 40 C3orf58 40 DPEP2C3orf58 83 40 idCLCN3 index 57 RFESDidCLCN3 258index 57 IFI30idCLCN3 150index 57 idCLCN3 index 57 FRMD4A 120 FRMD4A 120 FRMD4A 120 FRMD4A 120 ABCC4 3 ABCC4 3 ABCC4 3 ABCC4 3 AIDA 8 AIDA 8 AIDA 8 AIDA 8 RASGRP3 252 RASGRP3 252 RASGRP3 252 RASGRP3 252 GSTM1MSMO1 132 206 MSMO1 206 MSMO1 206 NRG1MSMO1 220 206 BMP2K 31 TAL1BMP2K 295 31 IL8BMP2K 155 31 BMP2K 31 BPI 33 BPI 33 BPI 33 BPI 33 ESPNSLC30A1 100 281 ESPNSLC30A1 100 281 ESPNSLC30A1 100 281 ESPNSLC30A1 100 281 NCEH1 209 NCEH1 209 NCEH1 209 NCEH1 209 KIAA1586 170 KIAA1586 170 KIAA1586 170 KIAA1586 170 JUNTCP11L2161 299 TCP11L2 299 TCP11L2 299 NINJ1TCP11L2217 299 C7orf58 43 ART4C7orf5820 43 G0S2C7orf58121 43 C7orf58 43 DCUN1D1 74 DCUN1D1 74 DCUN1D1 74 DCUN1D1 74 GSTM1PHLPP2 132 233 GSTM1PHLPP2 132 233 GSTM1PHLPP2 132 233 GSTM1PHLPP2 132 233 MKI67 198 MKI67 198 MKI67 198 MKI67 198 ACSL6 4 ACSL6 4 ACSL6 4 ACSL6 4 IFI44L 151 IFI44L 151 IFI44L 151 ADMIFI44L 5 151 FBXL4 107 NET1FBXL4 211 107 PTGDSFBXL4 247 107 FBXL4 107 MAP4K5 192 MAP4K5 192 MAP4K5 192 MAP4K5 192 HSPA13 144 HSPA13 144 HSPA13 144 HSPA13 144 JUNMSI2 161 205 JUNMSI2 161 205 JUNMSI2 161 205 JUNMSI2 161 205 DEFA3CA2 75 45 CA2 45 CA2 45 CA2 45 DYNLL1 85 DYNLL1 85 DYNLL1 85 LY96DYNLL1 189 85 C19orf77 38 IFI27C19orf77149 38 C19orf77 38 C19orf77 38 GMPR 127 GMPR 127 GMPR 127 GMPR 127 HIST1H2BD 139 HIST1H2BD 139 HIST1H2BD 139 HIST1H2BD 139 ERAL1 97 ERAL1 97 ERAL1 97 ERAL1 97 HIST1H1CE2F1 138 87 E2F1 87 ARHGEF40E2F1 18 87 E2F1 87 FOSB 116 FOSB 116 FOSB 116 ST3GAL4FOSB 290 116 GATA1 123 SOSTDC1GATA1 287 123 GATA1 123 GATA1 123 FAM100A 102 FAM100A 102 FAM100A 102 FAM100A 102 HSPA1A 145 HSPA1A 145 HSPA1A 145 HSPA1A 145 DEFA3ABCB6 75 2 DEFA3ABCB6 75 2 DEFA3ABCB6 75 2 DEFA3ABCB6 75 2 EGR1DLL3 90 78 DLL3 78 OSCARDLL3 225 78 DLL3 78 FKBP4 115 FKBP4 115 FKBP4 115 RCN3FKBP4 255 115 KEL 167 EIF1BKEL 91 167 KEL 167 KEL 167 HSPB1 147 HSPB1 147 HSPB1 147 HSPB1 147 ANKRD9 12 ANKRD9 12 ANKRD9 12 ANKRD9 12 HIST1H1CPTN 138 248 HIST1H1CPTN 138 248 HIST1H1CPTN 138 248 HIST1H1CPTN 138 248 SELKLTF 268 188 LTF 188 CLEC4ELTF 58 188 LTF 188 AQP1 14 HSPA1BAQP1 146 14 AQP1 14 LRG1AQP1 185 14 GOLT1A 129 GOLT1A 129 GOLT1A 129 GOLT1A 129 CSPG5 67 CSPG5 67 CSPG5 67 CSPG5 67 EGFR 89 EGFR 89 EGFR 89 EGFR 89 EPB42 95 EPB42 95 EPB42 95 EPB42 95 C3orf58EGR1KRT1 4090 173 EGR1KRT1 90 173 FAM212BEGR1KRT1 10590 173 EGR1KRT1 90 173 PLVAP 238 GYPAPLVAP 133 238 PLVAP 238 NFIL3PLVAP 213 238 AHSP 7 AHSP 7 AHSP 7 AHSP 7 HMBS 142 HMBS 142 HMBS 142 HMBS 142 TMEM14B 307 TMEM14B 307 TMEM14B 307 TMEM14B 307 KLF1 171 KLF1 171 KLF1 171 KLF1 171 CLCN3SELKRNF182 57268 261 SELKRNF182 268 261 MANSC1SELKRNF182 190268 261 SELKRNF182 268 261 RUNDC3A 264 UBXN10RUNDC3A318 264 RUNDC3A 264 ALPLRUNDC3A10 264 ARL4A 19 ARL4A 19 ARL4A 19 ARL4A 19 CCRL2 48 CCRL2 48 CCRL2 48 CCRL2 48 FAM83A 106 FAM83A 106 FAM83A 106 FAM83A 106 NGRN 215 NGRN 215 NGRN 215 NGRN 215 FRMD4ADNAJA4 120 80 DNAJA4 80 HCAR2DNAJA4 135 80 DNAJA4 80 C3orf58ERMAP 40 99 PLD6C3orf58ERMAP 23740 99 C3orf58ERMAP 40 99 FRAT2C3orf58ERMAP 11940 99 HEPACAM2 137 HEPACAM2 137 HEPACAM2 137 HEPACAM2 137 IBA57 148 IBA57 148 IBA57 148 IBA57 148 CLIC2 59 CLIC2 59 CLIC2 59 CLIC2 59 SOX6 288 SOX6 288 SOX6 288 SOX6 288 ABCC4CA1 3 44 CA1 44 SLC45A4CA1 282 44 CA1 44 CLCN3IFIT1B 57 152 BBS12CLCN3IFIT1B 2557 152 CLCN3IFIT1B 57 152 DGAT2CLCN3IFIT1B 7757 152 YPEL4 330 YPEL4 330 YPEL4 330 YPEL4 330 NSUN3 221 NSUN3 221 NSUN3 221 NSUN3 221 BPGM 32 BPGM 32 BPGM 32 BPGM 32 MAP1LC3B2 191 MAP1LC3B2 191 MAP1LC3B2 191 MAP1LC3B2 191 AIDATMOD1 8 310 TMOD1 310 KRT23TMOD1 174 310 TMOD1 310 PPME1 242 ATG4APPME1 23 242 PPME1 242 WLSPPME1 324 242 FRMD4ASLC2A1 120 280 FRMD4ASLC2A1 120 280 FRMD4ASLC2A1 120 280 FRMD4ASLC2A1 120 280

E2F2 88N AH K E2F2 88 E2F2 88 E2F2 88 REXO2 257 REXO2 257 REXO2 257 REXO2 257 GADD45A 122 GADD45A 122 GADD45A 122 GADD45A 122 RASGRP3TMEM158252 308 TMEM158 308 ARAP3TMEM15816 308 TMEM158 308 RFESD 258 FECHRFESD 112 258 RFESD 258 RFESD 258 ABCC4TAL1 3 295 ABCC4TAL1 3 295 ABCC4TAL1 3 295 PILRAABCC4TAL1 2343 295 ART4 20 ART4 20 ART4 20 ART4 20 NET1 211 NET1id index211 NET1 211 NET1 211 IFI27 149 IFI27 149 IFI27 149 IFI27 149 MSMO1SOSTDC1206 287 SOSTDC1 287 MRVI1SOSTDC1204 287 SOSTDC1 287 EIF1B 91 DYRK3EIF1B 86 91 EIF1B 91 EIF1B 91 HSPA1B 146 HSPA1B 146 HSPA1B 146 MBOAT7HSPA1B 195 146 AIDAGYPA 8 133 AIDAGYPA 8 133 AIDAGYPA 8 133 AIDAGYPA 8 133 UBXN10 318 UBXN10 318 UBXN10 318 UBXN10 318 BMP2KPLD6 31 237 PLD6 237 PLD6 237 PLD6 237 BBS12 25 BBS12ESPN 10025 GNG10BBS12 128 25 BBS12 25 ATG4A 23 C1orf61ATG4A 39 23 ATG4A 23 ATG4A 23 FECH 112 FECH 112 FECH 112 NLRP6FECH 219 112 RASGRP3DYRK3 252 86 RASGRP3DYRK3 252 86 RASGRP3DYRK3 252 86 RASGRP3DYRK3 252 86 C1orf61 39 C1orf61 39 C1orf61 39 C1orf61 39 BPI CISD2 33 54 CISD2 54 CISD2 54 CISD2 54 CRYAB 65 CRYAB 65 TNFRSF10CCRYAB 311 65 CRYAB 65 MPO 203 CISD2MPOGSTM1 54 132203 MPO 203 MPO 203 SLC7A5 284 SLC7A5 284 SLC7A5 284 GLT1D1SLC7A5 126 284 SLC16A1 277 SLC16A1 277 SLC16A1 277 SLC16A1 277 MSMO1C17orf103206 36 MSMO1C17orf103206 36 MSMO1C17orf103206 36 MSMO1C17orf103206 36 SLC30A1MARCH3281 193 MARCH3 193 MARCH3 193 MARCH3 193 TBCEL 297 TBCEL 297 CSF2RATBCEL 66 297 TBCEL 297 YOD1 329 CRYABYOD1JUN 65 161329 YOD1 329 YOD1 329 BCAM 26 BCAM 26 BCAM 26 LPAR2BCAM 181 26 C17orf39 37 C17orf39 37 C17orf39 37 C17orf39 37 BMP2KTUBB2A 31 316 BMP2KTUBB2A 31 316 BMP2KTUBB2A 31 316 BMP2KTUBB2A 31 316 NCEH1CITED2 209 55 CITED2 55 CITED2 55 CITED2 55 FEM1B 113 FEM1B 113 HCAR3FEM1B 136 113 FEM1B 113 KANK2 162 MPOKANK2 203 162 KANK2 162 KANK2 162 PNP 240 PNP 240 PNP 240 RASGRP4PNP 253 240 C5 41 C5 41 C5 41 C5 41 CTNNAL1 68 CTNNAL1 68 CTNNAL1 68 CTNNAL1 68 KIAA1586BPISPTA1 17033 289 BPISPTA1 33 289 HRH2BPISPTA1 14333 289 BPISPTA1 33 289 FHDC1 114 FHDC1 114 FHDC1 114 FHDC1 114 DNAJB4 81 SLC7A5DNAJB4DEFA3 284 75 81 DNAJB4 81 DNAJB4 81 XPO7 327 XPO7 327 XPO7 327 PTP4A3XPO7 249 327 NFIX 214 NFIX 214 NFIX 214 NFIX 214 ITSN1 160 ITSN1 160 ITSN1 160 ITSN1 160 TCP11L2SFRP2 299 270 SFRP2 270 NLRP12SFRP2 218 270 SFRP2 270 SLC30A1TMEM57 281 309 SLC30A1TMEM57 281 309 SLC30A1TMEM57 281 309 SLC30A1TMEM57 281 309 ISCA1 159 SLC16A1ISCA1 277 159 ISCA1 159 ISCA1 159 ZRANB1 336 ZRANB1HIST1H1C 138336 ZRANB1 336 IMPA2ZRANB1 156 336 ATG14 22 ATG14 22 ATG14 22 ATG14 22 CTSL1 69 CTSL1 69 CTSL1 69 CTSL1 69 C7orf58ELL2 43 92 ELL2 92 DAPK2ELL2 73 92 ELL2 92 NCEH1KIAA1191209 168 NCEH1KIAA1191209 168 NCEH1KIAA1191209 168 NCEH1KIAA1191209 168 ALDH5A1 9 C17orf103ALDH5A136 9 ALDH5A1 9 ALDH5A1 9 FBXO30 108 FBXO30EGR1 90 108 FBXO30 108 CEACAM3FBXO30 51 108 SLFN14 285 SLFN14 285 SLFN14 285 SLFN14 285 RNF14 260 RNF14 260 RNF14 260 RNF14 260 DCUN1D1YES1 74 328 YES1 328 AATKYES1 1 328 YES1 328 TRAK2 312 MARCH3TRAK2 193 312 TRAK2 312 TRAK2 312 KIAA1586BIRC2 170 30 KIAA1586BIRC2 170 30 KIAA1586BIRC2 170 30 KIAA1586BIRC2 170 30 OSBP2 224 OSBP2 224 OSBP2 224 TSEN34OSBP2 315 224 TFDP1 301 TFDP1SELK 268301 TFDP1 301 TFDP1 301 C14orf45 35 C14orf45 35 C14orf45 35 C14orf45 35 PHLPP2XK 233 325 XK 325 LPPR2XK 183 325 XK 325 TFRC 302 TBCELTFRC 297 302 TFRC 302 TFRC 302 TCP11L2FAM118A299 103 TCP11L2FAM118A299 103 TCP11L2FAM118A299 103 TCP11L2FAM118A299 103 TBC1D22B 296 TBC1D22B 296 TBC1D22B 296 PADI4TBC1D22B227 296 TCEANC 298 TCEANCC3orf58 40 298 TCEANC 298 TCEANC 298 C5orf4 42 C5orf4 42 C5orf4 42 C5orf4 42 MKI67NUDT4 198 223 NUDT4 223 S100PNUDT4 265 223 NUDT4 223 ARHGEF12 17 YOD1ARHGEF12329 17 ARHGEF12 17 ARHGEF12 17 ANK1 11 ANK1 11 ANK1 11 ANK1 11 C7orf58MARCH8 43 194 C7orf58MARCH8 43 194 C7orf58MARCH8 43 194 THBDC7orf58MARCH830343 194 USP12 319 USP12 319 USP12 319 USP12 319 GCLC 124 GCLCCLCN3 57 124 GCLC 124 GCLC 124 ACSL6MOSPD14 202 MOSPD1 202 NTNG2MOSPD1222 202 MOSPD1 202 PBX1 230 BCAMPBX1 26 230 PBX1 230 PBX1 230 ELOVL6 93 ELOVL6 93 ELOVL6 93 ELOVL6 93 DCUN1D1KAT2B 74 163 DCUN1D1KAT2B 74 163 DCUN1D1KAT2B 74 163 DCUN1D1KAT2B 74 163 ZNF23 334 ZNF23 334 ZNF23 334 ZNF23 334 TRIM10 314 TRIM10FRMD4A 120314 TRIM10 314 TRIM10 314 IFI44LCPOX 151 62 CPOX 62 CEP19CPOX 53 62 CPOX 62 RSAD2 263 C17orf39RSAD2 37 263 RSAD2 263 RSAD2 263 ZFAND4 332 ZFAND4 332 ZFAND4 332 BCL6ZFAND4 28 332 PHLPP2SLC14A1 233 276 PHLPP2SLC14A1 233 276 PHLPP2SLC14A1 233 276 PHLPP2SLC14A1 233 276 DENND4A 76 DENND4A 76 DENND4A 76 DENND4A 76 FBXL4FCRL5 107 111 FCRL5ABCC4 3 111 PRAM1FCRL5 244 111 FCRL5 111 PANK3 228 TUBB2APANK3 316 228 PANK3 228 PANK3 228 FCRL3 110 FCRL3 110 FCRL3 110 HALFCRL3 134 110 ZNF192 333 ZNF192 333 ZNF192 333 ZNF192 333 MKI67IFI30 198 150 MKI67IFI30 198 150 MKI67IFI30 198 150 MKI67IFI30 198 150 MAP4K5IL8 192 155 IL8AIDA 8 155 LILRB3IL8 179 155 IL8 155 G0S2 121 CITED2G0S2 55 121 G0S2 121 G0S2 121 PTGDS 247 PTGDS 247 PTGDS 247 KCNJ15PTGDS 165 247 ARHGEF40 18 ARHGEF40 18 ARHGEF40 18 ARHGEF40 18 ACSL6OSCAR 4 225 ACSL6OSCAR 4 225 ACSL6OSCAR 4 225 ACSL6OSCAR 4 225 HSPA13CLEC4E 144 58 CLEC4E 58 ZNF467CLEC4E 335 58 CLEC4E 58 FAM212B 105 FEM1BFAM212BRASGRP3 113 252105 FAM212B 105 FAM212B 105 MANSC1 190 MANSC1 190 MANSC1 190 RBM47MANSC1 254 190 HCAR2 135 HCAR2 135 HCAR2 135 HCAR2 135 SLC45A4 282 SLC45A4 282 SLC45A4 282 SLC45A4 282 KRT23 174 KRT23 174 KRT23 174 KRT23 174 MSI2IFI44LARAP3 205151 16 IFI44LARAP3 151 16 CPT1BIFI44LARAP3 63151 16 IFI44LARAP3 151 16 MRVI1 204 KANK2MRVI1 162 204 MRVI1 204 MRVI1 204 GNG10 128 GNG10MSMO1 206128 GNG10 128 SIGLEC14GNG10 271 128 TNFRSF10C 311 TNFRSF10C 311 TNFRSF10C 311 TNFRSF10C 311 CSF2RA 66 CSF2RA 66 CSF2RA 66 CSF2RA 66 HCAR3 136 HCAR3 136 HCAR3 136 HCAR3 136 CA2FBXL4HRH2 45107 143 FBXL4HRH2 107 143 PYGLFBXL4HRH2 250107 143 FBXL4HRH2 107 143 NLRP12 218 PNPNLRP12 240 218 NLRP12 218 NLRP12 218 DAPK2 73 DAPK2BMP2K 31 73 DAPK2 73 CYP4F3DAPK2 72 73 AATK 1 AATK 1 AATK 1 AATK 1 LPPR2 183 LPPR2 183 LPPR2 183 LPPR2 183 S100P 265 S100P 265 S100P 265 S100P 265 DYNLL1NTNG2 85 222 NTNG2 222 FPR2NTNG2 117 222 NTNG2 222 MAP4K5CEP19 192 53 C5MAP4K5CEP19 41192 53 MAP4K5CEP19 192 53 MAP4K5CEP19 192 53 PRAM1 244 PRAM1 244 PRAM1 244 UBE2D1PRAM1 317 244 LILRB3 179 LILRB3BPI 33 179 LILRB3 179 LILRB3 179 ZNF467 335 ZNF467 335 ZNF467 335 ZNF467 335 CPT1B 63 CPT1B 63 CPT1B 63 CPT1B 63 C19orf77PYGL 38 250 PYGL 250 EMR3PYGL 94 250 PYGL 250 HSPA13FPR2 144 117 CTNNAL1HSPA13FPR2 68144 117 HSPA13FPR2 144 117 HSPA13FPR2 144 117 EMR3 94 EMR3 94 EMR3 94 EMR3 94 TECPR2 300 TECPR2SLC30A1 281300 TECPR2 300 QPCTTECPR2 251 300 AQP9 15 AQP9 15 AQP9 15 AQP9 15 PLB1 236 PLB1 236 PLB1 236 PLB1 236 GMPRCKLF 127 56 CKLF 56 TECPR2CKLF 300 56 CKLF 56 PADI2 226 SPTA1PADI2 289 226 PADI2 226 PADI2 226 MSI2CEACAM4205 52 MSI2CEACAM4205 52 MSI2CEACAM4205 52 MSI2CEACAM4205 52 VNN1 321 VNN1NCEH1 209321 VNN1 321 AGPAT9VNN1 6 321 ST6GALNAC2 291 ST6GALNAC2 291 ST6GALNAC2 291 ST6GALNAC2 291 LRRC25 186 LRRC25 186 LRRC25 186 LRRC25 186 HIST1H2BDKIAA1324139 169 KIAA1324 169 AQP9KIAA132415 169 KIAA1324 169 MEFV 196 FHDC1MEFV 114 196 MEFV 196 MEFV 196 CA2PFKFB4 45 231 CA2PFKFB4 45 231 CA2PFKFB4 45 231 CA2PFKFB4 45 231 PRRG4 246 PRRG4 246 PRRG4 246 SIRPB1PRRG4 273 246 PROK2 245 PROK2KIAA1586 170245 PROK2 245 PROK2 245 SIGLEC5 272 SIGLEC5 272 SIGLEC5 272 SIGLEC5 272 ERAL1EPHB4 97 96 EPHB4 96 PLB1EPHB4 236 96 EPHB4 96 COL18A1 61 DNAJB4COL18A181 61 COL18A1 61 COL18A1 61 LGALS2 176 LGALS2 176 LGALS2 176 LGALS2 176 DYNLL1MMP9 85 201 DYNLL1MMP9 85 201 DYNLL1MMP9 85 201 GKDYNLL1MMP9 12585 201 ANPEP 13 ANPEPTCP11L2 29913 ANPEP 13 ANPEP 13 B3GNT8 24 B3GNT8 24 B3GNT8 24 B3GNT8 24 E2F1CMTM2 87 60 CMTM2 60 CKLFCMTM2 56 60 CMTM2 60 LRFN1 184 XPO7LRFN1 327 184 LRFN1 184 LRFN1 184 XKR8 326 XKR8 326 XKR8 326 XKR8 326 C19orf77GPR97 38 130 C19orf77GPR97 38 130 C19orf77GPR97 38 130 MGAMC19orf77GPR97 19738 130 CCR3 47 CCR3 47 CCR3 47 CCR3 47 SEPX1 269 SEPX1C7orf58 43 269 SEPX1 269 SEPX1 269 FOSBRFX2 116 259 RFX2 259 PADI2RFX2 226 259 RFX2 259 SCARF1 267 NFIXSCARF1 214 267 SCARF1 267 SCARF1 267 BCL3 27 BCL3 27 BCL3 27 BCL3 27 IL1R2 154 IL1R2 154 IL1R2 154 REPS2IL1R2 256 154 GMPRHLX 127 141 GMPRHLX 127 141 GMPRHLX 127 141 GMPRHLX 127 141 FCGRT 109 FCGRTDCUN1D1 74 109 FCGRT 109 FCGRT 109 GATA1TREM1 123 313 TREM1 313 CEACAM4TREM1 52 313 TREM1 313 PPCDC 241 ITSN1PPCDC 160 241 PPCDC 241 PPCDC 241 SLC19A1 279 SLC19A1 279 SLC19A1 279 SLC19A1 279 CDA 49 CDA 49 CDA 49 DSC2CDA 84 49 HIST1H2BDCCNJL 139 46 HIST1H2BDCCNJL 139 46 HIST1H2BDCCNJL 139 46 HIST1H2BDCCNJL 139 46 MMP25 200 MMP25 200 VNN1MMP25 321 200 MMP25 200 FAM100AFRAT1 102 118 FRAT1PHLPP2 233118 FRAT1 118 FRAT1 118 SLC16A3 278 SFRP2SLC16A3270 278 SLC16A3 278 SLC16A3 278 NCF4 210 NCF4 210 NCF4 210 NCF4 210 CDC42EP2 50 CDC42EP2 50 CDC42EP2 50 LILRA3CDC42EP2178 50 LST1 187 LST1 187 LST1 187 LST1 187 HSPA1AERAL1PHC2 14597 232 ERAL1PHC2 97 232 ST6GALNAC2ERAL1PHC2 29197 232 ERAL1PHC2 97 232 PLAUR 235 PLAURMKI67 198235 PLAUR 235 PLAUR 235 BTNL8 34 TMEM57BTNL8 309 34 BTNL8 34 BTNL8 34 SULF2 294 SULF2 294 SULF2 294 SULF2 294 LILRA2 177 LILRA2 177 LILRA2 177 STX3LILRA2 293 177 SLC11A1 275 SLC11A1 275 SLC11A1 275 SLC11A1 275 ABCB6PANX2 2 229 PANX2 229 LRRC25PANX2 186 229 PANX2 229 E2F1DPEP2 87 83 E2F1DPEP2 87 83 E2F1DPEP2 87 83 E2F1DPEP2 87 83 NRG1 220 ISCA1NRG1ACSL6 159 4 220 NRG1 220 NRG1 220 NINJ1 217 NINJ1 217 NINJ1 217 NINJ1 217 ADM 5 ADM 5 ADM 5 MMEADM 199 5 LY96 189 LY96 189 LY96 189 LY96 189 DLL3ST3GAL478 290 ST3GAL4 290 KIAA1324ST3GAL4169 290 ST3GAL4 290 FOSBRCN3 116 255 FOSBRCN3 116 255 FOSBRCN3 116 255 FOSBRCN3 116 255 LRG1 185 ZRANB1LRG1IFI44L 336 151185 LRG1 185 LRG1 185 NFIL3 213 NFIL3 213 NFIL3 213 NFIL3 213 ALPL 10 ALPL 10 ALPL 10 TMCC3ALPL 306 10 FRAT2 119 FRAT2 119 FRAT2 119 FRAT2 119 FKBP4DGAT2 115 77 DGAT2 77 MEFVDGAT2 196 77 DGAT2 77 WLS 324 ATG14WLS 22 324 WLS 324 WLS 324 GATA1PILRA 123 234 GATA1PILRA 123 234 GATA1PILRA 123 234 GATA1PILRA 123 234 MBOAT7 195 MBOAT7FBXL4 107195 MBOAT7 195 MBOAT7 195 NLRP6 219 NLRP6 219 NLRP6 219 CREB5NLRP6 64 219 GLT1D1 126 GLT1D1 126 GLT1D1 126 GLT1D1 126 KELLPAR2 167 181 LPAR2 181 PFKFB4LPAR2 231 181 LPAR2 181 RASGRP4 253 CTSL1RASGRP469 253 RASGRP4 253 RASGRP4 253 FAM100APTP4A3 102 249 FAM100APTP4A3 102 249 FAM100APTP4A3 102 249 FAM100APTP4A3 102 249 IMPA2 156 IMPA2MAP4K5 192156 IMPA2 156 IMPA2 156 CEACAM3 51 CEACAM3 51 CEACAM3 51 LPCAT2CEACAM3182 51 TSEN34 315 TSEN34 315 TSEN34 315 TSEN34 315 HSPB1PADI4 147 227 PADI4 227 PRRG4PADI4 246 227 PADI4 227 THBD 303 ELL2THBD 92 303 THBD 303 THBD 303 HSPA1ABCL6 145 28 HSPA1ABCL6 145 28 HSPA1ABCL6 145 28 HSPA1ABCL6 145 28 HAL 134 HALHSPA13 144134 HAL 134 TLR8HAL 305 134 KCNJ15 165 KCNJ15 165 KCNJ15 165 KCNJ15 165 ANKRD9RBM47 12 254 RBM47 254 PROK2RBM47 245 254 RBM47 254 SIGLEC14 271 KIAA1191SIGLEC14168 271 SIGLEC14 271 SIGLEC14 271 CYP4F3 72 CYP4F3 72 CYP4F3 72 CYP4F3 72 ABCB6UBE2D1 2 317 ABCB6UBE2D1 2 317 ABCB6UBE2D1 2 317 ABCB6UBE2D1 2 317 QPCT 251 QPCTMSI2 205251 QPCT 251 KREMEN1QPCT 172 251 AGPAT9 6 AGPAT9 6 AGPAT9 6 AGPAT9 6 PTNSIRPB1 248 273 SIRPB1 273 SIGLEC5SIRPB1 272 273 SIRPB1 273 GK 125 ALDH5A1GK 9 125 GK 125 GK 125 MGAM 197 MGAM 197 MGAM 197 MGAM 197 REPS2 256 REPS2 256 REPS2 256 REPS2 256 DLL3DSC2 78 84 DLL3DSC2 78 84 DLL3DSC2 78 84 PPP1R3BDLL3DSC2 24378 84 LILRA3 178 LILRA3CA2 45 178 LILRA3 178 LILRA3 178 LTFSTX3 188 293 STX3 293 EPHB4STX3 96 293 STX3 293 MME 199 FBXO30MME 108 199 MME 199 MME 199 TMCC3 306 TMCC3 306 TMCC3 306 TMCC3 306 CREB5 64 CREB5 64 CREB5 64 CREB5 64 FKBP4LPCAT2 115 182 FKBP4LPCAT2 115 182 FKBP4LPCAT2 115 182 RNF24FKBP4LPCAT2 262115 182 TLR8 305 TLR8DYNLL1 85 305 TLR8 305 TLR8 305 AQP1KREMEN114 172 KREMEN1 172 COL18A1KREMEN161 172 KREMEN1 172 PPP1R3B 243 SLFN14PPP1R3B285 243 PPP1R3B 243 PPP1R3B 243 RNF24 262 RNF24 262 RNF24 262 RNF24 262 SIRPB2 274 SIRPB2 274 SIRPB2 274 SIRPB2 274 KELSTEAP4 167 292 KELSTEAP4 167 292 KELSTEAP4 167 292 SIRPB2KELSTEAP4 274167 292 GOLT1ABEND2 129 29 BEND2C19orf77 38 29 LGALS2BEND2 176 29 BEND2 29 CYP1B1 71 RNF14CYP1B1 260 71 CYP1B1 71 CYP1B1 71 IRAK3 158 IRAK3 158 IRAK3 158 IRAK3 158 CXCL5 70 CXCL5 70 CXCL5 70 CXCL5 70 FAM198B 104 FAM198B 104 FAM198B 104 STEAP4FAM198B292 104 HLA-DQA2 140 HLA-DQA2 140 HLA-DQA2 140 HLA-DQA2 140 CSPG5HSPB1DOCK5 67147 82 HSPB1DOCK5 147 82 MMP9HSPB1DOCK5 201147 82 HSPB1DOCK5 147 82 F5 101 YES1F5GMPR 328 127101 F5 101 F5 101 KCNJ2 166 KCNJ2 166 KCNJ2 166 KCNJ2 166 PLXDC2 239 PLXDC2 239 PLXDC2 239 PLXDC2 239 TLR6 304 TLR6 304 TLR6 304 TLR6 304 EGFRANKRD9WDFY3 8912 323 ANKRD9WDFY3 12 323 ANPEPANKRD9WDFY3 1312 323 ANKRD9WDFY3 12 323 ERAP2 98 TRAK2ERAP2HIST1H2BD312 13998 ERAP2 98 ERAP2 98 NFATC2 212 NFATC2 212 NFATC2 212 NFATC2 212 ASXL2 21 ASXL2 21 ASXL2 21 ASXL2 21 KCNA3 164 KCNA3 164 KCNA3 164 BEND2KCNA3 29 164 LNPEP 180 LNPEP 180 LNPEP 180 LNPEP 180 EPB42VPS13A 95 322 VPS13A 322 B3GNT8VPS13A 24 322 VPS13A 322 PTNNCAPG2 248 208 BIRC2PTNNCAPG230248 208 PTNNCAPG2 248 208 PTNNCAPG2 248 208 INPP4B 157 INPP4BERAL1 97 157 INPP4B 157 INPP4B 157 USP53 320 USP53 320 USP53 320 USP53 320 DMXL1 79 DMXL1 79 DMXL1 79 CYP1B1DMXL1 71 79 GPRIN3 131 GPRIN3 131 GPRIN3 131 GPRIN3 131 KRT1LATS1 173 175 LATS1 175 CMTM2LATS1 60 175 LATS1 175 LTFNHLRC2 188 216 OSBP2LTFNHLRC2 224188 216 LTFNHLRC2 188 216 LTFNHLRC2 188 216 SACS 266 SACSE2F1 87 266 SACS 266 SACS 266 N4BP2 207 N4BP2 207 N4BP2 207 N4BP2 207 SLC4A7 283 SLC4A7 283 SLC4A7 283 IRAK3SLC4A7 158 283 SLFN5 286 SLFN5 286 SLFN5 286 SLFN5 286 PLVAPIKZF3 238 153 IKZF3 153 LRFN1IKZF3 184 153 IKZF3 153 AQP1ZDHHC2114 331 TFDP1AQP1ZDHHC21FOSB 30114 116331 AQP1ZDHHC2114 331 CXCL5AQP1ZDHHC217014 331 AHSPGOLT1A 7 129 C14orf45GOLT1A 35129 XKR8GOLT1A 326129 GOLT1A 129 GATA1 123 GPR97 130 FAM198B 104 HMBSCSPG5 14267 XKCSPG5 32567 CSPG5 67 CSPG5 67 TMEM14B 307 FAM100A 102 CCR3 47 HLA-DQA2 140 EGFR 89 TFRCEGFRHSPA1A 30289 145 EGFR 89 DOCK5EGFR 8289 KLF1 171 FAM118A 103 SEPX1 269 EPB42 95 EPB42 95 EPB42 95 F5EPB42 10195 RNF182 261 TBC1D22BABCB6 296 2 RFX2 259 KCNJ2 166 RUNDC3AKRT1 264173 TCEANCKRT1DLL3 29817378 SCARF1KRT1 267173 KRT1 173 PLXDC2 239 ARL4APLVAP 19238 C5orf4PLVAPFKBP4 42238115 BCL3PLVAP 27238 PLVAP 238 TLR6 304 CCRL2AHSP 487 NUDT4AHSPKEL 2237 167 IL1R2AHSP 1547 AHSP 7 WDFY3 323 FAM83AHMBS 106142 ARHGEF12HMBSHSPB1 17142147 HLXHMBS 141142 HMBS 142 NGRNTMEM14B 215307 ANK1TMEM14BANKRD9 1130712 FCGRTTMEM14B 109307 TMEM14B 307 DNAJA4 80 TREM1 313 ERAP2 98 KLF1 171 MARCH8KLF1PTN 194171248 KLF1 171 NFATC2KLF1 212171 ERMAPRNF182 99261 USP12RNF182LTF 319261188 PPCDCRNF182 241261 RNF182 261 SLC19A1 279 ASXL2 21 HEPACAM2RUNDC3A 137264 GCLCRUNDC3AAQP1 12426414 RUNDC3A 264 RUNDC3A 264 IBA57 148 CDA 49 KCNA3 164 ARL4A 19 MOSPD1ARL4AGOLT1A 20219 129 ARL4A 19 LNPEPARL4A 18019 CLIC2 59 PBX1 230 CCNJL 46 CCRL2 48 CCRL2 48 CCRL2 48 VPS13ACCRL2 32248 SOX6 288 ELOVL6CSPG5 93 67 MMP25 200 NCAPG2 208 CA1FAM83A 44106 KAT2BFAM83AEGFR 16310689 FRAT1FAM83A 118106 FAM83A 106 INPP4B 157 IFIT1BNGRN 152215 ZNF23NGRNEPB42 33421595 SLC16A3NGRN 278215 NGRN 215 USP53 320 YPEL4DNAJA4 33080 TRIM10DNAJA4KRT1 31480 173 NCF4DNAJA4 21080 DNAJA4 80 DMXL1 79 NSUN3ERMAP 22199 CPOXERMAPPLVAP 6299 238 CDC42EP2ERMAP 5099 ERMAP 99 BPGMHEPACAM2 32137 RSAD2HEPACAM2AHSP 2631377 LST1HEPACAM2 187137 GPRIN3HEPACAM2 131137 MAP1LC3B2 191 ZFAND4HMBS 332 142 PHC2 232 LATS1 175 IBA57 148 IBA57 148 IBA57 148 NHLRC2IBA57 216148 TMOD1CLIC2 31059 SLC14A1CLIC2TMEM14B 27659 307 PLAURCLIC2 23559 CLIC2 59 BTNL8 34 SACS 266 PPME1SOX6 242288 SOX6KLF1 288171 SOX6 288 SOX6 288 DENND4A 76 SULF2 294 N4BP2 207 SLC2A1 280 RNF182 261 E2F2CA1 8844 FCRL5CA1 11144 LILRA2CA1 17744 SLC4A7CA1 28344 REXO2IFIT1B 257152 PANK3IFIT1BRUNDC3A228152264 SLC11A1IFIT1B 275152 SLFN5IFIT1B 286152 GADD45AYPEL4 122330 FCRL3YPEL4ARL4A 11033019 PANX2YPEL4 229330 IKZF3YPEL4 153330 TMEM158NSUN3 308221 ZNF192NSUN3CCRL2 33322148 DPEP2NSUN3 83221 ZDHHC21NSUN3 331221 RFESDBPGM 25832 BPGMFAM83A 32 106 NRG1BPGM 22032 BPGM 32 TAL1MAP1LC3B2 295191 IFI30MAP1LC3B2NGRN 150191215 NINJ1MAP1LC3B2 217191 MAP1LC3B2 191 ART4TMOD1 20310 IL8TMOD1DNAJA4 15531080 ADMTMOD1 5 310 TMOD1 310 NET1PPME1 211242 G0S2PPME1ERMAP 12124299 LY96PPME1 189242 PPME1 242 PTGDS 247 IFI27SLC2A1 149280 SLC2A1HEPACAM2 280137 ST3GAL4SLC2A1 290280 SLC2A1 280 RCN3 255 SOSTDC1E2F2 28788 E2F2IBA57 88 148 E2F2 88 E2F2 88 EIF1B 91 ARHGEF40 18 28 LRG1 185 REXO2 257 OSCARREXO2CLIC2 22525759 NFIL3REXO2 213257 REXO2 257 HSPA1B 146 SOX6 288 GYPAGADD45A 133122 CLEC4EGADD45A 58122 ALPLGADD45A 10122 GADD45A 122 UBXN10TMEM158 318308 FAM212BTMEM158CA1 10530844 FRAT2TMEM158 119308 TMEM158 308 PLD6RFESD 237258 MANSC1RFESDIFIT1B 190258152 DGAT2RFESD 77258 RFESD 258 BBS12TAL1 25295 HCAR2TAL1YPEL4 135295330 WLSTAL1 324295 TAL1 295 ATG4AART4 2320 SLC45A4ART4NSUN3 28220 221 PILRAART4 23420 ART4 20 FECHNET1 112211 KRT23NET1BPGM 17421132 MBOAT7NET1 195211 NET1 211 ARAP3 16 DYRK3IFI27 86149 IFI27MAP1LC3B2149191 NLRP6IFI27 219149 IFI27 149 C1orf61 39 MRVI1 204 GLT1D1 126 SOSTDC1 287 GNG10SOSTDC1TMOD1 128287310 SOSTDC1 287 SOSTDC1 287 CISD2 54 PPME1 242 LPAR2 181 EIF1B 91 TNFRSF10CEIF1B 31191 RASGRP4EIF1B 25391 EIF1B 91 CRYABHSPA1B 65146 HSPA1BSLC2A1 146280 HSPA1B 146 HSPA1B 146 MPO 203 CSF2RA 66 PTP4A3 249 GYPA 133 HCAR3GYPAE2F2 13613388 IMPA2GYPA 156133 GYPA 133 SLC7A5 284 REXO2 257 SLC16A1UBXN10 277318 HRH2UBXN10 143318 CEACAM3UBXN10 51318 UBXN10 318 C17orf103PLD6 36237 NLRP12PLD6GADD45A218237122 TSEN34PLD6 315237 PLD6 237 MARCH3BBS12 19325 DAPK2BBS12TMEM1587325 308 PADI4BBS12 22725 BBS12 25 TBCELATG4A 29723 AATKATG4ARFESD 1 23 258 THBDATG4A 30323 ATG4A 23 YOD1FECH 329112 LPPR2FECHTAL1 183112295 FECH 112 FECH 112 S100P 265 BCL6 28 BCAMDYRK3 2686 DYRK3ART4 86 20 DYRK3 86 DYRK3 86 C17orf39 37 NTNG2NET1 222 211 HAL 134 C1orf61 39 CEP19C1orf61 5339 KCNJ15C1orf61 16539 C1orf61 39 TUBB2ACISD2 31654 CISD2IFI27 54 149 CISD2 54 CISD2 54 CITED2 55 PRAM1 244 RBM47 254 CRYAB 65 LILRB3CRYABSOSTDC1 17965 287 SIGLEC14CRYAB 27165 CRYAB 65 FEM1B 113 EIF1B 91 KANK2MPO 162203 ZNF467MPO 335203 CYP4F3MPO 72203 MPO 203 SLC7A5 284 CPT1BSLC7A5HSPA1B 63284146 UBE2D1SLC7A5 317284 SLC7A5 284 PNP 240 GYPA 133 C5SLC16A1 41277 PYGLSLC16A1 250277 QPCTSLC16A1 251277 SLC16A1 277 CTNNAL1C17orf103 6836 FPR2C17orf103UBXN10 11736 318 AGPAT9C17orf103 6 36 C17orf103 36 SPTA1MARCH3 289193 EMR3MARCH3PLD6 94193237 SIRPB1MARCH3 273193 MARCH3 193 FHDC1TBCEL 114297 TECPR2TBCELBBS12 30029725 GKTBCEL 125297 TBCEL 297 DNAJB4 81 AQP9ATG4A 15 23 MGAM 197 YOD1 329 PLB1YOD1 236329 REPS2YOD1 256329 YOD1 329 XPO7BCAM 32726 BCAMFECH 26 112 BCAM 26 BCAM 26 NFIX 214 CKLFDYRK3 56 86 DSC2 84 C17orf39 37 PADI2C17orf39 22637 LILRA3C17orf39 17837 C17orf39 37 ITSN1TUBB2A 160316 TUBB2AC1orf61 31639 TUBB2A 316 TUBB2A 316 SFRP2 270 CEACAM4 52 STX3 293 CITED2 55 VNN1CITED2CISD2 32155 54 MMECITED2 19955 CITED2 55 TMEM57 309 CRYAB 65 ISCA1FEM1B 159113 ST6GALNAC2FEM1B 291113 TMCC3FEM1B 306113 FEM1B 113 KANK2 162 LRRC25KANK2MPO 186162203 CREB5KANK2 64162 KANK2 162 ZRANB1 336 SLC7A5 284 ATG14PNP 22240 KIAA1324PNP 169240 LPCAT2PNP 182240 PNP 240 CTSL1C5 6941 MEFVC5 SLC16A1 19641 277 TLR8C5 30541 C5 41 ELL2CTNNAL1 9268 PFKFB4CTNNAL1C17orf10323168 36 KREMEN1CTNNAL1 17268 CTNNAL1 68 KIAA1191 168 PRRG4MARCH3 246 193 PPP1R3B 243 SPTA1 289 PROK2SPTA1 245289 RNF24SPTA1 262289 SPTA1 289 ALDH5A1FHDC1 9 114 FHDC1TBCEL 114297 FHDC1 114 FHDC1 114 FBXO30 108 SIGLEC5YOD1 272 329 SIRPB2 274 DNAJB4 81 EPHB4DNAJB4 9681 STEAP4DNAJB4 29281 DNAJB4 81 SLFN14XPO7 285327 XPO7BCAM 32726 XPO7 327 XPO7 327 RNF14 260 COL18A1 61 NFIX 214 LGALS2NFIXC17orf39 17621437 BEND2NFIX 29214 NFIX 214 YES1 328 TUBB2A 316 CYP1B1 71 TRAK2ITSN1 312160 MMP9ITSN1 201160 ITSN1 160 ITSN1 160 SFRP2 270 ANPEPSFRP2CITED2 1327055 IRAK3SFRP2 158270 SFRP2 270 BIRC2 30 FEM1B 113 CXCL5 70 OSBP2TMEM57 224309 B3GNT8TMEM57 24309 TMEM57 309 TMEM57 309 CMTM2KANK2 60 162 FAM198B 104 TFDP1ISCA1 301159 ISCA1 159 ISCA1 159 ISCA1 159 LRFN1 184 HLA-DQA2 140 C14orf45ZRANB1 35336 ZRANB1PNP 336240 ZRANB1 336 ZRANB1 336 XKR8 326 DOCK5 82 XK 325 C5 41 ATG14 22 GPR97ATG14 13022 F5ATG14 10122 ATG14 22 TFRC 302 CTNNAL1 68 CTSL1 69 CCR3CTSL1 4769 KCNJ2CTSL1 16669 CTSL1 69 FAM118A 103 SPTA1 289 ELL2 92 SEPX1ELL2 26992 PLXDC2ELL2 23992 ELL2 92 TBC1D22B 296 FHDC1 114 KIAA1191 168 RFX2KIAA1191 259168 TLR6KIAA1191 304168 KIAA1191 168 TCEANC 298 DNAJB4 81 SCARF1 267 WDFY3 323 C5orf4ALDH5A1 429 ALDH5A1 9 ALDH5A1 9 ALDH5A1 9 BCL3XPO7 27 327 NUDT4FBXO30 223108 FBXO30 108 FBXO30 108 FBXO30 108 IL1R2NFIX 154 214 ERAP2 98 ARHGEF12SLFN14 17285 SLFN14 285 SLFN14 285 SLFN14 285 HLX ITSN1 141 160 NFATC2 212 ANK1RNF14 11260 RNF14 260 RNF14 260 RNF14 260 FCGRTSFRP2 109 270 ASXL2 21 MARCH8YES1 194328 YES1 328 YES1 328 YES1 328 TREM1TMEM57 313 309 KCNA3 164 USP12TRAK2 319312 TRAK2 312 TRAK2 312 TRAK2 312 PPCDC 241 LNPEP 180 GCLC 124 ISCA1 159 BIRC2 30 SLC19A1BIRC2 27930 VPS13ABIRC2 32230 BIRC2 30 MOSPD1 202 ZRANB1 336 OSBP2 224 CDAOSBP2 49224 NCAPG2OSBP2 208224 OSBP2 224 PBX1 230 ATG14 22 TFDP1 301 CCNJLTFDP1 46301 INPP4BTFDP1 157301 TFDP1 301 ELOVL6 93 CTSL1 69 C14orf45 35 MMP25C14orf45 20035 USP53C14orf45 32035 C14orf45 35 KAT2B 163 ELL2 92 FRAT1 118 DMXL1 79 ZNF23XK 334325 XK 325 XK 325 XK 325 SLC16A3KIAA1191278 168 GPRIN3 131 TRIM10TFRC 314302 TFRC 302 TFRC 302 TFRC 302 NCF4ALDH5A1 210 9 LATS1 175 CPOXFAM118A 62103 FAM118A 103 FAM118A 103 FAM118A 103 CDC42EP2FBXO30 50 108 NHLRC2 216 RSAD2TBC1D22B 263296 TBC1D22B 296 TBC1D22B 296 TBC1D22B 296 LST1SLFN14 187 285 SACS 266 ZFAND4TCEANC 332298 TCEANC 298 TCEANC 298 TCEANC 298 PHC2RNF14 232 260 N4BP2 207 SLC14A1C5orf4 27642 C5orf4 42 C5orf4 42 C5orf4 42 PLAURYES1 235 328 SLC4A7 283 NUDT4 223 NUDT4 223 SLFN5NUDT4 286223 NUDT4 223 DENND4A 76 BTNL8TRAK2 34 312 ARHGEF12 17 SULF2ARHGEF12BIRC2 29417 30 IKZF3ARHGEF12 15317 ARHGEF12 17 FCRL5ANK1 11111 ANK1 11 ZDHHC21ANK1 33111 ANK1 11 PANK3 228 LILRA2OSBP2 177 224 FCRL3MARCH8 110194 SLC11A1MARCH8TFDP1 275194301 MARCH8 194 MARCH8 194 ZNF192USP12 333319 PANX2USP12C14orf45 22931935 USP12 319 USP12 319 GCLC 124 DPEP2GCLC 83124 GCLC 124 GCLC 124 NRG1XK 220 325 IFI30MOSPD1 150202 MOSPD1TFRC 202302 MOSPD1 202 MOSPD1 202 IL8 155 NINJ1 217 PBX1 230 PBX1FAM118A 230103 PBX1 230 PBX1 230 G0S2ELOVL6 12193 ADMELOVL6 5 93 ELOVL6 93 ELOVL6 93 PTGDS 247 LY96TBC1D22B189 296 KAT2B 163 ST3GAL4KAT2BTCEANC 290163298 KAT2B 163 KAT2B 163 ARHGEF40ZNF23 18334 RCN3ZNF23C5orf4 25533442 ZNF23 334 ZNF23 334 OSCARTRIM10 225314 LRG1TRIM10NUDT4 185314223 TRIM10 314 TRIM10 314 CLEC4ECPOX 5862 NFIL3CPOXARHGEF1221362 17 CPOX 62 CPOX 62 FAM212BRSAD2 105263 ALPLRSAD2ANK1 1026311 RSAD2 263 RSAD2 263 MANSC1ZFAND4 190332 FRAT2ZFAND4MARCH8 119332194 ZFAND4 332 ZFAND4 332 HCAR2SLC14A1 135276 DGAT2SLC14A1USP12 77276319 SLC14A1 276 SLC14A1 276 SLC45A4 282 WLS 324 GCLC 124 KRT23DENND4A 17476 PILRADENND4A 23476 DENND4A 76 DENND4A 76 MOSPD1 202 ARAP3FCRL5 16111 MBOAT7FCRL5 195111 FCRL5 111 FCRL5 111 PBX1 230 MRVI1PANK3 204228 NLRP6PANK3 219228 PANK3 228 PANK3 228 ELOVL6 93 GNG10FCRL3 128110 GLT1D1FCRL3 126110 FCRL3 110 FCRL3 110 LPAR2KAT2B 181 163 TNFRSF10CZNF192 311333 ZNF192 333 ZNF192 333 ZNF192 333 CSF2RA 66 RASGRP4ZNF23 253 334 PTP4A3 249 HCAR3IFI30 136150 IFI30TRIM10 150314 IFI30 150 IFI30 150 HRH2 143 IMPA2 156 IL8 155 IL8 CPOX 15562 IL8 155 IL8 155 NLRP12 218 CEACAM3RSAD2 51 263 DAPK2G0S2 73121 TSEN34G0S2 315121 G0S2 121 G0S2 121 PTGDS 247 PTGDSZFAND4 247332 PTGDS 247 PTGDS 247 AATK 1 PADI4SLC14A1 227 276 LPPR2 183 THBD 303 ARHGEF40 18 ARHGEF40 18 ARHGEF40 18 ARHGEF40 18 S100P 265 DENND4A 76 NTNG2OSCAR 222225 BCL6OSCAR 28225 OSCAR 225 OSCAR 225 FCRL5 111 CEP19CLEC4E 5358 HALCLEC4E 13458 CLEC4E 58 CLEC4E 58 PRAM1FAM212B 244105 KCNJ15FAM212BPANK3 165105228 FAM212B 105 FAM212B 105 LILRB3MANSC1 179190 RBM47MANSC1FCRL3 254190110 MANSC1 190 MANSC1 190 SIGLEC14ZNF192 271 333 ZNF467HCAR2 335135 HCAR2 135 HCAR2 135 HCAR2 135 CPT1B 63 CYP4F3 72 SLC45A4 282 UBE2D1SLC45A4IFI30 317282150 SLC45A4 282 SLC45A4 282 PYGLKRT23 250174 KRT23 174 KRT23 174 KRT23 174 FPR2 117 QPCTIL8 251 155 EMR3ARAP3 9416 AGPAT9ARAP3G0S2 6 16 121 ARAP3 16 ARAP3 16 TECPR2MRVI1 300204 SIRPB1MRVI1PTGDS 273204247 MRVI1 204 MRVI1 204 AQP9GNG10 15128 GKGNG10 125128 GNG10 128 GNG10 128 PLB1TNFRSF10C 236311 MGAMTNFRSF10CARHGEF4019731118 TNFRSF10C 311 TNFRSF10C 311 CKLFCSF2RA 5666 REPS2CSF2RAOSCAR 25666 225 CSF2RA 66 CSF2RA 66 PADI2HCAR3 226136 DSC2HCAR3CLEC4E 8413658 HCAR3 136 HCAR3 136 CEACAM4HRH2 52143 LILRA3HRH2 178143 HRH2 143 HRH2 143 STX3FAM212B 293 105 VNN1NLRP12 321218 NLRP12MANSC1 218190 NLRP12 218 NLRP12 218 ST6GALNAC2 291 MME 199 DAPK2 73 TMCC3DAPK2HCAR2 30673 135 DAPK2 73 DAPK2 73 LRRC25AATK 1861 AATK 1 AATK 1 AATK 1 KIAA1324 169 CREB5SLC45A4 64 282 MEFVLPPR2 196183 LPCAT2LPPR2KRT23 182183174 LPPR2 183 LPPR2 183 PFKFB4S100P 231265 TLR8S100PARAP3 30526516 S100P 265 S100P 265 PRRG4NTNG2 246222 KREMEN1NTNG2MRVI1 172222204 NTNG2 222 NTNG2 222 PROK2CEP19 24553 PPP1R3BCEP19GNG10 24353 128 CEP19 53 CEP19 53 SIGLEC5PRAM1 272244 RNF24PRAM1TNFRSF10C262244311 PRAM1 244 PRAM1 244 EPHB4LILRB3 96179 SIRPB2LILRB3CSF2RA 27417966 LILRB3 179 LILRB3 179 STEAP4 292 COL18A1ZNF467 61335 ZNF467HCAR3 335136 ZNF467 335 ZNF467 335 LGALS2CPT1B 17663 CPT1BHRH2 63 143 CPT1B 63 CPT1B 63 MMP9 201 BEND2 29 PYGL 250 CYP1B1PYGLNLRP12 71250218 PYGL 250 PYGL 250 ANPEPFPR2 13117 FPR2 117 FPR2 117 FPR2 117 B3GNT8 24 IRAK3DAPK2 158 73 CMTM2EMR3 6094 CXCL5EMR3AATK 7094 1 EMR3 94 EMR3 94 LRFN1TECPR2 184300 FAM198BTECPR2LPPR2 104300183 TECPR2 300 TECPR2 300 XKR8AQP9 32615 HLA-DQA2AQP9S100P 14015 265 AQP9 15 AQP9 15 GPR97PLB1 130236 DOCK5PLB1NTNG2 82236222 PLB1 236 PLB1 236 CCR3CKLF 4756 F5CKLFCEP19 10156 53 CKLF 56 CKLF 56 SEPX1PADI2 269226 KCNJ2PADI2 166226 PADI2 226 PADI2 226 PLXDC2PRAM1 239 244 RFX2CEACAM4 25952 CEACAM4LILRB3 52 179 CEACAM4 52 CEACAM4 52 SCARF1 267 TLR6 304 VNN1 321 WDFY3VNN1ZNF467 323321335 VNN1 321 VNN1 321 BCL3ST6GALNAC2 27291 ST6GALNAC2CPT1B 29163 ST6GALNAC2 291 ST6GALNAC2 291 IL1R2LRRC25 154186 ERAP2LRRC25PYGL 98186250 LRRC25 186 LRRC25 186 HLX 141 NFATC2 212 FCGRTKIAA1324 109169 KIAA1324FPR2 169117 KIAA1324 169 KIAA1324 169 MEFV 196 ASXL2MEFVEMR3 2119694 MEFV 196 MEFV 196 TREM1 313 KCNA3 164 PPCDCPFKFB4 241231 PFKFB4TECPR2 231300 PFKFB4 231 PFKFB4 231 PRRG4 246 LNPEPPRRG4AQP9 18024615 PRRG4 246 PRRG4 246 SLC19A1 279 VPS13A 322 CDAPROK2 49245 NCAPG2PROK2PLB1 208245236 PROK2 245 PROK2 245 CCNJLSIGLEC5 46272 INPP4BSIGLEC5CKLF 15727256 SIGLEC5 272 SIGLEC5 272 MMP25EPHB4 20096 USP53EPHB4PADI2 32096 226 EPHB4 96 EPHB4 96 FRAT1COL18A1 11861 DMXL1COL18A1CEACAM47961 52 COL18A1 61 COL18A1 61 SLC16A3LGALS2 278176 GPRIN3LGALS2VNN1 131176321 LGALS2 176 LGALS2 176 NCF4MMP9 210201 LATS1MMP9ST6GALNAC2175201291 MMP9 201 MMP9 201 CDC42EP2 50 ANPEP 13 NHLRC2ANPEPLRRC25 21613 186 ANPEP 13 ANPEP 13 LST1 187 SACS 266 PHC2B3GNT8 23224 B3GNT8KIAA1324 24 169 B3GNT8 24 B3GNT8 24 CMTM2 60 N4BP2CMTM2MEFV 20760 196 CMTM2 60 CMTM2 60 PLAUR 235 SLC4A7 283 BTNL8LRFN1 34184 LRFN1PFKFB4 184231 LRFN1 184 LRFN1 184 SLFN5PRRG4 286 246 SULF2XKR8 294326 IKZF3XKR8 153326 XKR8 326 XKR8 326 LILRA2GPR97 177130 ZDHHC21GPR97PROK2 331130245 GPR97 130 GPR97 130 SLC11A1CCR3 27547 CCR3SIGLEC5 47 272 CCR3 47 CCR3 47 PANX2SEPX1 229269 SEPX1EPHB4 26996 SEPX1 269 SEPX1 269 DPEP2RFX2 83259 RFX2COL18A1 25961 RFX2 259 RFX2 259 NRG1SCARF1 220267 SCARF1LGALS2 267176 SCARF1 267 SCARF1 267 NINJ1 217 BCL3 27 BCL3MMP9 27 201 BCL3 27 BCL3 27 ADM 5 ANPEP 13 LY96IL1R2 189154 IL1R2 154 IL1R2 154 IL1R2 154 ST3GAL4HLX 290141 HLXB3GNT8 14124 HLX 141 HLX 141 RCN3FCGRT 255109 FCGRTCMTM2 10960 FCGRT 109 FCGRT 109 LRG1TREM1 185313 TREM1LRFN1 313184 TREM1 313 TREM1 313 NFIL3PPCDC 213241 PPCDCXKR8 241326 PPCDC 241 PPCDC 241 ALPLSLC19A1 10279 SLC19A1GPR97 279130 SLC19A1 279 SLC19A1 279 FRAT2CDA 11949 CDACCR3 49 47 CDA 49 CDA 49 DGAT2CCNJL 7746 CCNJLSEPX1 46 269 CCNJL 46 CCNJL 46 WLSMMP25 324200 MMP25RFX2 200259 MMP25 200 MMP25 200 PILRAFRAT1 234118 FRAT1SCARF1 118267 FRAT1 118 FRAT1 118 MBOAT7 195 SLC16A3 278 SLC16A3BCL3 27827 SLC16A3 278 SLC16A3 278 NLRP6 219 IL1R2 154 GLT1D1NCF4 126210 NCF4 210 NCF4 210 NCF4 210 LPAR2CDC42EP2 18150 CDC42EP2HLX 50 141 CDC42EP2 50 CDC42EP2 50 RASGRP4LST1 253187 LST1FCGRT 187109 LST1 187 LST1 187 PTP4A3PHC2 249232 PHC2TREM1 232313 PHC2 232 PHC2 232 IMPA2PLAUR 156235 PLAURPPCDC 235241 PLAUR 235 PLAUR 235 CEACAM3BTNL8 5134 BTNL8SLC19A1 34 279 BTNL8 34 BTNL8 34 TSEN34SULF2 315294 SULF2CDA 29449 SULF2 294 SULF2 294 PADI4LILRA2 227177 LILRA2CCNJL 17746 LILRA2 177 LILRA2 177 THBDSLC11A1 303275 SLC11A1MMP25 275200 SLC11A1 275 SLC11A1 275 FRAT1 118 BCL6PANX2 28229 PANX2 229 PANX2 229 PANX2 229 HALDPEP2 13483 DPEP2SLC16A3 83 278 DPEP2 83 DPEP2 83 KCNJ15NRG1 165220 NRG1NCF4 220210 NRG1 220 NRG1 220 RBM47NINJ1 254217 NINJ1CDC42EP2 21750 NINJ1 217 NINJ1 217 SIGLEC14ADM 2715 ADMLST1 5 187 ADM 5 ADM 5 CYP4F3LY96 72189 LY96PHC2 189232 LY96 189 LY96 189 UBE2D1ST3GAL4 317290 ST3GAL4PLAUR 290235 ST3GAL4 290 ST3GAL4 290 QPCTRCN3 251255 RCN3BTNL8 25534 RCN3 255 RCN3 255 AGPAT9 6 LRG1 185 LRG1SULF2 185294 LRG1 185 LRG1 185 SIRPB1 273 LILRA2 177 GKNFIL3 125213 NFIL3 213 NFIL3 213 NFIL3 213 ALPL 10 ALPLSLC11A1 10 275 ALPL 10 ALPL 10 MGAM 197 PANX2 229 REPS2FRAT2 256119 FRAT2 119 FRAT2 119 FRAT2 119 DSC2DGAT2 8477 DGAT2DPEP2 77 83 DGAT2 77 DGAT2 77 LILRA3WLS 178324 WLSNRG1 324220 WLS 324 WLS 324 STX3PILRA 293234 PILRANINJ1 234217 PILRA 234 PILRA 234 MMEMBOAT7 199195 MBOAT7ADM 1955 MBOAT7 195 MBOAT7 195 TMCC3NLRP6 306219 NLRP6LY96 219189 NLRP6 219 NLRP6 219 CREB5GLT1D1 64126 GLT1D1ST3GAL4 126290 GLT1D1 126 GLT1D1 126 LPCAT2 182 LPAR2 181 LPAR2RCN3 181255 LPAR2 181 LPAR2 181 TLR8 305 LRG1 185 KREMEN1RASGRP4 172253 RASGRP4 253 RASGRP4 253 RASGRP4 253 PTP4A3 249 PTP4A3NFIL3 249213 PTP4A3 249 PTP4A3 249 PPP1R3B 243 ALPL 10 RNF24IMPA2 262156 IMPA2 156 IMPA2 156 IMPA2 156 CEACAM3 51 CEACAM3FRAT2 51 119 CEACAM3 51 CEACAM3 51 SIRPB2 274 DGAT2 77 STEAP4TSEN34 292315 TSEN34 315 TSEN34 315 TSEN34 315 WLS 324 PADI4 227 PADI4 227 PADI4 227 PADI4 227 PILRA 234 BEND2THBD 29303 THBD 303 THBD 303 THBD 303 CYP1B1 71 MBOAT7 195 IRAK3BCL6 15828 BCL6NLRP6 28 219 BCL6 28 BCL6 28 CXCL5HAL 70134 HALGLT1D1 134126 HAL 134 HAL 134 FAM198B 104 KCNJ15 165 KCNJ15LPAR2 165181 KCNJ15 165 KCNJ15 165 HLA-DQA2 140 RASGRP4 253 DOCK5RBM47 82254 RBM47 254 RBM47 254 RBM47 254 SIGLEC14 271 SIGLEC14PTP4A3 271249 SIGLEC14 271 SIGLEC14 271 F5 101 IMPA2 156 KCNJ2CYP4F3 16672 CYP4F3 72 CYP4F3 72 CYP4F3 72 PLXDC2UBE2D1 239317 UBE2D1CEACAM3 31751 UBE2D1 317 UBE2D1 317 TLR6QPCT 304251 QPCTTSEN34 251315 QPCT 251 QPCT 251 WDFY3AGPAT9 3236 AGPAT9PADI4 6 227 AGPAT9 6 AGPAT9 6 SIRPB1 273 SIRPB1THBD 273303 SIRPB1 273 SIRPB1 273 ERAP2 98 NFATC2GK 212125 GK BCL6 12528 GK 125 GK 125 ASXL2MGAM 21197 MGAMHAL 197134 MGAM 197 MGAM 197 KCNA3REPS2 164256 REPS2KCNJ15 256165 REPS2 256 REPS2 256 LNPEPDSC2 18084 DSC2RBM47 84 254 DSC2 84 DSC2 84 VPS13ALILRA3 322178 LILRA3SIGLEC14 178271 LILRA3 178 LILRA3 178 NCAPG2STX3 208293 STX3CYP4F3 29372 STX3 293 STX3 293 INPP4B 157 MME 199 MMEUBE2D1 199317 MME 199 MME 199 USP53TMCC3 320306 TMCC3 306 TMCC3 306 TMCC3 306 DMXL1 79 QPCT 251 CREB5 64 CREB5AGPAT9 64 6 CREB5 64 CREB5 64 GPRIN3LPCAT2 131182 LPCAT2 182 LPCAT2 182 LPCAT2 182 LATS1 175 SIRPB1 273 NHLRC2TLR8 216305 TLR8GK 305125 TLR8 305 TLR8 305 SACSKREMEN1 266172 KREMEN1MGAM 172197 KREMEN1 172 KREMEN1 172 N4BP2PPP1R3B 207243 PPP1R3BREPS2 243256 PPP1R3B 243 PPP1R3B 243 SLC4A7RNF24 283262 RNF24DSC2 26284 RNF24 262 RNF24 262 SLFN5SIRPB2 286274 SIRPB2LILRA3 274178 SIRPB2 274 SIRPB2 274 IKZF3STEAP4 153292 STEAP4STX3 292293 STEAP4 292 STEAP4 292 ZDHHC21 331 MME 199 BEND2 29 BEND2 29 BEND2 29 BEND2 29 TMCC3 306 CYP1B1 71 CYP1B1 71 CYP1B1 71 CYP1B1 71 CREB5 64 IRAK3 158 IRAK3 158 IRAK3 158 IRAK3 158 LPCAT2 182 CXCL5 70 CXCL5 70 CXCL5 70 CXCL5 70 TLR8 305 FAM198B 104 FAM198B 104 FAM198B 104 FAM198B 104 KREMEN1 172 HLA-DQA2 140 HLA-DQA2PPP1R3B 140243 HLA-DQA2 140 HLA-DQA2 140 DOCK5 82 DOCK5RNF24 82 262 DOCK5 82 DOCK5 82 F5 101 F5 SIRPB2 101274 F5 101 F5 101 KCNJ2 166 KCNJ2STEAP4 166292 KCNJ2 166 KCNJ2 166 PLXDC2 239 PLXDC2 239 PLXDC2 239 PLXDC2 239 TLR6 304 TLR6BEND2 30429 TLR6 304 TLR6 304 WDFY3 323 WDFY3CYP1B1 32371 WDFY3 323 WDFY3 323 IRAK3 158 ERAP2 98 ERAP2CXCL5 98 70 ERAP2 98 ERAP2 98 NFATC2 212 NFATC2FAM198B 212104 NFATC2 212 NFATC2 212 ASXL2 21 ASXL2HLA-DQA2 21 140 ASXL2 21 ASXL2 21 KCNA3 164 KCNA3DOCK5 16482 KCNA3 164 KCNA3 164 LNPEP 180 LNPEPF5 180101 LNPEP 180 LNPEP 180 VPS13A 322 VPS13AKCNJ2 322166 VPS13A 322 VPS13A 322 NCAPG2 208 NCAPG2PLXDC2 208239 NCAPG2 208 NCAPG2 208 INPP4B 157 INPP4BTLR6 157304 INPP4B 157 INPP4B 157 USP53 320 USP53WDFY3 320323 USP53 320 USP53 320 DMXL1 79 DMXL1 79 DMXL1 79 DMXL1 79 GPRIN3 131 GPRIN3ERAP2 13198 GPRIN3 131 GPRIN3 131 LATS1 175 LATS1NFATC2 175212 LATS1 175 LATS1 175 NHLRC2 216 NHLRC2ASXL2 21621 NHLRC2 216 NHLRC2 216 SACS 266 SACSKCNA3 266164 SACS 266 SACS 266 N4BP2 207 N4BP2LNPEP 207180 N4BP2 207 N4BP2 207 SLC4A7 283 SLC4A7VPS13A 283322 SLC4A7 283 SLC4A7 283 SLFN5 286 SLFN5NCAPG2 286208 SLFN5 286 SLFN5 286 IKZF3 153 IKZF3INPP4B 153157 IKZF3 153 IKZF3 153 ZDHHC21 331 ZDHHC21USP53 331320 ZDHHC21 331 ZDHHC21 331 DMXL1 79 GPRIN3 131 LATS1 175 NHLRC2 216 SACS 266 N4BP2 207 SLC4A7 283 SLFN5 286 IKZF3 153 ZDHHC21 331 The DAVID Bioinformatics Resources tools were used to organize the list of

DEGs under ontological terms, so that genes with common functions could be clustered.

As shown in Table 5, there were highly significant clusterings of genes involved in erythrocyte development and function. The dysregulation of globin genes, the cause of

β-thalassemia, is included under the ontology term Iron Ion Binding. There are also multiple categories of the differentially expressed genes that are involved in hematopoiesis, response to oxidative stress, inflammation, immune response, protein modification, and apoptosis (Table 5). Although some DEGs belong to more than one ontological category, almost all of the genes in erythrocyte homeostasis, hemopoiesis, iron ion binding, hemoglobin's chaperone, blood group antigens, and response to oxidative stress categories are up regulated. Conversely, most of the DEGs belonging to the defense response, response to bacteria, chemotaxis, and immunoglobulin-like fold categories are down regulated. About half of the DEGs belonging to the apoptosis category are up regulated and the remaining ones are down regulated. DEG that are involved in both innate and adaptive immune responses include some belonging to the iron ion binding and iron homeostasis ontological categories.

Table 5)

ONTOLOGY TERMS COUNT P- EXAMPLE GENES VALUE

Erythrocyte homeostasis 8 3.2E-5 BPGM, BCL6, KLF1, SOX6, TAL1, DYRK3, EPB42, TRIM10 Hemopoiesis 15 1.4E-4 BPGM, BCL3, BCL6, KLF1, RASGRP4, SOX6, TAL1, DYRK3, EGR1, EPB42, Immune system 6.8E-4 AHSP, MMP9, PBX1, SPTA1, TRIM10 development

29

Iron ion binding 18 8.1E-5 CISD2, RFESD, STEAP4, C5orf4, CYP1B1, CYP4F3, FECH, HBA2, HBA1, HBB, HBE1, HBG1, HBM, HBQ1, ISCA1, LTF, MPO, RSAD2, SLC11A1 Hemoglobin's Chaperone 6 6.0E-6 GATA1, CPOX, AHSP, FECH, HBB, HMB

Inflammatory response 14 7.8E-3 CCR3, C5, FPR2, IL8, KRT1, LY96, MEFV, MMP25, PROK2, SLC11A1, TLR6, TLR8, TFRC, VNN1 Defense response 23 2.6E-3 BCL3, MEFV, BPI, CCR3, C5, DEFA3, FPR2, IL8, KRT1, LTF, LILRA2, LILRA3, LILRB3, LY96, MMP25, MPO, PROK2, RSAD2, SLC11A1, TLR6, TLR8, TFRC, VNN1 Response to oxidative 8 3.4E-2 CRYAB, EGFR, GCLC, JUN, KRT1, MPO, stress SELK, VNN1

Blood group antigen 8 6.0E-7 ART4, KEL, XK, AQP1, BCAM, ERMAP, GYPA, SLC14A1

Response to bacterium 11 3.3E-3 BCL3, ADM, BP1, DEFA3, IRAK3, JUN, LTF, LY96, SLC11A1, THBD, TLR6

Regulation of apoptosis 23 4.6E-2 BCL3, BCL6, CITED2, NLRP12, ARHGEF12, BIRC2, COL18A1, CRYAB, DAPK2, DYNLL1, EGFR, FEM1B, GCLC, HSPB1, HSPA1B, HSPA1A, ITSN1, JUN, MMP9, MPO, NRG1, NET1, PROK2, VNN1 Gas transport 8 1.9E-8 AQP1, CA2, HBA1, HBA2, HBB, HBE1, HBG1, HBM, HBQ1 Iron ion homeostasis 5 4.3E-3 ABCB6, EPB42, LTF, SLC11A1, TFRC Chemotaxis 10 3.1E-3 CMTM2, CCR3, CCRL2, CXCL5, CKLF, C5, FPR2, IL8, PLAUR, PROK2 (Chemokine activity) Regulation of cell shape 5 1.8E-2 ARAP3, CDC42EP2, EPB42, LST1, SPTA1

Immunoglobulin-like 24 2.9E-4 FCGRT, FCRL3, FCRL5, HEPACAM2, fold BCAM, BTNL8, CEACAM3, CEACAM4, ERMAP, EPB42, IL1R2, LRFN1. LILRA2, LILRA3, HLA-DQA2, NRG1, NFATC2, OSCAR, PILRA, SIGLEC14, SIGLEC5, SIRPB1, SIRPB2, TREM1

30

Cofactor metabolic 11 3.6E-3 ACSL6, ALDH5A1, CPOX, FECH, GCLC, process HMBS, ISCA1, PNP, PANK3, PPCDC, VNN1 Protein kinase cascade 13 4.5E-2 BCL3, NLRP12, RASGRP3, C5, CRYAB, DAPK2, EGFR, LY96, LPAR2, MAP4K5, PROK2, TLR6, TLR8 Disulfide bond 70 1.5E-3 KIAA1324, CTSL1, CCRL2, CXCL5, CLIC2, ST3GAL4, ST6GALNAC2, ALDH5A1, DLL3, DPEP2, EMR3, EMR3, IFI30, EMR3, HRH2, KREMEN1, LRG1, MGAM, MME, NTNG2, PLAUR, PTN, PRRG4, PTGDS, PTP4A3, SCARF1, SOSTDC1, SFRP2, TNFRSF10C Carboxylic acid transport 8 2.0E-2 XK, AQP9, CPT1B, SLC11A1, SLC16A1, SLC16A3, SLC19A1, SLC7A5 Regulation of cytokine 8 5.3E-2 BCL3, BCL6, NLRP12, BPI, IRAK3, production SLC11A1, TLR6, TLR8 Membrane fraction 24 2.9E-2 ABCC4, RASGRP4, ACSL6, NCEH1, CEACAM4, CPT1B, CSPG5, CSPG5, CYP4F3, DGAT2, DYNLL1, GYPA, HSPA13, ITSN1, KREMEN1, LNPEP, MME, SLC16A1, SLC16A3, SLC19A1, SLC2A1, STX3, YES1

Table 5: Gene ontology analysis table. Selected gene ontology terms that were significantly enriched in our list of the differentially expressed genes.

The most significantly up-regulated gene in D was IFI27 (Figure 11), an INFa

inducible gene that encodes an intracellular product used as a marker of epithelial-

mesenchymal transition in many forms of cancer [34& 35]. Its up-regulation in β-

thalassemia may be indicative of myelofibrosis, or an inflammation and scarring of the

bone marrow, seen also in Philadelphia-negative chronic myeloproliferative neoplasms

where there is an exceedingly high level of expression of IFI27 in whole blood [36]. The

expression of the gene IFI27 was up-regulated more than 1100-fold in D, and this may be

31 indicative of a process of myelofibrosis or fibrotic scarring of the bone marrow. This has only been recently reported in a single case of ß-thalassemia [37], and it may be an overlooked element of the pathology of the disease. Exhaustion of the bone marrow epithelial microenvironment and subsequent fibroblast overgrowth would be predicted to interfere with the robust rate of erythropoiesis needed to replace cells lost to hemolysis, and therapeutic approaches that minimize or prevent myelofibrotic scarring might be helpful. Osteoporosis is a significant cause of morbidity in β-thalassemia, and down- regulation of alkaline phosphatase (ALPL) expression by more than 30-fold in D provides an important clue as to how that may develop.

IGFBP5 (insulin-like growth factor-binding protein 5) is also one of the substantially upregulated genes in D (82-fold, compared to N), and it has been observed to be upregulated in the fibrotic disorders systemic sclerosis/scleroderma and idiopathic pulmonary fibrosis. The product of the IGFBP5 gene is secreted and may play a causal role in epithelial cell senescence and fibroblast overgrowth [38]. The IGFBP5 gene is expressed at low levels in cell lines of hematopoietic lineages, and the dramatic increase in its RNA levels in the blood compartment of D may reflect a derangement of the bone marrow or increased population of fibrocytes associated with inflammation elsewhere in the body [39].

The microenvironment of the bone marrow may be profoundly affected by the disease β-thalassemia and some of the changes may be mediated by hyaluronic acid

(HA), a major component of the extracellular matrix in bone marrow. For example, the receptor EGFR (epidermal growth factor receptor) is 35-fold upregulated in D and is involved in fibroblast to myofibroblast differentiation, mediated by HA and in response

32 to TGF-β1 signaling [40]. The extracellular matrix proteoglycans BCAN (brevican, 66- fold upregulated in D) and NCAN (neurocan core protein, 47-fold upregulated in D) have roles in fibrosis, inflammation, and wound recovery, and bind to HA [41]. Upregulation of RAP1GAP (Rap1 GTPase-activating protein) by 140-fold in D may help reduce inflammation mediated by Rap1 and NFκB [42], by returning Rap1 to an inactive (GDP bound) state.

Osteoporosis is one of the leading causes of morbidity in patients with β- thalassemia, and the ALPL (alkaline phosphatase) gene required for proper osteoblast function was 33-fold down-regulated in D compared to N. This reduction in expression may be a regulatory consequence of iron overload [43]. DYRK3 (dual specificity --regulated kinase 3, 140-fold up-regulated in D) may also be important in bone homeostasis as it acts as a negative regulator of osteoclastogenesis in mice [44]. SLC11A1 (solute carrier family 11A1, encoding a divalent metal ion transporter), was 8-fold down-regulated in D compared to N, which may also be a response to iron overload from hemolysis [45] (Figure 11).

We observed differential expression of genes involved in the regulation of circadian rhythms, for example the circadian sleep-wake cycle process. Sleep disturbances, and particularly sleep-disorders of breathing have been frequently observed in hemoglobinopathies such as sickle cell disease and β thalassemia, and also in both iron overload and iron deficiency anemia [46]. In particular PROK2 encoding Prokineticin 2 protein, 24-fold down-regulated in D compared to N, is expressed in the suprachiasmatic nucleus and may act as an output component to transmit behavioral circadian rhythm and regulate other circadian rhythms such as sleep-wake cycles, feeding, endocrine rhythms

33 and a suggestion of signaling circadian day [47]. In addition, EGFR (epidermal growth factor receptor) indirectly controls inhibited locomotors activity and disrupted circadian sleep-wake cycles via the alteration of TGF-α [48], and it is 35-fold upregulated in D.

Figure 11) Heat-map generated for the significantly differentially expressed genes found by pairwise comparisons of the normal (N), the mother (M), and the daughter (D) using cuffdiff. The heat-map is divided into five parts, and in each part the first and second columns represent the data for, respectively, D vs. N and M vs. N pairwise comparisons. A red color indicates up-regulation, and a blue color indicates down- regulation. The numbers in the cells show the log base 2 of the fold change (numbers are rounded). The heat-map is generated using selected genes with differential expression greater than ~6x (log2 > 2.5) or less than ~1/6 (log2 < -2.5). Also, the genes with log2 (fold change) equal to inf or –inf have been omitted. The red and blue colors indicate upregulation and downregulation respectively.

34

Figure 11) D vs. N Dvs. N vs. M N Dvs. N vs. M N Dvs. N vs. M N Dvs. N vs. M N Dvs. N vs. M

IFI27 10 0 CDH1 5 2 KANK2 4 1 NRXN1 3 0 KCNJ15 -3 -1 RNF152 8 2 NFASC 5 1 SKA3 4 2 PCGF5 3 2 CORO6 -3 -1 GYPE 8 2 ADAMTS9 5 0 PCDHGC3 4 -2 C5orf4 3 1 TMEM88 -3 -1 GYPA 8 1 DCLK1 5 0 ANK1 4 1 CDC6 3 2 ABCC6 -4 0 LRRC2 8 2 AK7 5 0 RSC1A1 4 2 TMEM158 3 0 LOC643802 -4 -1 DYRK3 7 3 SLC16A1 5 1 PCDHB5 4 0 KLHL4 3 0 CCR3 -4 -2 RAP1GAP 7 3 TMEM56 5 2 ZNF334 4 2 ELANE 3 0 AOC3 -4 -1 TMCC2 7 1 AHSP 5 0 CMYA5 4 3 ANKRD18B 3 2 BASP1 -4 -1 AQP4 7 0 XPO7 5 1 CA3 4 1 CISD2 3 0 CMTM2 -4 -2 GYPB 7 1 POLQ 5 4 DEPDC1B 4 1 LOC81691 3 0 MGAM -4 0 RHAG 7 1 BCAM 5 1 MKI67 4 2 IL17D 3 1 TNFRSF10C -4 -1 CSPG4 7 2 SPRY4 5 -1 DDI2 4 3 MOK 3 0 MANSC1 -4 -1 HEPACAM2 7 2 CCDC80 4 0 MAGI2 4 1 ANLN 3 2 FBXL13 -4 -1 SLC6A19 7 0 SLC7A5 4 1 HBG2 4 -4 SHANK3 3 1 RFX2 -4 -2 PAQR9 7 1 SEMA5A 4 2 SORBS1 4 1 DTNA 3 1 AQP9 -4 -1 C10orf10 7 2 CACNG4 4 0 DNAJC6 4 2 GADD45A 3 0 CXCL6 -4 -1 KIF4A 7 3 ELN 4 0 UBB 4 0 CREG1 3 1 THBD -4 -1 TUBB2B 7 2 GPR98 4 2 TBC1D22B 4 1 EIF1B 3 0 LYVE1 -4 0 SLFN14 7 3 BEST3 4 3 ARPM1 4 2 SLC6A9 3 0 LILRA3 -4 0 RNF182 7 -1 TRAK2 4 2 CTNND2 4 -1 AFAP1L1 3 1 HSD17B3 -4 -1 PLD6 7 1 MAP7D2 4 4 SPTBN2 4 1 STYK1 3 1 GPR27 -4 -1 TUBB2A 7 2 KIF18B 4 1 PLA2G2D 4 1 C15orf26 3 1 CXCR1 -4 -2 PMP2 6 0 ABO 4 6 HIST1H3A 4 1 NGRN 3 0 CXCR2 -4 -1 ETV1 6 3 HIST3H2A 4 0 CAPN12 4 1 PPAP2B 3 2 FAM157A -4 0 IGFBP5 6 -2 ATCAY 4 -1 RCOR2 4 0 SPTBN4 3 0 MYBPC3 -4 -1 HBD 6 1 GPLD1 4 1 RSAD2 4 1 SHISA9 3 2 SMN2 -4 -2 CNTN1 6 1 BBS12 4 1 C17orf110 4 2 APOE 3 -1 DPEP3 -4 -1 LSAMP 6 0 GUCY1A2 4 1 RIOK3 4 1 PSME4 3 1 PADI2 -4 -1 NRARP 6 -1 AIF1L 4 0 GPM6A 4 2 GSPT1 3 1 ISL2 -4 -1 RHD 6 5 RASD1 4 0 HMBS 4 0 PRSS50 3 3 EMR3 -4 -1 CDC20 6 4 ALAS2 4 0 LRP1B 4 1 CEP55 3 2 CLDN9 -4 -1 BCAN 6 -1 UNC13A 4 -1 DLGAP5 4 2 TYMS 3 1 MMP25 -4 -2 UBXN10 6 0 GNG12 4 1 PGF 4 -2 PRRG1 3 3 DGAT2 -4 -2 ABCG2 6 2 C5 4 1 HMGA2 4 2 CLCN3 3 2 GHRL -4 -1 ARL4A 6 1 NRIP3 4 1 ZNF192 4 3 SNX7 3 5 TGM3 -4 -3 CA1 6 1 MT3 4 0 F3 4 0 CRMP1 3 1 BTNL8 -4 -2 FHDC1 6 2 VGF 4 -1 USP12 4 1 KRT5 3 3 COL7A1 -4 -1 YPEL4 6 1 MYT1 4 1 YES1 4 1 TFDP1 3 1 BEAN1 -4 -1 SPTA1 6 2 MYO18B 4 0 TMEM57 4 1 APOBEC3B 3 4 MAP3K15 -4 -1 LOC100507421 6 3 FECH 4 1 GATSL2 4 0 MFSD2B 3 2 CSF3R -4 -2 PTPRZ1 6 0 IQGAP3 4 2 ZNF23 4 1 ITSN1 3 1 RCN3 -4 -2 MAOA 6 1 BMP3 4 4 NACAD 4 0 PPIC 3 1 HRASLS -4 1 TBCEL 6 2 HBG1 4 -2 NNAT 4 3 FMN2 3 0 GPR97 -4 -2 FEM1B 6 2 MCM10 4 2 ANKRD9 4 -1 SHD 3 2 PANX2 -4 -2 ATP1A2 6 0 NCAPG 4 3 ITGB8 4 1 MASP1 3 0 PLIN5 -4 -2 TNR 6 1 NKX3-1 4 1 PCDHGC4 4 0 FNBP1L 3 1 DAAM2 -4 -4 RHCE 6 1 TCEANC 4 1 NLGN1 3 0 TLR3 3 3 TIAM2 -4 -1 FN1 6 1 CDCA2 4 2 ZNF462 3 1 CCL2 3 -1 ZDHHC11 -4 -1 CPE 6 2 GFAP 4 0 BUB1 3 2 CXADR 3 0 CRABP2 -4 -2 EMILIN3 6 -1 TAL1 4 1 ESCO2 3 2 TMC4 -4 -3 SGIP1 6 2 SERPINA3 4 -1 OSMR 3 1 IL1R2 -4 -2 NCAN 6 0 HIST1H2AI 4 -1 VANGL2 3 1 PROK2 -5 -1 NES 6 -1 DCLK2 4 3 PANK3 3 3 CNTNAP3 -5 -1 BPGM 5 1 SOX6 4 0 NEIL3 3 3 ALOX15 -5 0 SPARCL1 5 -1 TYRO3 4 1 ALDH5A1 3 1 CYP4F3 -3 -1 HORMAD1 -5 1 ARC 5 0 APOL4 4 2 KIAA1549 3 1 CCNJL -3 -1 HSPA12B -5 -4 CLIC2 5 1 LOC388588 4 -1 ACSM3 3 0 FRAT2 -3 -1 CA4 -5 -2 TSPO2 5 -1 CAMK2B 4 -1 CDC25C 3 2 FCGR2A -3 -1 TEK -5 -2 SFRP2 5 2 TECTA 4 2 TUBB3 3 0 PRSS35 -3 0 OTX1 -5 -2 SLC2A4 5 1 TIMP3 4 0 THSD7A 3 2 NCF4 -3 -2 ALPL -5 -2 ELF3 5 -2 PCDH1 4 3 RNF14 3 1 SLC11A1 -3 -1 FAM21B -5 -7 GRIA4 5 1 CHGB 4 -1 XKRX 3 2 LRG1 -3 -2 RPS28 -5 -1 ELOVL2 5 0 ARHGEF12 4 2 LONRF2 3 -1 TRPM6 -3 0 PI3 -5 -1 NSUN3 5 1 PARVA 4 4 SHISA7 3 1 GPR162 -3 -1 USP9Y -5 -8 E2F2 5 1 SMOC1 4 0 MYO5B 3 3 NAMPT -3 -1 KRT23 -6 -1 LHFPL3 5 0 CRYM 4 0 ABCC4 3 2 MRVI1 -3 -1 SMN1 -6 0 PCDHGB2 5 2 THEM5 4 2 KIAA1586 3 2 RASGRP4 -3 -1 MDGA1 -6 0 ARG1 5 1 PLSCR4 4 2 HOXA5 3 1 WNT11 -3 -2 DDX3Y -7 -11 PCDHGB7 5 1 C17orf103 4 1 SLC30A1 3 2 MBOAT7 -3 -1 ATP5J2-PTCD1 -8 -10 RUNDC3A 5 0 HJURP 4 1 SEC14L4 3 0 PAQR6 -3 -3 EIF1AY -9 -10 GPR37L1 5 0 MEX3A 4 1 BIRC2 3 1 ACSL1 -3 0 TMC5 5 1 REXO2 4 0 SHC3 3 0 IMPA2 -3 -1 GRIA2 5 -1 SLC44A5 4 4 OSBP2 3 1 THNSL2 -3 -1 YOD1 5 1 FREM2 4 -1 CTSL1 3 1 ARAP3 -3 -1 EGFR 5 -2 FBXO30 4 1 ALS2CR11 3 4 FPR1 -3 -1 TEAD1 5 2 SERPINB10 4 5 FAM72D 3 3 SIGLEC5 -3 -1 ERMAP 5 1 C3orf70 4 -2 PNP 3 1 IL27 -3 -1 TMEM132B 5 1 COL4A1 4 -2 PLEKHA6 3 0 TMPRSS9 -3 -2 SCG2 5 1 TMOD1 4 0 SNCA 3 1 AMN -3 -1 RGS16 5 -1 DLG2 4 1 CD276 3 1 QPCT -3 -1 TFRC 5 2 LAMA4 4 -2 TSPAN7 3 0 NRG1 -3 -2 GALNT5 5 2 SLC4A1 4 1 AQP1 3 0 MME -3 0 PCDH17 5 -1 HBM 4 -1 BIRC5 3 2 FAM70B -3 -4 PTPLA 5 0 TROAP 4 3 CADM2 3 2 BCL6 -3 -1 HSPA1B 5 0 COL4A2 4 0 CCRL2 3 0 CYP2E1 -3 -1 PBK 5 3 COL9A1 4 0 CCNB2 3 1 NECAB2 -3 -2 MELK 5 4 TRIM10 4 2 GAS2L3 3 2 TMEM92 -3 -1 NTRK2 5 0 CAV1 4 3 KAT2B 3 1 ANPEP -3 -2 RFESD 5 1 DNAJB4 4 1 KRT1 3 0 MMP9 -3 -2 LPHN3 5 0 GLDC 4 2 HSPA4L 3 1 GREM2 -3 1 ASPM 5 4 SLC1A3 4 1 DLX2 3 2 CCDC103 -3 -1 ACHE 5 -1 MAP2 4 1 PTGR1 3 1 AOC2 -3 -1 SLC1A2 5 -2 FAT3 4 -1 KCNJ10 3 1 F12 -3 -1 C8orf4 5 0 ATP1B2 4 -1 GPM6B 3 3 MFSD6L -3 -1 POU3F2 5 1 TNS1 4 1 APLP1 3 -2 KIAA1324 -3 -1 IFIT1B 5 1 SLC25A18 4 1 HSPB1 3 -1 C19orf57 -3 -1 NLGN4X 5 3 TFPI 4 1 PEG10 3 2 ZNF467 -3 -1 HIST1H3E 5 -1 RRM2 4 2 DSCC1 3 3 FPR2 -3 -1 SCG3 5 1 NET1 4 0 CNR1 3 3 SLC22A1 -3 -1 GDF15 5 -1 ARHGEF37 4 2 ZFAND4 3 1 ZNF396 -3 0 NPR3 5 3 ELL2 4 1 ATG14 3 1 GPR77 -3 -1 FAM46C 5 1 LCA5 4 0 BNIP3L 3 0 SIGLEC8 -3 1 FZD5 5 1 HRH1 4 1 CKAP2L 3 2 CASP5 -3 -1 ARHGAP42 5 3 BCAR1 4 1 UBE2O 3 0 KCNH7 -3 0 SLC2A1 5 1 DEPDC1 4 1 GLRX5 3 0 TIGD3 -3 -2 TMEM132A 5 2 TNFRSF19 4 2 KCND3 3 4 GLT1D1 -3 -1 KIF26A 5 1 PTPRF 4 1 FAM5C 3 0 ADM -3 -2 ELOVL6 5 2 FGFR3 4 0 TTYH1 3 -1 PABPN1L -3 -2 IBA57 5 1 SEZ6 4 0 CEACAM6 3 1 PLIN4 -3 -1 XK 5 1 SLC16A2 4 2 TRIM67 3 1 TPSAB1 -3 -5

35

Scatter plots (log-log) were used to compare the relative expression of individual genes in samples D, M, and N over a wide range of expression (Figure 12A, 12B, and

12C). An equivalence between FPKM values in the pairwise comparisons is shown as a diagonal line, and its offset from the data is likely to indicate a robust process of hematopoiesis in D and M compared to N. The comparison also indicates substantially greater numbers of genes that are up- or down-regulated in D compared to M.

Figure 12A)

1.E+00 1.E-09 1.E-08 1.E-07 1.E-06 1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00

1.E-01

1.E-02

1.E-03

1.E-04

1.E-05

Expression in D(FPKM) Expressionin 1.E-06

1.E-07

1.E-08

1.E-09 Expression in N (FPKM)

Figure 12. Diagonal (log-log) plots of expression levels of individual genes (FPKM units). (A) D vs. N comparison, (B) M vs. N comparison, and (C) D vs. M comparison.

36

Figure 12B)

1.E+00 1.E-09 1.E-08 1.E-07 1.E-06 1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00

1.E-01

1.E-02

1.E-03

1.E-04

1.E-05

Expression in M (FPKM) M Expressionin 1.E-06

1.E-07

1.E-08

1.E-09 Expression in N (FPKM)

Figure 12C)

1.E+00 1.E-09 1.E-08 1.E-07 1.E-06 1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00

1.E-01

1.E-02

1.E-03

1.E-04

1.E-05

Expression in D(FPKM) Expressionin 1.E-06

1.E-07

1.E-08

1.E-09 Expression in M (FPKM)

37

β-Thalassemia shows important similarities and differences with sickle cell disease at the transcriptome level

The list of differentially expressed genes in β-thalassemia was compared to the previously published RNA-seq results for sickle cell disease [49]. In the case of sickle cell disease (SCD), there were 331 DEG in whole blood compared to a normal sample, and approximately one third of them (110 genes) were in common with the 343 DEG in β thalassemia, as shown in Figure 13 and Table 6. Many genes involved in hematopoiesis, erythrocyte hemostasis, and iron binding proteins were differentially expressed in both β- thalassemia and SCD, which might be expected since both disorders are hemolytic anemias. Improving the ability of erythrocytes to deliver oxygen to tissues would also be a predictable response in both β-thalassemia and SCD, and up-regulation of BPGM

(bisphosphoglycerate mutase) is indeed seen in both cases. One of the genes up-regulated in both β-thalassemia and SCD was KLF1 (Kruppel-like factor 1), a master regulator of gene expression. 25 DEG in both β-thalassemia and SCD were also found in the published list of the DEG in circulating erythroblasts from a homozygous KLF1- null neonate with hydrops fetalis [50] (Figure 13).

The genes differentially expressed in β-thalassemia but not in SCD included

AHSP (alpha hemoglobin stabilizing protein), and the transcription factor GATA-1 that induces AHSP expression [51& 52]. AHSP prevents aggregation of α-globin during erythroid cell development and is predicted to moderate pathological states of α-globin excess such as β-thalassemia. SCD is not associated with α/β chain imbalance, and so it is understandable that up-regulated expression of AHSP is unnecessary. A surprising finding is that most genes differentially expressed in β-thalassemia but not in SCD are

38 involved in inflammation and immunity (gene names highlighted blue in supplemental

Table 2). Among these were genes involved in response to bacterial pathogens such as

ADM, BCL3, BP1, DEFA3, IRAK3, JUN, LTF, LY96, SLC11A1, THBD, and TLR6.

Table 6)

ß-thalassemia ABCB6, ACSL6, ALPL, ANK1, ANKRD9, AQP9, ARHGEF12, ARL4A, BCAM, BIRC2, BMP2K, BPGM, C14orf45, C5orf4, CA1, CA2, CCRL2, CISD2, CLCN3, & CLEC4E, CLIC2, CTNNAL1, CYP4F3, DAPK2, DCUN1D1, DNAJB4, DOCK5, DYRK3, E2F2, ELL2, ELOVL6, EMR3, ERMAP, FAM83A, FECH, FHDC1, sickle-cell GCLC, GLT1D1, GPR97, GYPA, HAL, HBA1, HBA2, HBB, HBE1, HBG1, anemia HBM, HEPA, CAM2, HMBS, HRH2, HSPA13, IFI27, IFI44L, IL1R2, ISCA1, ITSN1, KANK2, KCNJ15, KCNJ2, KEL, KIAA1324, KLF1, KRT1, KRT23, MANSC1, MAP4K5, MARCH3, MARCH8, MGAM, MME, MMP9, MOSPD1, NFIX, NSUN3, OSBP2, PLVAP, PPME1, REPS2, RFESD, RNF14, RNF182, RSAD2, RUNDC3A, SFRP2, SLC14A1, SLC16A1, SLC2A1, SLC45A4, SLC7A5, SOX6, SPTA1, STEAP4, TAL1, TBC1D22B, TBCEL, TCEANC, TCP11L2, TFDP1, TMEM14B, TMEM158, TNFRSF10C, TRAK2, TREM1, TRIM10, UBXN10, USP12, XK, XPO7, YPEL4, ZNF23

ß-thalassemia ABCB6, ABCC4, AIDA, ANK1, ANKRD9, ARL4A, ART4, B3GNT8, BMP2K, C3orf58, C5, CLCN3, CLIC2, DCUN1D1, DYRK3, ERMAP, FBXO30, FHDC1, & KFL1-null GYPA, HIST1H2BD anemia sickle-cell ABCB6, ABCG2, ANK1, ANKRD9, ARL4A, BMP2K, C10orf10, C17orf99, anemia & CLCN3, CLIC2, DCUN1D1, DYRK3, ERMAP, FHDC1, FOXO3, GDF15, GYPA, GYPE, HBD, IGF2, IGF2BP2, KLC3, KRT1, LNX2, MOSPD1, NSUN3, KFL1-null RAP1GAP, RHD, RUNDC3A, SERPINI1, SLC14A1, SLC16A1, SLC6A19, anemia SLC6A9, SLC7A5, TAL1, TBCEL, TCEANC, TCP11L2, TFR2, TSPAN7, USP12, XK, YIPF6 ß-thalassemia & ABCB6, ANK1, ANKRD9, ARL4A, BMP2K, CLCN3, CLIC2, DCUN1D1, DYRK3, ERMAP, FHDC1, GYPA, KRT1, MOSPD1, NSUN3, RUNDC3A, Sickle-cell SLC14A1, SLC16A1, SLC7A5, TAL1, TBCEL, TCEANC, TCP11L2, USP12, XK anemia & KLF1-null anemia

Table 6. Comparing the list of the DEGs in anemias; β-thalassemia, sickle cell anemia, and KFL1-null anemia.

39

Figure 13)

Figure 13. Venn diagram comparing the number of the DEGs in β-thalassemia to those found in sickle-cell anemia and KLF1-null anemia. Comparison shows that β- thalassemia has more DEGs in common with sickle cell anemia rather than KFL1-null neonatal anemia. 25 genes have been differentially expressed in all the disease.

40

CHAPTER IV: DISCUSSION

We described a novel mutation in the codon 16 of the exon 1 region of the human

β-globin gene (HBB) that introduces a cryptic 5’ splice (donor) site in the region, and is associated with down-regulation of β-globin expression as confirmed by qPCR and RNA- seq analysis. The novel mutation contributes to a severe β-thalassemia intermedia phenotype in a compound heterozygous daughter (D), but her mother (M) carrying the novel mutation in combination with a wild-type allele is phenotypically asymptomatic and has borderline normal hematological parameters.

The amount of the β-globin down-regulation as the result of the novel mutation measured with different platforms or tools (qPCR, UCSC tracks, IGV, and Cuffdiff) resulted in an approximately similar measurement. The ratio of the β-globin expression in

M to N was around 0.77, 0.7, 0.69 and 0.6 for qPCR, UCSC tracks (the normalized coverage signal of the reads mapped to HBB), IGV and Cuffdiff respectively. The moderate down-regulation of HBB in M is consistent with the functional effect of the previously known cryptic splice site mutations at the HBB gene as they are recessive and categorized as either β+ or β++ thalassemia alleles [33& 19- 23]. The expression of HBB in D is more complicated, since the paternal allele bears a frameshift mutation and the maternal allele introduces a cryptic splice site leading to a truncated protein product.

While the steady-state HBB transcript levels in D are approximately 93% of N, most arising from the maternal allele, a smaller subset of transcripts will encode a full-length

β-globin protein.

41

This study is the first study that investigates the effect of software filtering β- globin reads after RNA nucleotide sequence determination. A few studies have been done before, to confirm that having globin transcripts in the total RNA isolated from blood do not significantly affect expression or introduce bias [49, 53]. This supports the accuracy of our RNA-seq data from samples that included the highly abundant globin transcripts.

Regression analysis showed that the two approaches give similar results, with R= 0.997.

Over 300 genes were identified as being differentially expressed (DEG) between the samples from D and M, in comparison with a normal sample (N). The 343 genes identified as differentially expressed in β-thalassemia were interpreted through gene ontology enrichment analysis using David tools. This revealed that the genes most affected in β-thalassemia are involved in the hematopoetic response to oxidative stress, inflammation, immune response, protein modification, and apoptosis, as shown in Table

5. Examination of these pathways suggests that these differentially expressed genes are consistent with the homeostatic response to known pathobiological stresses in β- thalassemia, including oxidative and hemolytic stress, and contribution in repair.

Although some genes belong to more than one category, it seems that almost all the genes in erythrocyte homeostasis, hemopoiesis, iron ion binding, hemoglobin's chaperone, blood group antigens, and response to oxidative stress categories are up regulated. On the other hand, most of the genes belonging to defense response, response to bacterium, chemotaxis, and immunoglobulin-like fold categories are down regulated.

About half of the genes belonging to the apoptosis category are up regulated and the rest are down regulated.

42

There were multiple categories of dysregulated genes that are involved in inflammation and both innate and adaptive immune response or immunity. We observed down regulation of genes LILRA2, LILRA3, and LILRB3, which belong to a family of innate immune receptors- the leucocyte immunoglobulin-like receptors (LILRs).

Generally, after recognition of microbial patterns, innate immune receptors postulate a rapid innate response and activate maturation of antigen-presenting cell to initiate adaptive immune responses. The LILRs act as immunomodulators; they exert powerful inhibitory effects on antigen-presenting cells and subsequent T-cell responses. As individuals with β-thalassemia have high rates of morbidity and mortality due to infection and cancer this is also expected [54& 55].

Most of the genes involved in innate immune suppression were down regulated.

Previous studies also showed that the activity of Natural Killer (NK) cells, which are an important part of both innate and immune systems, is decreased in β-thalassemia patients

[56& 57]. This decrease in activity can result from treatments such as splenectomy, or may be due to iron overload from other causes [58]. Generally, Natural Killer cells are involved in the first line of defense against infectious cells and pathogenic microorganisms, and in the destruction of tumor cells. It is hoped that investigation of the cellular mechanism and transcriptional regulation underlying decreased activity of NK cells will help with treatment that will decrease the incidence of infectious and cancerous diseases in β-thalassemia major patients. Although many physiological mechanisms and micro environmental factors affecting NK activity are still unknown, our findings also suggest that increasing NK activity may prevent the risk of infection and cancer and may help to improve the quality of life for ß-thalassemia patients.

43

Other differentially expressed genes that affect both innate and adaptive immune systems are genes belonging to the iron ion binding or iron homeostasis categories; for example an iron transporter-encoding gene SLC11A1, which was significantly decreased in the daughter (log2 fold change= -3.05). Iron is vital for almost all living organisms and takes part in many important biological processes. However, high concentrations of free iron can be cytotoxic because they catalyze the formation of oxidative radicals (toxic oxygen radicals) that can cause protein degradation, lipid peroxidation, nucleic acid damage, and finally cell death. Hence, both iron deficiency and iron overload can cause many negative effects on a variety of cell, tissue and organ functions. Also, it has been observed that changes in systemic iron homeostasis affects immune responses. For example iron overload conditions such as β-thalassemia and sickle cell anemia increase the risk of infections [59& 60]. On the other hand, iron deficiencies decrease the inflammatory response to infection and may be associated with autoimmune and other diseases. For this reason, an optimal balance of iron is necessary for our health. Iron enters the circulation from two major sources: macrophages that recycle iron from red blood cells and duodenal epithelial cells that absorb iron from the diet. Generally, iron has both direct and secondary or indirect effects on immunity. The “direct effect” of iron refers to its function in the regulation of growth and virulence of microbial pathogens. An aspect of innate anti-microbial defense is based on starving pathogens of this nutrient or on withholding metals to restrict the growth of invading pathogens. This is called

“nutritional immunity,” and is controlled by metal/iron transporters.

Hence, our results highlight the importance of the study of the molecules that mediate the movement of iron into and out of cells and how they are influenced by iron

44 status (study of iron regulation at the molecular level) provided important insights into how the process of iron absorption, recycling and distribution are modulated in response to changing requirements. This gave a conceptual structure for considering the bidirectional interactions between iron and immunity.

Two of the important iron ion transporters are Nramp1 and Nramp2 (natural resistance-associated macrophage protein 1 and 2), which are divalent metal transporters encode by the SLC11A1 and SLC11A2 genes. They work together to transport iron from the phagosomal lumen that is released from phagocytosed red blood cells into the cytosol

[61, 62, 63]. Nramp2 acts correspondingly at the apical surface of the duodenal enterocytes to absorb iron from the intestinal lumen. Thus, these proteins are closely associated to iron homeostasis and can combat pathogens by restricting the availability of essential metals such as iron [64, 65]. Nramp1 is expressed in the cells involved in the immediate innate immune response against invading microbes, and it was found to be expressed in lymphocytes, and particularly, a subgroup of cells that are responsible for interferon-γ production [66]. Nramp1 expression is up-regulated by cytokines, and it helps produce nitric oxide along with other pro-inflammatory responses [67& 68].

Reduced expression of Nramp1 lowers the inflammatory response to infection, and also inhibits iron recycling by macrophages [63]. Ferroportin (FPN) is another iron transporter, which effluxes iron into the plasma from both macrophage and duodenal enterocytes [69& 70]. Iron also indirectly controls immunity through regulation of iron- dependent transcription factors such as NF-κB (nuclear factor kappa--chain-enhancer of activated B cells) and HIF-1α (Hypoxia-inducible factor 1-alpha). NF-κB is required for the expression of a number of genes involved in innate immunity and inflammation

45

[71], and HIF-1α is involved in innate immune responses, particularly in the expression of inflammatory cytokines and anti-microbial peptides by macrophages [72].

Since many metal transporters and transcription factors are simultaneously involved in nutritional immunity and immunostasis, there should be a balance between their roles in resistance to infection and their immune response. Therefore, both iron deficiency and iron excess conditions can influence the performance of our innate and adaptive immune systems as iron deficiency affects the promotion of the appropriate inflammatory response and may be associated with auto-immune and other disorders.

Another group of the differentially expressed gens included the genes involved in the biosynthetic pathway of heme: ALAS2, HMBS, CPOX and FECH. The ALAS2 gene encodes an erythroid-specific mitochondrial enzyme that is the rate-limiting step in heme biosynthesis, and there is also an ALAS1 gene that is expressed in all tissues. These enzymes generate aminolevulinic acid (ALA), which is a precursor used in heme synthesis [73]. ALAS2 is up regulated in the daughter, but ALAS1 is fairly steady in all samples. ALAS2 has three isoforms, and interestingly the relative expression ratios of the three isoforms among the samples were constant. The ALAD gene, which encodes the

ALA dehydratase enzyme for the third step of heme synthesis, is two-fold higher in the daughter. In the next step of heme biosynthesis, the HMBS gene encodes the hydroxymethylbilane synthase enzyme that condenses four porphobilinogen molecules

[73]. HMBS is also significantly up regulated in the daughter, but not the mother. UROS and UROD encode the forth and fifth enzymes in of this pathway. UROS encodes the enzyme for enzymatic conversion to uroporphyrinogen III, and that gene is not up regulated. UROD encodes uroporphyrinogen decarboxylase, which is responsible for

46 catalyzing the conversion of uroporphyrinogen to coproporphyrinogen, is slightly up regulated in the daughter. CPOX and PPOX genes, which encode two mithodondrial enzymes involved in the sixth and seventh steps of heme biosynthesis respectively [74], are also up regulated in the daughter. During these steps, oxygen-dependent coproporphyrinogen-III oxidase generates protoporphyrinogen IX, and protoporphyrinogen IX oxidase enzyme generates protoporphyrin IX. In the last step of heme biosynthesis, protoporphyrin IX is converted to heme by the ferrochelatase enzyme encoded by the FECH gene [74], which is significantly up regulated in the daughter

BPGM gene, which is in the lists of the differentially expressed genes in both our study and in the sickle cell paper, is significantly up regulated in the daughter sample

(The expression value for Normal, Mother and Daughter was respectively around 13, 25, and 595 FPKM, and the Log2 fold change of the daughter versus normal was equal to

5.45). The BPGM gene encodes the Bisphosphoglycerate mutase (BPGM) enzyme that is specifically expressed in placental cells and erythrocytes. The alternative names or titles are 2,3 Diphosphoglycerate mutase (DPGM), 2,3 Bisphosphoglycerate phosphatase, 2,3-

BPG phosphatase. The BPGM enzyme has an important role in oxygen transport; it catalyzes synthesis of 2,3-Bisphosphoglycerate (2,3-BPG) from 1,3-bisphosphoglycerate.

2,3-BPG binds to hemoglobin tightly (with high affinity), and make a conformational change in the hemoglobin resulting in release of oxygen to the surrounding tissues [75&

76]. Increased expression of the DPGM leads to a higher production of the 2,3-BPG, which ultimately decreases the hemoglobin’s oxygen affinity. Hence, The observed up- regulation of BPGM may enhance the efficiency of oxygen transport to the body and compensates for the lower concentration of hemoglobin in our β-thalassemia sample, and

47 generally in other hemoglobinopathies like sickle cell disease. Also, this suggests that increasing the expression of BPGM and consequently the 2,3-BPG levels can be used as a new therapeutic approach for ß-thalassemia that may compensate for the lower percentage of HbA even in a more efficient way of increasing the amount of fetal hemoglobin (HbF).

BPGM deficiency, which is a very rare autosomal recessive disorder, has been described only in two affected families before. In both families, patients had significantly diminished 2,3-DPG levels but they were clinically asymptomatic. They did not present hemolytic anemia but displayed a moderate compensatory erythrocytosis that possibly resulted from the increased hemoglobin’s oxygen affinity [77& 78]. It appears that the body makes more red blood cells in order to compensate for the lack of sufficient oxygen transport in the hemoglobinopathies.

We observed differential expression of the genes involved in the regulation of circadian rhythms, for example the circadian sleep-wake cycle process. Prokineticin 2

(PROK2), prostaglandin D2 (PTGDS or PDG2), and nuclear factor interleukin 3 regulated (NFIL3) genes are significantly down regulated in the daughter. On the other hand, epidermal growth factor receptor (EGFR), which is another regulator of the circadian sleep-wake cycles, is significantly up regulated in the daughter.

PTGDS encodes an enzyme called Prostaglandin-H2 D-isomerase, which catalyzes the conversion of prostaglandin H2 (PGH2) to prostaglandin D2 (PGD2).

PGD2, which is known as the most potent endogenous sleep-promoting component, acts as an output molecule from the suprachiasmatic nucleus to transmit behavioral circadian

48 rhythm. PTGDS is secreted as a sleep hormone, which stimulates DP1 receptors leading to the release of adenosine as a paracrine sleep-promoting molecule. Adenosine activates adenosine A2A receptor-expressing sleep-promoting neurons and inhibits adenosine A1 receptor-possessing arousal neurons. PGD2 acts as a neuromodulator in a variety of central nervous system functions, particularly circadian regulation of non-rapid eye movement (NREM) sleep. Hence, PTGDS and its product PGD2 are essential for physiological sleep maintenance [79& 80].

PROK2 encodes Prokineticin 2 Protein, which is expressed in the suprachiasmatic nucleus and may act as an output component to transmit behavioral circadian rhythm.

Recent studies showed that in addition to the role of PROK2 in generation of circadian locomotor activity, it regulates other circadian rhythms controlled by suprachiasmatic nucleus such as sleep-wake cycles, feeding, endocrine rhythms and a suggestion of signaling circadian day [81].

Nuclear Factor Interleukin 3 (NFIL3) acts as a transcriptional regulator that suppresses transcriptional activity of the PER1 and PER2, which are two key circadian genes, thereby is involved in the controlling of the circadian rhythm. NFIL3 is a component of the circadian clock, which works as a negative regulator for the expression of PER2 oscillation in the cell-autonomous core clock [82].

The epidermal growth factor receptors encoded by the EGFR gene is involved in the daily regulation of locomotor activity. The epidermal growth factor receptors on neurons in the hypothalamus mediate the inhibition of locomotion by rhythmic expression of transforming growth factor-alpha (TGF-α) in the suprachiasmatic nucleus.

49

The EGFR gene indirectly controls inhibited locomotor activity and disrupted circadian sleep-wake cycles via alteration of TGF-α [83].

Sleep disturbances, and particularly sleep-disorders of breathing have been frequently observed in hemoglobinopathies such as sickle cell disease and ß-thalassemia, and also in both iron overload and iron deficiency anemia. Multiple previous studies showed a high incidence of obstructive sleep apnea (OSA) in patients with sickle cell disease. Objective measures of sleep pattern in sickle cell patients showed that in addition to sleep-disordered breathing, there are other factors behind the sleep disruption observed in sickle cell disease such as: diminished rapid eye movement (REM) sleep time, increased arousal times, periodic limb movement syndrome (PLMS), and insomnia [84,

85 & 86].

Some previous studies characterized the sleep pattern and assessed the sleep disruption in ß– thalassemia patients as well. There have been reports of OSA in a study among children with severe ß-thalassemia while polysomnography test was performed in those who habitually snored to investigate the occurrence and associated factors for OSA among them. Among 120 children with ß-thalassemia in that study, 16% had habitual snoring, and 8.3% had OSA. Also, the occurrence of OSA was associated with adenotonsillar lymphoid hyperplasia and a high serum ferritin level [87]. More importantly, another study found objective evidence of sleep disruption with an increased arousal index in ß-thalassemia patients, which arose partially from excessive periodic limb movements during sleep. Overnight polysomnographic test showed that the number of arousals and awakenings for those patients was increased by 2-fold compared to the healthy controls, which may result in an increase in their daytime sleepiness. However,

50 there were neither evidences for obstructive sleep apnea, and nor genetics link between ß- thalassemia and the sleep disruption observed among that group of patients [88].

We think there might be a link between the dysregulated genes involved in the circadian rhythm-sleep found in our study and the ambiguity behind the sleep dysfunction and disturbances in hemoglobinopathies. Interestingly, sleep-disordered breathing and hemoglobinopathies share many clinical manifestations and pathophysiological pathways such as repeated cycles of hypoxia-reoxygenation, increased generation of reactive oxygen species, reduced nitric oxide bioavailability and its associated endothelial dysfunction, increased inflammatory cascade activation, and impaired autonomic nervous system balance. Also, there were enough proofs on association of the chronic pain in hemoglobinopathies (especially sickle cell disease) with deprived sleep quality.

All these findings may suggest that therapeutic interventions to help improving sleep quality may lead to a decrease in pain episodes and severity of hemoglobinopathies.

If the dysregulated genes found in this study are part of the missed genetic basis of sleep disturbance in hemoglobinopathies, they might be used as therapeutical targets for increasing the life quality of the patients with sickle cell disease and thalassemias.

It was possible to compare these findings with published RNA-seq analyses of sickle cell disease (SCD) and erythroblasts from a KLF1-null neonate with hydrops fetalis, and recognize similarities and differences in their transcriptional expression patterns. While many DEG involved in response to hemolysis, iron homeostasis, and anemia were common to these disorders, over 200 DEG were unique to β-thalassemia.

These point to fundamental differences between the anemias, and can be used in drug

51 discovery to identify novel therapeutic approaches. While this study only provides information on two patients, a carrier mother and her severely affected daughter, it provides a wealth of data on how the disease is manifested in that family.

Although this study provides information on only two patients, a carrier mother and her severely affected daughter, it provides a wealth of data on how the disease is manifested in that family. Also, this study is the first such broad investigation of blood cell gene expression in β-thalassemia using high throughput RNA-Seq, and it allows novel insights into the regulatory changes and disease processes taking place in the affected individuals. Moreover, the results of this study improve our understanding of the heterogeneity and molecular mechanisms of β-thalassemia, which can be used in drug discovery to identify novel therapeutic approaches for the disease.

52

REFERENCES

1. Mueller R.F, Young I.D (2001) Emery's Elements of Medical Genetics. Eleventh edition.

2. HBB hemoglobin, beta [Homo sapiens (human)] (2013) NCBI Gene. Retrieved from http://www.ncbi.nlm.nih.gov/gene/3043

3. Modell, B., & Darlison, M. (2008). Global epidemiology of haemoglobin disorders and derived service indicators. Bulletin of the World Health Organization, 86(6), 480-487.

4. Thein, S. L. (2005). Pathophysiology of β thalassemia—A guide to molecular therapies. ASH Education Program Book, 2005(1), 31-37.

5. Ribeil, J. A., Arlet, J. B., Dussiot, M., Cruz Moura, I., Courtois, G., & Hermine, O. (2013). Ineffective erythropoiesis in β-thalassemia. The Scientific World Journal, 2013. Article ID 394295, 11 pageshttp://dx.doi.org/10.1155/2013/394295

6. Cao A, Galanello R, Origa R. Beta-Thalassemia. 2000 Sep 28 [Updated 2013 Jan 24]. In: Pagon RA, Bird TD, Dolan CR, et al., editors. GeneReviews™ [Internet]. Seattle (WA): University of Washington, Seattle; 1993-. Available from: http://www.ncbi.nlm.nih.gov/books/NBK1426//

7. Cao A, Galanello R (2010) Beta-thalassemia. Genetics in Medicine 12(2): 61-76. Doi: 10.1097/GIM.0b013e3181cd68ed.

8. Thein, S. L. (2005). Pathophysiology of β thalassemia—A guide to molecular therapies. ASH Education Program Book, 2005(1), 31-37.

9. Ho PJ, Hall GW, Luo LY, Weatherall DJ, Thein SL. intermedia: is it possible to consistently predict phenotype from genotype? Br J Haematol. 1998;100:70–78. CrossRefMedline

10. Camaschella C, Maza U, Roetto A, et al. Genetic interactions in thalassemia intermedia: analysis of beta-mutations, alpha-genotype, gamma-promoters, and beta-LCR hypersensitive sites 2 and 4 in Italian patients. Am J Hematol. 1995;48:82–87. Medline

11. Khandros, E., & Weiss, M. J. (2010). Protein quality control during erythropoiesis and hemoglobin synthesis. Hematology/oncology clinics of North America, 24(6), 1071-1088.

12. Arlet, J. B., Ribeil, J. A., Guillem, F., Negre, O., Hazoume, A., Marcion, G., ... & de Beauchêne, I. C. (2014). HSP70 sequestration by free α-globin promotes ineffective erythropoiesis in β-thalassaemia. Nature, 514(7521), 242-246.

53

13. Wessling-Resnick, M. (2010). Iron homeostasis and the inflammatory response. Annual review of nutrition, 30, 105.

14. Cappellini MD, Cohen A, Eleftheriou A, et al. Guidelines for the Clinical Management of Thalassaemia [Internet]. 2nd Revised edition. Nicosia (CY): Thalassaemia International Federation; 2008. Chapter 9, Infections in Thalassaemia Major. Available from: http://www.ncbi.nlm.nih.gov/books/NBK173965/

15. Giardine B, Borg J, Higgs D. R, Peterson K. R, Philipsen S, et al. (2011) Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature genetics 43(4), 295-301. 10.

16. Thein SL. (2013) The molecular basis of beta-thalassemia. Cold spring Harb perspect Med 3(5):a011700.

17. Orkin S. H, Antonarakis S. E and Loukopoulos D. (1984). Abnormal processing of beta Knossos RNA. Blood 64(1), 311-313.

18. Treisman R, Orkinl S. H and Maniatis T (1983). Specific transcription and RNA splicing defects in five cloned β -thalassaemia genes. Nature, 302, 14.

19. Goldsmith M. E, Humphries R. K, Ley T, Cline A, Kantor J. A, et al. (1983) " Silent" nucleotide substitution in a beta+-thalassemia globin gene activates splice site in coding sequence RNA. Proc NatL Acad Sci 80(8), 2318-2322. 26.

20. Orkin, S. H., Kazazian, H. H., Antonarakis, S. E., Ostrer, H., Goff, S. C., & Sexton, J. P. (1982). Abnormal RNA processing due to the exon mutation of βE-globin gene. Nature 300:768-769.

21. Orkin, S. H., Antonarakis, S. E., & Loukopoulos, D. (1984). Abnormal processing of beta Knossos RNA. Blood, 64(1), 311-313.

22. Yang K. G, Kutlar F, George E, Wilson J. B, Kutlar A, et al. (1989) Molecular characterization of β‐globin gene mutations in Malay patients with Hb E ‐ β‐thalassaemia and thalassaemia major. British journal of haematology 72(1), 73-80.

23. Pawar A.R, Colah R.B and Mohanty D. (1997) A novel beta+-thalassemia mutation (codon 10 GCC --> GCA) and a rare transcriptional mutation (-28A --> G) in Indians. Blood 89: 3888

24. Wang, M., & Marín, A. (2006). Characterization and prediction of alternative splice sites. Gene, 366(2), 219-227.

54

25. Pfaffl, M. W. (2001). A new mathematical model for relative quantification in real-time RT– PCR. Nucleic acids research, 29(9), e45-e45.

26. Pfaffl, M. W., Horgan, G. W., & Dempfle, L. (2002). Relative expression software tool (REST©) for group-wise comparison and statistical analysis of relative expression results in real-time PCR. Nucleic acids research, 30(9).

27. Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., ... & Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15-21.

28. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., ... & Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), 2078-2079.

29. Robinson, J. T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., & Mesirov, J. P. (2011). Integrative genomics viewer. Nature biotechnology, 29(1), 24-26.

30. Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841-842.

31. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., ... & Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols, 7(3), 562-578.

32. Deniss, G. J., Sherman, B. T., Hosack, D. A., Yang, J., & Gao, W. (2003). DAVID: database for annotation, visualization, and integrated discovery. Genome Biology, 4, P3. http://david.abcc.ncifcrf.gov/

33. Tubsuwan, A., Munkongdee, T., Jearawiriyapaisarn, N., Boonchoy, C., Winichagoon, P., Fucharoen, S., & Svasti, S. (2011). Molecular analysis of globin gene expression in different thalassaemia disorders: individual variation of βE pre-mRNA splicing determine disease severity. British journal of haematology, 154(5), 635-643.

34. Li, S., Xie, Y., Zhang, W., Gao, J., Wang, M., Zheng, G., ... & Tao, X. (2015). Interferon alpha-inducible protein 27 promotes epithelial–mesenchymal transition and induces ovarian tumorigenicity and stemness. Journal of Surgical Research, 193(1), 255-264.

35. Rasmussen UB, Wolf C, Mattei MG, Chenard MP, Bellocq JP, Chambon P, Rio MC, Basset P (1993). Identification of a new interferon-alpha-inducible gene (p27) on human chromosome 14q32 and its expression in breast carcinoma. Cancer Res, 53(17):4096–4101

36. Skov, V., Larsen, T. S., Thomassen, M., Riley, C. H., Jensen, M. K., Bjerrum, O. W., & Hasselbalch, H. C. (2011). Whole‐blood transcriptional profiling of interferon‐inducible genes identifies highly upregulated IFI27 in primary myelofibrosis. European journal of haematology, 87(1), 54-60.

55

37. Nirupam, N., Maheshwari, A., Rath, B., Chandra, J., Kumar, P., Basu, S., & Nangia, A. (2012). Myelofibrosis: a cause of increased transfusion requirement in a child with β- thalassemia intermedia. Journal of pediatric hematology/oncology, 34(2), 143-145.

38. Allan, G. J., Beattie, J., & Flint, D. J. (2008). Epithelial injury induces an innate repair mechanism linked to cellular senescence and fibrosis involving IGF-binding protein-5. Journal of Endocrinology, 199(2), 155-164.

39. Yu, J., Cao, J., Li, H., Liu, P., Xu, S., Zhou, R., ... & Guo, X. (2016). Bone marrow fibrosis with fibrocytic and immunoregulatory responses induced by β- activation in osteoprogenitors. Bone, 84, 38-46.

40. Midgley, A. C., Rogers, M., Hallett, M. B., Clayton, A., Bowen, T., Phillips, A. O., & Steadman, R. (2013). Transforming growth factor-β1 (TGF-β1)-stimulated fibroblast to myofibroblast differentiation is mediated by hyaluronan (HA)-facilitated epidermal growth factor receptor (EGFR) and CD44 co-localization in lipid rafts. Journal of Biological Chemistry, 288(21), 14824-14838.

41. Gorden, A., Yang, R., Yerges-Armstrong, L. M., Ryan, K. A., Speliotes, E., Borecki, I. B., ... & Shuldiner, A. R. (2013). Genetic variation at NCAN locus is associated with inflammation and fibrosis in non-alcoholic fatty liver disease in morbid obesity. Human heredity, 75(1), 34- 43.

42. Cai, Y., Sukhova, G. K., Wong, H. K., Xu, A., Tergaonkar, V., Vanhoutte, P. M., & Tang, E. H. C. (2015). Rap1 induces cytokine production in pro-inflammatory macrophages through NFκB signaling and is highly expressed in human atherosclerotic lesions. Cell Cycle, 14(22), 3580-3592.

43. Mukaiyama, K., Kamimura, M., Uchiyama, S., Ikegami, S., Nakamura, Y., & Kato, H. (2015). Elevation of serum alkaline phosphatase (ALP) level in postmenopausal women is caused by high bone turnover. Aging clinical and experimental research, 27(4), 413-418.

44. Lee, Y., Ha, J., Kim, H. J., Kim, Y.-S., Chang, E.-J., Song, W.-J., & Kim, H.-H. (2009). Negative Feedback Inhibition of NFATc1 by DYRK1A Regulates Bone Homeostasis. The Journal of Biological Chemistry, 284(48), 33343–33351.

45. Biggs, T. E., Baker, S. T., Botham, M. S., Dhital, A., Barton, C. H., & Perry, V. H. (2001). Nramp1 modulates iron homoeostasis in vivo and in vitro: evidence for a role in cellular iron release involving de‐acidification of intracellular vesicles. European journal of immunology, 31(7), 2060-2070.

46. Gileles-Hillel, A., Kheirandish-Gozal, L., & Gozal, D. (2015). Hemoglobinopathies and sleep–The road less traveled. Sleep medicine reviews, 24, 57-70.

56

47. Xi, B., He, D., Zhang, M., Xue, J., & Zhou, D. (2014). Short sleep duration predicts risk of metabolic syndrome: a systematic review and meta-analysis. Sleep medicine reviews, 18(4), 293-297.

48. Kramer, A., Yang, F. C., Snodgrass, P., Li, X., Scammell, T. E., Davis, F. C., & Weitz, C. J. (2001). Regulation of daily locomotor activity and sleep by hypothalamic EGF receptor signaling. Science, 294(5551), 2511-2515.

49. Raghavachari, N., Barb, J., Yang, Y., Liu, P., Woodhouse, K., Levy, D., ... & Kato, G. J. (2012). A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease. BMC medical genomics, 5(1), 1.

50. Magor, G. W., Tallack, M. R., Gillinder, K. R., Bell, C. C., McCallum, N., Williams, B., & Perkins, A. C. (2015). KLF1 null neonates display hydrops fetalis and a deranged erythroid transcriptome. Blood, blood-2014. 45.

51. Kihm, A. J., Kong, Y. I., Hong, W., Russell, J. E., Rouda, S., Adachi, K., ... & Weiss, M. J. (2002). An abundant erythroid protein that stabilizes free α-haemoglobin. Nature, 417(6890), 758-763.

52. Gallagher, P. G., Liem, R. I., Wong, E., Weiss, M. J., & Bodine, D. M. (2005). GATA-1 and Oct-1 are required for expression of the human α-hemoglobin-stabilizing protein gene. Journal of Biological Chemistry, 280(47), 39016-39023.

53. Shin, Heesun, et al. "Variation in RNA-Seq transcriptome profiles of peripheral whole blood from healthy individuals with and without globin depletion." PloS one 9.3 (2014): e91041.

54. Rahav G, Volach V, Shapiro M, Rund D, Rachmilewitz EA, Goldfarb A. Severe infections in thalassaemic patients: prevalence and predisposing factors. Br J Haematol. 2006; 133: 667- 674.

55. Zurlo MG, De Stefano P, Borgna-Pignatti C, Di Palma A, Piga A, Melevendi C,et al. Survival and causes of death in thalassaemia major. Lancet, 1989; 2: 27-30.

56. Akbar, A. N., et al. "Decreased natural killer activity in thalassemia major: a possible consequence of iron overload." The Journal of Immunology 136.5 (1986): 1635-1640.

57. Atasever, Belkis, et al. "In vitro effects of vitamin c and selenium on nk activity of patients with β-thalassemia major." Pediatric hematology and oncology 23.3 (2006): 187-197.

58. Walker, Ernest M., and Sandra M. Walker. "Effects of iron overload on the immune system." Annals of Clinical & Laboratory Science 30.4 (2000): 354-365.

57

59. Onwubalili, James K. "Sickle cell disease and infection." Journal of Infection 7.1 (1983): 2- 20.

60. Wanachiwanawin, Wanchai. "Infections in E-β thalassemia." Journal of pediatric hematology/oncology 22.6 (2000): 581-587.

61. Biggs, Thelma E., et al. "Nramp1 modulates iron homoeostasis in vivo and in vitro: evidence for a role in cellular iron release involving de‐acidification of intracellular vesicles." European journal of immunology 31.7 (2001): 2060-2070.

62. Soe-Lin, Shan, et al. "Nramp1 equips macrophages for efficient iron recycling." Experimental hematology 36.8 (2008): 929-937.

63. Soe-Lin, Shan, et al. "Nramp1 promotes efficient macrophage recycling of iron following erythrophagocytosis in vivo." Proceedings of the National Academy of Sciences 106.14 (2009): 5960-5965.

64. Fleming, Mark D., et al. "Microcytic anaemia mice have a mutation in Nramp2, a candidate iron transporter gene." Nature genetics 16.4 (1997): 383-386.

65. Gunshin, Hiromi, et al. "Cloning and characterization of a mammalian proton-coupled metal- ion transporter." Nature 388.6641 (1997): 482-488.

66. Hedges, Jodi F., et al. "Solute carrier 11A1 is expressed by innate lymphocytes and augments their activation." The Journal of Immunology 190.8 (2013): 4263-4273.

67. Ibrahim, Muntaser, Hiba S. Mohamed, and Jenefer M. Blackwell. "SLC11A1 (formerly NRAMP1) and disease resistance. Cell Microbiol 3: 773-784." (2015).

68. Wyllie S., Seu P., Goss J. A. (2002) The natural resistance-associated macrophage protein 1 Slc11a1 (formerly Nramp1) and iron metabolism in macrophages. Microbes Infect. 4, 351– 359

69. De Domenico, Ivana, et al. "The molecular mechanism of hepcidin-mediated ferroportin down-regulation." Molecular biology of the cell 18.7 (2007): 2569-2578.

70. De Domenico, Ivana, et al. "Hepcidin-induced internalization of ferroportin requires binding and cooperative interaction with Jak2." Proceedings of the National Academy of Sciences 106.10 (2009): 3800-3805.

71. Vallabhapurapu, Sivakumar, and Michael Karin. "Regulation and function of NF-κB transcription factors in the immune system." Annual review of immunology 27 (2009): 693- 733.

58

72. Nizet, Victor, and Randall S. Johnson. "Interdependence of hypoxic and innate immune responses." Nature Reviews Immunology 9.9 (2009): 609-617.

73. Dailey, H. A., & Meissner, P. N. (2013). Erythroid heme biosynthesis and its disorders. Cold Spring Harbor perspectives in medicine, 3(4), a011676.

74. Chiabrando, D., Mercurio, S., & Tolosano, E. (2014). Heme and erythropoieis: more than a structural role. haematologica, 99(6), 973-983.

75. Benesch, Reinhold, and Ruth E. Benesch. "The effect of organic phosphates from the human erythrocyte on the allosteric properties of hemoglobin." Biochemical and biophysical research communications 26.2 (1967): 162-167.

76. Benesch, Reinhold, Ruth E. Benesch, and Chi Ing Yu. "Reciprocal binding of oxygen and diphosphoglycerate by human hemoglobin." Proceedings of the National Academy of Sciences 59.2 (1968): 526-532.

77. Rosa R, Prehu MO, Beuzard Y, Rosa J. The first case of a complete deficiency of diphosphoglyceratemutase in human erythrocytes. J Clin Invest. 1978;62:907-15.

78. Hoyer, James D., et al. "Erythrocytosis due to bisphosphoglycerate mutase deficiency with concurrent glucose‐6‐phosphate dehydrogenase (G‐6‐PD) deficiency." American journal of hematology 75.4 (2004): 205-208.

79. Ueno, Ryuji, et al. "Prostaglandin D2 induces sleep when microinjected into the preoptic area of conscious rats." Biochemical and biophysical research communications 109.2 (1982): 576- 582.

80. Urade, Yoshihiro, and Osamu Hayaishi. "Crucial role of prostaglandin D2 and adenosine in sleep regulation: experimental evidence from pharmacological approaches to gene-knockout mice." Future Neurology 5.3 (2010): 363-376.

81. Allebrandt, K. V. "Sleep duration and metabolic syndrome." Somnologie-Schlafforschung und Schlafmedizin 17.1 (2013): 15-20.

82. Yan, Jun, et al. "Analysis of gene regulatory networks in the mammalian circadian rhythm." PLoS Comput Biol 4.10 (2008): e1000193.

83. Kramer, Achim, et al. "Regulation of daily locomotor activity and sleep by hypothalamic EGF receptor signaling." Science 294.5551 (2001): 2511-2515.

84. Maddern, Bruce R., et al. "Obstructive sleep apnea syndrome in sickle cell disease." Annals of Otology, Rhinology & Laryngology 98.3 (1989): 174-178.

59

85. Samuels, Martin P., et al. "Sleep related upper airway obstruction and hypoxaemia in sickle cell disease." Archives of Disease in Childhood 67.7 (1992): 925-929.

86. Salles, Cristina, et al. "Prevalence of obstructive sleep apnea in children and adolescents with sickle cell anemia." Jornal Brasileiro de Pneumologia 35.11 (2009): 1075-1083.

87. Sritippayawan, Suchada, et al. "Obstructive sleep apnea among children with severe beta- thalassemia." Southeast Asian Journal of Tropical Medicineand Public Health 43.1 (2012): 152

88. Tarasiuk, Ariel, et al. "Sleep disruption and objective sleepiness in children with β- thalassemia and congenital dyserythropoietic anemia." Archives of pediatrics & adolescent medicine 157.5 (2003): 463-468.

60

APPENDIX A. Supplemental Matrix 1

FPKM values Log2 (Fold change)

Gene_ID N M D Gene_ID N_vs._M N_vs._D M_vs._D AATK 26.4413 13.2951 3.38882 AATK -0.991901 -2.96394 -1.97204 ABCB6 2.11679 1.51102 14.3561 ABCB6 -0.486355 2.76172 3.24807 ABCC4 0.87849 3.06414 9.18128 ABCC4 1.80238 3.3856 1.58321 ACSL6 1.17133 3.59942 7.11653 ACSL6 1.61962 2.60303 0.983412 ADM 43.8308 13.7099 4.08603 ADM -1.67672 -3.42317 -1.74645 AGPAT9 13.8026 11.2665 3.53578 AGPAT9 -0.292894 -1.96484 -1.67195 AHSP 67.875 48.8033 1590.51 AHSP -0.475903 4.55047 5.02637 AIDA 3.61268 8.09758 20.3831 AIDA 1.16442 2.49623 1.33181 ALDH5A1 2.1422 4.42309 23.2897 ALDH5A1 1.04596 3.44253 2.39656 ALPL 222.395 71.1909 6.69572 ALPL -1.64336 -5.05374 -3.41038 ANK1 4.05039 11.1594 54.2048 ANK1 1.46213 3.74229 2.28016 ANKRD9 14.4532 9.02294 165.347 ANKRD9 -0.679725 3.51603 4.19576 ANPEP 131.146 35.0884 14.3077 ANPEP -1.90211 -3.19631 -1.2942 AQP1 0.845232 0.668026 8.38077 AQP1 -0.339443 3.30966 3.64911 AQP9 182.492 97.8188 14.5839 AQP9 -0.899651 -3.64539 -2.74574 ARAP3 22.912 10.3628 2.66017 ARAP3 -1.14469 -3.10651 -1.96182 ARHGEF12 2.76818 9.09786 44.7714 ARHGEF12 1.71659 4.01557 2.29898 ARHGEF40 16.4808 7.81128 2.93927 ARHGEF40 -1.07715 -2.48725 -1.4101 ARL4A 1.7343 3.37995 111.375 ARL4A 0.962645 6.00492 5.04228 ART4 0 0.061901 16.7704 ART4 inf inf 8.08174 ASXL2 1.91894 9.64829 3.9385 ASXL2 2.32996 1.03734 -1.29263 ATG14 4.03688 7.32594 38.3008 ATG14 0.859772 3.24606 2.38629 ATG4A 2.82651 3.2358 14.6694 ATG4A 0.195103 2.37571 2.18061 B3GNT8 31.3579 9.55564 4.83464 B3GNT8 -1.7144 -2.69735 -0.982945 BBS12 0.452836 0.755908 9.14501 BBS12 0.739223 4.33593 3.5967 BCAM 0.722571 1.49065 16.4419 BCAM 1.04473 4.50809 3.46336 BCL3 90.131 31.8061 17.7017 BCL3 -1.50272 -2.34814 -0.845419 BCL6 107.589 75.0211 12.0005 BCL6 -0.520167 -3.16436 -2.6442 BEND2 1.43295 4.12063 0.764033 BEND2 1.52387 -0.907284 -2.43116 BIRC2 11.7652 19.5193 121.307 BIRC2 0.730371 3.36606 2.63569 BMP2K 5.25401 12.1987 35.7091 BMP2K 1.21524 2.7648 1.54956 BPGM 13.5803 25.1243 595.259 BPGM 0.887569 5.45393 4.56636 BPI 3.10297 6.36533 17.5398 BPI 1.03659 2.49891 1.46232 BTNL8 16.0306 5.29569 1.04741 BTNL8 -1.59793 -3.93593 -2.33799 C14orf45 1.9317 2.94713 15.3446 C14orf45 0.609436 2.98979 2.38035 C17orf103 17.0214 27.4311 270.592 C17orf103 0.688467 3.9907 3.30224 C17orf39 3.60829 4.42886 20.3643 C17orf39 0.295619 2.49666 2.20104 C19orf77 6.34586 3.35599 34.8454 C19orf77 -0.919078 2.45708 3.37616 C1orf61 0 0.13938 3.07743 C1QTNF1 -inf 2.7244 inf C3orf58 3.49572 7.15262 18.1633 C3orf58 1.03288 2.37736 1.34448 C5 0.298607 0.612145 5.79242 C5 1.03562 4.27785 3.24222 C5orf4 11.4916 28.1344 103.21 C5orf4 1.29176 3.16693 1.87518 C7orf58 0.460261 1.40615 3.38052 C7orf58 1.61122 2.87672 1.2655 CA1 15.5718 36.6861 999.635 CA1 1.2363 6.0044 4.7681 CA2 49.0565 23.3644 156.807 CA2 -1.07013 1.67648 2.74661 CCNJL 19.6067 7.02684 2.4383 CCNJL -1.4804 -3.0074 -1.527 CCR3 22.1042 5.36902 1.91925 CCR3 -2.04159 -3.52571 -1.48412 CCRL2 1.39988 1.58991 13.7315 CCRL2 0.183649 3.29411 3.11047 CDA 118.824 48.3069 22.8686 CDA -1.29852 -2.37738 -1.07886 CDC42EP2 31.6161 12.855 5.81371 CDC42EP2 -1.29833 -2.44313 -1.14479 CEACAM3 63.0396 26.0492 8.84679 CEACAM3 -1.27502 -2.83303 -1.55801 CEACAM4 29.3912 17.1911 4.02376 CEACAM4 -0.773724 -2.86877 -2.09504 CEP19 14.7397 7.95328 3.08544 CEP19 -0.890086 -2.25616 -1.36608 CISD2 1.50708 2.04418 13.2735 CISD2 0.439772 3.13873 2.69895 CITED2 25.6087 33.4108 177.514 CITED2 0.383678 2.79323 2.40955 CKLF 99.2198 61.164 22.796 CKLF -0.697945 -2.12185 -1.4239 CLCN3 3.269 9.38627 27.3809 CLCN3 1.5217 3.06625 1.54454 CLEC4E 28.0732 13.8766 5.58058 CLEC4E -1.01654 -2.33071 -1.31417 CLIC2 0.88246 1.66741 36.2174 CLIC2 0.91801 5.35901 4.441 CMTM2 126.445 31.5359 10.5509 CMTM2 -2.00345 -3.58307 -1.57963 COL18A1 37.132 9.58413 5.35758 COL18A1 -1.95394 -2.79301 -0.839067 CPOX 5.00958 8.839 36.0394 CPOX 0.819192 2.84681 2.02762 CPT1B 19.8748 12.9044 4.86565 CPT1B -0.623076 -2.03024 -1.40716

61

CREB5 14.3631 18.2653 2.38259 CREB5 0.346736 -2.59176 -2.9385 CRYAB 0 0.179285 3.86134 CRYAB inf inf 4.42878 CSF2RA 38.4201 20.8958 6.58225 CSF2RA -0.878646 -2.54521 -1.66656 CSPG5 0.0811097 0 3.74833 CSPG5 -inf 5.53023 inf CTNNAL1 1.55673 2.15062 11.6446 CTNNAL1 0.466236 2.90307 2.43683 CTSL1 3.0186 5.61309 30.8217 CTSL1 0.894916 3.352 2.45708 CXCL5 4.00349 29.106 2.21991 CXCL5 2.86199 -0.85076 -3.71275 CYP1B1 4.64648 11.8985 1.91175 CYP1B1 1.35657 -1.28125 -2.63781 CYP4F3 32.2132 21.4803 4.01647 CYP4F3 -0.58464 -3.00365 -2.41901 DAPK2 14.6045 7.41662 1.93041 DAPK2 -0.977576 -2.91943 -1.94185 DCUN1D1 4.37488 9.71474 21.9005 DCUN1D1 1.15093 2.32365 1.17272 DEFA3 61.0162 4.31706 169.605 DEFA3 -3.82107 1.47492 5.29599 DENND4A 1.79347 7.34133 11.9719 DENND4A 2.03329 2.73882 0.705535 DGAT2 69.5469 23.8336 4.63139 DGAT2 -1.54499 -3.90847 -2.36348 DLL3 0.286706 0 5.80702 DLL3 -inf 4.34016 inf DMXL1 1.24327 5.60158 2.71589 DMXL1 2.1717 1.12729 -1.04441 DNAJA4 6.99117 8.09371 54.5591 DNAJA4 0.211268 2.96422 2.75295 DNAJB4 1.04561 1.83355 15.2331 DNAJB4 0.810293 3.86479 3.0545 DOCK5 7.44829 15.8637 2.09135 DOCK5 1.09075 -1.83247 -2.92322 DPEP2 77.1006 30.2549 16.3983 DPEP2 -1.34957 -2.2332 -0.883623 DSC2 5.37516 3.98359 0.873698 DSC2 -0.432236 -2.6211 -2.18886 DYNLL1 109.134 81.1012 469.907 DYNLL1 -0.428299 2.10628 2.53458 DYRK3 0.206182 1.27226 29.4705 DYRK3 2.6254 7.15921 4.53381 E2F1 4.87058 2.99032 18.686 E2F1 -0.703794 1.93979 2.64359 E2F2 2.43778 4.86483 93.7307 E2F2 0.996818 5.26488 4.26806 EGFR 0.409194 0.12143 14.3535 EGFR -1.75267 5.13248 6.88514 EGR1 15.4299 6.67246 37.4116 EGR1 -1.20944 1.27775 2.48719 EIF1B 47.4876 49.4353 412.825 EIF1B 0.057992 3.11991 3.06192 ELL2 1.42817 3.05824 20.2725 ELL2 1.09854 3.82728 2.72875 ELOVL6 0.481803 2.0344 12.1893 ELOVL6 2.07809 4.66103 2.58295 EMR3 46.0807 27.0006 3.14611 EMR3 -0.771175 -3.87252 -3.10135 EPB42 27.7093 25.1903 157.265 EPB42 -0.137502 2.50476 2.64226 EPHB4 8.88265 2.32968 1.70494 EPHB4 -1.93086 -2.38126 -0.450405 ERAL1 7.59804 4.67812 28.3273 ERAL1 -0.699698 1.89849 2.59819 ERAP2 4.24828 23.6484 18.6566 ERAP2 2.47679 2.13474 -0.342055 ERMAP 3.0093 5.27117 100.622 ERMAP 0.808696 5.06337 4.25467 ESPN 10.0456 1.97007 11.1054 ESPN -2.35025 0.14469 2.49494 F5 3.74056 6.92383 1.24566 F5 0.888317 -1.58634 -2.47466 FAM100A 23.9252 14.5427 100.67 FAM100A -0.718231 2.07303 2.79126 FAM118A 5.21711 6.63543 22.4853 FAM118A 0.346937 2.10766 1.76072 FAM198B 5.90659 18.4736 3.68909 FAM198B 1.64507 -0.679062 -2.32413 FAM212B 24.5338 11.3605 3.64738 FAM212B -1.11074 -2.74984 -1.6391 FAM83A 0 0.0803655 6.29262 FAM83A inf inf 6.29094 FBXL4 1.77073 4.20368 8.15285 FBXL4 1.24731 2.20296 0.955654 FBXO30 2.16297 5.47457 33.1243 FBXO30 1.33974 3.93681 2.59707 FCGRT 257.052 82.5717 36.6376 FCGRT -1.63834 -2.81067 -1.17232 FCRL3 2.87544 13.5416 15.257 FCRL3 2.23554 2.40761 0.17207 FCRL5 1.31675 5.50872 8.23579 FCRL5 2.06473 2.64492 0.580191 FECH 3.18652 5.07556 58.9261 FECH 0.671587 4.20885 3.53727 FEM1B 2.55223 9.92988 146.394 FEM1B 1.96002 5.84196 3.88194 FHDC1 0.370974 1.65838 23.708 FHDC1 2.16038 5.99791 3.83753 FKBP4 9.55137 7.11189 53.6984 FKBP4 -0.425476 2.4911 2.91657 FOSB 3.6074 0.955529 23.5578 FOSB -1.91659 2.70718 4.62376 FPR2 93.7457 55.8024 9.65157 FPR2 -0.748426 -3.27992 -2.53149 FRAT1 64.5361 26.0467 12.4658 FRAT1 -1.309 -2.37214 -1.06313 FRAT2 229.167 88.3419 28.3341 FRAT2 -1.37523 -3.01579 -1.64056 FRMD4A 1.14652 2.42684 6.20618 FRMD4A 1.08181 2.43644 1.35463 G0S2 82.2634 2.41705 15.7729 G0S2 -5.08893 -2.3828 2.70614 GADD45A 8.9171 10.6499 77.581 GADD45A 0.256198 3.12106 2.86486 GATA1 14.3009 8.76411 55.754 GATA1 -0.70643 1.96297 2.6694 GCLC 3.82902 7.19837 30.1491 GCLC 0.910697 2.97707 2.06637 GK 8.68642 7.12201 1.67546 GK -0.286477 -2.37421 -2.08773 GLT1D1 64.425 25.3699 6.09496 GLT1D1 -1.3445 -3.40193 -2.05743 GMPR 47.8543 27.7777 247.403 GMPR -0.784722 2.37014 3.15486 GNG10 98.5775 46.9276 14.511 GNG10 -1.07082 -2.76411 -1.69329 GOLT1A 0.108348 0 4.54764 GOLT1A -inf 5.39137 inf GPR97 136.758 28.564 7.2759 GPR97 -2.25935 -4.23235 -1.973 GPRIN3 0.501912 8.49614 3.09445 GPRIN3 4.0813 2.62418 -1.45712 GSTM1 4.98184 0 4.31551 GSTM1 -inf -0.207149 inf GYPA 0.228119 0.520831 46.1934 GYPA 1.19103 7.66176 6.47073

62

HAL 15.8263 11.0388 2.65518 HAL -0.519745 -2.57544 -2.0557 HBA1 95499 65921 242472 HBA1 -0.534747 1.344261 1.879008 HBA2 702234 537204 1296709 HBA2 -0.386482 0.884831 1.271313 HBB 466965 279935 436329 HBB -0.738219 -0.0978962 0.640323 HBE1 0 0 5.12433 HBE1 0 inf inf HBG1 345.586 63.125 6369.88 HBG1 -2.45276 4.20415 6.65691 HBM 397.224 216.241 5900.02 HBM -0.877314 3.89269 4.77001 HBQ1 67.4203 21.1996 279.313 HBQ1 -1.66914 2.05063 3.71977 HCAR2 27.1943 12.613 4.1406 HCAR2 -1.10839 -2.71539 -1.607 HCAR3 35.0224 18.3417 4.79893 HCAR3 -0.933147 -2.86749 -1.93435 HEPACAM2 0.125587 0.456522 14.3809 HEPACAM2 1.862 6.83932 4.97733 HIST1H1C 45.4856 19.2217 98.1801 HIST1H1C -1.24267 1.11002 2.3527 HIST1H2BD 14.9163 10.0091 67.7829 HIST1H2BD -0.575578 2.18403 2.75961 HLA-DQA2 7.67248 36.3228 3.87836 HLA-DQA2 2.24311 -0.984246 -3.22736 HLX 24.8641 9.08693 5.01823 HLX -1.4522 -2.30882 -0.856616 HMBS 11.8986 9.95542 144.945 HMBS -0.257238 3.60664 3.86388 HRH2 26.1351 13.9094 3.8621 HRH2 -0.909926 -2.75853 -1.8486 HSPA13 1.86613 5.97156 10.6445 HSPA13 1.67806 2.51199 0.833932 HSPA1A 73.6955 33.6141 398.462 HSPA1A -1.13251 2.43479 3.56731 HSPA1B 4.29027 5.01859 131.834 HSPA1B 0.226214 4.94151 4.7153 HSPB1 45.51 20.1286 435.278 HSPB1 -1.17694 3.25768 4.43462 IBA57 1.54884 2.40596 39.1259 IBA57 0.635425 4.65887 4.02344 IFI27 4.21681 5.91013 4839.51 IFI27 0.487036 10.1645 9.67746 IFI30 549.743 45.3565 249.945 IFI30 -3.59938 -1.13714 2.46223 IFI44L 4.014 13.7649 27.1957 IFI44L 1.77788 2.76026 0.982385 IFIT1B 4.6529 7.37342 132.359 IFIT1B 0.664202 4.83018 4.16597 IKZF3 3.60303 21.3987 11.1373 IKZF3 2.57024 1.62812 -0.942121 IL1R2 44.0312 10.3695 2.08467 IL1R2 -2.08618 -4.40064 -2.31446 IL8 42.7691 1.78193 16.3057 IL8 -4.58505 -1.39119 3.19386 IMPA2 57.6586 22.7426 6.70863 IMPA2 -1.34214 -3.10344 -1.7613 INPP4B 1.45715 8.27539 2.26014 INPP4B 2.50568 0.63326 -1.87242 IRAK3 6.34188 14.7117 2.93197 IRAK3 1.21398 -1.11304 -2.32702 ISCA1 12.3906 15.9162 65.7909 ISCA1 0.361246 2.40864 2.04739 ITSN1 1.4232 2.04525 11.6468 ITSN1 0.523132 3.03272 2.50959 JUN 16.2698 3.33266 13.8748 JUN -2.28746 -0.229733 2.05772 KANK2 1.01244 1.67171 13.6381 KANK2 0.723484 3.75173 3.02825 KAT2B 10.5937 22.8643 103.06 KAT2B 1.1099 3.28222 2.17232 KCNA3 3.05142 18.7605 7.17807 KCNA3 2.62015 1.23412 -1.38603 KCNJ15 40.7159 27.2489 3.72638 KCNJ15 -0.579395 -3.44975 -2.87035 KCNJ2 5.33635 9.61303 0.794426 KCNJ2 0.849138 -2.74787 -3.59701 KEL 1.47292 1.08816 8.39216 KEL -0.436783 2.51036 2.94714 KIAA1191 10.8321 16.6805 75.8831 KIAA1191 0.622841 2.80846 2.18562 KIAA1324 8.50481 4.48372 0.907617 KIAA1324 -0.923583 -3.22812 -2.30454 KIAA1586 1.10516 3.17495 11.5155 KIAA1586 1.52248 3.38125 1.85877 KLF1 10.8501 10.5775 58.4175 KLF1 -0.036704 2.42869 2.4654 KREMEN1 6.1635 7.06739 0.910699 KREMEN1 0.197427 -2.7587 -2.95613 KRT1 8.92501 7.45223 86.7318 KRT1 -0.26018 3.28064 3.54082 KRT23 39.8303 15.0678 0.824952 KRT23 -1.4024 -5.59341 -4.19101 LATS1 0.97442 6.48117 2.75582 LATS1 2.73364 1.49987 -1.23377 LGALS2 114.543 33.755 21.4536 LGALS2 -1.76271 -2.4166 -0.653881 LILRA2 132.754 51.9093 20.5565 LILRA2 -1.35469 -2.69109 -1.3364 LILRA3 27.4825 20.328 2.09253 LILRA3 -0.435041 -3.71519 -3.28015 LILRB3 91.8651 44.814 13.0297 LILRB3 -1.03557 -2.81771 -1.78215 LNPEP 1.26031 15.1863 4.75872 LNPEP 3.59092 1.9168 -1.67413 LPAR2 111.403 47.9124 16.7604 LPAR2 -1.21731 -2.73265 -1.51534 LPCAT2 8.77183 11.0355 2.66076 LPCAT2 0.331208 -1.72104 -2.05225 LPPR2 60.6589 31.9075 10.2232 LPPR2 -0.926825 -2.56888 -1.64205 LRFN1 23.5914 8.18421 4.79521 LRFN1 -1.52735 -2.2986 -0.771249 LRG1 64.8053 21.9756 7.7845 LRG1 -1.56021 -3.05744 -1.49723 LRRC25 114.715 64.6552 21.3282 LRRC25 -0.82721 -2.42722 -1.60001 LST1 468.317 176.339 65.291 LST1 -1.40913 -2.84253 -1.4334 LTF 16.404 13.0946 92.8449 LTF -0.325074 2.50077 2.82585 LY96 88.1074 36.9381 20.6764 LY96 -1.25415 -2.09128 -0.837123 MANSC1 17.3689 7.3315 1.43366 MANSC1 -1.24433 -3.59873 -2.3544 MAP1LC3B2 17.503 18.6366 75.5438 MAP1LC3B2 0.0905355 2.10971 2.01918 MAP4K5 2.96634 6.60507 12.6305 MAP4K5 1.15489 2.09016 0.935271 MARCH3 0.767117 1.21861 11.6406 MARCH3 0.66772 3.92358 3.25586 MARCH8 18.0853 61.82 317.79 MARCH8 1.77326 4.13518 2.36193 MBOAT7 307.357 119.048 36.2346 MBOAT7 -1.36837 -3.08448 -1.71611 MEFV 76.7548 42.3919 11.2978 MEFV -0.856469 -2.76421 -1.90774

63

MGAM 26.6557 21.4481 2.21783 MGAM -0.313596 -3.58723 -3.27363 MKI67 0.173475 0.829219 2.26158 MKI67 2.25703 3.70453 1.4475 MME 31.2313 28.1144 3.52869 MME -0.151683 -3.14579 -2.99411 MMP25 294.67 92.898 19.691 MMP25 -1.66538 -3.90349 -2.23811 MMP9 117.951 28.4396 12.8492 MMP9 -2.05222 -3.19843 -1.14622 MOSPD1 1.41634 2.52343 9.99446 MOSPD1 0.833223 2.81897 1.98574 MPO 2.79575 3.47026 18.0164 MPO 0.311809 2.68801 2.3762 MRVI1 10.4559 4.772 1.24487 MRVI1 -1.13165 -3.07025 -1.9386 MSI2 3.9641 13.7508 25.2495 MSI2 1.79445 2.67119 0.87674 MSMO1 2.68118 5.83954 14.1762 MSMO1 1.12298 2.40253 1.27955 N4BP2 0.680079 2.98165 1.52654 N4BP2 2.13234 1.1665 -0.965843 NCAPG2 0.774864 5.48275 1.87871 NCAPG2 2.82288 1.27773 -1.54516 NCEH1 1.30759 2.51166 6.95726 NCEH1 0.941729 2.41161 1.46988 NCF4 316.271 109.486 38.1925 NCF4 -1.53042 -3.0498 -1.51938 NET1 1.80022 1.85803 25.8069 NET1 0.0456012 3.84151 3.79591 NFATC2 1.97438 12.1801 7.73637 NFATC2 2.62506 1.97026 -0.6548 NFIL3 35.5784 14.5698 7.6569 NFIL3 -1.28802 -2.21617 -0.928153 NFIX 6.61181 7.88516 30.4034 NFIX 0.254096 2.20112 1.94702 NGRN 15.6041 17.1962 133.004 NGRN 0.14017 3.09148 2.95131 NHLRC2 0.425653 3.87144 1.63867 NHLRC2 3.18512 1.94478 -1.24034 NINJ1 197.938 72.0202 34.37 NINJ1 -1.45857 -2.52582 -1.06725 NLRP12 30.6833 16.6016 4.91737 NLRP12 -0.886134 -2.64149 -1.75536 NLRP6 15.8673 6.52144 2.39456 NLRP6 -1.28279 -2.72822 -1.44543 NRG1 3.6097 1.13827 0.4087 NRG1 -1.66504 -3.14276 -1.47773 NSUN3 3.4882 6.20784 136.572 NSUN3 0.83161 5.29104 4.45943 NTNG2 18.8169 10.2756 3.7915 NTNG2 -0.872811 -2.31119 -1.43837 NUDT4 3.66733 6.55855 20.7078 NUDT4 0.838644 2.49737 1.65873 OSBP2 7.79634 12.9127 80.207 OSBP2 0.72792 3.36286 2.63494 OSCAR 73.432 35.8907 14.7759 OSCAR -1.0328 -2.31316 -1.28037 PADI2 43.0415 23.2288 2.98906 PADI2 -0.889814 -3.84796 -2.95815 PADI4 52.4562 24.5014 11.3743 PADI4 -1.09825 -2.20534 -1.10709 PANK3 1.13478 8.73294 12.4788 PANK3 2.94406 3.459 0.514944 PANX2 23.2795 7.39404 1.18821 PANX2 -1.65463 -4.2922 -2.63757 PBX1 1.0149 1.68526 6.14852 PBX1 0.731631 2.5989 1.86727 PFKFB4 22.0785 12.6413 4.19256 PFKFB4 -0.804495 -2.39674 -1.59224 PHC2 144.378 56.105 22.46 PHC2 -1.36365 -2.68442 -1.32077 PHLPP2 0.724502 1.86874 4.43395 PHLPP2 1.36701 2.61353 1.24653 PILRA 170.043 69.1597 27.559 PILRA -1.2979 -2.62531 -1.32741 PLAUR 56.3179 24.681 12.614 PLAUR -1.19019 -2.15857 -0.968381 PLB1 4.53999 2.59499 0.674926 PLB1 -0.806959 -2.74989 -1.94293 PLD6 1.90811 3.2216 174.946 PLD6 0.755637 6.51863 5.76299 PLVAP 4.54324 4.20505 24.1222 PLVAP -0.111598 2.40857 2.52016 PLXDC2 9.33243 15.6415 3.47494 PLXDC2 0.745056 -1.42526 -2.17032 PNP 16.0073 23.6572 161.927 PNP 0.563547 3.33854 2.77499 PPCDC 17.2963 6.31304 3.39356 PPCDC -1.45406 -2.34959 -0.895535 PPME1 4.49557 4.87866 24.9721 PPME1 0.117982 2.47374 2.35576 PPP1R3B 11.7643 13.1542 2.64008 PPP1R3B 0.161105 -2.15576 -2.31687 PRAM1 74.5405 40.4499 16.6744 PRAM1 -0.88189 -2.16039 -1.2785 PROK2 127.917 64.4031 5.29507 PROK2 -0.990004 -4.59441 -3.60441 PRRG4 5.73655 3.25851 1.02833 PRRG4 -0.815972 -2.47988 -1.66391 PTGDS 77.0553 14.3488 18.3205 PTGDS -2.42497 -2.07243 0.352537 PTN 0.134092 0 4.08022 PTN -inf 4.92736 inf PTP4A3 49.1436 22.5599 9.57825 PTP4A3 -1.12325 -2.35917 -1.23592 PYGL 66.4073 40.9149 10.9568 PYGL -0.698716 -2.59951 -1.90079 QPCT 64.6399 41.386 7.32447 QPCT -0.643282 -3.14163 -2.49835 RASGRP3 1.14132 2.49736 6.19781 RASGRP3 1.1297 2.44106 1.31135 RASGRP4 82.9204 33.8544 9.83254 RASGRP4 -1.29239 -3.07609 -1.78371 RBM47 23.0162 16.3262 4.05575 RBM47 -0.495459 -2.50461 -2.00915 RCN3 37.2751 10.4771 2.12698 RCN3 -1.83097 -4.13134 -2.30037 REPS2 3.87827 3.26872 0.780688 REPS2 -0.246689 -2.3126 -2.06591 REXO2 13.349 18.6454 207.572 REXO2 0.48209 3.95881 3.47672 RFESD 0.408794 0.755432 12.0779 RFESD 0.885928 4.88485 3.99892 RFX2 9.49771 2.24735 0.761058 RFX2 -2.07936 -3.6415 -1.56215 RNF14 3.48534 7.0016 36.7247 RNF14 1.00638 3.39738 2.39099 RNF182 0.185468 0.0931893 17.0283 RNF182 -0.992935 6.52062 7.51355 RNF24 25.7432 25.6583 5.22448 RNF24 -0.00476669 -2.30083 -2.29606 RSAD2 6.02715 14.4316 74.672 RSAD2 1.25968 3.63102 2.37134 RUNDC3A 14.4412 12.988 520.511 RUNDC3A -0.153008 5.17167 5.32467 S100P 235.823 120.14 32.7636 S100P -0.97299 -2.84754 -1.87455 SACS 0.66078 3.31017 1.60416 SACS 2.32466 1.27957 -1.04509

64

SCARF1 18.5223 5.6854 3.0611 SCARF1 -1.70393 -2.59714 -0.893212 SELK 51.2081 26.9498 110.124 SELK -0.926098 1.10469 2.03079 SEPX1 296.001 95.007 53.6219 SEPX1 -1.63949 -2.46471 -0.825213 SFRP2 0.823269 2.84672 33.3301 SFRP2 1.78987 5.33932 3.54945 SIGLEC14 28.2158 19.7174 4.17552 SIGLEC14 -0.517028 -2.75647 -2.23945 SIGLEC5 28.2227 15.3139 3.22525 SIGLEC5 -0.88202 -3.12937 -2.24736 SIRPB1 49.8791 39.6884 9.16125 SIRPB1 -0.329719 -2.44482 -2.1151 SIRPB2 21.7378 20.9502 3.85745 SIRPB2 -0.0532435 -2.49449 -2.44124 SLC11A1 118.067 43.2664 14.2245 SLC11A1 -1.44828 -3.05315 -1.60486 SLC14A1 3.58399 5.96318 24.1982 SLC14A1 0.734514 2.75526 2.02074 SLC16A1 1.24778 2.5241 30.2217 SLC16A1 1.0164 4.59815 3.58175 SLC16A3 236.612 91.9118 41.1082 SLC16A3 -1.3642 -2.52503 -1.16082 SLC19A1 52.0246 16.8645 7.07691 SLC19A1 -1.62521 -2.878 -1.25279 SLC2A1 9.96286 14.6316 258.622 SLC2A1 0.554452 4.69814 4.14369 SLC30A1 2.0892 6.62932 21.6833 SLC30A1 1.66591 3.37556 1.70965 SLC45A4 20.2378 9.87558 3.85285 SLC45A4 -1.03511 -2.39305 -1.35794 SLC4A7 1.65315 8.11347 4.04352 SLC4A7 2.2951 1.2904 -1.00471 SLC7A5 3.58884 6.98138 79.273 SLC7A5 0.959996 4.46524 3.50524 SLFN14 0.220638 2.39003 20.5307 SLFN14 3.43728 6.53996 3.10268 SLFN5 5.65326 31.7892 15.1717 SLFN5 2.49138 1.42423 -1.06715 SOSTDC1 0 0 2.52567 SOSTDC1 0 inf inf SOX6 0.245318 0.33277 4.15179 SOX6 0.439875 4.08101 3.64113 SPTA1 0.166804 0.767249 10.422 SPTA1 2.20154 5.96533 3.76379 ST3GAL4 22.2063 9.14079 5.00873 ST3GAL4 -1.28058 -2.14845 -0.867874 ST6GALNAC2 13.8266 8.05014 1.96202 ST6GALNAC2 -0.780357 -2.81703 -2.03667 STEAP4 27.0644 28.3025 4.5829 STEAP4 0.0645333 -2.56206 -2.6266 STX3 22.6839 17.4468 4.02367 STX3 -0.378704 -2.49508 -2.11638 SULF2 40.8697 16.0423 6.20909 SULF2 -1.34915 -2.71858 -1.36943 TAL1 5.15191 7.59942 90.6138 TAL1 0.560784 4.13655 3.57577 TBC1D22B 3.993 7.68492 50.509 TBC1D22B 0.944558 3.661 2.71644 TBCEL 1.36114 4.54961 80.9564 TBCEL 1.74093 5.89426 4.15333 TCEANC 1.26954 2.97737 22.5425 TCEANC 1.22973 4.15027 2.92054 TCP11L2 10.423 21.2016 64.41 TCP11L2 1.02441 2.62752 1.60311 TECPR2 10.8878 7.15734 2.50619 TECPR2 -0.605223 -2.11915 -1.51393 TFDP1 10.8631 16.5444 89.602 TFDP1 0.606904 3.04409 2.43719 TFRC 2.77059 9.10442 88.1532 TFRC 1.71637 4.99175 3.27537 THBD 18.0292 6.71385 1.38208 THBD -1.42512 -3.70542 -2.2803 TLR6 9.33015 14.0894 2.33487 TLR6 0.594641 -1.99856 -2.5932 TLR8 20.2527 26.3535 5.58272 TLR8 0.379884 -1.85907 -2.23896 TMCC3 8.02535 6.95425 1.77009 TMCC3 -0.20667 -2.18074 -1.97407 TMEM14B 28.5143 27.6708 112.602 TMEM14B -0.0433208 1.98147 2.02479 TMEM158 12.7333 15.242 114.003 TMEM158 0.259448 3.16239 2.90295 TMEM57 3.56238 5.91709 41.7537 TMEM57 0.732047 3.55099 2.81894 TMOD1 6.65773 8.45161 100.423 TMOD1 0.344196 3.91491 3.57072 TNFRSF10C 301.631 131.28 24.9465 TNFRSF10C -1.20014 -3.59588 -2.39574 TRAK2 4.90892 15.171 105.779 TRAK2 1.62784 4.4295 2.80166 TREM1 111.988 37.3607 17.8441 TREM1 -1.58375 -2.64982 -1.06607 TRIM10 1.15471 3.31581 16.877 TRIM10 1.52183 3.86946 2.34763 TSEN34 40.7411 18.9911 8.88367 TSEN34 -1.10116 -2.19726 -1.0961 TUBB2A 0.852982 4.70737 77.9746 TUBB2A 2.46433 6.51435 4.05001 UBE2D1 24.6741 17.6549 6.35614 UBE2D1 -0.482929 -1.95677 -1.47384 UBXN10 0.332256 0.46497 21.9881 UBXN10 0.484843 6.04829 5.56344 USP12 5.3459 13.772 63.1098 USP12 1.36524 3.56136 2.19612 USP53 0.923495 4.8868 1.5971 USP53 2.40372 0.790275 -1.61344 VNN1 6.24667 3.80499 1.17767 VNN1 -0.715196 -2.40715 -1.69195 VPS13A 0.798973 4.31515 1.68434 VPS13A 2.43319 1.07596 -1.35723 WDFY3 3.73341 6.11263 0.777948 WDFY3 0.711299 -2.26275 -2.97405 WLS 16.5603 6.35448 2.07359 WLS -1.38188 -2.99753 -1.61565 XK 1.3401 3.7637 33.7368 XK 1.48981 4.65391 3.1641 XKR8 40.979 13.73 7.66657 XKR8 -1.57756 -2.41823 -0.840674 XPO7 9.27203 20.559 213.182 XPO7 1.14881 4.52306 3.37424 YES1 0.784006 1.70053 9.21357 YES1 1.11705 3.55482 2.43778 YOD1 2.50728 5.80819 88.1617 YOD1 1.21196 5.13596 3.92399 YPEL4 1.17727 2.72304 74.6667 YPEL4 1.20977 5.98694 4.77718 ZDHHC21 0.705609 3.33196 1.78308 ZDHHC21 2.23943 1.33743 -0.901997 ZFAND4 1.35066 2.76044 12.8486 ZFAND4 1.03123 3.24987 2.21864 ZNF192 0.704308 6.79495 8.40603 ZNF192 3.27019 3.57715 0.30696 ZNF23 2.45498 5.94737 28.4817 ZNF23 1.27654 3.53625 2.25971 ZNF467 76.4189 35.4913 8.02006 ZNF467 -1.10646 -3.25225 -2.14578 ZRANB1 5.88213 8.0711 38.7968 ZRANB1 0.456425 2.72153 2.2651

65

APPENDIX B) Supplemental Table 1)

ONTOLOGY TERMS COUNT P- EXAMPLE GENES VALUE

Erythrocyte 8 3.2E-5 BPGM 4.50*, BCL6 -3.16, KLF1 2.42, SOX6 4.08, 4.13 7.15 2.50 homeostasis TAL1 , DYRK3 **, EPB42 , TRIM10 3.86 15 1.4E-4 BPGM 5.45*, BCL3-2.34, BCL6-3.16, KLF12.42, RASGRP4-3.07, SOX64.08, TAL14.13, Hemopoiesis 6.8E-4 DYRK37.15**, EGR1 1.27, EPB42 2.50, AHSP 4.55, MMP9 -3.19,

PBX1 2.59, SPTA15.96*, TRIM10 3.86 Iron ion binding 18 8.1E-5 CISD2 3.13, RFESD 4.88, STEAP4 -2.56, C5orf4 3.16, CYP1B1 -1.28, CYP4F3 -3.0, FECH 4.20, HBA2, HBA1, HBB, HBE1 inf, HBG1 4.2, HBM 3.89, HBQ1 2.05, ISCA1 2.4, LTF 2.5, MPO 2.68, RSAD2 3.63,

SLC11A1 -3.05 Hemoglobin's 6 6.0E-6 GATA1 1.96, CPOX 2.84, AHSP 4.55, FECH 4.2, 3.6 Chaperone HBB, HMBS

Inflammatory response 14 7.8E-3 CCR3 -3.52, C5 4.27, FPR2 -3.27, IL8 -1.39, KRT1 3.28, LY96 -2.09, MEFV -2.76, MMP25 -3.9, PROK2 -4.59 ””, SLC11A1-3.05, TLR6 -1.99, TLR8 -1.85 , TFRC 4.99, VNN1 -2.4 Defense response 23 2.6E-3 BCL3 -2.34, MEFV -2.76, BPI 2.49, CCR3 -3.52, C5 4.27, DEFA3 1.47, FPR2-3.27, IL8 -1.39, KRT13.28, LTF 2.5, LILRA2 -2.69, LILRA3 -3.71, LILRB3 - 2.81, LY96 -2.09, MMP25 -3.9, MPO 2.68, PROK2 - 4.59 ””, RSAD2 3.63, SLC11A1 -3.05, TLR6 -1.99, TLR8 -1.85, TFRC 4.99, VNN1 -2.4 Response to oxidative 8 3.4E-2 CRYAB inf, EGFR 5.13, GCLC 2.97, JUN -0.02, 3.28 2.68 1.1 -2.4 stress KRT1 , MPO , SELK , VNN1

Blood group antigen 8 6.0E-7 ART4 inf, KEL 2.51, XK 4.65, AQP1 3.309, BCAM 4.5, ERMAP 5.06, GYPA 7.66**, SLC11A1 -3.05

Response to bacterium 11 3.3E-3 BCL3 -2.34, ADM -3.42, BP1 2.49, DEFA3 1.47, IRAK3 -1.11, JUN -0.02, LTF 2.5, LY96 -2.09,

66

SLC11A1 -3.05, THBD -3.7, TLR6 -1.99 Regulation of apoptosis 23 4.6E-2 BCL3 -2.34, BCL6 -3.16, CITED2 2.79, NLRP12 - 2.64

, ARHGEF12 4.01, BIRC2 3.36, COL18A1 -2.79

, CRYAB inf, DAPK2 -2.91, DYNLL1 2.1, EGFR 5.13, FEM1B 5.84, GCLC 2.97, HSPB1 3.25, HSPA1B 4.904, HSPA1A 2.43, ITSN1 3.03, JUN - 0.02, MMP9 -3.19, MPO2.68, NRG1-3.14, NET1 3.84, PROK2 -4.59 ””,

VNN1-2.4 Gas transport 8 1.9E-8 AQP1 3.309, CA2 1.67, HBA1, HBA2, HBB, HBE1 inf, HBG1 4.2, HBM 3.89, HBQ1 2.05 Iron ion homeostasis 5 4.3E-3 ABCB6 2.76, EPB42 2.50, LTF 2.5, SLC11A1 -3.05, TFRC 4.99 Chemotaxis 10 3.1E-3 CMTM2 -3.58, CCR3 -3.52, CCRL2 3.29, CXCL5 - 0.85, CKLF -2.12, C5 4.27, FPR2 -3.27, IL8 -1.39, (Chemokine activity) PLAUR -2.15 , PROK2 -4.59 ”” Regulation of cell shape 5 1.8E-2 ARAP3 -3.1, CDC42EP2 -2.44, EPB42 2.50, LST1 - 2.84, SPTA1 5.96*

Immunoglobulin-like 24 2.9E-4 FCGRT -2.81, FCRL3 2.4, FCRL5 2.64, 6.83 4.5 -3.93 fold HEPACAM2 **, BCAM , BTNL8 , CEACAM3 -2.83, CEACAM4 -2.86, ERMAP 5.06, 2.50 -4.4 -2.29 - EPB42 , IL1R2 ””, LRFN1 , LILRA2 2.69, LILRA3 -3.71, HLA-DQA2 -0.98, NRG1-3.14, NFATC2 1.97, OSCAR -2.31, PILRA -2.62, SIGLEC14 -2.75, SIGLEC5 -3.12, SIRPB1 -2.44, SIRPB2 -2.49, TREM1 -2.64 Cofactor metabolic 11 3.6E-3 ACSL6 2.6, ALDH5A1 3.44, CPOX 2.84, FECH 4.2 2.97 3.6 2.4 3.33 process , GCLC , HMBS , ISCA1 , PNP , PANK3 3.45, PPCDC -2.34, VNN1 -2.4

Protein kinase cascade 13 4.5E-2 BCL3 -2.34, NLRP12 -2.64, RASGRP3 2.44, C5 4.27, CRYAB inf, DAPK2 -2.91, EGFR 5.13, LY96 - 2.09, LPAR2 -2.73, MAP4K5 2.09, PROK2 -4.59 ””, TLR6 -1.99, TLR8 -1.85

67

Disulfide bond 70 1.5E-3 KIAA1324 -3.22, CTSL1 3.35, CCRL2 3.29, CXCL5 -0.85, CLIC2 5.35, ST3GAL4 -2.14, ST6GALNAC2 -2.81, ALDH5A13.44, DLL3 4.34, DPEP2 -2.23, EMR3-3.87, IFI30 -1.13, HRH2 -2.75, KREMEN1 -2.75, LRG1-3.05, MGAM -3.58, MME -3.14, NTNG2 -2.31, PLAUR -2.15, PTN 4.92, PRRG4 -2.47, PTGDS -2.07, PTP4A3 -2.35, SCARF1-2.59, SOSTDC1inf, SFRP25.35, TNFRSF10C -3.59

Carboxylic acid 8 2.0E-2 XK 4.65, AQP9 -3.64, CPT1B -2.03, SLC11A1 -3.05, 4.59 -2.52 -2.87 transport SLC16A1 , SLC16A3 , SLC19A1 , SLC7A54.65

Regulation of cytokine 8 5.3E-2 BCL3 -2.34, BCL6- 3.16, NLRP12 -2.64, BPI 2.49, -1.11 -3.05 -1.99 - production IRAK3 , SLC11A1 , TLR6 , TLR8 1.85

Membrane fraction 24 2.9E-2 ABCC4 3.38, RASGRP4 -3.07, ACSL6 2.6, NCEH1 2.41, CEACAM4 -2.86, CPT1B -2.03, CSPG5 5.53*, CYP4F3 -3.0, DGAT2-3.93, DYNLL12.1, GYPA7.66**, HSPA132.51, ITSN13.03, KREMEN1-2.75, MME -3.14, SLC16A1 4.59, SLC16A3-2.52, SLC19A1-2.87, SLC2A14.69, STX3-2.49, YES13.55

Supplemental Table 1. Gene ontology analysis. Selected gene ontology terms that were significantly enriched in the differentially expressed genes in ß-thalassemia. The genes in blue color are the ones that have not been reported in the sickle cell paper, so they are thalassemia-specific. The number followed by each gene’s name indicates the log2(fold-change) of the pairwise comparison of the D vs. N. Positive numbers (up- regulation) is written with pink color, whereas negative numbers (down-regulation) is written in blue.

68

APPENDIX C.

RNA-Seq data analysis commands:

***(The letters N, AH, and K are abbreviations for the first names of Normal, Mother, and Daughter samples respectively. During the RNA-Seq data analysis steps, I used the abbreviations above, for the data from the corresponding sample).

###Size of each file: foroughs-MacBook-Pro:Project_FT05042015 forough$ du -h 5.7G ./Sample_AH 5.6G ./Sample_K 5.2G ./Sample_N 17G .

###compilation scripts

#/bin/bash zcat Sample_AH/*L002*R1*fastq.gz | gzip --best > Sample_AH_1.fastq.gz zcat Sample_AH/*L002*R2*fastq.gz | gzip --best > Sample_AH_2.fastq.gz

#/bin/bash zcat Sample_K/*L002*R1*fastq.gz | gzip --best > Sample_K_1.fastq.gz zcat Sample_K/*L002*R2*fastq.gz | gzip --best > Sample_K_2.fastq.gz

#/bin/bash zcat Sample_N/*L002*R1*fastq.gz | gzip --best > Sample_N_1.fastq.gz zcat Sample_N/*L002*R2*fastq.gz | gzip --best > Sample_N_2.fastq.gz

###submit qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=6:00:00 ./compileAH qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=6:00:00 ./compileK qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=6:00:00 ./compileN

#output Sample_AH_1.fastq.gz Sample_AH_2.fastq.gz Sample_K_1.fastq.gz Sample_K_2.fastq.gz Sample_N_1.fastq.gz Sample_N_2.fastq.gz

### Convert fastq files into fasta files

qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=8:00:00 ./generatefastaFromFastaqz Sample_AH_1.fastq.gz Sample_AH_1 qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=8:00:00 ./generatefastaFromFastaqz Sample_AH_2.fastq.gz Sample_AH_2 qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=8:00:00 ./generatefastaFromFastaqz Sample_K_1.fastq.gz Sample_K_1

69 qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=8:00:00 ./generatefastaFromFastaqz Sample_K_2.fastq.gz Sample_K_2 qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=8:00:00 ./generatefastaFromFastaqz Sample_N_1.fastq.gz Sample_N_1 qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=8:00:00 ./generatefastaFromFastaqz Sample_N_2.fastq.gz Sample_N_2

#output Sample_AH_1.fasta.gz Sample_AH_2.fasta.gz Sample_K_1.fasta.gz Sample_K_2.fasta.gz Sample_N_1.fasta.gz Sample_N_2.fasta.gz

### Alignments with STAR

[ftaghavi@login3 PROJECT]$ mkdir Sample_AH_STAR [ftaghavi@login3 PROJECT]$ mkdir Sample_N_STAR [ftaghavi@login3 PROJECT]$ mkdir Sample_K_STAR

[ftaghavi@login3 PROJECT]$ mv Sample_AH*fasta.gz Sample_AH_STAR/ [ftaghavi@login3 PROJECT]$ mv Sample_K*fasta.gz Sample_K_STAR/ [ftaghavi@login3 PROJECT]$ mv Sample_N*fasta.gz Sample_N_STAR/

[ftaghavi@login3 PROJECT]$ cp tostarALLPairedGTFParam ParametersSTAR.txt Sample_AH_STAR/ [ftaghavi@login3 PROJECT]$ cp tostarALLPairedGTFParam ParametersSTAR.txt Sample_K_STAR/ [ftaghavi@login3 PROJECT]$ cp tostarALLPairedGTFParam ParametersSTAR.txt Sample_N_STAR/ cd Sample_AH_STAR/ qsub -cwd -o $PWD -e $PWD -l exclusive,h_data=4G,h_rt=6:00:00 -pe shared 8 ./tostarALLPairedGTFParam Sample_AH

[ftaghavi@login3 PROJECT]$ cd Sample_K_STAR qsub -cwd -o $PWD -e $PWD -l exclusive,h_data=4G,h_rt=6:00:00 -pe shared 8 ./tostarALLPairedGTFParam Sample_K

[ftaghavi@login3 PROJECT]$ cd Sample_N_STAR/ qsub -cwd -o $PWD -e $PWD -l exclusive,h_data=4G,h_rt=6:00:00 -pe shared 8 ./tostarALLPairedGTFParam Sample_N

#output Sample_AH.sam Sample_K.sam Sample_N.sam

###Converts sams into sorted bams

*Script for sam to bam

#!/bin/bash . /u/local/Modules/default/init/modules.sh module load samtools

70 samtools view ${1} -Sbh -o ${1/sam/bam} samtools sort ${1/sam/bam} ${1/sam/srt}

[ftaghavi@login3 PROJECT]$ ls */*sam | while read line ; do qsub -cwd -o /u/scratch/f/ftaghavi/PROJECT/ -e /u/scratch/f/ftaghavi/PROJECT/ -l h_rt=6:00:00,h_data=4G ./sortALLalignment $line ; done

************************************************************************************** ### Use sorted bams to compute coverage and transform into Bwig format

Step 1. Generate coverage files

#save as "sendCov". Copy ################

#!/bin/bash

. /u/local/Modules/default/init/modules.sh module load bedtools genomeCoverageBed -bg -ibam $1 -split -g /u/scratch/d/dcasero/BIG/hg19samheader.txt > ${1/bam/cov}

################

#submit ls */*srt.bam | while read line ; do qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=24:00:00 ./sendCov $line done ls */*srt.bam | while read line ; do qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=24:00:00 ./sendCov $line ; done

#output

Sample_AH_STAR/Sample_AH_STARAligned.out.srt.cov Sample_K_STAR/Sample_K_STARAligned.out.srt.cov Sample_N_STAR/Sample_N_STARAligned.out.srt.cov

Step 2. Normalization

#compute the total signal ls */*.cov | while read line ; do echo $line ; cat $line | awk '{SUM+=$4} END {print SUM}' ; done

*ANSWER* Sample_AH_STAR/Sample_AH_STARAligned.out.srt.cov 7,981,694,298 ~ 8.0 Sample_K_STAR/Sample_K_STARAligned.out.srt.cov 13,263,516,746 ~ 13.3 Sample_N_STAR/Sample_N_STARAligned.out.srt.cov 8,335,633,208 ~ 8.3

71

###Normalize cat Sample_AH_STAR/Sample_AH_STARAligned.out.srt.cov | /u/scratch/d/dcasero/BIG/DCswitchchrNorm.pl -N 8.0 -t -T /u/scratch/d/dcasero/BIG/hg19chrtable.txt > Sample_AH_STAR/Sample_AH_STARAligned.out.srt.norm.cov cat Sample_K_STAR/Sample_K_STARAligned.out.srt.cov | /u/scratch/d/dcasero/BIG/DCswitchchrNorm.pl -N 13.3 -t -T /u/scratch/d/dcasero/BIG/hg19chrtable.txt > Sample_K_STAR/Sample_K_STARAligned.out.srt.norm.cov cat Sample_N_STAR/Sample_N_STARAligned.out.srt.cov | /u/scratch/d/dcasero/BIG/DCswitchchrNorm.pl -N 8.3 -t -T /u/scratch/d/dcasero/BIG/hg19chrtable.txt > Sample_N_STAR/Sample_N_STARAligned.out.srt.norm.cov

Step 3. Conversion to Bwig files

#save script as "sendBwig" vim sendBwig

#!/bin/bash

/u/scratch/d/dcasero/BIG/bedGraphToBigWig $1 /u/scratch/d/dcasero/BIG/hg19.chrom.sizes ${1/cov/Bwig}

############################################ ls */*norm.cov | while read line ; do qsub -cwd -o $PWD -e $PWD -l h_data=2048M,h_rt=2:00:00 ./sendBwig $line done

ls */*norm.cov | while read line ; do qsub -cwd -o $PWD -e $PWD -l h_data=4G,h_rt=2:00:00 ./sendBwig $line ; done

************************************************************************************** UCSC TRAKS (The track files to be see in the UCSC BROWTHER) track type=bigWig description="Sample_AH, linear coverage, scale factor 8.0" name="Sample_M" visibility=full color=100,149,237 graphType=bar autoScale=on viewLimits=0:25 alwaysZero=on maxHeightPixels=32:64:128 priority=1 bigDataUrl=http://danio.pellegrini.mcdb.ucla.edu/~davidc/DavidForBIG/Sample_AH _STARAligned.out.srt.norm.Bwig track type=bigWig description="Sample_K, linear coverage, scale factor 13.3" name="Sample_D" visibility=full color=255,102,102 graphType=bar autoScale=on viewLimits=0:25 alwaysZero=on maxHeightPixels=32:64:128 priority=1 bigDataUrl=http://danio.pellegrini.mcdb.ucla.edu/~davidc/DavidForBIG/Sample_K_ STARAligned.out.srt.norm.Bwig track type=bigWig description="Sample_N, linear coverage, scale factor 8.3" name="Sample_N" visibility=full color=50,205,50 graphType=bar autoScale=on viewLimits=0:25 alwaysZero=on maxHeightPixels=32:64:128 priority=1 bigDataUrl=http://danio.pellegrini.mcdb.ucla.edu/~davidc/DavidForBIG/Sample_N_ STARAligned.out.srt.norm.Bwig

72 http://danio.pellegrini.mcdb.ucla.edu/~davidc/DavidForBIG/Sample_AH_STARAligned.out.spliced .10.bed http://danio.pellegrini.mcdb.ucla.edu/~davidc/DavidForBIG/Sample_K_STARAligned.out.spliced.1 0.bed http://danio.pellegrini.mcdb.ucla.edu/~davidc/DavidForBIG/Sample_N_STARAligned.out.spliced. 10.bed

**************************************************************************************

###MAKING BAI FILE FROM SRT.BAM TO SEE THEM ON THE igv input = Sample_AH_STARAligned.out.srt.bam output = Sample_AH_STARAligned.out.srt.bam.bai

[ftaghavi@n2196 PROJECT]$ module load samtools [ftaghavi@n2196 Sample_AH_STAR]$ samtools index Sample_AH_STARAligned.out.srt.bam

[ftaghavi@login2 Sample_K_STAR]$ samtools index Sample_K_STARAligned.out.srt.bam

[ftaghavi@login1 Sample_N_STAR]$ samtools index Sample_N_STARAligned.out.srt.bam

Then in ordet to see from the Igv: I go to the Igv app, and from File-> upload file -> ...... srt.bam ( .....srt.bam.bai has to be already in the same directly in the computer, where we also saved the ...srt.bam file) * I have bothfile on my download folder

**************************************************************************************

@@@@@@@ CUFFQUANT @@@@@@

ftaghavi@n258 SRR316665_cuff]$ cat cuffquantALL #!/bin/bash . /u/local/Modules/default/init/modules.sh module load cufflinks/2.2.1 cuffquant -o $PWD -p 4 -b /u/scratch/f/ftaghavi/BIG/Homo_sapiens.GRCh37.71.dna.primary_assembly.Cleanheader.fa -u /u/scratch/f/ftaghavi/BIG/genesUCSChg19.NM.clean.gtf /u/scratch/f/ftaghavi/PROJECT/${1}/${1}Aligned.out.srt.bam

#qsub for cuffquant (with David's fa and gtf): [ftaghavi@n258 CUFFLINKS]$ cat samples | while read line ; do mkdir ${line}_cuffquant ; cp cuffquantALL ${line}_cuffquant ; cd ${line}_cuffquant ; qsub -cwd -o $PWD -e $PWD -l exclusive,h_data=4G,h_rt=8:00:00 -pe shared 8 ./cuffquantALL $line ; cd .. ; done

**************** [ftaghavi@login1 PROJECT]$ cat cuffquantALL2 #!/bin/bash . /u/local/Modules/default/init/modules.sh module load cufflinks/2.2.1

73 cuffquant -o $PWD -p 8 -b /u/scratch/f/ftaghavi/BIG/Homo_sapiens.GRCh37.71.dna.primary_assembly.Cleanheader.fa -u /u/scratch/f/ftaghavi/BIG/genesUCSChg19.NM.clean.gtf /u/scratch/f/ftaghavi/PROJECT/${1}/${1}Aligned.out.srt.bam [ftaghavi@login1 PROJECT]$ cat samples2 | while read line ; do mkdir ${line}_cuffquant2 ; cp cuffquantALL2 ${line}_cuffquant2 ; cd ${line}_cuffquant2 ; qsub -cwd -o $PWD -e $PWD -l h_data=4G,h_rt=16:00:00 -pe shared 8 ./cuffquantALL2 $line ; cd .. ; done

cat samples3 | while read line ; do mkdir ${line}_cuffquant3 ; cp cuffquantALL3 ${line}_cuffquant3 ; cd ${line}_cuffquant3 ; qsub -cwd -o $PWD -e $PWD -l exclusive,h_data=4G,h_rt=24:00:00 -pe shared 8 ./cuffquantALL3 $line ; cd .. ; done

******************************************** CUFFQUANT WITH MASKING HBA1 & HBA2 & UNLIMITTED COUNT ABILITY *********************************************

CUFFQUANT didn't work for the sample K because I had a lot of multi signaling reads for sample K.

Now I Want to mask HBA1 and HBA2 genes, to get rid of that problem.

######## -grep all the HBA1 and HBA2 information from the main gtf file , and -save them as a new file - use them for masking by -M/–mask-file option,during the cuffquant

[ftaghavi@login3 BIG]$ grep -e HBA1 -e HBA2 genesUCSChg19.NM.clean.gtf > HBA1-HBA2.gtf

#!/bin/bash . /u/local/Modules/default/init/modules.sh module load cufflinks/2.2.1 cuffquant -o $PWD -p 8 --max-bundle-frags unlim -b /u/scratch/f/ftaghavi/BIG/Homo_sapiens.GRCh37.71.dna.primary_assembly.Cleanheader.fa -u -M HBA1- HBA2.gtf /u/scratch/f/ftaghavi/BIG/genesUCSChg19.NM.clean.gtf /u/scratch/f/ftaghavi/PROJECT/${1}/${1}Aligned.out.srt.bam cat samples | while read line ; do mkdir ${line}_cuffquant4 ; cp cuffquantALL4 HBA1-HBA2.gtf ${line}_cuffquant4 ; cd ${line}_cuffquant4 ; qsub -cwd -o $PWD -e $PWD -l exclusive,h_data=4G,h_rt=8:00:00 -pe shared 8 ./cuffquantALL4 $line ; cd .. ; done

**************************************************************************************

@@@@@@@ CUFFDIFF @@@@@@@@@@

FOR MASKED CUFFQUANT SAMPLES

[ftaghavi@login4 PROJECT]$ cat cuffdiffALL #!/bin/bash

74

. /u/local/Modules/default/init/modules.sh module load cufflinks/2.2.1 cuffdiff --output-dir /u/scratch/f/ftaghavi/PROJECT/CUFFDIFF --labels N,AH,K --num-threads 4 --min- alignment-count 10 --FDR 0.05 --frag-bias-correct /u/scratch/f/ftaghavi/BIG/Homo_sapiens.GRCh37.71.dna.primary_assembly.Cleanheader.fa /u/scratch/f/ftaghavi/BIG/genesUCSChg19.NM.clean.gtf N-Mask.cxb AH-Mask.cxb K-Mask.cxb qsub -cwd -o $PWD -e $PWD -M [email protected] -m bea -l exclusive,h_data=4G,h_rt=4:00:00 -pe shared 8 ./cuffdiffALL

*************************************************************************************#

###Working with cuffdiff generated files, in order to identify each sample’s specific genes and generating matices.

[ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-C]$ grep yes$ gene_exp.diff > SIG-DEgenes- 5000000

[ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-C]$ grep yes$ gene_exp.diff |sort -k 2,2 > SIG-DEgenes-sorted-5000000

[ftaghavi@login4 CUFFDIFF-cuffquant5000000-maskHBA-C]$ cut -f 2 SIG-DEgenes-sorted-5000000 | uniq > SIG-DEgenes-sorted-uniq-list-5000000

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ wc -l SIG-DEgenes-sorted-uniq-list- 5000000 365 SIG-DEgenes-sorted-uniq-list-5000000

### Get vs condition in 4 columns

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ grep "N.AH" gene_exp.diff | awk '{print $1"\t"$5 "\t"$6 "\t"$14}' > N-vs-AH

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ grep [[:space:]]N[[:space:]]K[[:space:]] gene_exp.diff | awk '{print $1"\t"$5 "\t"$6 "\t"$14}' > N-vs-K

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ grep "AH.K" gene_exp.diff | awk '{print $1"\t"$5 "\t"$6 "\t"$14}' > AH-vs-K

### Create Binary files

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ cat N-vs-AH | sed 's/no/0/g;s/yes/1/g' > N-vs-AH_binary grep 1$ N-vs-AH_binary | wc -l 40 grep 0$ N-vs-AH_binary | wc -l 18989

75

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ cat N-vs-K | sed 's/no/0/g;s/yes/1/g' > N- vs-K_binary grep yes$ N-vs-K | wc -l 284 grep no$ N-vs-K | wc -l 18745

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ cat AH-vs-K | sed 's/no/0/g;s/yes/1/g' > AH-vs-K_binary grep 1$ AH-vs-K_binary | wc -l 156 grep 0$ AH-vs-K_binary | wc -l 18873

### Getting the list of $yes in each Binary file

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ grep 1$ N-vs-AH_binary | cut -f 1 > SIG- DEgenes-List-N-vs-AH-5000000

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ grep 1$ N-vs-K_binary | cut -f 1 > SIG- DEgenes-List-N-vs-K-5000000

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ grep 1$ AH-vs-K_binary | cut -f 1 > SIG- DEgenes-List-AH-vs-K-5000000

### Capture the SEG-DE-genes from each Binary files

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ cat SIG-DEgenes-sorted-uniq-list- 5000000 | while read line ; do grep ^$line[[:space:]] N-vs-AH_binary >> SIG-N-vs-AH_binary ; done

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ cat SIG-DEgenes-sorted-uniq-list- 5000000 | while read line ; do grep ^$line[[:space:]] N-vs-K_binary >> SIG-N-vs-K_binary ; done

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ cat SIG-DEgenes-sorted-uniq-list- 5000000 | while read line ; do grep ^$line[[:space:]] AH-vs-K_binary >> SIG-AH-vs-K_binary ; done

### Make Binary comparison files

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-C]$ paste SIG-N-vs-AH_binary SIG-N-vs- K_binary SIG-AH-vs-K_binary | cut -f 1,4,8,12 > SIG-DEgenes-BINARY_COMP

### Find signiture of each group N vs AH N vs k AH vs K

76

[ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-R]$ cat SIG-DEgenes-BINARY_COMP | awk '{if ($2==1 && $3==1 && $4==0) print $1}' > N-SPECIFIC-GENES-5000000 [ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-R]$ wc -l N-SPECIFIC-GENES-5000000 7 N-SPECIFIC-GENES

[ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-R]$ cat SIG-DEgenes-BINARY_COMP | awk '{if ($2==1 && $3==0 && $4==1) print $1}' > AH-SPECIFIC-GENES-5000000 [ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-R]$ wc -l AH-SPECIFIC-GENES-5000000 5 AH-SPECIFIC-GENES

[ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-R]$ cat SIG-DEgenes-BINARY_COMP | awk '{if ($2==0 && $3==1 && $4==1) print $1}' > K-SPECIFIC-GENES-5000000 [ftaghavi@login2 CUFFDIFF-cuffquant5000000-maskHBA-R]$ wc -l K-SPECIFIC-GENES-5000000 103 K-SPECIFIC-GENES

### Making gct format file --> to generate heatmap

[ftaghavi@login4 CUFFDIFF-COMPARISON]$ head TOTAL-DEgenes-EXCEPTglobins-ALLLL-index- MATRIX2.gct #1.2 391 3 Gene index N AH K AATK 1 26.4413 13.2951 3.38882 ABCB6 2 2.11679 1.51102 14.3561 ABCC4 3 0.87849 3.06414 9.18128 ACCN4 4 0 0.0211235 0.820067 ACSL6 5 1.17133 3.59942 7.11653 ADM 6 43.8308 13.7099 4.08603 AGPAT9 7 13.8026 11.2665 3.53578

@@@@@@@@@@@@@@@@@ TO TEST THE EFFECT OF GLOBIN MASKING @@@@@@@@@@@@@@@@@

[ftaghavi@login1 N-NOmasking-vs-ALLglobinMasking-R]$ pwd /u/scratch/f/ftaghavi/TEST/CUFFDIFF/CUFFDIFF-COMPARISON/NOmasking-vs- ALLglobinMasking/N-NOmasking-vs-ALLglobinMasking-R

# TO MAKE A MATRIX OF 3 COLUMN (GENE_ID, FPKM (NOMasking), FPKM(MaskingALLGlobins) [ftaghavi@login1 N-NOmasking-vs-ALLglobinMasking-R]$ awk '{print $2"\t"$8 "\t"$9}' gene_exp.diff | grep -v [[:space:]]0[[:space:]]0$ > FPKM-N-NOmasking-vs-MaskingALLGlobins

Generating R plot: setwd("/Users/forough/Downloads")

fpkms <- read.table("FPKM-N-NOmasking-vs-MaskingALLGlobins-MATRIX")

77

plot(log10(fpkms[,1]+1),log10(fpkms[,2]+1)) lines(lowess(log10(fpkms[,1]+1),log10(fpkms[,2]+1)),col="red")

> cor(log10(fpkms[,1]+1),log10(fpkms[,2]+1)) [1] 0.9971939

> setwd("/Users/forough/Downloads") > fpkms <- read.table("FPKM-N-NOmasking-vs-MaskingALLGlobins-MATRIX") > plot(log10(fpkms[,1]+1),log10(fpkms[,2]+1)) > lines(lowess(log10(fpkms[,1]+1),log10(fpkms[,2]+1)),col=“red”) Error: unexpected input in "lines(lowess(log10(fpkms[,1]+1),log10(fpkms[,2]+1)),col=‚" > lines(lowess(log10(fpkms[,1]+1),log10(fpkms[,2]+1)),col="red") > cor(log10(fpkms[,1]+1),log10(fpkms[,2]+1)) [1] 0.9971939

@@@@@@@@@@@@@@@@@@@@ CHENERATING MATRICES for ALL the genes @@@@@@@@@@@@@@@@@@@@ awk '{print $2"\t"$5 "\t"$6 "\t"$8"\t"$9"\t"$10"\t"$14}' gene_exp.diff | sort -k 1,1 > NEW-MATRIX1

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ ./DCperlexample.pl genes.read_group_tracking > MARTIX

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ ./DCperlexample2.pl NEW-MATRIX1 > MATRIX-FoldChange

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ awk '{print $1"\t"$3"\t"$4"\t"$2}' MATRIX-FoldChange > MATRIX-FoldChange-2

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ paste MARTIX MATRIX-FoldChange-2 > MATRIX-FPKM-FOLDCHANGE-1

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ ./DCperlexample.pl A- genes.read_group_tracking > MARTIX-FPKM [ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ awk '{print $2"\t"$5 "\t"$6 "\t"$8"\t"$9"\t"$10"\t"$14}' A-gene_exp.diff | sort -k 1,1 > MATRIX-FoldChange-000 [ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ ./DCperlexample2.pl MATRIX- FoldChange-000 > MATRIX-FoldChange-000-1 [ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ awk '{print $1"\t"$3"\t"$4"\t"$2}' MATRIX-FoldChange-000-1 > MATRIX-FoldChange-000-2

[ftaghavi@login3 CUFFDIFF-cuffquant5000000-maskHBA-R]$ paste MARTIX-FPKM MATRIX- FoldChange-000-2 > MATRIX-FPKM-FOLDCHANGE-000-3

78