bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Cis- and Trans-Acting Expression Quantitative Trait Loci Differentially Regulate Gamma- Globin

Gene Expression

Elmutaz M. Shaikho1, 2, John J. Farrell1, David H. K. Chui1, Paola Sebastiani3, Martin H. Steinberg1

1Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA

2Bioinformatics Program, Boston University, Boston, MA 02215, USA

3Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA

Acknowledgments: Funded in part by R01 HL 068970, RC2 HL 101212, R01 87681 (MHS), T32

HL007501 (EMS), from the NIH Bethesda, MD.

Conflict-of-interest disclosure: The authors declare no competing interests.

Correspondence: Martin H. Steinberg, 72 E. Concord St., Boston, MA 02118, 617-414-1020, FAX 617-

414-1021; [email protected]

Contribution: PS, JJF, DHKC, MHS supervised the research and edited the paper, EMS conceived the

the research, performed the analysis, and wrote the paper.

Keywords: g-globin, fetal hemoglobin, eQTLs, BCL2L1, globin expression

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

ABSTRACT

Genetic association studies have detected two trans-acting quantitative trait loci (QTL) on

2, 6 and one cis-acting QTL on 11 that were associated with fetal hemoglobin (HbF) levels.

In these studies, HbF was expressed as a percentage of total hemoglobin or the number of erythrocytes

that contain HbF (F-cells). As the g-globin chains of HbF are encoded by two non-allelic (HBG2,

HBG1) that are expressed at different levels we used normalized and genotype data from

The Genotype-Tissue Expression (GTEx)-project to study the effects of cis- and trans-acting HbF

expression or eQTL. This allowed us to examine mRNA expression of HBG2 and HBG1individually. In

addition to studying eQTL for globin genes we examined genes co-expressed with HBG1, studied

upstream regulators of HBG1 co-expressed genes and performed a correlation analysis between HBG2

and HBG1 and known HbF regulators. Our results suggest differential effect of cis and trans-acting QTL

on HBG and HBG1 expression. Trans-acting eQTLs have the same magnitude of effect on the expression

of both HBG2 and HBG1 while the sole cis-acting eQTL affected only HBG2. Furthermore, the analysis

of upstream regulators and the correlation analysis suggested that BCL2L1 might be a new potential trans-

acting HbF activator. HbF is the major modulator of the phenotype of sickle cell anemia and b

thalassemia. Depending on the effect size, modification of trans-acting elements might have a greater

impact on HbF levels than cis-acting elements alone.

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Introduction

Genome-wide and other genetic association studies have detected many single nucleotide polymorphisms

(SNPs) in chromosomes 2, 6, and 11 that were associated with fetal hemoglobin (HbF) levels in normal

individuals and in patients with sickle cell anemia and b thalassemia (Thein et al. 1987; Craig et al. 1996;

Garner et al. 1998; Uda et al. 2008; Mtatiro et al. 2014). These studies used either HbF protein

levels-expressed as a percentage of total hemoglobin- or the percentage of all erythrocytes that had HbF

detected by immunofluorescence (F-cells) as a quantitative trait. HbF is a tetramer composed of two a

and two g-polypeptide subunits. The g-globin subunits, which characterize HbF, are encoded by two

closely linked genes-from 5’ to 3’, HBG2 and HBG1 (together, HBG). The respective polypeptides of the

g-globin genes differ by only a single amino acid in position 136; glycine in the Gg chain (HBG2) and

alanine in the Ag chain (HBG1) (Schroeder et al. 1968). In adults, these genes are expressed at a ratio of

2:3 (Terasawa et al. 1980). Expression quantitative trait loci (eQTLs) are genomic loci that lead to

variability in mRNAs expression levels and can be identified by analyzing global RNA expression using

the expression levels of genes as quantitative phenotypes. Differential expression of HBG in whole blood

can be detected by mRNA analysis that may provide an understanding of whether eQTL associated with

HBG affects the expression of one or both genes.

RNA sequence data (RNA-seq) and whole genome sequences from primary erythroid progenitors

expressing HBG would provide the ideal data set to study cis and trans-acting elements that regulate HbF

expression. However, such data are rarely available publicly. The Genotype-Tissue Expression (GTEx)-

project provides an alternative resource to study HBG gene expression, the regulatory elements of these

genes and the effects of genetic variation of these elements (GTEx_Consortium 2013). GTEx was

launched in 2010 to create a publicly available database of genotype and tissue expression. Samples come

from either deceased or surgical donor’s organ/tissues. These human tissue samples then undergo RNA-

seq and genome-wide SNP analysis. The tissues sampled include peripheral blood, which contains bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

reticulocytes that express hemoglobin genes. Reticulocytes account for about 10% of total hemoglobin

synthesis.

We performed a genome-wide association study (GWAS) to detect eQTLs associated with expression of

HBG in whole blood, other genes of the b-globin (HBB) gene cluster and with known HbF cis- and trans-

acting regulators. In addition, we examined genes co-expressed with HBG1 to discover enriched

pathways, studied upstream regulators of HbF co-expressed genes and performed correlation analysis

between HBG and known HbF regulators. A similar correlation analysis included BCL2L1, a potential

HbF activator we found in our analysis. We hypothesized that trans-acting elements that are transcription

factors affect the expression of both HBG2 and HBG1, while the known cis-acting QTL in the promoter

of HBG2 is likely to be associated with the expression of this gene only.

Results

Preliminary studies

GTEx whole blood RNA-seq data detected HbF eQTLs in BCL11A, HMIP and in the HBB gene cluster

using HbF gene expression as a surrogate for HbF protein levels, validating the utility of this data set for

examining HbF-associated eQTL (Table 1 and Fig. 1A-F).

Gene Symbol Variant Id SNP P-Value Effect Size T-Statistic Standard Error HBG1 2_60718043_T_G_b37 rs1427407 1.20E-11 -0.46 -7.1 0.065 HBG1 11_5263683_C_T_b37 rs10128556 0.89 -0.0085 -0.14 0.059 HBG1 6_135419018_T_C_b37 rs9399137 2.20E-15 0.51 8.4 0.061 HBG2 2_60718043_T_G_b37 rs1427407 1.10E-09 -0.46 -6.3 0.073 HBG2 11_5263683_C_T_b37 rs10128556 2.60E-16 0.51 8.7 0.058 HBG2 6_135419018_T_C_b37 rs9399137 1.70E-09 0.44 6.2 0.07

Table 1. SNPs from previous GWAS and their association with HBG1 and HBG2 expression in

whole blood samples available in the GTEx portal. bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 1. Effect of rs1427407 (2:60718043_T/G), rs10128556 (11:5263683_C/T), and rs9399137 (6:135418632_TTAC/T) on HBG1 and HBG2 expression in GTEx data set version 6. P-values are above genome-wide significance for rs1427407 (BCL11A, chr2) and rs9399137 (MYB, chr6) association with expression of both HBG genes. Rs10128556 (chr 11) is significantly associated with only HBG2 (p- bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

value of 2.60E-16) while it has no effect on HBG1(p-value 0.89) Het denotes heterozygous, and Homo, homozygous.

Genome-wide eQTL association analysis

Twenty-seven SNPs were eQTL associated with HBG1 (p-value ≤ 5 × 10−8). Of these 27, 21 were

intergenic variants in the HMIP region on chromosome 6p, five were BCL11A variants on chromosome

2p, and one was on chromosome 11p (Fig. 2A, Table S1). The SNP rs66650371, the functional 3-bp

deletion in the MYB enhancer, is one of the most significant SNPs in the HMIP region. The most

significant SNP of the five BCL11A variants was rs1427407, which in other studies was the functional

SNP in the erythroid-specific enhancer of this gene (Bauer et al. 2013).

Forty-nine SNPs meeting genome-wide significance levels were eQTL for HBG2. Seventeen were

intergenic variants in the HMIP region on chromosome 6, five were BCL11A variants on chromosome 2p,

26 were on chromosome 11p (Fig 2B, Table S2) and one was on chromosome 1. The most significant

SNP in chromosome 11 was rs7482144 and this was associated only with the expression of HBG2;

rs16912979, which is located in hypersensitive site (HS) 4 of the locus control region (LCR) of the HBB

gene complex, was also associated solely with HBG2 expression. The SNP rs66650371 (MYB) and

rs1427407 (BCL11A) were significantly associated with both HBG2 and HBG1. The conditional analysis

of HBG1 showed that after adjusting for the effects of rs1427407, rs66650371, and rs7482144, there was

no association on chromosome 2 or chromosome 6 (Fig. 2C, Table S3).

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

A

B

C

Figure 2. Manhattan plots for HBG eQTLs in 338 whole blood samples. Fig. 2A shows HBG1 eQTL;

Fig. 2B shows HBG2 eQTL; Fig. 2C shows HBG1 eQTL conditioned on rs66650371, rs1427407 and

rs7482144 genotype. The red line indicates genome-wide significance levels

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Conditional analysis of HBG2 showed that no SNP was significantly associated with HBG2 expression

after adjusting for rs7482144, rs1427407, and rs66650371 (Fig. S1, Table S4). There was no SNP with

genome-wide significance associated with expression of BCL11A and ZBTB7A (Figs. S2, S3; Tables S5,

S6). One SNP on chromosome 6 was associated with MYB expression, another on chromosome 6 was

associated with HBD expression, and one on chromosome 12 was associated with KLF1 expression (Figs.

S4, S5, S6; Tables S7, S8, S9). Variants on chromosome 1, 2, 6, 7, 14, and 20 were significantly

associated with HBB expression (Fig. S7, Table S10).

Association of the erythroid-specific BCL11A enhancer variant rs1427407 with whole blood gene

expression

Linear regression analysis showed that the genotype of rs1427407 was significantly associated with

HBG1 and HBG2 expression in whole blood (Table 2) after correction for multiple testing using FDR, p-

values were 2.04E-06 and 3.63E-06 for HBG1 and HBG2, respectively. This SNP was not a eQTL for any

other gene expressed in peripheral blood.

ENS_ID Gene logFC AveExpr P.Value adj.P.Val Symbol ENSG00000213934.5 HBG1 0.502491942 -4.32E-06 8.51E-11 2.04E-06 ENSG00000196565.8 HBG2 0.507932715 -7.56E-06 3.03E-10 3.63E-06 ENSG00000187017.10 ESPN 0.307905604 -1.45E-07 1.80E-05 0.143668299 ENSG00000239405.1 TMED10P2 0.362195583 0.002468525 7.07E-05 0.423972382 ENSG00000010626.10 LRRC23 0.277170914 2.20E-07 9.30E-05 0.427249648 ENSG00000248309.1 MEF2C-AS1 0.320432287 -9.14E-08 0.000106933 0.427249648 ENSG00000160401.10 CFAP157 - -8.98E-07 0.000164101 0.561998024 0.342869883 ENSG00000135414.5 GDF11 - -3.51E-08 0.000286461 0.777799594 0.163176468 ENSG00000255318.1 - - 0.000665361 0.000316391 0.777799594 0.358828046

Table 2. Effect of rs1427407 on whole blood gene expression. ENS_ID is ensemble gene ID, LogFC is

log fold change, aveExpr is average expression, and adj.P.Val is FDR adjusted p-value.

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

HBG1 differential co-expression analysis

Twelve genes were differentially co-expressed with HBG1 after FDR p-value adjustment (Table S11).

The pathway enrichment analysis for the top100 co-expressed genes did not include any known pathway

involved in HbF regulation among the top 10 enriched pathways. Of 452 upstream regulators that regulate

the top100 HBG1 co-expressed genes, 360 regulated BCL2L1 (Table S12). GATA1and KLF1 were also in

top five statistically significant upstream regulators.

Correlation analysis between known and potential HbF regulators and HBG

The correlation between HBG1 and HBG2 normalized gene expression and the expression of BCL11A,

KLF1, MYB, ZBTB7A, SIRT1 and BCL2L1 in the GTEx whole blood data set, showed that BCL2L1 had

correlation coefficients of 0.59 and 0.55 for HBG1 and HBG2, respectively with a p value of 2.2E-16.

(Table 3).

Regulators HBG1 Correlation P-value HBG2 Correlation P-value BCL2L1 0.59 2.20E-16 0.55 2.20E-16 KLF1 0.47 2.20E-16 0.44 2.20E-16 MYB 0.27 2.94E-07 0.27 2.8E-07 BCL11A 0.15 0.003914 0.17 0.001694 SIRT1 -0.06 0.238 -0.06 0.29 ZBTB7A -0.0045 0.9334 0. 067 0.212

Table 3. Pearson correlation between HBG and known or potential HbF regulators using RNA-seq

from 338 whole blood samples from GTEx.

In primary human fetal liver proerythroblasts, BCL2L1, KLF1, and SIRT1 were positively correlated with

HBG expression with correlation coefficients of 0.93, 0.92, and 0.62, respectively; BCL11A and MYB bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

were negatively correlated with HBG1 with correlation coefficients of -0.63, and -0.64, respectively;

ZBTB7A was weakly correlated with HBG1, and the p-value for association was not significant (Table 4).

Regulators HBG1 Correlation P-value HBG1 Correlation P-value BCL2L1 0.93 3.85E-07 0.93 3.85E-07 KLF1 0.92 9.24E-07 0.92 9.24E-07 MYB -0.64 0.01 -0.63 0.01 BCL11A -0.63 0.01 -0.63 0.01 SIRT1 0.62 0.01 0.62 0.01 ZBTB7A -0.38 0.17 -0.38 0.16

Table 4. Pearson correlation between HBG and known or potential HbF regulators using RNA-seq

primary human fetal liver proerythroblasts.

Discussion

Among eQTL significantly associated with HBG expression 21 were intergenic variants in HMIP on

chromosome 6p. The functional 3-bp deletion (rs66650371) was one of the most significant eQTL in this

MYB enhancer (Farrell et al. 2011; Stadhouders et al. 2014). The SNP rs66650371 was in strong LD with

20 other SNPs in chromosome 6p in Europeans, South and East Asians, and admixed Americans (Table

S13). A long non-coding or lncRNA containing the site of rs66650371was transcribed from this enhancer.

Its downregulation in an erythroid cell line that expressed adult hemoglobin (HbA) was associated with a

200-fold increase in HBG expression and a 20-fold increase in HbF (Morrison et al. 2017). The eQTL at

rs66650371 further supports the functional importance of this SNP. Five BCL11A SNPs in chromosome

2p were in very strong LD across populations (Table S14). The most significant, rs1427407, is a

functional erythroid-specific enhancer variant that altered the expression of BCL11A (Bauer et al. 2013).

Both rs1427407 and rs66650371are trans-acting elements and had the same magnitude of effect on HBG1

and HBG2 (Fig. 1C, 1D; Fig. 3). These observations are consistent with our hypothesis that trans-acting

elements affect expression of both g-globin genes. bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 3. Effect of rs66650371 (6:135418632_TTAC/T) genotypes on HBG1 and HBG2 expression

in 338 whole blood samples. P-values of genome-wide association are 6.49E-11 and 5.86E-09 for HBG1

and HBG2, respectively.

The SNP rs10128556 in the pseudogene HBBP1 on chromosome 11p was in strong LD with rs7482144

and was reported to have an effect independent of rs7482144 in African American sickle cell disease

patients and be the likely functional variant modulating cis-acting HbF gene expression (Galarneau et al.

2010). However, rs7482144 is located within the HBG2 promoter and highly linked to rs368698783 in

HBG1 promoter. The SNP rs368698783 together with rs7482144 are associated with reduced

methylation in six CpG sites flanking the transcription start site of HBG (Chen et al. 2017). Moreover,

rs7482144 alters a putative binding motif for ZBTB7A, a silencer of HBG expression (Masuda et al.

2016; Shaikho et al. 2016).

Strong evidence supporting differential regulation of HBG by cis- and trans-acting elements comes from

clinical observations of patients with the Arab Indian (AI) and Senegal haplotypes of the sickle bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

hemoglobin (HbS) gene and from the reports of hereditary persistence of HbF (HPFH) caused by point

mutations in HBG promoters. Patients with the AI and Senegal haplotypes, which are the only HBB

haplotypes containing rs7482144, had increased levels of HbF that was predominantly of the Gg-globin

type (Nagel et al. 1985; Ballas et al. 1991; Rahimi et al. 2015). Many mutations have been described in

the promoters of HBG2 and HBG1 that cause the phenotype of HPFH. For each of these mutations,

depending on the affected gene, either HBG2 or HBG1usually comprises more than 90% of total g globin

(Wood 2001). The T-C mutations at -175 and -173 in the promoter of HBG1 reactivated Ag-globin gene

expression in adult erythrocytes of transgenic mice while altering the binding of GATA-1 and Oct-1 and

(Liu et al. 2005). Moreover, a C-T polymorphism that is identical to rs7482144 but at position -158

relative to HBG1 affected HBG1 expression (Patrinos et al. 1998).

These results suggest that rs7482144 rather than rs10128556 is either the functional cis-acting element

regulating HBG2 expression or the best surrogate for this element. In addition, rs16912979 in HS-4 of the

LCR is also solely associated with HBG2 further supporting the effect of cis-acting elements on a single

g-globin gene (Fig. 4, 5). This SNP is a member of the 3-SNP T/A/T haplotype (rs16912979, rs7482144,

rs10128556) that is exclusive to the AI haplotype and is associated with their high HbF (Vathipadiekal et

al. 2016). The T allele of rs16912979 tags a predicted binding site for runt-related 1

(RUNX1) in the palindromic region of 5′ HS-4. RUNX1 plays an important role in hematopoiesis and

electromobility shift assays (EMSA) suggested an allele-specific binding of RUNX1 to 5′ HS-4

(Dehghani et al. 2016). bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 4. Effect of rs16912979 (11_5309695_T_C) on HBG2 and HBG1 expression in GTEx data

set. p-values of genome wide significance are 7.0e-14 and 0.77, respectively. Het denotes heterozygous,

and Homo, homozygous.

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Figure 5. Effect of rs7482144 (11:5276169_G/A) genotypes on HBG1 and HBG2 expression in 338

whole blood samples. P-values of genome-wide association are 0.038593 and 9.49E-16 for HBG1 and

HBG2, respectively.

No common variants affected the expression of BCL11A. The BCL11A hypersensitive site variant

rs1427407 is erythroid specific and in whole blood samples would impact BCL11A expression only in

reticulocytes. However, BCL11A is expressed in leukocytes that are abundant in the blood. BCL11A

expression in these nucleated cells with their greater transcriptional activity than reticulocytes would

render an erythroid-specific signal impossible to detect. Our inability to find genes known to influence

HBG1 in the co-expression analysis and a lack of correlation of known HbF regulators with HBG in

GTEx data despite a strong relationship in fetal liver proerythroblasts might have a similar explanation

and illustrate the main limitation of using GTEx data where globin synthesizing reticulocytes were likely

to be <1% of all red blood cells (Means RT 2009). Although few in number, reticulocytes are the only

cells in the blood expressing globin genes, and RNA-seq provides sufficient data for most statistical

analysis.

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

The unexpected correlation between BCL2L1 and HBG expression (r 0.58 & 0.55; p=2.30E-16) was

validated in fetal proerythroblasts (r> 0.9). BCL2L1 is a member of bcl-2 gene family involved in anti-

apoptotic activities with an important role in erythropoiesis. Induction of bcl-xL (BCL2L1) expression by

erythropoietin and GATA1 was critical for the survival of late proerythroblasts and early normoblasts

(Gregory et al. 1999).The upstream regulation analysis shows that BCL2L1 was regulated by 80 percent

of the regulators that regulate the top 100 co-expressed genes. GATA1 and KLF13 are among these

upstream regulators. These data and the correlation of BCL2L1 with HBG suggest a role for BCL2L1 in

HbF regulation. Experimental validation to confirm the relationship between BCL2L1 and HBG and to

pinpoint the mechanism through which BCL2L1 regulates HBG expression could lead to the discovery of

potential new drug targets or molecules that can be used to induce HbF in sickle cell anemia and b

thalassemia.

Cis- and trans-acting regulators have a differential effect on HBG1 and HBG2 expression. The trans-

acting eQTLs of BCL11A and MYB enhancers affect expression of both g-globin genes. In contrast, only a

single g-globin gene is affected by the cis-acting eQTL. HbF is the major modulator of the severity of

both sickle cell anemia and b thalassemia and considerable effort is being made to develop methods to

increase HBG expression for therapeutic purposes (Deng et al. 2014; Lettre and Bauer 2016). Although

clinical observations suggest that an increase of HBG2 alone has some benefit (Ballas et al. 1991;

Teixeira et al. 2003; Alsultan et al. 2014), modification of trans-acting elements that affect both HbF

genes might have a greater effect on HbF levels and help force a more pancellular distribution that could

be therapeutically important (Sankaran et al. 2011; Steinberg et al. 2014; Guda et al. 2015; Masuda et al.

2016).

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Methods

Genome-wide eQTL association analysis

Whole blood normalized RNA-seq data, 1000 Genome imputed genotypes, and the covariates file of 338

donors were downloaded from the GTEx portal version 6. The covariates used in the original GTEx data

analysis included 3 genome-wide genotype principal components (PCs), 35 Probabilistic Estimation of

Expression Residuals (PEER) factors to be used as confounders, genotyping platform (Illumina HiSeq

2000 or HiSeq X), and sex (Consortium 2017). We performed genome-wide eQTL analysis using

Efficient and Parallelizable Association Container Toolbox (EPACTS) (EPACTS 2017) and selected

SNPs with minor allele frequency (MAF) ≥ 0.01 and imputation quality score (R^2) ≥ 0.4 to reduce the

rate of false association. We used the first 3 PCs to adjust for population substructure. We also used sex

and platform as covariates to remove any bias introduced by these two components. The 35 PEER factors

were used to adjust for batch effects and experimental confounders. This standard set of covariates were

used in the models to detect eQTLs for HBG1, HBG2, BCL11A, KLF1, MYB, HBB, and HBD. To detect

any significant association after adjusting for the most significant SNPs on chromosomes 2, 6, and 11, we

tested association of normalized expression of HBG1 and HBG2 conditioned on the genotypes of

rs7482144, rs1427407, and rs66650371. Only SNPs that reached genome-wide significance (p-value ≤ 5

× 10-8) were considered statistically significant.

Preliminary studies

To reproduce the genetic association of known QTL variants with HBG expression using GTEx RNA-seq

data from reticulocytes, three SNPs were selected based on their prior association with HbF levels in

GWAS. We chose rs1427407 in the BCL11A erythroid-specific enhancer on chromosome 2p (Bauer et al.

2013), rs10128556, which is in high linkage disequilibrium (LD) with the Xmn1 restriction

polymorphism (rs7482144) on chromosome 11p (Galarneau et al. 2010), and rs9399137 that is in perfect

LD with the 3- deletion (rs66650371) in the HBS1L-MYB intergenic polymorphism (HMIP)

region on chromosome 6q (Farrell et al. 2011). SNPs in LD with rs66650371and rs7482144 were chosen bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

because GTEx pre-calculated eQTL did not contain these SNPs. The GTEx eQTL calculation tool was

used to validate these associations in 338 whole blood samples available in the GTEx portal. We sought

to validate the effect of a BCL11A erythroid-specific-enhancer SNP on HBG2 and HBG1 to confirm that

genotype can be used to predict phenotype and vice versa.

Association of the erythroid-specific BCL11A enhancer variant rs1427407 with whole blood gene

expression

We extracted the genotype of rs1427407 form GTEx data set version 6 and used these data to predict the

impact of this SNP on global gene expression. The same covariates were used in the eQTL analysis, and

Linear Models for Microarray and RNA-seq Data (limma) were used to perform linear regression (Ritchie

et al. 2015). We used false discovery rate (FDR) to correct for multiple hypothesis testing.

HBG1 differential co-expression analysis

Since HBG2 expression is significantly associated with the genotype of rs7482144, we used HBG1

expression as a marker of expression of both g-globin genes. A gene expression matrix of normalized log

expression of RNA-seq along with covariates including 35 PEER factors and sex from 338 whole blood

samples were used as input for limma to perform differential analysis. We used FDR to correct for

multiple testing. Using Ingenuity Pathways Analysis (QIAGEN Inc.,

https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/) we performed pathway

analysis using the top100 co-expressed genes to see whether there was any enrichment in pathways

involved in HbF regulation. In addition, we examined upstream regulators of the genes in the top100

HBG1 co-expressed gene list to identify HbF transcriptional regulators

that can explain the observed gene expression changes.

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Correlation analysis between known and potential HbF regulators and HBG expression

We performed Pearson correlation between HBG expression and the expression of BCL11A, KLF1, MYB,

ZBTB7A, SIRT1 and BCL2L1 on 338 GTEx whole blood samples. These genes were selected because of

their role or putative role as modulators of HBG expression, except for BCL2L1 which was selected based

on our HBG1 co-expression and upstream regulators analysis (Jiang et al. 2006; Zhou et al. 2010; Masuda

et al. 2016; Dai et al. 2017).

To replicate parts of our GTEx-based analysis, the same correlation analysis was done on data from

erythroid samples acquired from Gene Expression Omnibus (GEO; accession number GSE59089) (Xu et

al. 2015). These data contained RNA-seq derived from primary human fetal liver proerythroblasts studied

at different developmental stages after shRNA-mediated knockdown of Polycomb Repressive Complex 2

(PRC2) core subunits.

Data Access

Whole blood normalized RNA-seq data, 1000 Genome imputed genotypes, and the covariates file of 338

donors were downloaded from the GTEx portal version 6.

RNA-seq data from 15 erythroid samples were acquired from Gene Expression Omnibus (GEO;

accession number GSE59089).

bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

References

Alsultan A, Alabdulaali MK, Griffin PJ, Alsuliman AM, Ghabbour HA, Sebastiani P, Albuali WH, Al- Ali AK, Chui DH, Steinberg MH. 2014. Sickle cell disease in Saudi Arabia: the phenotype in adults with the Arab-Indian haplotype is not benign. British Journal of Haematology 164: 597-604. Ballas SK, Talacki CA, Adachi K, Schwartz E, Surrey S, Rappaport E. 1991. The Xmn I site (-158, C-- --T) 5' to the G gamma gene: correlation with the Senegalese haplotype and G gamma globin expression. Hemoglobin 15: 393-405. Bauer DE, Kamran SC, Lessard S, Xu J, Fujiwara Y, Lin C, Shao Z, Canver MC, Smith EC, Pinello L et al. 2013. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science 342: 253-257. Chen D, Zuo Y, Zhang X, Ye Y, Bao X, Huang H, Tepakhan W, Wang L, Ju J, Chen G et al. 2017. A Genetic Variant Ameliorates beta-Thalassemia Severity by Epigenetic-Mediated Elevation of Human Fetal Hemoglobin Expression. American Journal of Human Genetics 101: 130-138. Consortium GT. 2017. Genetic effects on gene expression across human tissues. Nature 550: 204. Craig JE, Rochette J, Fisher CA, Weatherall DJ, Marc S, Lathrop GM, Demenais F, Thein S. 1996. Dissecting the loci controlling fetal haemoglobin production on chromosomes 11p and 6q by the regressive approach. Nature Genetics 12: 58-64. Dai Y, Chen T, Ijaz H, Cho EH, Steinberg MH. 2017. SIRT1 activates the expression of fetal hemoglobin genes. American Journal of Hematology 92: 1177-1186. Dehghani H, Ghobakhloo S, Neishabury M. 2016. Electromobility Shift Assay Reveals Evidence in Favor of Allele-Specific Binding of RUNX1 to the 5ʹ Hypersensitive Site 4-Locus Control Region. Hemoglobin 40: 236-239. Deng W, Rupon JW, Krivega I, Breda L, Motta I, Jahn KS, Reik A, Gregory PD, Rivella S, Dean A et al. 2014. Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell 158: 849-860. EPACTS. 2017. EPACTS: Efficient and Parallelizable Association Container Toolbox. . In May 21. Farrell JJ, Sherva RM, Chen ZY, Luo HY, Chu BF, Ha SY, Li CK, Lee AC, Li RC, Yuen HL et al. 2011. A 3-bp deletion in the HBS1L-MYB intergenic region on chromosome 6q23 is associated with HbF expression. Blood 117: 4935-4945. Galarneau G, Palmer CD, Sankaran VG, Orkin SH, Hirschhorn JN, Lettre G. 2010. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nature Genetics 42: 1049-1051. Garner C, Mitchell J, Hatzis T, Reittie J, Farrall M, Thein SL. 1998. Haplotype mapping of a major quantitative-trait locus for fetal hemoglobin production, on chromosome 6q23. American Journal of Human Genetics 62: 1468-1474. Gregory T, Yu C, Ma A, Orkin SH, Blobel GA, Weiss MJ. 1999. GATA-1 and Erythropoietin Cooperate to Promote Erythroid Cell Survival by Regulating bcl-xL Expression. Blood 94: 87-96. bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

GTEx_Consortium. 2013. The Genotype-Tissue Expression (GTEx) project. Nature Genetics 45: 580-585. Guda S, Brendel C, Renella R, Du P, Bauer DE, Canver MC, Grenier JK, Grimson AW, Kamran SC, Thornton J et al. 2015. miRNA-embedded shRNAs for Lineage-specific BCL11A Knockdown and Hemoglobin F Induction. Molecular Therapy 23: 1465-1474. Jiang J, Best S, Menzel S, Silver N, Lai MI, Surdulescu GL, Spector TD, Thein SL. 2006. cMYB is involved in the regulation of fetal hemoglobin production in adults. Blood 108: 1077- 1083. Lettre G, Bauer DE. 2016. Fetal haemoglobin in sickle-cell disease: from genetic epidemiology to new therapeutic strategies. The Lancet 387: 2554-2564. Liu LR, Du ZW, Zhao HL, Liu XL, Huang XD, Shen J, Ju LM, Fang FD, Zhang JW. 2005. T to C substitution at -175 or -173 of the gamma-globin promoter affects GATA-1 and Oct-1 binding in vitro differently but can independently reproduce the hereditary persistence of fetal hemoglobin phenotype in transgenic mice. Journal of Biological Chemistry 280: 7452-7459. Masuda T, Wang X, Maeda M, Canver MC, Sher F, Funnell AP, Fisher C, Suciu M, Martyn GE, Norton LJ et al. 2016. Transcription factors LRF and BCL11A independently repress expression of fetal hemoglobin. Science 351: 285-289. Means RT GB, General Considerations., Greer JP, Foerster J, Rodgers GM, eds. 2009. Anemia. In Wintrobe’s Clinical Hematology, Vol 1. Lippincott Williams and Wilkins, Philadelphia, PA. Morrison TA, Wilcox I, Luo HY, Farrell JJ, Kurita R, Nakamura Y, Murphy GJ, Cui S, Steinberg MH, Chui DHK. 2017. A long noncoding RNA from the HBS1L-MYB intergenic region on chr6q23 regulates human fetal hemoglobin expression. Blood Cells, Molecules, and Diseases 69: 1-9. Mtatiro SN, Singh T, Rooks H, Mgaya J, Mariki H, Soka D, Mmbando B, Msaki E, Kolder I, Thein SL et al. 2014. Genome wide association study of fetal hemoglobin in sickle cell anemia in Tanzania. PloS One 9: e111464. Nagel RL, Fabry ME, Pagnier J, Zohoun I, Wajcman H, Baudin V, Labie D. 1985. Hematologically and genetically distinct forms of sickle cell anemia in Africa. The Senegal type and the Benin type. New England Journal of Medicine 312: 880-884. Patrinos GP, Kollia P, Loutradi-Anagnostou A, Loukopoulos D, Papadakis MN. 1998. The Cretan type of non-deletional hereditary persistence of fetal hemoglobin [A gamma-158C-->T] results from two independent gene conversion events. Human Genetics 102: 629-634. Rahimi Z, Vaisi-Raygani A, Merat A, Haghshenass M, Rezaei M. 2015. Level of Hemoglobin F and Gg Gene Expression in Sickle Cell Disease and Their Association with Haplotype and XmnI Polymorphic Site in South of Iran. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. 2015. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43: e47. Sankaran VG, Menne TF, Šćepanović D, Vergilio J-A, Ji P, Kim J, Thiru P, Orkin SH, Lander ES, Lodish HF. 2011. MicroRNA-15a and -16-1 act via MYB to elevate fetal hemoglobin expression in human trisomy 13. Proceedings of the National Academy of Sciences doi:10.1073/pnas.1018384108. bioRxiv preprint doi: https://doi.org/10.1101/304899; this version posted April 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Schroeder WA, Huisman TH, Shelton JR, Shelton JB, Kleihauer EF, Dozy AM, Robberson B. 1968. Evidence for multiple structural genes for the gamma chain of human fetal hemoglobin. Proceedings of the National Academy of Sciences of the United States of America 60: 537-544. Shaikho EM, Habara AH, Alsultan A, Al-Rubaish A, Al-Muhanna F, Naserullah Z, Alsuliman A, Qutub HO, Patra P, Sebastiani P. 2016. Variants of ZBTB7A (LRF) and its β-globin gene cluster binding motifs in sickle cell anemia. Blood cells, molecules & diseases 59: 49. Stadhouders R, Aktuna S, Thongjuea S, Aghajanirefah A, Pourfarzad F, van Ijcken W, Lenhard B, Rooks H, Best S, Menzel S et al. 2014. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. The Journal of clinical investigation 124: 1699-1710. Steinberg MH, Chui DH, Dover GJ, Sebastiani P, Alsultan A. 2014. Fetal hemoglobin in sickle cell anemia: a glass half full? Blood 123: 481-485. Teixeira SM, Cortellazzi LC, Grotto HZ. 2003. Effect of hydroxyurea on G gamma chain fetal hemoglobin synthesis by sickle-cell disease patients. Brazilian Journal of Medical and Biological Research 36: 1289-1292. Terasawa T, Ogawa M, Porter P, Karam J. 1980. G gamma and A gamma globin-chain biosynthesis by adult and umbilical cord blood erythropoietic bursts and reticulocytes. Blood 56: 93-97. Thein SL, Wainscoat JS, Sampietro M, Old JM, Cappellini D, Fiorelli G, Modell B, Weatherall DJ. 1987. Association of thalassaemia intermedia with a beta-globin gene haplotype. British Journal of Haematology 65: 367-373. Uda M, Galanello R, Sanna S, Lettre G, Sankaran VG, Chen W, Usala G, Busonero F, Maschio A, Albai G et al. 2008. Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of beta-thalassemia. Proceedings of the National Academy of Sciences of the United States of America 105: 1620-1625. Vathipadiekal V, Alsultan A, Baltrusaitis K, Farrell JJ, Al-Rubaish A, Al-Muhanna F, Naserullah Z, Alsuliman A, Patra P, Milton JN et al. 2016. Homozygosity for a Haplotype in the HBG2- OR51B4 Region is Exclusive to Arab-Indian Haplotype Sickle Cell Anemia. American Journal of Hematology 91: E308-311. Wood W. 2001. Hereditary Persistence of Fetal Hemoglobin and Thalassemia, in, Disorders of Hemoglobin: Genetics, Pathophysiology and Clinical Management, Chapter15, Steinberg MH, Forget BG, Higgs DR, Nagel RL eds. pp. 356-388. Cambridge University Press. Xu J, Shao Z, Li D, Xie H, Kim W, Huang J, Taylor JE, Pinello L, Glass K, Jaffe JD et al. 2015. Developmental control of polycomb subunit composition by GATA factors mediates a switch to non-canonical functions. Molecular Cell 57: 304-316. Zhou D, Liu K, Sun C-W, Pawlik KM, Townes TM. 2010. KLF1 regulates BCL11A expression and γ- to β-globin gene switching. Nature Genetics 42: 742.