Adventitious changes in long-range expression caused by polymorphic structural variation and promoter competition

Karen M. Lowera, Jim R. Hughesa, Marco De Gobbia, Shirley Hendersonb, Vip Viprakasitc, Chris Fishera, Anne Gorielya, Helena Ayyuba, Jackie Sloane-Stanleya, Douglas Vernimmena, Cordelia Langfordd, David Garricka, Richard J. Gibbonsa, and Douglas R. Higgsa,1

aMedical Research Council Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, The John Radcliffe Hospital, Headington, Oxford, OX3 9DS, United Kingdom; bNational Haemoglobinopathy Reference Laboratory, Oxford Radcliffe Hospitals NHS Trust, Oxford, OX3 7LJ, United Kingdom; cDepartment of Paediatrics, Faculty of Medicine, Siriaj Hospital, Mahidol University, Bangkok, 10700, Thailand; and dMicroarray Facility, The Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, United Kingdom

Edited by Mark T. Groudine, Fred Hutchinson Cancer Research Center, Seattle, WA, and approved October 14, 2009 (received for review August 17, 2009) It is well established that all of the cis-acting sequences required for 70 kb upstream the ␣-like (4). In addition, we have also shown fully regulated ␣-globin expression are contained within a that a 120-kb region of conserved synteny containing the human region of Ϸ120 kb of conserved synteny. Here, we show that activa- ␣-like globin genes, together with their major upstream regulatory tion of this cluster in erythroid cells dramatically affects expression of element (MCS-R2, also called HS-40), is sufficient to obtain apparently unrelated and noncontiguous genes in the 500 kb sur- optimal tissue- and developmental stage-specific expression in a rounding this domain, including a gene (NME4) located 300 kb from mouse model (6). However, here we have asked whether globin the ␣-globin cluster. Changes in NME4 expression are mediated by gene activation within this region has more far reaching conse- physical cis-interactions between this gene and the ␣-globin regula- quences, by affecting the expression of apparently unrelated genes tory elements. Polymorphic structural variation within the globin in the surrounding chromosomal neighborhood. cluster, altering the number of ␣-globin genes, affects the pattern of To investigate this hypothesis, we examined the expression of 14 NME4 expression by altering the competition for the shared ␣-globin genes in an extensive region (500 kb) surrounding the ␣-globin regulatory elements. These findings challenge the concept that the cluster in nonerythroid cells (when the ␣-globin genes are silent) genome is organized into discrete, insulated regulatory domains. In and in erythroid cells (when the ␣-globin genes and their regulatory addition, this work has important implications for our understanding elements are fully active). When the ␣-globin genes are switched on, of genome evolution, the interpretation of genome-wide expression, expression of the functionally unrelated gene (C16orf35), contain- expression-quantitative trait loci, and copy number variant analyses. ing the ␣-globin regulatory elements, is increased by Ϸ30-fold. In addition, we have shown that another apparently unrelated gene ͉ ͉ ͉ looping copy number variants globin (NME4), located 300 kb from the ␣-globin cluster (in which we have allele-specific expression ͉ 4C identified a potential erythroid cis-acting element) physically inter- acts with, and is regulated by, MCS-R2, such that its expression is ecent global analyses of mammalian genomes have revised our also increased 10-fold in erythroid cells. All other genes lying Rview of the relationship between genome organization and the between MCS-R2 and NME4 are unaffected. When the ␣-globin regulation of gene expression. It was previously thought that the genes are deleted from this chromosomal region, expression of genome might be arranged as a series of independently regulated NME4 (300 kb away) is further increased by 8-fold, as a result of chromosomal domains flanked by boundary elements (1). In con- increased competition for the shared regulatory element (MCS- trast, it is now clear that cis-acting regulatory elements ( R2). Because ␣-globin deletions have been selected to reach high control regions, enhancers, silencers, enhancer blockers, and chro- frequencies in many populations (as they cause ␣-thalassemia, matin barrier elements), controlling tissue- or developmental stage- which protects against falciparum malaria), the levels of NME4 will specific genes, may be dispersed over tens to thousands of kilobases be expected to vary in such populations in parallel with changes in (2, 3). Furthermore, we now know that in gene-rich regions such the number of cis-linked ␣-globin genes. elements are commonly interspersed with widely expressed genes This study therefore demonstrates a common mechanism by (2). These observations raise important general questions, such as: which patterns and levels of gene expression across a large chro- How does the activation of specialized regulatory elements and mosomal region may radically change in an unexpected way. their cognate genes influence the expression of other apparently ␣ unrelated genes in a shared chromosomal environment? How do Common structural polymorphisms in the -globin genes, which common structural variants which alter genome architecture affect have been selected during evolution, have a dramatic effect on gene expression? What, if any, are the consequences of such expression of an unrelated gene (NME4) lying 300 kb away in what apparently adventitious effects on gene expression? appears to be a shared chromosomal environment. These findings To investigate these issues in detail we have examined the pattern have important, general implications for the evolution of the of gene expression across a large segment of the and genome, and for understanding how common expression quanti- studied how polymorphic variation in this region may influence GENETICS long-range patterns of gene expression. In particular, we analyzed Author contributions: K.M.L., D.G., R.J.G., and D.R.H. designed research; K.M.L., J.R.H., a well-characterized, gene-dense, telomeric region of the genome M.D.G., H.A., and D.V. performed research; J.R.H., S.H., V.V., C.F., A.G., J.S.-S., and C.L. (16p13.3) containing the human ␣-like globin genes (␨, ␣2, and ␣1), contributed new reagents/analytic tools; K.M.L. analyzed data; and K.M.L., R.J.G., and which are activated and transcribed at very high levels only in D.R.H. wrote the paper. erythroid cells (4, 5). We have previously shown that critical, remote The authors declare no conflict of interest. regulatory elements controlling ␣-globin expression, MCS-R1 to This article is a PNAS Direct Submission. -R4 (representing previously identified DNaseI hypersensitive sites 1To whom correspondence should be addressed. E-mail: [email protected]. HS-48, HS-40, HS-33, and HS-10, respectively), three of which lie This article contains supporting information online at www.pnas.org/cgi/content/full/ within the introns of a widely expressed gene (C16orf35) lying 50 to 0909331106/DCSupplemental.

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0909331106 PNAS ͉ December 22, 2009 ͉ vol. 106 ͉ no. 51 ͉ 21771–21776 Downloaded by guest on September 24, 2021 A 0k 100k 200k 300k 400k 500k Genes CYXorf1 POLR3K MPG HBZ HBZpsHBA1 Luc7L ITFG3 RGS11 ARHGDIG MRPL28 NME4 RAB11FIP3 gs3 SNRNP25 C16orf35 HBM HBQ PDIA2 TMEM8 DECR2 IL9R3ps RHBDF1 HBA1ps AXIN1 HBA2 Conserved Synteny 123 4 MCS-R Deletions  MCS-R −− H4ac 40 Fig. 1. Overview and expression analysis of the terminal 500 kb of human chromo- 20 some 16p, containing the ␣-globin cluster.

0 (A) Representation of the genes contained GATA1 10 within this chromosomal region. Conserved synteny with the mouse region and the 5 MCS-R region, which is conserved and re- ␣ 1 quired for full expression of the -globin SCL genes are shown. The minimal regions de- 30 leted, in all cases of ␣-thalassemia affecting 15 the MCS-R elements (⌬MCS-R) one (-␣)or two (--) ␣ genes, are shown. ChIP for the 0 PolII activating chromatin mark H4ac, the eryth- 70 roid-specific binding factors GATA1 and 35 SCL, and RNA polymerase II were carried out in erythroid cells and hybridized to a 0 tiled microarray covering this region (ChIP- chip). Tracks are representative of a mini- 0 10 B Cfull length NME4 mum of two biological replicates. (B) Ex-

-1 pression of genes contained within this 10 eNME4 erythroid-specific region, and an erythroid control gene

-2 eNME4 expression amplicon EPOR, in hES and erythroid cells. Expression 10 NME4 expression amplicon was normalized to 18S. Values represent an

-3 Ϯ 10 NME4 SNP (rs14293) average of three biological replicates 1 standard deviation. The y axis is a log scale.

-4 10 (C) Schematic representation of the genomic structure of NME4 and eNME4.

-5 10 (Black boxes) Exons; (gray box) alternative

Expression relative to 18S relative Expression erythroid-specific exon; (full line) introns;

-6 10 (dashed lines) splicing of mature transcript. Amplicons used for expression analysis are -7 10 shown; further information can be found in

MPG HBA EPOR Tables S1–S3. The highly polymorphic SNP LUC7LITFG3 AXIN1 NME4DECR2 POLR3K MRPL28TMEM8 SNRNP25RHBDF1 C16orf35 RAB11FIP3 used for allele-specific expression (rs14293) hES erythroid is shown in red.

tative trait loci and copy number variants (CNVs) may influence activating chromatin modifications (H4ac, H3ac, H3K4me2, gene expression across long segments of the human genome. H3K4me3) in erythroid cells (Fig. S2). In addition, in erythroid cells, this gene is bound by GATA1 and SCL, which are components Results of the pentameric erythroid-specific transcription factor complex Analysis of the Expression of Genes in the Telomeric Region of (consisting of SCL, GATA1, LDB1, E2A, and LMO2) (5). High Chromosome 16. The expression of 14 genes contained in the levels of RNA polymerase II binding were also observed (see Fig. terminal 500-kb region of chromosome 16 (16p13.3) (Fig. 1A) was 1A and Fig. S2). Further characterization of NME4 identified a examined in human embryonic stem cells (hES), where the ␣ genes GATA1 binding site within intron 3 (Fig. S3A), which colocalizes are largely silent, and in erythroid cells, in which they are fully with the observed binding of GATA1 and SCL (see Fig. 1A and Fig. activated. Most genes showed no increase in expression in erythroid S2B). This erythroid-specific transcription factor binding site lies cells (Fig. 1B). However, two genes within this 500-kb region (in within an internal promoter, which directs expression of a trun- addition to the ␣-globin genes) are up-regulated to levels similar to cated, erythroid-specific transcript which we refer to as eNME4 that of the erythroid-specific control gene, EPOR. C16orf35 (up- (Fig. 1C and Fig. S3B). Both the eNME4 and the full-length NME4 regulated by a factor of 27) is a highly conserved gene of unknown transcript are up-regulated in erythroid cells. All other genes tested function containing the erythroid MCS-R elements, which become in this 500-kb chromosomal region, including five genes lying activated during erythropoiesis (4). A second gene specifically between the ␣-globin cluster and NME4, are expressed at similar up-regulated in erythroid cells, NME4 [a nucleoside diphosphate levels in both erythroid and nonerythroid (hES and EBV) cells (see kinase (7); expression increased by a factor of 12] lies Ϸ300-kb Fig. 1B and Fig. S1). downstream of the ␣ cluster, far beyond the region of conserved synteny. Erythroid-specific up-regulation of both C16orf35 and Expression of a Gene Located 300 kb from the ␣-Cluster Is Controlled NME4 was also confirmed by comparison with another noneryth- by the ␣-Globin Regulatory Elements. To determine whether expres- roid cell type (EBV-transformed B lymphocytes) (Fig. S1). sion of eNME4 in erythroid cells is regulated by the ␣-globin Although widely expressed (8), NME4 acquires an increase in MCS-R elements, we analyzed expression of eNME4 mRNA tran-

21772 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0909331106 Lower et al. Downloaded by guest on September 24, 2021 A 1600% allele-specific expression in red blood cells, using a highly polymor- * phic, synonymous transcribed SNP in exon 4 of NME4 (A/G, 1200% rs14293). This SNP is contained in both the full-length transcript and the truncated erythroid-specific transcript (Fig. 1C). With one

800% exception, six individuals with MCS-R deletions, who were infor- mative for this SNP, displayed allele-specific expression of NME4

400% (Fig. 2B). Using somatic cell hybrids containing a single copy of human chromosome 16 derived from these individuals, we deter- mined which SNP was in phase with the deletion and found that in

Expression 300% all cases examined, deletion of the ␣-globin MCS-R occurred in cis * to the eNME4 allele that was under-expressed. The reduced level of 200% expression specifically from one allele in cis with the MCS-R * deletion demonstrates that the erythroid-specific enhanced expres- 100% * * * * sion of NME4/eNME4 is regulated in cis by the ␣-globin major * regulatory element (MCS-R2). 0% Identification of Long-Range Interactions Across the Terminal Region of Chromosome 16. We hypothesized that this functional interaction B 1.0 between the MCS-Rs and NME4/eNME4 was mediated via a * 0.9 physical interaction. We have previously used chromosome con- formation capture (3C) to demonstrate physical interactions be- 0.8 tween the ␣-globin genes and the MCS-Rs both in mouse (11) and 0.7 human (12). Recently, we and others (13, 14), have developed this methodology into a modified circular 3C method (4C), which, after 0.6 cross linking and ligation of chromosome loops, uses an inverse 0.5 PCR protocol to detect all physical interactions with a genomic fragment of interest. Our assay has been modified to give extremely 0.4 high sensitivity and is analyzed using a microarray-platform (see 0.3 Methods). By using a DpnII fragment containing MCS-R2 (the Proportion of G allele Proportion major ␣-globin regulatory element) as the anchor fragment, we 0.2 performed 4C analysis on nonerythroid and erythroid cells 0.1 (Fig. 3A). In nonerythroid cells there are no interactions between MCS-R2 0 * and genes in the ␣-globin locus (see Fig. 3A Top). In erythroid cells, where MCS-R2 becomes active and the ␣-globin genes are highly transcribed, there is a strong interaction between MCS-R2 and the ␣ globin genes (see Fig. 3A Middle, erythroid controls 1 and 2). In Fig. 2. Allele-specific effect on the expression of eNME4 by various deletions these cells, although there is not a consistent interaction with of the ␣-globin cluster. (A) Expression of the three erythroid-specific genes in NME4, we have observed rare interactions represented by small the terminal 500 kb of chromosome 16p, and two erythroid-specific control peaks of enrichment (for example, erythroid control 2) in approx- genes, EPOR and HBB. For each gene, the expression in control samples is set imately one in three 4C experiments (n ϭ 9). Therefore, even to 100%, and each group of deletions is calculated relative to controls. Student’s t test P values are calculated for each gene for each group of samples though the functional data and some 4C data suggest an interaction does occur between MCS-R2 and NME4 in a normal, intact compared to controls; *, P Ͻ 0.05; **, P Ͻ 0.005. Controls, n ϭ 15; ⌬MCS-R/␣␣, n ϭ 7; -␣/␣␣, n ϭ 14; --/␣␣, n ϭ 22. (B) The proportion of the G allele of NME4 chromosome, it seems that when the ␣-globin genes are present, contributing to total transcription, as determined by pyrosequencing (see their interaction with the MCS-Rs may out-compete the much Tables S4 and S5 for details). All samples are heterozygous (A/G) for SNP weaker and occasional interactions with NME4 (see below). Such rs14293 (controls, n ϭ 12; ⌬MCS-R/␣␣, n ϭ 7; --/␣␣, n ϭ 11) except for an A/A infrequent interactions (MCS-R2/NME4) may often lie below the and a G/G control (indicated by *). Samples for which phase of the ␣-globin level of detection using this assay. locus deletion and the SNP allele could be determined are shown in color; deletion in phase with the A allele are shown in red; deletion in phase with the Deletion of the ␣-Globin Genes Increases Expression of NME4 via G allele are shown in green; samples where phase could not be determined are Competition for the Shared Regulatory Element (MCS-R2). shown in black. The P value is calculated by an f test for differences in variation. The level of expression of eNME4 in erythroid cells is regulated by the ␣-globin MCS-Rs, and provisional evidence suggests that NME4 scripts in red blood cells obtained from rare individuals, each with and MCS-R2 may physically interact, albeit rarely. Therefore, it a different deletion of the MCS-Rs on one allele (⌬MCS-R/␣␣) (9) seemed likely that NME4, even though it lies hundreds of kilobases from the ␣-globin cluster, may compete for the activity of the (see Fig. 1A). We found that while the expression of control remote regulatory elements (MCS-R1 to -R4). To test this hypoth- erythroid-specific genes [EPOR and ␤-globin (HBB)] was not esis, we examined the expression of eNME4 in the red cells of significantly affected by these deletions, eNME4 was significantly ␣ GENETICS ϭ ϭ Ϸ patients who have inherited with either one (- )or reduced (n 7, P 0.01) to 50% of its normal level of expression no (--) ␣ genes rather than the normal duplicated pair of genes (␣␣). (Fig. 2A). This suggested that enhanced expression of the NME4 We observed that deletion of a single ␣-globin gene (-␣) resulted ␣ ␣ erythroid-specific transcript, like -globin, depends on the -globin in a small increase (factor of two) in expression of eNME4 (see Fig. Ϸ MCS-R elements located 300 kb upstream of this gene. 2A). However, deletions removing both ␣-globin genes (--) resulted Given the relatively large distance between NME4 and the in a dramatic increase (8-fold) in eNME4 expression when com- ␣-globin regulatory elements, it seemed possible that the effect on pared to normal controls (n ϭ 22, P ϭ 3.18 ϫ 10Ϫ9) (see Fig. 2A). expression might be mediated in cis or in trans [as suggested for This group consisted of individuals each carrying mutations on one other long-range interactions (reviewed in ref. 10)]. To establish the of three different chromosomes (--SEA, n ϭ 18; --FIL, n ϭ 3; --MED, effect of the MCS-R deletions on each copy of NME4, we analyzed n ϭ 1).

Lower et al. PNAS ͉ December 22, 2009 ͉ vol. 106 ͉ no. 51 ͉ 21773 Downloaded by guest on September 24, 2021 1616 A 0k 100k 200k 300k 400k 500k Genes CYXorf1 POLR3K MPG HBZ HBZpsBZps LUC7L ITFG3 RGS11 ARHGDIG MRPL28 NME4NM RAB11FIP3 gs3 SNRNP25 C16orf35 PDIA2 TMEM8 DECR2D IL9R3ps RHBDF1 AXIN1 HBA2 MCS-R 123 4 Non-erythroid control 400

200

Fold enrichment 0 Erythroid control 1 400

200

Fold enrichment 0 Erythroid control 2 400

200 Fig. 3. 4C analysis from MCS-R2 identifies a ␣␣

Fold enrichment physical interaction with NME4 in --/ ery- 0 throid material. (A) 4C material hybridized to 400 a tiled microarray. The dashed line represents the fixed fragment of MCS-R2. Shaded boxes 200 show ␣-globin locus and NME4; noneryth-

Fold enrichment 0 roid, EBV-transformed B-lymphocyte cell line; erythroid, two-phase culture system for B % of total signal 52 48 1616 generation of erythroid cells. All tracks are 1350 360k 370k 380k 390k 400k 410k representative of two biological replicates. MRPL28 TMEM8 NME4 DECR2 Zoomed section shows signal from Lower. 1300 NME4 Actual enrichment of 4C-amplified material 100 eNME4 relative to genomic DNA (based on real-time

Genomic DNA 1250 PCR; QPCR) is shown for two amplicons 50 (389776 and 396875). The y axis is a log scale.

ESTGACGCGCAG Fold enrichment 0 Arrows represent transcription of NME4 and % of total signal 97 3 eNME4. Primer sequences can be found in QPCR Table S6.(B) Pyrosequencing tracks from an 1000 1500 ␣␣ 100 --/ individual informative for SNP rs14293 10 at NME4.(Upper) Genomic DNA; (Lower) 4C-

1400 Relative to 1 amplified DNA. Peaks used for calculations genomic DNA 0.1 are shaded; peak 4 corresponds to the G al- 200x 0.8x 1300 lele, peak 8 corresponds to the A allele; dis- 4C amplified DNA pensation order of nucleotides is shown on the x axis; E, enzyme; S, substrate. Further ESTGACGCGCAG 510 information can be found in Tables S4 and S5.

As described before, we also determined whether up-regulation NME4. This interaction is not restricted to the erythroid-specific of eNME4 was caused by an increase in expression from one or both promoter of NME4 (contained in intron 3) but is equally spread alleles. We found highly skewed patterns of NME4 expression in all across the full length of the gene (see Fig. 3A, zoomed section). In informative individuals carrying these deletions (see Fig. 2B). addition to NME4, we observed a number of other interacting Again, by generating somatic cell hybrids where material was fragments associated with various genes along this region. Expres- available, we were able to link the nucleotide at the NME4 SNP to sion of these genes was analyzed in erythroid material, and they the deletion, and found that the up-regulated allele was always on were found either not to be expressed in red blood cells (RHDBF1, the chromosome from which the ␣-globin gene deletion had MPG, MRPL28, DECR2) or did not show significantly different occurred. This confirms that not only is the expression of eNME4 expression between controls and --/␣␣ erythroid material (e.g., under the influence of the MCS-Rs lying Ϸ300 kb away, but the LUC7L)(Fig. S4). At present we do not fully understand the level of expression dramatically increases as the number of com- functional significance of these interactions; however, we hypoth- peting ␣-globin promoters in cis is reduced. esize that they may represent structural interactions rather than having a functional effect on gene expression. Identification of Long-Range Interactions Between NME4 and the The interaction between MCS-R2 and NME4 identified by the ␣-Globin Regulatory Elements in the Absence of the ␣-Globin Genes. 4C technique in --/␣␣ erythroid material resulted in a Ϸ200-fold Because expression of NME4 increases (from the affected ‘‘--’’ enrichment of NME4 DpnII-ligated DNA (see Fig. 3A, QPCR). If allele) in the absence of the ␣-globin promoters, it seemed possible this interaction between MCS-R2 and NME4 occurs predominantly that the previously noted rare interactions between MCS-R2 on the allele in cis with the deletion (as set out above), this allele and NME4 in erythroid cells from nonthalassemic individuals should be overrepresented in the 4C-amplified material. To test (see above) might be increased in frequency, and therefore more this, we used the same technique as for the allele-specific expression readily detectable. To test this, we carried out 4C analysis on (pyrosequencing of SNP rs14293 in exon 4 of NME4; the amplicon erythroid material from individuals heterozygous for deletions of is contained entirely within a single DpnII fragment) on both both ␣-globin genes (--/␣␣), a common cause of ␣ thalassemia in genomic DNA and 4C-amplified DNA from an --/␣␣ individual some regions. In this material, the interaction with the ␣-globin informative for this SNP (Fig. 3B). While the genomic DNA has genes (from the intact [␣␣] chromosome) remains (see Fig. 3A equal representation of both NME4 alleles (see Fig. 3B Upper), Lower). However, now (observed in two independent experiments) DNA obtained from the MCS-R2 4C-amplified DNA is highly there is clearly a more prominent interaction between MCS-R2 and skewed toward one allele (97:3) (see Fig. 3B Lower). This is the

21774 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0909331106 Lower et al. Downloaded by guest on September 24, 2021 A

Fig. 4. Schematic representation of the effect of MCS-Rs activated ␣-globin MCS-Rs, and structural polymor- 300 kb phisms, on the expression of surrounding genes. (A) In erythroid cells the ␣-globin MCS-Rs (black bars) up-regulate the ␣ globin genes (red boxes), and also B C16orf35 (yellow box; mechanism unknown) and NME4 (blue box; via a physical interaction in cis). (B) Variation in the number of ␣-globin genes (shown in MCS-Rs this example as deletion of both adult ␣-globin ~280 kb genes) results in variation in the expression of NME4 through competition for the shared enhancer element. Boxes represent genes as shown in Fig. 1A; gray boxes are unaffected genes. The light gray area represents the region of conserved synteny across the ␣-globin locus. Arrowhead lines indicate interactions, thickness of the line represents frequency of the interaction.

NME4 allele in cis to the ␣-globin gene deletion, confirming that principle also seems to apply to the ␣-globin cluster where, from removal of the ␣-globin genes increases the interaction between chromosomes containing between one (-␣)tofive(␣␣␣␣␣) iden- MCS-R2 and NME4, which can now be readily detected by the 4C tical ␣ genes, the ␣-globin output does not increase in a linear analysis. fashion but appears to be limited by the available interaction with a single MCS-R (or complex formed by more than one MCS Discussion element): the most proximal gene competing most efficiently and Here we have addressed the general question of how activation of the most distal gene competing least (reviewed in ref. 21). Also a highly specialized gene cluster located within a gene-rich region consistent with this competitive model, we recently demonstrated of the genome affects expression of other genes in the shared that a regulatory SNP lying between MCS-R2 and the normal ␣ chromosomal environment. By studying one such locus in detail, we promoters (␣␣), which creates a new erythroid promoter, appears have shown that although a region of conserved synteny, spanning to out-compete the more distal ␣-globin promoters for access to Ϸ120 kb of the human ␣-globin cluster, is sufficient to obtain MCS-R2, thereby causing ␣-thalassemia (22). These principles also optimal tissue- and developmental stage-specific expression of the seem to apply to the interaction between the ␣-globin regulatory ␣-globin genes, this does not delimit the full extent over which elements and NME4, the observations being most readily explained globin gene activation exerts an effect. The process of ␣-globin by competition between this gene and the ␣ globin promoters for activation results in significant effects on other apparently unrelated the MCS elements. This leads to a situation in which polymorphic genes (C16orf35 and NME4). It is interesting to note that although variation in the number of ␣-globin genes radically affects expres- C16orf35 lies adjacent to the ␣-globin cluster and contains the major sion of an unrelated gene located 300 kb away. erythroid-specific regulatory elements, NME4 lies 300 kb away An important question is whether the apparently unrelated genes from these elements. The ␣-globin regulatory elements appear to have any significant biological function. It has been previously bypass, and therefore not affect expression, of at least five other argued that activated bystander genes have arisen by chance, and genes contained within this chromosomal region, and yet specifi- have no known biological function (17, 23). Although the role of cally up-regulates NME4. This may be related to the chance C16orf35 is currently unknown, a promoter knockout model of this appearance of an erythroid-specific element in this gene (see below). gene has no obvious additional effects on erythropoiesis. In the case It is also of general interest that this 500-kb region contains of NME4, in the mouse this gene is located on a separate chromo- numerous potential enhancer blocker or boundary elements (DNa- some from the ␣-globin cluster, and its expression is not up- seI hypersensitive sites associated with binding of CTCF) (Fig. S5), regulated in murine erythroid cells (Fig. S3C). This suggests that which clearly do not act as such, at least in erythroid cells. These NME4 does not play a general role in erythropoiesis, but is an observations (and others, for example refs. 15 and 16) question the example of a gene that (in ) has become activated by the models of the genome in which genes are thought to be compart- MCS-R elements by chance. Clearly this type of mechanism, in mentalized and insulated from activation or repression by the general, could play an important role in the recruitment of novel activity of nearby, unrelated cis elements and genes. genes to existing biological circuits during evolution. The mechanism by which expression of a nearby gene (C16orf35, It seems likely that the principles established at this well- which contains the ␣-globin MCS-R elements) is up-regulated is not characterized locus will apply to many other regions of the genome. clear. Others have suggested that such ‘‘bystander’’ activation The adventitious activation of apparently unrelated genes clearly simply results from location of a gene within an active chromatin provides a potential pitfall in the interpretation of global gene domain (17), although the details of this mechanism have not been expression studies and expression quantitative trait loci data, as addressed. By contrast, we have shown that activation of NME4 changes in the expression of some genes may play no role in the (300 kb away) results from a direct physical interaction between processes being studied. In particular, our findings are relevant to multiprotein complexes assembled at an erythroid-like element at the interpretation of CNV data. Recent genome wide studies have NME4 and at the ␣-globin MCS-R elements, as observed for other shown that not only do CNVs account for a large proportion of long-range enhancer/promoter interactions (11, 18). heritable variation in gene expression, but surprisingly up to 50% of GENETICS It is interesting that the influence of MCS-R2 on NME4 expres- this variation is because of genes lying beyond the CNV interval sion is modulated by the number of ␣-globin genes in cis. This (24). Variation in the number of ␣-globin genes, generated by finding was most rigorously tested by the fact that the expression of frequent homologous recombination between duplicated se- eNME4 was up-regulated in individuals, each carrying one of three quences, provided one of the first examples of a polymorphic CNV independent deletions that remove both ␣-globin genes. As the only in human populations (21). Here we have shown how such CNVs genetic feature these individuals share is the deletion of the ␣-globin may alter the expression of coordinately regulated genes, outside genes, it is compelling evidence that this is indeed the causative the deletions or insertions, across hundreds of kilobases of a factor in this up-regulation. chromosome, almost certainly by altering competition between Previous studies have suggested that closely linked promoters promoters for a shared regulatory element (Fig. 4). It seems likely may compete for the activity of a shared enhancer (19, 20). This that this mechanism will explain some elusive diseases and pheno-

Lower et al. PNAS ͉ December 22, 2009 ͉ vol. 106 ͉ no. 51 ͉ 21775 Downloaded by guest on September 24, 2021 types that are not currently explained by the gain or loss of genes Modified Circular Chromosome Conformation Capture. For 4C, 1 ϫ 107 cells were that physically lie within associated CNVs. fixed with 2% formaldehyde in medium for 10 min at room temperature with agitation. Following quenching with glycine, cells were lysed (10 mM Tris pH8.0, Methods 10 mM NaCl, 0.2% Nonidet P-40, 1ϫ proteinase inhibitor) and resuspended in 1ϫ Patients. The patients studied were ascertained by reduced hematological indi- DpnII digestion buffer (NEB), 0.3% SDS, 2% Triton X-100, and 500U DpnII at 37 °C ces. Deletions were confirmed with either Multiplex Ligation-dependent Probe overnight with shaking. The enzyme was inactivated at 65 °C for 25 min with ϫ Amplification or Southern blotting. Controls are individuals with no evidence of shaking, and the total volume resuspended in 7 ml 1 ligation buffer with 1% hematological defects. Consent was obtained in accordance with standard ethics Triton X-100. Following incubation at 37 °C for 1 h, the samples were cooled on approval guidelines. All individuals are homozygous for the common haplotype ice for 2 min and 240 units of high-concentration T4 DNA ligase (Fermentas) was ␮ surrounding NME4. added. Following incubation overnight at 16 °C, 300 g of proteinase K was added and cross-links reversed at 65 °C overnight with rotation. The samples Cell Types. Erythroid cells were obtained using a two-phase culture system as were treated with 15 ␮g RNase (Roche) at 37 °C for 30 min. Following phenol/ previously described (25). Red blood cells were purified from whole blood (ac- chloroform extraction and ethanol precipitation, DNA was resuspended in 500 ␮l cording to ref. 26), for expression analysis. hES cells were obtained from the H1 1ϫ ligation buffer and 60 units high-concentration T4 DNA ligase (Fermentas) HES cell line, grown according to manufacturer’s instructions. Nonerythroid cells and incubated for2hat16°Cwith shaking. Following phenol/chloroform ex- were either primary T lymphocytes (ChIP, Northern analysis), or EBV-transformed traction and ethanol precipitation, DNA was resuspended in 100 ␮l water, of B-lymphocyte cell lines (expression analysis, 4C analysis). which 10 ␮l was used as template in Advantage-GC PCR (Clontech) as per man- ufacturer’s instructions. Primer sequences can be found in Table S6. The resultant Expression Analysis. For all cell types, RNA was extracted with Tri reagent as per amplified DNA was ethanol precipitated and resuspended in 20 ␮l water, of ␮ manufacturer’s instructions (Sigma). For Northern blots, 20 g of total RNA were which 5 ␮l was hybridized to a customized ␣ globin tiled microarray using assayed, using the NorthernMax-Gly kit as per manufacturer’s instructions (Am- sonicated genomic DNA as input, as previously described (5). Enrichment of NME4 bion). For real-time expression analysis, RNA was DNaseI treated (Ambion) and in 4C material was quantitated by real-time PCR, and normalized to an unen- cDNA was generated with SuperScript III (Invitrogen) as per manufacturer’s riched amplicon. instructions. Real-time PCR assays were obtained from either Applied Biosystem’s Assay-On-Demand resource, or designed with Primer Express software. Both ϩRT Pyrosequencing. The ratio of expression of allele-specific transcripts and 4C- and ϪRT templates were analyzed to detect genomic contamination. Expression amplified DNA of NME4 was ascertained by pyrosequencing. Primer and dispen- in hES and erythroid cells was calculated relative to a control sequence in the 18S sation information is contained in Table S4 and S5. Peak height is directly pro- ribosomal RNA gene (Eurogentec RT-CKFT-18S). Expression in red blood cells was portional to the amount of nucleotide incorporated. Analysis was performed in calculated relative to CD71, to correct for stage of erythropoiesis. For the latter, duplicate and an average obtained. the mean of expression of each gene in the control samples is set to 100%, and expression in the deletion patient samples is expressed relative to this mean for each gene analyzed. For details of assays, real time primers and probes, and Sequence Information. All human sequence positions correspond to the Inter- Northern probes see Tables S1–S3. national Human Genome Sequencing Consortium Human March 2006 (hg18) Assembly sequence. The NME4 gene corresponds to sequence position Statistical Analysis. Significance of differences in expression between control chr16:387193–390755; eNME4 corresponds to sequence position chr16:389609– and deletion samples was calculated with a two-tailed Student’s t test assuming 390755. nonequal variance. Significance in variation of allele-specific expression was calculated with an f test. ACKNOWLEDGMENTS. We thank the clinicians and the members of the families studied for their participation, particularly Dr. C. L. Harteveld (Leiden University ChIP and ChIP-chip. ChIP was performed as previously described (5). Briefly, for Medical Center, The Netherlands), Dr. D. Rund (Hadassah University Hospital, one immunoprecipitation, 1 ϫ 107 cultured primary human erythroblasts or T Israel), Dr. D. Filon (Hadassah Medical Center, Israel), Dr. H. Frischknecht (Institute lymphocytes were cross-linked with 1% formaldehyde for 10 min. DNA was for Medical & Molecular Diagnostics Ltd, Switzerland), Dr. S. L. Thein (King’s sheared by sonication to fragments under 500 base pairs. Antibodies used were College Hospital, United Kingdom) Dr. N. Gattermann (Heinrich-Heine University, H3Ac (06–599, Upstate), H4ac (06–866, Upstate), H3K4me2 (07–030, Upstate), Germany), Dr. J. Finlayson (QEII Medical Centre, Western Australia), and Dr. R. Hutch (Cork, Ireland). We thank Dr. C. Porcher for the kind gift of the SCL H3K4me3 (ab8580, Abcam), CTCF (07–729, Upstate), GATA1 (sc1234, Santa Cruz), antibody, Dr. I. Dunham for assistance with microarrays, the Computational and SCL (gifted by C. Porcher). ChIP DNA was analyzed by real time PCR, calculated Biology Research Group, Oxford University, for bioinformatic support, and Prof. ␤ relative to input, and normalized to -actin promoter. For details of primers and W. Wood for critical reading of the manuscript. This work was supported by the probes see Table S6. ChIP-chip was performed by hybridization to a custom Medical Research Council, the Wellcome Trust and the National Institute for ␣-globin tiled microarray as previously described (5), and enrichment was vali- Health Research Biomedical Research Centre Program. K.M.L. was supported by dated with real-time PCR. an Oxford Nuffield Medical Fellowship, Oxford University.

1. Kim TH et al. (2007) Analysis of the vertebrate insulator CTCF-binding sites in 14. Simonis M, et al. (2006) Nuclear organization of active and inactive chromatin domains the human genome. Cell 128:1231–1245. uncovered by chromosome conformation capture-on-chip (4C). Nat Genet 38:1348– 2. Kleinjan DA, van Heyningen V (2005) Long-range control of gene expression: Emerging 1354. mechanisms and disruption in disease. Am J Hum Genet 76:8–32. 15. Kokubu C, et al. (2009) A transposon-based chromosomal engineering method to 3. Dean A (2006) On a chromosome far, far away: LCRs and gene expression. Trends Genet survey a large cis-regulatory landscape in mice. Nat Genet 41:946–952. 22:38–45. 16. Bender MA, et al. (2006) Flanking HS-62.5 and 3Ј HS1, and regions upstream of the LCR, 4. Hughes JR, et al. (2005) Annotation of cis-regulatory elements by identification, are not required for beta-globin transcription. Blood 108:1395–1401. subclassification, and functional assessment of multispecies conserved sequences. Proc 17. Cajiao I, Zhang A, Yoo EJ, Cooke NE, Liebhaber SA (2004) Bystander gene activation by Natl Acad Sci USA 102:9830–9835. a locus control region. EMBO J 23:3854–3863. 5. De Gobbi M, et al. (2007) Tissue-specific histone modification and transcription factor 18. de Laat W, Grosveld F (2003) Spatial organization of gene expression: The active binding in alpha globin gene expression. Blood 110:4503–4510. chromatin hub. Chromosome Res 11:447–459. 6. Wallace HA, et al. (2007) Manipulating the mouse genome to engineer precise func- 19. Choi OR, Engel JD (1988) Developmental regulation of beta-globin gene switching. tional syntenic replacements with human sequence. Cell 128:197–209. Cell 55:17–26. 7. Milon L, et al. (2000) The human nm23–H4 gene product is a mitochondrial nucleoside 20. Dillon N, Trimborn T, Strouboulis J, Fraser P, Grosveld F (1997) The effect of distance on diphosphate kinase. J Biol Chem 275:14264–14272. long-range chromatin interactions. Mol Cell 1:131–139. 8. Milon L, et al. (1997) nm23–H4, a new member of the family of human nm23/nucleoside 21. Higgs DR, et al. (1989) A review of the molecular genetics of the human alpha-globin diphosphate kinase genes localised on chromosome 16p13. Hum Genet 99:550–557. 9. Higgs DR, Wood WG (2008) Long-range regulation of alpha globin gene expression gene cluster. Blood 73:1081–1104. during erythropoiesis. Curr Opin Hematol 15:176–183. 22. De Gobbi M, et al. (2006) A regulatory SNP causes a human genetic disease by creating 10. Sexton T, Bantignies F, Cavalli G (2009) Genomic interactions: Chromatin loops and a new transcriptional promoter. Science 312:1215–1217. gene meeting points in transcriptional regulation. Semin Cell Dev Biol 20:849–855. 23. Spitz F, Gonzalez F, Duboule D (2003) A global control region defines a chromosomal 11. Vernimmen D, De Gobbi M, Sloane-Stanley JA, Wood WG, Higgs DR (2007) Long-range regulatory landscape containing the HoxD cluster. Cell 113:405–417. chromosomal interactions regulate the timing of the transition between poised and 24. Stranger BE, et al. (2007) Relative impact of nucleotide and copy number variation on active gene expression. EMBO J 26:2041–2051. gene expression phenotypes. Science 315:848–853. 12. Vernimmen D, et al. (2009) Chromosome looping at the alpha globin locus is mediated 25. Pope SH, Fibach E, Sun J, Chin K, Rodgers GP (2000) Two-phase liquid culture system via the major upstream regulatory element (HS-40). Blood 114:4253–4260. models normal human adult erythropoiesis at the molecular level. Eur J Haematol 13. Zhao Z, et al. (2006) Circular chromosome conformation capture (4C) uncovers exten- 64:292–303. sive networks of epigenetically regulated intra- and interchromosomal interactions. 26. Beutler E, West C, Blume KG (1976) The removal of leukocytes and platelets from whole Nat Genet 38:1341–1347. blood. J Lab Clin Med 88:328–333.

21776 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0909331106 Lower et al. Downloaded by guest on September 24, 2021