Review For reprint orders, please contact: [email protected]

Chromatin conformation signatures: ideal human disease biomarkers?

Human health is related to information stored in our genetic code, which is highly variable even amongst healthy individuals. expression is orchestrated by numerous control elements that may be located anywhere in the , and can regulate distal by physically interacting with them. These DNA contacts can be mapped with the conformation capture and related technologies. Several studies now demonstrate that patterns are associated with specific structures, and may therefore correlate with chromatin conformation signatures. Here, we present an overview of genome organization and its relationship with gene expression. We also summarize how chromatin conformation signatures can be identified and discuss why they might represent ideal biomarkers of human disease in such genetically diverse populations.

1 KEYWORDS: 3C-carbon copy n biomarker n chromatin conformation signature Jennifer L Crutchley , n chromatin structure n chromosome conformation capture n long-range regulation Xue Qing David Wang1, n transcription Maria A Ferraiuolo1 & Josée Dostie†1 Recent advances in DNA sequencing technology composed of over 3 billion nucleotides, which 1Department of Biochemistry, uncovered a tremendous diversity in the human when pieced together, would measure almost a McGill University, 3655 Promenade genetic code. Indeed, several million nucleotides meter in length. Therefore, our genome must Sir-William-Osler, Room 814, Montréal, Québec, Canada were found to differ between individuals, even be tightly packaged and organized in order to †Author for correspondence: in the healthy population [1,2]. What might be fit within each micron-sized nuclei. Packaging Tel.: +1 514 398 4975 Fax: +1 514 398 7384 the impact of such variability on human health? of the human genome is functional rather than [email protected] Might sequence variations impart disease sus- random, and there are three defined hierarchi- ceptibility or differential drug response? As cal levels of organization (Figure 1) [3–5]. The first human health is related to information stored in level of genome organization is characterized our genetic code, this enormous variability will by the linear arrangement of genes and regu- have a significant impact by affecting the expres- latory sequences (or ‘DNA elements’) along sion of genes. Sequence variation may target . This first dimension includes genes and regulatory DNA elements directly, or clusters of genes and their regulatory DNA ele- alter gene expression by affecting spatial genome ments. Gene clusters composed of evolution- organization. Although still poorly understood, arily duplicated genes tend to encode spatial chromatin organization is emerging as with similar functions and with tissue-specific an important mechanism to regulate the expres- expression patterns defined by their regulatory sion of genes. Therefore, understanding genome elements. Examples of this level of organization organization will be crucial to the development include the Hox gene clusters and a/b-globin of optimal molecular targeted personalized loci, both of which will be further described in therapies. In this article, we report on the rela- sections later. tionship between gene expression and chromatin The second level of genome organization is structure, beginning with a summary of human defined by the interaction between DNA and genome organization below. proteins. This second dimension is dominated by the relationship between genomic DNA Spatial genome organization in vivo and , where DNA is wrapped around The ability to store, retrieve and translate to form the 10 nm chromatin fiber. instructions from the genetic code is essen- At this level, chromatin appears as beads on a tial to maintain life in all cells. This process is string, with beads corresponding to nucleosomes not trivial by any means in human cells given composed of two copies each of H2A, the size of our genome. In fact, understand- H2B, H3 and H4. Histones can be extensively ing this process is not trivial even for much modified post-translationally by acetylation, smaller . The human genetic code is methylation, phosphorylation, sumoylation,

10.2217/BMM.10.68 © 2010 Josée Dostie Biomarkers Med. (2010) 4(4), 611–629 ISSN 1752-0363 611 Review Crutchley, Wang, Ferraiuolo & Dostie

1st level

2nd level: 10 nm 3rd level

CT

Nucleus

Cytoplasm

Figure 1. In vivo spatial genome organization. Three hierarchal levels of genome organization are illustrated from top to bottom. A gene cluster represents the first level of genome organization where the double helix is shown as yellow and blue strands. Transcriptional start sites indicated by arrows are highlighted with yellow circles. Highlighted in red is a nearby element regulating downstream genes. The 10‑nm fiber is shown as an example of second level genome organization. A double helix wrapped around histone octamers represent nucleosomes. Further coiling of nucleosomes forms the 30‑nm fiber. The third level of genome organization is represented as progressively larger and more compact chromatin fibers. are highlighted in orange, green, violet, red, yellow and blue clusters. Intra- and inter-chromosomal contacts mediated by proteins are represented by red spheres. CT: Chromosome territory.

ADP-ribosylation and ubiquitinylation [6,7]. of an average chromosome would not fit into These epigenetic marks are mostly added to a nucleus. Therefore, additional folding and histone amino-terminal tails, and regulate their organization is essential for genome func- affinity to DNA and the recruitment of regula- tion. The third level of genome organization is tory chromatin binding proteins. Histone modi- defined by the packaging and spatial arrange- fications can also affect formation of the 30 nm ment of chromatin in the nuclear space. This chromatin fiber, which consists of a folded basic 3D organization is controlled by specialized 10 nm fiber with nucleosomes stacked on top of proteins that bind and fold the 30‑nm fiber each other. into higher levels of organization such as loops. Very little is known about genome organi- In addition to facilitating the accurate retrieval zation beyond the 30 nm fiber, of which the and translation of instructions from our genetic in vivo structure remains to be established [8]. code, the spatial chromatin architecture of our Even at this level of packaging, a stretched out genome is also used as a mechanism to regu- 30 nm chromatin fiber with the DNA content late gene expression [9–11]. Indeed, it was shown

612 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review that DNA elements can regulate the expression territories. Implementation of these technologies of distal target genes by physically interacting will help compile a complete list of functional with them [10]. This relatively recent discovery chromatin contacts, which may be impaired in explains why the functional organization of human disease. Importantly, these techniques the genome is not strictly linear along chro- will identify unique and conserved structural mosomes and how DNA elements can regulate features amongst the highly variable genomes genes located very far away on the same or even of healthy individuals, which may impart dif- on different chromosomes. Thus, long-range ferential disease susceptibility and response to chromatin contacts in cis (intrachromosomal) drug treatment. or in trans (interchromosomal) can regulate gene expression by bringing regulatory elements in „„DNA-FISH close physical proximity to target genes. Here, Until recently, DNA fluorescencein situ hybridi- we refer to ‘long-range’ chromatin contacts zation (DNA-FISH) was the main tool to from an empirical standpoint as interactions measure chromatin contacts and other genomic stronger than those originating from random features such as structural variations [33]. This collisions surrounding regions of interest. Long- technique is based on homologous sequence range chromatin contacts were found to regulate hybridization between an artificial DNA probe genes from diverse cellular pathways, indicating and the genomic DNA of cells chemically fixed that this form of control is a general regulation on glass slides. The artificial DNA probe con- mechanism [12–24]. However, at least for some tains an epitope that can be specifically recog- genomic regions, regulation through long-range nized by fluorescently labeled antibodies. Thus, DNA contacts has remained unclear [25–28]. hybridization sites can be visualized by epifluo- Nonetheless, coregulated genes located far from rescence microscopy and the position of multi- each other or on different chromosomes also can ple genomic regions can be measured simulta- co-localize and form foci in the nuclear space neously when using different combinations of [29,30]. This type of organization likely partici- probes and fluorescence tags. pates in coordinating the proper timing and/or As with any other approach, DNA-FISH relative expression levels of various genes. 3D offers both advantages and disadvantages to genome organization also includes positioning study genome organization. An important chromosomes into distinct territories within advantage of DNA-FISH is that it measures the nucleus, with gene-rich chromosomes at chromatin contacts in single cells. However, the the center and gene-poor chromosomes near resolution it provides is relatively low compared the periphery [3,31,32]. with newer sequencing-based methods, partly owing to the microscope’s detection limits (see Measuring spatial later). Although not the focus of this article, it genome organization is also important to note that advancements in It had long been suspected that genes could be fluorescence microscopy can now partly over- controlled over large genomic distances through come these limitations by allowing resolutions physical contacts with DNA elements. However, beyond the Abbe limit [34]. Visualization of this type of regulation mechanism has only been DNA-FISH fluorescence signals in chemically firmly demonstrated recently as a result of the fixed cells requires that superimposed signals development of various powerful technolo- be deconvoluted in order to accurately estimate gies. Today, genome architecture can be stud- physical distances along the z-axis. Chemical ied with several approaches, which are usually fixation is known to alter morphology, which combined for discovery (Figure 2 & Table 1). These can introduce error in distance measurements. techniques vary in resolution, throughput and This drawback of DNA-FISH is alleviated by cost, and together offer an unprecedented high- measuring physical distances between fluores- resolution view of our genome in vivo. Important cent probes in a large number of individual cells. information about spatial genome architecture Nevertheless, DNA-FISH remains the most has already come to light from these method- important method to validate long-range func- ologies and includes the identification of intra- tional chromatin contacts in vivo. This tech- and interchromosomal contacts with roles in nique is well suited to study the dynamics of a transcription or imprinting, the establishment few genomic regions or overall genome architec- of physical networks of coregulated genes, and ture and represents a perfect complement to the an overall assessment of our genome architec- recently developed chromosome conformation ture, including the existence of chromosome capture (3C) and 3C-related technologies (see

future science group www.futuremedicine.com 613 Review Crutchley, Wang, Ferraiuolo & Dostie

Digestion Sonication

Ligation Crosslinked IP cells

Biotinylated

Purification/ Purification/LMA IP fill in Biotinylated PCR linker

Agarose gel PCR/ TaqMan Ligation Ligation Ligation purification Tm

Microarray/ Purification/ Purification/ Purification/ 3C sequencing PCR sonication digestion

5C Cloning/ Linker/ Linker/ microarray sequencing sequencing

3C-Loop/ChIP-Loop Hi-C ChIA-PET

Figure 2. Mapping spatial genome organization with chromosome conformation capture and related technologies. Five techniques used to map physical chromatin contacts at high resolution in vivo are illustrated from top to bottom. Chemical crosslinking of cells is a common first step to all approaches and is used to capture chromatin structure. Interacting DNA segments crosslinked by proteins are shown as yellow and green lines, and green and orange spheres, respectively. Fixed chromatin is either digested with a restriction enzyme (shown as scissors) or sheared by sonication (represented by concentric circles) to release crosslinked DNA fragments. Yellow and green arrows represent chromosome conformation capture (3C) primers. 5C primers used during the ligation-mediated amplification step are illustrated by yellow/black and green/gray lines, where black and gray moieties represent universal primer sequences. Y‑shaped molecules represent antibodies. Biotinylated nucleotides are shown as red dots. Streptavidin beads are shown in purple. 3C: Chromosome conformation capture; 6C: Combined chromosome conformation capture ChIP cloning; ChIA-PET: Chromatin interaction analysis using paired-end tags; ChIP: Chromatin immunoprecipitation; IP: Immunoprecipitation; LMA: Ligation-mediated amplification.

later). Together, these approaches will likely lead „„RNA-TRAP to a better understanding of genome organiza- RNA tagging and recovery of associated pro- tion and function. teins (RNA-TRAP) is a technique reminiscent

614 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review of DNA-FISH introduced by Carter et al. in of their physical proximity in the nuclear space. 2002 [35,36]. This approach captures the local The third 3C step involves ligation of crosslinked DNA environment of an actively transcribed gene DNA fragments. The DNA is ligated under by combining aspects of FISH and chromatin conditions favoring intramolecular ligation of immunoprecipitation (ChIP). Like DNA-FISH, crosslinked fragments and minimizes random RNA-TRAP uses homologous sequence hybridi- ligation. During the fourth step of 3C, the DNA zation of an artificial probe. However, instead of is purified to remove all proteins and other con- hybridizing on complementary genomic DNA, taminants. The resulting 3C library features the probe binds to nascent unprocessed RNA at pair-wise ligation products between DNA seg- actively transcribed genes. During RNA-TRAP, ments that were close to each other in the nuclear the DNA probe is labeled with an epitope recog- space regardless of their linear distance along the nized by an antibody coupled to the horserad- genome. The relative abundance of these ligation ish peroxidase enzyme, which attaches biotin products is inversely proportional to the origi- tags onto chromatin proteins in the immedi- nal 3D distance separating DNA segments and ate vicinity of nascent RNA transcripts. Thus, can therefore be used to reconstruct the spatial active transcription sites can be visualized with organization of the genome in vivo. The final 3C fluorescently labeled streptavidin. Additionally, step consists of measuring the relative abundance active chromatin and associated proteins can be of individual ligation products in the library. 3C specifically purified by affinity chromatography library products are usually quantified by PCR and analyzed by quantitative PCR. amplification of ligation junctions and agarose Unlike DNA-FISH, which provides a low gel detection. Alternatively, ligation junctions resolution ‘bird’s eye’ view of targeted chromatin can be measured by TaqMan quantitative PCR components, RNA-TRAP can uncover in-depth information about the genomic environment of Table 1. Technologies employed to study spatial genome transcribed genes. However, the enzymatic step organization. that tags proteins surrounding transcribed genes Technique Genomic resolution/scale/throughput Ref. can trap proteins within a very large radius of 3C High resolution [37–39,59] activity. As such, RNA-TRAP captures the entire Small genomic domains local environment of a given gene of interest Low throughput rather than detecting direct physical interactions. 4C High resolution [44–46,98] Genomic environment surrounding a given region „„3C Low throughput Chromosome conformation capture was ini- 5C High resolution [47–49,99] tially developed to study the complete confor- Genome scale High throughput mation of a chromosome in yeast [37]. 3C is now used as a standard research tool to analyze the 6C High resolution [52,53] Genome-wide contacts associated with a given organization of complex genomic domains and Intermediate throughput investigate the relationship between genome 3C-Loop High resolution [42,43] architecture and gene expression [38,39]. 3C can Genome-wide contacts associated with a given protein be divided into five experimental steps. The Low throughput first step in conventional 3C is to chemically fix Hi-C High resolution (proportional to sequencing depth) [57] cells. This step captures interactions between Genome-wide DNA regions by crosslinking chromatin-bound High throughput histones and other associated proteins such as ChIA-PET High resolution [54–56,100] transcription factors. Thus, chemical fixation Genome-wide contacts associated with a given protein produces a snapshot of the 3D chromatin archi- (proportional to sequencing depth) tecture in vivo. Chemical fixation is a common High throughput step in all techniques currently used to study DNA-FISH Low resolution [33,101,102] Genome-wide genome organization. Although unavoidable, Low throughput it is important to note that this step may still RNA-TRAP Intermediate resolution [35,36] introduce artifacts that will be carried over in Genomic environment surrounding a given gene between approaches. Low throughput The second step of 3C consists of digesting 3C: Chromosome conformation capture; 4C: Circular chromosome conformation capture/ the genomic DNA with enzymes. Enzymatic chromosome conformation capture on ChIP/open-ended chromosome conformation capture; 5C: Chromosome conformation capture carbon copy; 6C: Combined chromosome conformation digestion of chemically fixed chromatin releases capture ChIP cloning; ChIA-PET: Chromatin interaction analysis using paired-end tags; DNA fragments that were crosslinked as a result FISH: Fluorescence in situ hybridization; TRAP: Tagging and recovery of associated proteins.

future science group www.futuremedicine.com 615 Review Crutchley, Wang, Ferraiuolo & Dostie

or by melting curve ana­lysis [40,41]. A major as quantitative as conventional 3C or 5C and caveat of 3C and 3C-based technologies is that should therefore be used mainly to identify it generates datasets from cell populations and ­interactions rather than quantify them. therefore features averaged interaction frequen- cies derived from various states. Thus, „„5C these technologies yield averaged structural The chromosome conformation capture carbon models rather than true structures. Although copy (5C) technology is also derived from 3C but these models can be noisy, they remain useful allows quantitative simultaneous genome-wide to identify changes between cell states. detection of thousands of DNA contacts [47–51]. During 5C, a 3C library is first generated using „„3C-Loop the standard 3C protocol. However, instead An immediate extension of 3C is the 3C-Loop of quantifying DNA contacts individually by technique, also known as the ChIP-loop method PCR amplification and agarose gel detection, [42,43]. Like 3C, this technique also involves fix- 3C libraries are first converted into 5C libraries ing cells to capture a ‘snapshot’ of in vivo genome and then analyzed on custom microarrays or by architecture. However, 3C-Loop includes an high-throughput DNA sequencing. 3C to 5C immunoprecipitation step for a specific protein library conversion is achieved by a ligation-medi- prior to ligation of crosslinked DNA fragments. ated amplification step involving annealing and 3C-Loop libraries are therefore enriched in liga- ligation of primers corresponding to 3C ligation tion products previously bound by a protein of junctions. This ligation-mediated amplification interest. Although removing unbound DNA step quantitatively detects 3C products specifi- fragments can substantially decrease background cally thereby creating a ‘carbon copy’ of DNA signals, this method requires prior knowledge of contacts, which is amplified by PCR and ana- the regions bound by the specific proteins since lyzed on microarrays or high-throughput DNA contacts are measured by PCR with specific sequencing. Although 5C is very quantitative primers. Nonetheless, an important strength of and somewhat high throughput, this approach 3C-Loop is that it allows identification of target does not identify DNA contacts without prior proteins contributing to local chromatin looping. knowledge of regions involved since it relies on However, this method is most currently used to the ligation of primers at predicted 3C junctions. validate possible cis/trans long-range interactions. „„6C „„4C The combined chromosome conformation cap- The 4C techniques (circular chromosome con- ture ChIP cloning (6C) technique is also derived formation capture, chromosome conformation from 3C and is an immediate extension of the capture on chip, open-ended chromosome con- 3C-Loop approach [52,53]. 6C was developed to formation capture) were developed to identify identify cis or trans long-range DNA interac- physical interactions genome-wide from any tions mediated by specific proteins without prior given genomic location [43–46]. Similar to 3C, knowledge of the regions involved. As such, the 4C involves chemical fixation of cells, digestion 6C protocol is identical to 3C-Loop until the of crosslinked DNA and ligation of crosslinked library purification step, but then uses a different fragments to generate a library of DNA contacts. approach to analyze libraries. During 6C, ligation However, during 4C, genomic DNA is digested products are first cloned into vectors rather than into very short fragments, which are then ligated analyzed individually by PCR. Individual clones under conditions promoting circularization of are then amplified and characterized by restric- crosslinked DNA. 4C libraries consisting of tion digest ana­lysis to identify those containing short circularized ligation products are then more than one DNA fragment. Clones with two purified as usual and amplified by reverse-PCR or more fragments are then sequenced from both with primers nested at a specific genomic loca- ends of the cloning vector to identify interacting tion. Reverse-PCR of 4C libraries thereby spe- sequences. Although 6C does not quantify DNA cifically amplifies all genomic regions physically contacts like 3C-Loop, the combined cloning/ interacting with the region of interest. Amplified sequencing shotgun approach qualitatively iden- DNA contacts are then identified on microar- tifies long-range DNA ­interactions mediated by rays or by high-throughput DNA sequencing. specific proteins. Although DNA contacts from a fixed genomic The development of 3C technology by Dekker location can be identifiedab initio genome-wide et al. in 2002 prompted the aggressive expansion at high resolution with 4C, this method is not of alternative 3C-derived approaches to study

616 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review high-resolution genome organization in vivo. that is labeled with an epitope tag. These prod- These methods share similar protocols each with ucts are then ligated under conditions favoring advantages and limitations but none are alto- intermolecular ligation of crosslinked DNA gether genome-wide, quantitative, high through- fragments and sonicated to reduce the size of put and applicable for ab initio contact identifica- Hi-C products. These Hi-C products are next tion. However, two state-of-the-art technologies isolated by affinity chromatography through the developed over the past year fulfill these criteria. epitope tags marking the junctions, and ligated These techniques are called chromatin interaction to sequencing linkers. The Hi-C library is finally analysis with paired-end tags (ChIA-PET) and analyzed by high-throughput DNA sequencing, Hi-C. These techniques are described later [54–57]. and all intra- and interchromosomal contacts mapped to the reference genome. „„ChIA-PET ChIA-PET was developed by the Ruan labora- Chromatin conformation signatures tory at the Genome Institute in Singapore [54,55]. The development of high-resolution techniques This high-throughput technology represents a to study spatial chromatin organization offers significant improvement upon the 3C-Loop an unprecedented view of our genome in vivo. and 6C technologies, as it quantitatively iden- Implementation of the more recently developed tifies chromatin contacts mediated by specific genome-wide technologies such as ChIA-PET proteins across entire genomes simultaneously. and Hi-C will eventually yield complete recon- ChIA-PET was first used to map the chroma- structions of highly variable human genome tin interaction network of estrogen receptor a architectures in vivo. Regardless, the earlier (ER-a) in a breast cancer cell line. ChIA-PET 3C-related techniques have already uncovered combines two techniques: chromatin inter­ new structure-based mechanisms of gene regula- action analysis and high-throughput paired-end tion. These control mechanisms involve differ- tag sequencing [56,58]. As with 3C, cells are first ent types of physical contacts such as – chemically fixed to capture in vivo chromatin enhancer or –enhancer interactions. contacts (Figure 2). However, instead of digesting Irrespective of the kind of contacts involved, the genomic DNA with a restriction enzyme, the expression state of genes regulated by this type of chromatin is sheared into small fragments by control mechanism can be identified by its spa- sonication. The fragmented chromatin is then tial chromatin organization. Furthermore, gen- immunoprecipitated with an antibody against eral gene expression patterns may also be asso- any protein of interest. The DNA fragment ends ciated with specific chromatin structures. We are then repaired and ligated to epitope-labeled term the collection of DNA contacts associated DNA linkers containing restriction sites. These with specific gene expression profiles chromatin products are further ligated under conditions conformation signatures (CCSs). We classify favoring intermolecular ligation of crosslinked CCSs into four distinct categories to simplify DNA, isolated by affinity with the epitope tag discussion (Figure 3). of linkers, and digested a second time into very short ChIA-PET fragments. These fragments „„Local chromatin organization are finally ligated to sequencing linkers, and the Local chromatin organization refers to the resulting ChIA-PET libraries are sequenced to chromatin compaction state of specific genomic map all intra- and interchromosomal contacts locations at distances below looping detection mediated by a given protein in the genome. range. This type of signature may significantly vary around promoter and enhancer sequences „„Hi-C depending on their activity, and is a measure Whereas ChIA-PET was designed to identify the of the levels of random collisions surrounding complete interactome of given proteins, Hi-C elements. For example, active elements can be was developed jointly by the Dekker and Lander more open and yield less DNA contacts while laboratories to measure all long-range genome- silent ones can appear more compact and pro- wide DNA contacts simultaneously [57]. As with duce stronger interactions. This ‘opening’ of the any other 3C-derived approach, the first steps chromatin was previously observed at the locus of Hi-C involve fixing cells and digesting the control region (LCR) in the b-globin locus [47]. crosslinked chromatin with a restriction enzyme. There are instances, however, where local chro- However, instead of immediately ligating the matin organization may remain constant regard- DNA, the overhangs produced by the enzyme less of cell type or cellular conditions. Such is are first filled with nucleotides including one the case of gene deserts devoid of transcription

future science group www.futuremedicine.com 617 Review Crutchley, Wang, Ferraiuolo & Dostie

Local chromatin Intrachromosomal organization contact (cis)

Interchromosomal Genomic contact (trans) environment

Figure 3. Chromatin conformation signatures. Collections of DNA contacts representing chromatin conformation signatures are separated into four distinct categories for discussion. (A) Local chromatin organization refers to compaction levels at genomic sites. Organization of the 10‑nm fiber at a transcribed gene is shown as described in Figure 1. Transcription machinery is represented by orange, green and yellow spheres. (B) Long-range intrachromosomal (cis) contacts. (C) Interchromosomal (trans) contacts. (D) Genomic environment. DNA from three chromosomes converges in the nuclear space to share regulatory factors and/or control elements and form contacts. A transcription factory is highlighted in yellow. The double helices from different chromosomes are shown as blue/yellow, red/yellow or purple/yellow strands. Rod-like folding of the double helix represents the 30‑nm fiber. Dashed red arrows represent active transcription and blue spheres indicate enhancer-binding proteins.

activity, which are actually used as reference in that each gene may be controlled by multiple CCSs for sample comparison in conformation elements and each element may control more studies [59]. than one gene. Furthermore, multiple DNA ele- ments might function simultaneously or inde- „„Intrachromosomal contacts pendently, dependent upon cellular conditions. The functional organization of genes and DNA Studies using 3C or 3C-related technologies have elements is not strictly linear along chromo- now shown that DNA elements can control distal somes. While binding sites genes by forming long-range intrachromosomal tend to localize at promoters, a given element physical contacts with them. This type of sig- may regulate distant genes without affecting the nature was first identified in the b-globin locus ones adjacent to it. Also, the relationship between between the LCR and actively transcribed globin genes and regulatory DNA elements is complex genes [13]. Intrachromosomal contacts were found

618 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review to be essential and mediated by hematopoietic contains inactive gene-poor segments of chroma- transcription factors. Although the b-globin tin that are structurally closed. Thus, the human locus was the first region shown to be regulated genome appears to be generally organized into by this type of mechanism, functional intrachro- knot-free ‘fractal globule’ conformations (or mosomal contacts have since been found to play environments) maximizing dense packaging important roles in the regulation of numerous and the ability to remain structurally dynamic. other genes [12,13,15–24,55,60]. CCSs as markers of gene „„Interchromosomal contacts expression states In addition to intrachromosomal contacts, there Over the last few years it has become appar- are instances when genomic loci on separate ent that patterns of gene expression can be chromosomes interact with each other in the associated with distinct chromatin structures. nuclear space to control gene expression. The These CCSs can be summarized by collections functional significance of these interchromo- of DNA contacts, which may be complex and somal contacts is somewhat debated. Whereas tissue specific. The study of gene clusters has functionally significant long-range contacts may been instrumental in deciphering this informa- occur between regulatory elements and target tion. Gene clusters are ideal models for genome genes, other long-range interactions may sim- organization studies since they tend to encode ply be a consequence of genome compaction highly regulated tissue-specific genes, which and bear no immediately apparent functional are sometimes controlled during development. significance. Nonetheless, the first functional Thus, these genomic regions can facilitate the example of this type of CCS was identified identification of both transcription-dependent between the promoter region of the IFN-g gene and tissue-specific CCSs. on and the regulatory regions of the Th2 cytokine locus on „„The b-globin locus [61]. This contact was confirmed by DNA-FISH The first, and by far best characterized gene and is thought to maintain both loci in an active cluster remains the b-globin locus. In humans, state and allow for a rapid response upon T‑cell this locus contains a set of five developmentally activation to differentiate into Th1 and Th2 regulated genes (HBE, HBG2, HBG1, HBD and cell lineages by expression of either gene loci. HBB) that encode variants of the hemoglobin b Interchromosomal contacts have since been chain (Figure 4A). These genes are almost exclu- found between other genomic regions [14,27,55,62]. sively expressed in erythrocytes and follow a very specific developmental pattern with HBE „„Genomic environment expressed during embryogenesis, HBG1 and Genomic environment refers to the concentra- HBG2 in the fetal phase, and HBD and HBB tion and composition of DNA sequences in in adults [65]. The b-globin genes are control- the nuclear space surrounding a given genomic led by an element, the LCR, which is situated position. It was shown that coregulated genes approximately 25 kb upstream of the most proxi- often cluster together to share similar transcrip- mal gene (HBE), and over 80 kb away from the tion factories irrespective of their linear genomic farthest one (HBB). Although it was known for positions [30,55,63,64]. Altogether, these studies some time that the LCR is required to specifically indicate that genomes are likely organized into activate each b-globin gene sequentially during dynamic networks of physical contacts bringing development, the mechanism of this long-dis- genes and regulatory elements in to close prox- tance regulation remained unknown until 3C imity to orchestrate gene expression. This model was applied to examine the cluster [13,66–68]. 3C is further supported by a recent Hi-C study con- ana­lysis revealed that the LCR physically inter- firming the spatial proximity of small, gene-rich acts specifically with actively transcribed genes chromosomes [57]. Hi-C ana­lysis has also gen- and not with silent ones [69,70]. It was found that erated unbiased long-range interaction maps of chromatin looping was mediated by erythroid- the human genome. These maps confirmed the specific transcription factors, presumably by existence of distinct chromosome territories and bridging enhancer sequences of the LCR to the revealed that the genome is further divided into promoters of transcribed genes. Indeed, looping two types of spatial compartments, or genomic between the LCR and adult b-globin genes was environments. The first environment contains observed in definitive erythroid cells where the active chromatin, which is typically gene rich genes are expressed, but not in progenitor eryth- and structurally open. The second environment roid cells where they remain transcriptionally

future science group www.futuremedicine.com 619 Review Crutchley, Wang, Ferraiuolo & Dostie

Off On

X X X X

G2 G1 G2 G1

X E X E D X B D B X

β-globin locus

Off On

A10 A11 X

X A9 A10 A13 X A13 A9 A11 X

HoxA cluster

Figure 4. Paradigm chromatin conformation signatures at transcriptionally regulated genomic loci. Gain of contacts does not equate to transcription activation. (A) Regulation of the b-globin locus in erythroid precursor cells (left) and definitive erythrocytes (right). The transcriptionally silent cluster (off) adopts a poised hub conformation maintained by proteins (red spheres). Adult b-globin gene transcription (on) is associated with contact formation between the locus control region (highlighted in red) and b-globin genes (yellow circles). (B) Regulation of the HoxA gene cluster. The transcriptionally silent locus features several chromatin loops (off). Transcription induction is associated with loss of contacts and unfolding of the cluster (on). Gray arrows marked with an X (red) indicate silent transcription start sites. A dashed red arrow represents active transcription. Transcription machinery is represented by orange, green and yellow spheres. silent. Interestingly, the LCR was also found of both transcription-dependent and tissue-spe- to bind and form a loop with sequences down- cific CCSs, both of which can be important for stream of the cluster in both progenitor and ­transcription regulation. definitive erythroid cells, but not in the brain where the cluster is always inactive and where „„The HoxA cluster no looping is ever detected. These results led to In the b-globin locus, the presence of contacts the active chromatin hub (ACH) model, whereby between the LCR and genes correlates with tran- the b-globin cluster is suggested to adopt a basic scription activation. Similarly, clustering of the conformation, which primes the cluster for acti- a‑ and b-globin genes in the nuclear space occurs vation specifically in erythroid cells. Thus, ana­ when the genes are transcribed [30]. However, lysis of the b-globin cluster revealed the existence transcriptional activity does not necessarily

620 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review

associate with the establishment of contacts. organization [77,78]. First, CTCF is an insulator- In fact, loss of contacts is sometimes correlated binding protein shown to form loops at bound- with transcription activation. A good example of aries separating active and inactive chromatin this type of CCS was found in the HoxA clus- domains, thereby maintaining different tran- ter, which is also regulated in a tissue-specific scription activities in distinct nuclear compart- ­manner during development (Figure 4B) [50]. ments [79]. Second, CTCF was found to mediate The Hox genes are members of the evolu- functional chromatin loops within specific gene tionary conserved homeobox superfamily that loci. For example, CTCF was shown to bind encode developmentally regulated transcription and bridge multiple regulatory elements in the factors [71]. In humans, there are 39 Hox genes, mouse b-globin locus and mediate formation which are organized into four clusters of 13 par- of the so-called ACH [68]. Indeed, conditional alogue groups located on separate chromosomes. deletion of CTCF or disruption of regulatory The HoxA cluster is located on elements with CTCF binding sites destabilized and encodes 11 transcription factors. During the ACH. development, the spatial and temporal expres- Probably the most characterized intrachro- sion of these genes follows the order of their mosomal CTCF contacts are the ones that reg- position along the chromosome. For example, ulate imprinting at the Igf2/H19 locus [80–82]. HoxA genes located at the 3´ end of the cluster The H19 and Igf2 genes are located approxi- are expressed more anteriorly in the embryo and mately 100 kb away from each other on human earlier during development [72,73]. This colin- chromosome 11. Between the genes and proxi- earity and previous in situ hybridization studies mal to H19, a DNA sequence called imprinting strongly suggests that chromatin structure plays control region (ICR) is a well-known CTCF- a central role in Hox regulation [74,75]. Perhaps binding site. On the maternal allele, CTCF even more interesting is the observation that can bind to the ICR and forms multiple loops transcriptional silencing is key to proper Hox with regions along the locus that prevents Igf2 function, since ectopic expression can lead to from physically interacting with its enhancer human disease. We have recently found that the sequence. By contrast, on the paternal allele, HoxA cluster is organized into multiple discrete CTCF binding to the ICR is abolished by DNA chromatin loops when transcriptionally silent methylation and the Igf2 gene is able to form and that DNA looping is absent when genes are a long-range DNA contact with its enhancer. actively transcribed [50,76]. Thus, gene clustering Thus, CTCF binding and loop formation is through chromatin looping appears to be a CCS essential to regulate the entire locus and ensure of the transcriptionally silent HoxA gene cluster. that the Igf2 gene is expressed only from the Importantly, specific and sequential unfolding paternal allele. of these chromatin loops allowing access to the In addition to mediating intrachromosomal transcription machinery may hold the key to the contacts, CTCF can also form functional inter- developmental colinearity of Hox gene clusters. chromosomal interactions between coregu- lated genomic regions. For example, the Igf2/ Regulation of CCSs H19 and Wsb1/Nf1 gene loci were shown to While it is clear that DNA contacts are essen- interact in a CTCF-dependent manner [14]. tial to regulate the expression of some genes, Indeed, CTCF depletion or deletion of the the factors forming and maintaining these con- regulatory element required for CTCF bind- tacts, and the pathways regulating them are very ing abolished the interaction and altered gene poorly understood. The recent identification expression at the Wsb1/Nf1 locus. CTCF might of factors capable of forming chromatin loops therefore direct distant genomic segments to genome-wide suggests that general CCS regula- common transcription factories by mediating tion pathways may also exist in addition to gene- ­interchromosomal contacts. specific control mechanisms. The CTCF and SATB1 proteins described below are two DNA „„SATB1 looping factors known to integrate higher-order Special AT-rich sequence binding protein-1 chromatin architecture with gene regulation. (SATB1) is another relatively well-character- ized master organizer of in vivo chromatin „„CTCF structure [83–85]. This protein binds to specific Strong evidence supports a role for the CCCTC- DNA sequences termed matrix attachment binding factor (CTCF) as a genome-wide CCS regions, and anchors the genome to the nuclear and central regulator of spatial genome matrix through series of distinct loops. The

future science group www.futuremedicine.com 621 Review Crutchley, Wang, Ferraiuolo & Dostie

chromatin loops formed by SATB1 participate in Single nucleotide polymorphisms are single establishing the overall genome architecture and nucleotide variations that may be substituted, are known to function in gene regulation. SATB1 deleted or inserted in the genome. Although regulates gene expression by at least two types of SNPs are generally not pathogenic, a number mechanisms. First, it is known to form distinct have been associated with disease. They may chromatin structures that selectively tether DNA be present within coding or noncoding gene elements to specific nuclear compartments. For sequences, or in intergenic regions. SNPs in gene example, SATB1 was shown to promote enhancer coding regions may be synonymous and have no activity over long distances by forming cage-like effect on protein function. Alternatively, SNPs DNA networks that move distal elements and can be nonsynonymous (nsSNPs) and produce target genes in close proximity. The SATB1 net- protein variants. The effect of nsSNPs on protein works also appear to activate transcription by function can range from none to the production segregating to other nuclear of nonfunctional proteins. Likewise, intergenic compartments. Second, SATB1 can inhibit tran- SNPs may also bear multiple consequences, scription by recruiting chromatin modifiers, such ranging from none to disrupting genome func- as histone deacetylases, and remodelers, such as tion. For example, SNPs at enhancer elements ATP-dependent chromatin-assembly factor and may have a serious impact on genome regulation imitation switch [86]. (Figure 5B). SNPs may disrupt the binding sites Special AT-rich sequence binding protein-1 of transcription factors or of structural proteins, is cell-type specific, and its role in regulating alter transcription patterns and the formation of the Th2 cytokine locus is well documented in proper CCSs. Such is the case for the inherited thymocytes. The mouse Th2 cytokine locus rs6983267 SNP variant associated with colorectal located on chromosome 11 measures approxi- cancer pathogenesis. This variant was shown to mately 200 kb, and encodes the IL‑5, IL‑4 and modify an enhancer sequence found to physically IL‑13 genes. Expression of these genes is coor- interact with the MYC proto-oncogene [88]. This dinated upon Th2 cell activation and it was intergenic SNP increased binding of the tran- shown that following activation, the locus folds scription factor 7-like 2 (TCF7L2) and enhanced into numerous small DNA loops anchored at MYC expression in colorectal cancer cells. the base by SATB1. RNAi knockdown experi- Structural variations (deletions, insertion and ments demonstrated that SATB1 does not simply deletions) of large DNA segments could be par- organize the locus into distinct loops. Instead, ticularly detrimental to proper genome function SATB1 was also found to be required for the (Figure 5C, D & E). Like SNPs, structural variants expression of the interleukin genes themselves can be found anywhere in the genome of healthy and of the c-Maf transcription factor regulating individuals. They can affect genes directly or the locus [87]. alter regulatory DNA elements. For example, intergenic deletions may eliminate enhancer Genome variability, CCSs sequences and affect the expression pattern of one & human health or multiple constitutive and/or regulated genes. Variations in the human genetic code are very Such SVs may induce local chromatin changes at abundant. Although some variations may gene promoters, alter intra- or interchromosomal already be linked to human disease, the full contacts, and even modify the genomic environ- impact of this diversity on human health is ment of genes. Likewise, large intergenic inser- currently unknown. Human genome sequence tions may change gene expression by disrupting variations can take many forms, ranging in size regulatory elements or chromatin structures from single nucleotides to large chromosomal essential for transcription. Chromosomal inver- segments. Variants are classified into several dis- sions, either balanced or unbalanced, may also tinct groups according to their size, and include affect gene expression, particularly by altering single nucleotide polymorphisms (SNPs), inser- genomic environments of genes. For example, a tion/deletions (indels), copy number variations large balanced inversion might displace enhancer (CNVs) and structural variations (SVs; inser- elements from shared transcription factories and tions, deletion and inversions) (Table 2). Any type affect regulation of multiple genes under specific of sequence variation may change gene expres- cellular conditions. Thus, CCSs associated with sion patterns and result in altered CCSs (Figure 5). even single SV may be very complex. Thus, variations of the human genome may Copy number variations are of particular bear distinct CCSs and identify specific gene interest to disease pathogenesis because of their ­expression profiles. large size and effect on the overall genomic

622 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review

environment. CNVs are classified as segments Table 2. Human genome sequence variations. of DNA spanning 1 kb to several megabases in Variation Size Ref. size, for which copy-number differences were observed between two or more genomes of the Single nucleotide polymorphism Single base pair [103–106] same species. These segments can be copy- Insertion/deletions (indels) Up to 1 kb [107,108] number gains or losses of gene coding or inter- Copy number variation 1 kb or larger [90] genic genomic DNA. Intergenic CNV gain of Mapped from 443 bp and larger copy number variations DNA elements may directly modify gene expres- Structural variations: kb to Mb [91,108,109] sion and manifest complex CCSs of altered local Insertions chromatin structure and chromosomal contacts Deletions (Figure 5F). Alternatively, a CNV may contribute Inversions to the pathogenesis of a disease by altering the location of genes with respect to regulatory ele- knockdown was found to restore breast-like aci- ments. For example, a CNV loss in close prox- nar polarity, and to inhibit tumor growth and imity to an enhancer might displace elements metastasis. Furthermore, SATB1 was found from their target genes and prevent formation of to delineate specific chromatin modifications the correct CCS required for gene activation or at target gene loci, which directly upregulated silencing. An interesting example of this type of metastasis-associating genes and downregulated chromatin structure-induced altered transcrip- tumor-suppressor genes. SATB1 might there- tion was found at the 4q35 locus in patients fore play a role in tumorigenesis by reprogram- suffering from fascioscapulohumeral muscular ming chromatin organization and transcription dystrophy [22,89]. This dominant neuromuscular profiles of cancer cells to promote growth and disorder is linked to the partial deletion of a pol- metastasis. Thus, regulation of chromatin struc- ymorphic repeat region known as D4Z4 located ture and organization might contribute to dis- in the subtelomeric region of chromosome 4q. ease pathogenesis and represent valuable biomar- Whereas the 3.3 kb D4Z4 repeat is present at kers. The continual application of 3C-related up to 200 copies in healthy individuals, less than technologies will likely uncover the mechanisms 10 are usually found in fascioscapulohumeral regulating CCS formation and reveal their muscular dystrophy patients. Partial D4Z4 impact on genome function. deletion was shown to prevent anchoring of a nearby matrix attachment region sequence to CCSs as ‘ideal’ biomarkers? the nuclear matrix, which usually form distinct The recent developments in gene sequencing, looped domains restricting 3D contacts between targeted therapies and molecular diagnostics genes and enhancer sequences. Thus, neighbor- are leading to a more personalized approach ing genes are aberrantly overexpressed as a con- for the treatment of human disease. The iden- sequence of sequence variation-induced altered tification of biomarkers with these technologies spatial chromatin organization. Chromatin will continually improve personalized therapy architecture-induced aberrant transcription is as it will identify the threat of disease, if a dis- not likely linked to just a few human disorders. ease is present, its severity and its response to With the recent influx of evidence indicating drug treatment. For instance, biomarkers can that multiple rare de novo, and inherited CNVs provide the confirmation of a disease status contribute to the genetic component of vulner- necessary for correct treatment. In the case of ability to neuropsychiatric disorders such as chronic disease where individuals may require autism spectrum disorder and schizophrenia treatment for a long period of time, biomar- [90–96], it will be interesting the see whether kers may be critical for the timely identifica- chromatin organization plays a role in these tion and classification of a disease. Also, in the complex disorders. event of an early symptom-free phase, such as In addition to changes in genome sequence, in Alzheimer’s disease, biomarkers may allow deregulation of factors regulating chromatin preventive treatment. architecture may also play a significant role in The current need for more accurate disease disease pathogenesis. For example, the genome diagnosis and prognosis increases the priority organizer SATB1 was found to be involved in to identify new biomarkers. Markers that can breast cancer. Han and colleagues showed that predict response or resistance to drug therapies reducing SATB1 protein levels by RNAi altered are desirable and will help single out patients the expression of over 1000 genes and reversed susceptible to severe adverse drug reactions the process of tumorigenesis [97]. Indeed, SATB1 and reduce the risk of treatment failure. The

future science group www.futuremedicine.com 623 Review Crutchley, Wang, Ferraiuolo & Dostie

X X

DNA contact SNP Deletion

X X

Insertion Inversion CNV

Figure 5. Sequence variations can alter spatial genome organization and change gene expression. Any type of sequence variation can alter chromatin structure and either induce or inhibit transcription. A hypothetical enhancer/gene interchromosomal contact required for transcription is used as an example. Transcription inhibition is shown as the outcome of chromatin structure changes for all variants except CNVs. (A) DNA contact and gene expression of a healthy individual with a linear reference genome structure. (B) SNP can alter DNA binding site of a transcription factor and impair transcription. (C) DNA deletion can remove transcription factor binding sites. (D & E) DNA insertions or inversions can displace enhancer elements from their genomic environment. (F) CNVs can multiply the number of transcription factor binding sites and stimulate transcription. Enhancer- binding protein is shown as a blue sphere. Transcription machinery is illustrated by orange, green and yellow spheres. Red spheres are structural proteins added to indicate double helix orientation. Gray arrows marked with an X (red) indicate silent transcription start sites. A dashed red arrow represents active transcription. CNV: Copy number variation; SNP: Single nucleotide polymorphism.

variability of the human genome will likely gene expression and cope with environmental hamper identification of the genetic components stresses. Thus, CCSs can potentially capture this of diseases or render disease classification prob- ‘genomic behavior’ that may not be otherwise lematic. One of the advantages of using CCSs apparent in ‘static’ patient samples where cells as biomarkers is that multiple linear variations were not subjected to the relevant environment may be integrated into unique spatial signatures. conditions prior to ana­lysis. Moreover, gene For example, combinations of SNPs, apparently expression profiles such as steady-state mRNA nonpathogenic SVs and CNVs may together levels or protein output can be uninformative induce alternative genome conformations affect- when slight changes in multiple gene expres- ing an individual’s ability to optimally regulate sion levels contribute to a given pathology. Small

624 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review gene expression changes could be discarded in mechanisms involved. Finally, CCSs uniquely routine transcriptome ana­lysis but could still identify 3D gene regulation mechanisms. Thus, have important effects on cellular function. aberrant gene expression resulting from changes Alternative mRNA splicing patterns could also in the positioning of genes in space may only be contribute to specific cell states. These changes identified through CCSs. would not be detected in regular mRNA profil- ing. Similarly, changes in the expression of regu- Conclusion & future perspective latory noncoding RNAs such as miRNAs would Linear genome structures vary significantly not be detected in regular transcriptome ana­ between individuals, even in the healthy popu- lysis. Classifying components of gene regulation lation [92,93]. Since genomes appear organized mechanisms as biomarkers is appropriate since into dynamic 3D networks of physical contacts, they can be the underlying cause of human dis- spatial chromatin organization is likely to be ease. As chromatin structure can regulate gene shaped by the linear arrangement of genes and expression, CCSs may simply represent struc- regulatory DNA elements. Consequently, the tural signatures of gene expression. Also, since nuclear environment of individual genes may CCSs are manifestations of gene expression pro- be affected by linear sequence variations and files, they offer the advantage of identifying gene lead to improper gene expression. For this rea- expression states regardless of the nature of the son, understanding how the human genome is

Executive summary Overview ƒƒ Recent advances in genomics revealed tremendous human genome sequence diversity in the healthy population. ƒƒ Genome variability may impart disease complexity and differential drug response, or disease susceptibility. ƒƒ Sequence variation can alter gene expression either by targeting genes and regulatory DNA elements directly or by affecting genome organization. ƒƒ Understanding genome organization is crucial for optimal personalized therapies. Summary of genome organization ƒƒ There are three hierarchical levels of genome organization: - The linear arrangement of genes and DNA elements along chromosomes - The association of genomic DNA with proteins and formation of chromatin - The packaging and organization of chromatin in the nuclear space ƒƒ Genome organization in the nuclear space is not random – it is functional. ƒƒ Spatial genome organization is an important mechanism to regulate gene expression. Measuring spatial genome organization ƒƒ Genome architecture can be studied with several approaches. ƒƒ Available techniques vary in resolution, throughput, genomic coverage and cost. ƒƒ Current methodologies complement each other and should be combined for discovery. ƒƒ Chromosome conformation capture (3C)-related techniques offer an unprecedented high-resolution view of our genome. ƒƒ Specific DNA contacts can be measured with 3C-derived approaches. Chromatin conformation signatures ƒƒ Chromatin conformation signatures (CCSs) are collections of DNA contacts associated with specific gene expression states. ƒƒ There are four types of CCSs: - Local chromatin organization - Intrachromosomal contacts - Interchromosomal contacts - Genomic environment ƒƒ CCSs can be complex and include several types of DNA contacts. Genome variability, CCSs & human health ƒƒ Variations in the human genetic code are abundant and can be linked to disease. ƒƒ Gene expression patterns may be altered by sequence variations. ƒƒ Any type of variation may affect gene expression by altering genome organization. ƒƒ Sequence variations may bear distinct CCSs. ƒƒ CCSs help explain disease complexity and susceptibility, or differential drug response. CCSs as ideal biomarkers ƒƒ CCSs may integrate multiple variations into single signatures. ƒƒ CCss may identify gene expression states regardless of mechanisms. ƒƒ CCSs uniquely identify 3D mechanisms of gene regulation.

future science group www.futuremedicine.com 625 Review Crutchley, Wang, Ferraiuolo & Dostie

structured in vivo is key to understanding tran- possible. However, as with any technology, the scription regulation and other processes such as cost will decrease and sequencing will eventu- imprinting and DNA replication. ally be affordable enough for CCS screening Chromosome conformation capture-related in large cohorts. Nonetheless, mapping physi- techniques offer an unprecedented view of cal networks will unquestionably continue to genome organization at the ultrastructural level. uncover the relationship between CCSs and The recent development of the ChIA-PET and genome function, and provide a better under- Hi-C technologies has promoted chromosome standing of human disease pathogenesis. conformation research to an entirely higher level. Constructions of long-range interaction Financial & competing interests disclosure maps of human genomes have already begun The authors have no relevant affiliations or financial to emerge from these technologies. For the first involvement with any organization or entity with a finan- time, a more ‘top-down’ approach to studying cial interest in or financial conflict with the subject matter the role of genome architecture in the regulation or materials discussed in the manuscript. This includes of genes appears feasible. Currently, the main employment, consultancies, honoraria, stock ownership or issue with the Hi-C and ChIA-PET technology options, expert testimony, grants or patents received or is the high cost of high-throughput sequencing. ­pending, or royalties. As such, high-throughput spatial reconstruction No writing assistance was utilized in the production of of human genome libraries is not immediately this manuscript.

Bibliography 11 Gondor A, Ohlsson R: Chromosome crosstalk interactions formed by the TNF gene in three dimensions. Nature 461(7261), promoter and two distal enhancers. Proc. Natl Papers of special note have been highlighted as: 212–217 (2009). Acad. Sci. USA 104(43), 16850–16855 n of interest (2007). nn of considerable interest 12 Spilianakis CG, Flavell RA: Long-range intrachromosomal interactions in the T helper 20 Ju Z, Volpi SA, Hassan R et al.: Evidence 1 Pinto D, Marshall C, Feuk L, Scherer SW: type 2 cytokine locus. Nat. Immunol. 5(10), for physical interaction between the Copy-number variation in control population 1017–1027 (2004). immunoglobulin heavy chain variable region cohorts. Hum. Mol. Genet. 16(Spec No. 2), and the 3´ regulatory region. J. Biol. Chem. R168–R173 (2007). 13 Tolhuis B, Palstra RJ, Splinter E, Grosveld F, De Laat W: Looping and interaction between 282(48), 35169–35178 (2007). 2 Lee C, Scherer SW: The clinical context of hypersensitive sites in the active b-globin 21 D’haene B, Attanasio C, Beysen D et al.: copy number variation in the human genome. locus. Mol. Cell 10(6), 1453–1465 (2002). Disease-causing 7.4 kb cis-regulatory deletion Expert Rev. Mol. Med. 12, E8 (2010). 14 Ling JQ, Li T, Hu JF et al.: CTCF mediates disrupting conserved non-coding sequences 3 Fraser P, Bickmore W: Nuclear organization interchromosomal colocalization between and their interaction with the foxl2 promotor: of the genome and the potential for gene IGF2/H19 and WSB1/NF1. Science implications for mutation screening. PLoS regulation. Nature 447(7143), 413–417 312(5771), 269–272 (2006). Genet. 5(6), e1000522 (2009). (2007). 15 Liu Z, Garrard WT: Long-range interactions 22 Petrov A, Allinne J, Pirozhkova I, Laoudj D, 4 Babu MM, Janga SC, De Santiago I, between three transcriptional enhancers, Lipinski M, Vassetzky YS: A nuclear matrix Pombo A: Eukaryotic gene regulation in three active Vk gene promoters, and a 3´ boundary attachment site in the 4q35 locus has an dimensions and its impact on genome sequence spanning 46 kilobases. Mol. Cell enhancer-blocking activity in vivo: evolution. Curr. Opin. Genet. Dev. 18(6), Biol. 25(8), 3220–3231 (2005). implications for the facio-scapulo-humeral 571–582 (2008). dystrophy. Genome Res. 18(1), 39–45 (2008). 16 Murrell A, Heeson S, Reik W: Interaction 5 Cook PR: A model for all genomes: the role of between differentially methylated regions 23 Dmitriev P, Lipinski M, Vassetzky YS: transcription factories. J. Mol. Biol. 395(1), partitions the imprinted genes IGF2 and H19 Pearls in the junk: dissecting the molecular 1–10 (2010). into parent-specific chromatin loops. pathogenesis of facioscapulohumeral muscular 6 Berger SL: The complex language of Nat. Genet. 36(8), 889–893 (2004). dystrophy. Neuromuscul. Disord. 19(1), 17–20 chromatin regulation during transcription. (2009). 17 Lanzuolo C, Roure V, Dekker J, Bantignies F, Nature 447(7143), 407–412 (2007). Orlando V: Polycomb response elements 24 Sexton T, Bantignies F, Cavalli G: Genomic 7 Kouzarides T: Chromatin modifications and mediate the formation of chromosome interactions: Chromatin loops and gene their function. Cell 128(4), 693–705 (2007). higher-order structures in the bithorax meeting points in transcriptional regulation. 8 Tremethick DJ: Higher-order structures of complex. Nat. Cell Biol. 9(10), 1167–1174 Semin. Cell. Dev. Biol. 20(7), 849–855 (2009). chromatin: the elusive 30 nm fiber. Cell (2007). 25 Lasalle JM, Lalande M: Homologous 128(4), 651–654 (2007). 18 Jiang H, Peterlin BM: Differential chromatin association of oppositely imprinted 9 Woodcock CL: Chromatin architecture. looping regulates CD4 expression in chromosomal domains. Science 272(5262), Curr. Opin. Struct. Biol. 16(2), 213–220 immature thymocytes. Mol. Cell Biol. 28(3), 725–728 (1996). (2006). 907–912 (2008). 26 Teller K, Solovei I, Buiting K, Horsthemke B, 10 West AG, Fraser P: Remote control of gene 19 Tsytsykova AV, Rajsbaum R, Falvo JV, Cremer T: Maintenance of imprinting and transcription. Hum. Mol. Genet. Ligeiro F, Neely SR, Goldfeld AE: nuclear architecture in cycling cells. Proc. Natl 14(Spec No. 1), R101–R111 (2005). Activation-dependent intrachromosomal Acad. Sci. USA 104(38), 14970–14975 (2007).

626 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review

27 Hu Q, Kwon YS, Nunez E et al.: Enhancing nn Along with references [40] and [41] describes 50 Fraser J, Rousseau M, Shenker S et al.: nuclear receptor-induced transcription a variation of the 4C approach for the Chromatin conformation signatures of requires nuclear motor and LSD1-dependent first time. cellular differentiation. Genome Biol. 10(4), gene networking in interchromatin granules. R37 (2009). 40 Hagege H, Klous P, Braem C et al.: Proc. Natl Acad. Sci. USA 105(49), 19199– Quantitative analysis of chromosome nn First study using the ChIA-PET approach. 19204 (2008). conformation capture assays (3C-qPCR). 51 Van Berkum NL, Dekker J: 28 Kocanova S, Kerr EA, Rafique S et al.: Nat. Protoc. 2(7), 1722–1733 (2007). Determining spatial chromatin Activation of estrogen-responsive genes does nn Along with references [39] and [41] describes organization of large genomic regions using not require their nuclear co-localization. PLoS a variation of the 4C approach for the 5C technology. Methods Mol. Biol. 567, Genet. 6(4), E1000922 (2010). first time. 189–213 (2009). 29 Osborne CS, Chakalova L, Brown KE et al.: 41 Abou El Hassan M, Bremner R: A rapid 52 Tiwari VK, Cope L, Mcgarvey KM, Ohm JE, Active genes dynamically colocalize to shared simple approach to quantify chromosome Baylin SB: A novel 6C assay uncovers sites of ongoing transcription. Nat. Genet. conformation capture. Nucleic Acids Res. polycomb-mediated higher order chromatin 36(10), 1065–1071 (2004). 37(5), E35 (2009). conformations. Genome Res. 18(7), 1171–1179 30 Schoenfelder S, Sexton T, Chakalova L et al.: (2008). nn Preferential associations between co-regulated Along with references [39] and [40] describes nn Describes the Hi-C technique for the genes reveal a transcriptional interactome in a variation of the 4C approach for the first time. erythroid cells. Nat. Genet. 42(1), 53–61 first time. (2010). 42 Horike S-I, Cai S, Miyano M, Cheng J-F, 53 Tiwari VK, Baylin SB: Combined 3C-chip- Kohwi-Shigematsu T: Loss of silent- cloning (6C) assay: a tool to unravel n Describes the RNA-TRAP technique for the protein-mediated genome architecture. first time. chromatin looping and impaired imprinting of DLX5 in Rett syndrome. Nat. Genet. Cold Spring Harb. Protoc. 2009(3), pdb. 31 Cremer T, Cremer C: Chromosome 37(1), 31–40 (2005). prot5168 (2009). territories, nuclear architecture and gene 54 Fullwood MJ, Ruan Y: Chip-based methods nn Describes the 5C technique for the first time. regulation in mammalian cells. Nat. Rev. for the identification of long-range chromatin 43 Simonis M, Kooren J, De Laat W: Genet. 2(4), 292–301 (2001). interactions. J. Cell. Biochem. 107(1), 30–39 32 Bolzer A, Kreth G, Solovei I et al.: An evaluation of 3C-based methods to (2009). Three-dimensional maps of all chromosomes in capture DNA interactions. Nat. Meth. 4(11), 895–901 (2007). 55 Fullwood MJ, Liu MH, Pan YF et al.: human male fibroblast nuclei and prometaphase An oestrogen-receptor-[AGR]-bound human 44 Zhao Z, Tavoosidana G, Sjolinder M : rosettes. PLoS Biol. 3(5), E157 (2005). et al. chromatin interactome. Nature 462(7269), Circular chromosome conformation capture nn Describes the 3C technique for the 58–64 (2009). (4C) uncovers extensive networks of first time. epigenetically regulated intra- and 56 Fullwood MJ, Han Y, Wei CL, Ruan X, 33 Simonis M, De Laat W: FISH-eyed and interchromosomal interactions. Nat. Genet. Ruan Y: Chromatin interaction analysis using genome-wide views on the spatial organisation 38(11), 1341–1347 (2006). paired-end tag sequencing. Curr. Protoc. Mol. of gene expression. Biochim. Biophys. Acta. Biol. 15, 21–25 (2010). 45 Würtele H, Chartrand P: Genome-wide 1783(11), 2052–2060 (2008). scanning of HOXB1-associated loci in nn Identifies functional interchromosomal 34 Fernandez-Suarez M, Ting AY: Fluorescent mouse ES cells using an open-ended DNA contacts at high resolution for the probes for super-resolution imaging in living chromosome conformation capture first time. cells. Nat. Rev. Mol. Cell. Biol. 9(12), methodology. Chromosome Res. 14(5), 57 Lieberman-Aiden E, Van Berkum NL, 929–943 (2008). 477–495 (2006). Williams L et al.: Comprehensive mapping of 35 Carter D, Chakalova L, Osborne CS, 46 Simonis M, Klous P, Splinter E et al.: long-range interactions reveals folding Dai Y-F, Fraser P: Long-range chromatin Nuclear organization of active and inactive principles of the human genome. Science regulatory interactions in vivo. Nat. Genet. chromatin domains uncovered by 326(5950), 289–293 (2009). 32(4), 623–626 (2002). chromosome conformation capture-on-chip 58 Fullwood MJ, Wei C-L, Liu ET, Ruan Y: 36 Bulger M, Groudine M: Trapping enhancer (4C). Nat. Genet. 38(11), 1348–1354 (2006). Next-generation DNA sequencing of function. Nat. Genet. 32(4), 555–556 47 Dostie J, Richmond TA, Arnaout RA et al.: paired-end tags (PET) for transcriptome and (2002). Chromosome conformation capture carbon genome analyses. Genome Res. 19(4), 521–532 37 Dekker J, Rippe K, Dekker M, Kleckner N: copy (5C): a massively parallel solution for (2009). Capturing chromosome conformation. Science mapping interactions between genomic 59 Dekker J: The 3C’s of chromosome 295(5558), 1306–1311 (2002). elements. Genome Res. 16(10), 1299–1309 conformation capture: controls, 38 Miele A, Dekker J: Mapping cis- and (2006). controls, controls. Nat. Meth. 3(1), 17–21 trans-chromatin interaction networks using 48 Dostie J, Zhan Y, Dekker J: High-throughput (2006). chromosome conformation capture (3C). mapping of chromatin interactions using 5C 60 Hakim O, John S, Ling JQ, Biddie SC, Methods Mol. Biol. 464, 105–121 (2009). technology. In: Current Protocols in Molecular Hoffman AR, Hager GL: Glucocorticoid 39 Miele A, Gheldof N, Tabuchi TM, Dostie J, Biology. Ausubel FM, Brent R, Kingston RR receptor activation of the CIZ1-LCN2 locus Dekker J: Mapping chromatin interactions by et al. (Eds). John Wiley & Sons, Hoboken, by long range interactions. J. Biol. Chem. chromosome conformation capture (3C). NJ, USA (2007). 284(10), 6048–6052 (2009). In: Current Protocols in Molecular Biology. 49 Dostie J, Dekker J: Mapping networks of 61 Spilianakis CG, Lalioti MD, Town T, Ausubel FM, Brent R, Kingston RE et al. physical interactions between genomic Lee GR, Flavell RA: Interchromosomal (Eds). John Wiley & Sons, Hoboken, NJ, elements using 5C technology. Nat. Protoc. associations between alternatively expressed USA (2006). 2(4), 988–1002 (2007). loci. Nature. 435(7042), 637–645 (2005).

future science group www.futuremedicine.com 627 Review Crutchley, Wang, Ferraiuolo & Dostie

62 Lomvardas S, Barnea G, Pisapia DJ, 76 Ferraiuolo MA, Rousseau M, Miyamoto C 87 Notani D, Gottimukkala KP, Jayani RS et al.: Mendelsohn M, Kirkland J, Axel R: et al.: The three-dimensional architecture of Global regulator SATB1 recruits b‑catenin Interchromosomal interactions and olfactory Hox cluster silencing. Nucleic Acids Res. and regulates T(H)2 differentiation in receptor choice. Cell 126(2), 403–413 (2010) (In Press). Wnt-dependent manner. PLoS Biol. 8(1), (2006). 77 Filippova GN, Fagerlie S, Klenova EM et al.: E1000296 (2010). 63 Sexton T, Schober H, Fraser P, Gasser SM: An exceptionally conserved transcriptional 88 Pomerantz MM, Ahmadiyeh N, Jia L et al.: Gene regulation through nuclear repressor, CTCF, employs different The 8q24 cancer risk variant rs6983267 organization. Nat. Struct. Mol. Biol. 14(11), combinations of zinc fingers to bind diverged shows long-range interaction with MYC in 1049–1055 (2007). promoter sequences of avian and mammalian colorectal cancer. Nat. Genet. 41(8), 882–884 64 Osborne CS, Chakalova L, Mitchell JA et al.: c-myc oncogenes. Mol. Cell Biol. 16(6), (2009). Myc dynamically and preferentially relocates 2802–2813 (1996). 89 Pirozhkova I, Petrov A, Dmitriev P, to a transcription factory occupied by Igh. 78 Phillips JE, Corces VG: CTCF: master Laoudj D, Lipinski M, Vassetzky Y: PLoS Biol. 5(8), e192 (2007). weaver of the genome. Cell 137(7), 1194–1211 A functional role for 4qA/B in the structural 65 Trimborn T, Gribnau J, Grosveld F, (2009). rearrangement of the 4q35 region and Fraser P: Mechanisms of developmental 79 Szabo PE, Tang SH, Silva FJ, Tsark WM, in the regulation of FRG1 and ANT1 in control of transcription in the murine a- and Mann JR: Role of CTCF binding sites in the facioscapulohumeral dystrophy. PLoS One b-globin loci. Genes Dev. 13(1), 112–124 IGF2/H19 imprinting control region. 3(10), E3389 (2008). (1999). Mol. Cell Biol. 24(11), 4791–4800 (2004). 90 Conrad DF, Pinto D, Redon R et al.: 66 Palstra RJ, Tolhuis B, Splinter E, Nijmeijer R, 80 Kanduri C, Pant V, Loukinov D et al.: Origins and functional impact of Grosveld F, De Laat W: The b‑globin nuclear Functional association of CTCF with the copy number variation in the human compartment in development and erythroid insulator upstream of the H19 gene is parent genome. Nature 464(7289), 704–712 differentiation. Nat. Genet. 35(2), 190–194 of origin-specific and methylation-sensitive. (2010). (2003). Curr. Biol. 10(14), 853–856 (2000). 91 Feuk L, Carson AR, Scherer SW: Structural 67 Lewis EB: A gene complex controlling 81 Hark AT, Schoenherr CJ, Katz DJ, variation in the human genome. Nat. Rev. segmentation in drosophila. Nature Ingram RS, Levorse JM, Tilghman SM: Genet. 7(2), 85–97 (2006). 276(5688), 565–570 (1978). CTCF mediates methylation-sensitive 92 Feuk L, Marshall CR, Wintle RF, 68 Splinter E, Heath H, Kooren J et al.: enhancer-blocking activity at the Scherer SW: Structural variants: changing the CTCF mediates long-range chromatin H19/IGF2 locus. Nature 405(6785), landscape of chromosomes and design of looping and local histone modification in 486–489 (2000). disease studies. Hum. Mol. Genet. 15(1), the b-globin locus. Genes Dev. 20(17), 82 Bell AC, Felsenfeld G: Methylation of a R57–R66 (2006). 2349–2354 (2006). CTCF-dependent boundary controls 93 Freeman JL, Perry GH, Feuk L et al.: 69 De Laat W, Grosveld F: Spatial organization imprinted expression of the IGF2 gene. Copy number variation: new insights in of gene expression: the active chromatin Nature 405(6785), 482–485 (2000). genome diversity. Genome Res. 16(8), hub. Chromosome Res. 11(5), 447–459 83 Galande S, Purbey PK, Notani D, Kumar PP: 949–961 (2006). (2003). The third dimension of gene regulation: 94 Glessner JT, Wang K, Cai G et al.: 70 Vakoc C, Letting DL, Gheldof N et al.: organization of dynamic chromatin loopscape Autism genome-wide copy number variation Proximity among distant regulatory by SATB1. Curr. Opin. Genet. Dev. 17(5), reveals ubiquitin and neuronal genes. Nature elements at the b-globin locus requires 408–414 (2007). 459(7246), 569–573 (2009). GATA-1 and FOG-1. Mol. Cell 17(3), 84 Cai S, Lee CC, Kohwi-Shigematsu T: SATB1 95 Merikangas AK, Corvin AP, Gallagher L: 453–462 (2005). packages densely looped, transcriptionally Copy-number variants in neurodevelopmental 71 Krumlauf R: Hox genes in vertebrate active chromatin for coordinated expression disorders: promises and challenges. development. Cell 78(2), 191–201 (1994). of cytokine genes. Nat. Genet. 38(11), Trends Genet. 25(12), 536–544 (2009). 1278–1288 (2006). 72 Kmita M, Duboule D: Organizing axes in 96 Bassett AS, Marshall CR, Lionel AC, Chow time and space; 25 years of colinear tinkering. nn Demonstrates how a single nucleotide EW, Scherer SW: Copy number variations Science 301(5631), 331–333 (2003). polymorphism can change chromatin and risk for schizophrenia in 22q11.2 deletion structure and alter gene expression patterns syndrome. Hum. Mol. Genet. 17(24), 73 Duboule D, Morata G: Colinearity and 4045–4053 (2008). functional hierarchy among genes of the in a human disease. homeotic complexes. Trends Genet. 10(10), 85 Cai S, Han HJ, Kohwi-Shigematsu T: 97 Han HJ, Russo J, Kohwi Y, 358–364 (1994). Tissue-specific nuclear architecture and gene Kohwi-Shigematsu T: SATB1 reprogrammes gene expression to promote breast tumour 74 Morey C, DA Silva NR, Perry P, expression regulated by SATB1. Nat. Genet. growth and metastasis. Nature 452(7184), Bickmore WA: Nuclear reorganisation and 34(1), 42–51 (2003). 187–193 (2008). chromatin decondensation are conserved, but nn Demonstrates how deletion of polymorphic distinct, mechanisms linked to HOX gene DNA repeats can change chromatin 98 Simonis M, Klous P, Homminga I et al.: High-resolution identification of balanced activation. Development 134(5), 909–919 structure and alter gene expression patterns and complex chromosomal rearrangements by (2007). in a human disease. 4C technology. Nat. Meth. 6(11), 837–842 75 Chambeyron S, Bickmore WA: Chromatin 86 Yasui D, Miyano M, Cai S, Varga-Weisz P, (2009). decondensation and nuclear reorganization of Kohwi-Shigematsu T: SATB1 targets 99 Dostie J, Zhan Y, Dekker J: Chromosome the HOXB locus upon induction of chromatin remodelling to regulate genes over conformation capture carbon copy transcription. Genes Dev. 18(10), 1119–1130 long distances. Nature 419(6907), 641–645 technology. Curr. Protoc. Mol. Biol. (2004). (2002). Chapter 21(Unit 21), 14 (2007).

628 Biomarkers Med. (2010) 4(4) future science group Chromatin conformation signatures: ideal human disease biomarkers? Review

100 Li G, Fullwood M, Xu H et al.: ChIA-PET 103 Yue P, Moult J: Identification and analysis of 107 Mills RE, Luttig CT, Larkins CE et al.: tool for comprehensive chromatin interaction deleterious human SNPs. J. Mol. Biol. 356(5), An initial map of insertion and deletion analysis with paired-end tag sequencing. 1263–1274 (2006). (indel) variation in the human genome. Genome Biol. 11(2), R22 (2010). 104 Kruglyak L, Nickerson DA: Variation is Genome Res. 16(9), 1182–1190 (2006). 101 Chambeyron S, Bickmore WA: Chromatin the spice of life. Nat. Genet. 27(3), 234–236 108 De La Chaux N, Messer P, Arndt P: decondensation and nuclear reorganization of (2001). DNA indels in coding regions reveal selective the HOXB locus upon induction of 105 International HapMap Consortium et al.: constraints on protein evolution in the human transcription. Genes Dev. 18(10), 1119–1130 A second generation human haplotype map of lineage. BMC Evolutionary Biology 7(1), 191 (2004). over 3.1 million SNPs. Nature 449(7164), (2007). 102 Sieben VJ, Marun CSD, Pilarski PM, Kaigala 851–861 (2007). 109 Tuzun E, Sharp AJ, Bailey JA et al.: GV, Pilarski LM, Backhouse CJ: FISH and 106 Wain LV, Armour JAL, Tobin MD: Fine-scale structural variation of the chips: chromosomal analysis on microfluidic Genomic copy number variation, human human genome. Nat. Genet. 37(7), 727–732 platforms. IET Nanobiotechnol. 1(3), 27–35 health, and disease. Lancet 374(9686), (2005). (2007). 340–350 (2009).

future science group www.futuremedicine.com 629