TECHNOLOGY FEATURE THE DARK SIDE OF THE Scientists are uncovering the hidden switches in our genome that dial expression up and down, but much work lies ahead to peel back the many layers of regulation. MEHAU KULYK/SPL MEHAU

The human genome is not packed with ‘junk’ as previously thought, but with regulatory regions that modulate gene activity.

BY KELLY RAE CHI expression in ways that scientists are only of non-coding portions, that drive when and starting to unravel. By uncovering regulatory where are expressed. Scientists have ifteen years ago, scientists celebrated instructions in the genome beyond protein- generated a list of such elements by using bio- the first draft of the sequenced human coding genes, scientists are hoping to yield chemical assays to probe DNA sequences, RNA genome. At the time, they predicted that new ways to understand and treat disease. “It’s transcripts, regulatory proteins bound to DNA Fhumans had between 25,000 and 40,000 genes not overstating to say that ENCODE is as sig- and RNA and epigenetic signatures — the chem- that code for proteins. That estimate has con- nificant for our understanding of the human ical tags on DNA and the proteins packaging it tinued to fall. Humans actually seem to have genome as the original DNA sequencing of the — that also affect . as few as 19,000 such genes1 — a mere 1–2% of human genome,” says cell biologist Bing Ren of So far, the data suggest that there are the genome. The key to our complexity lies in the University of California, San Diego, Insti- hundreds of thousands of functional regions how these genes are regulated by the remain- tute for Genomic Medicine in La Jolla, who is in the human genome whose task is to control ing 99% of our DNA, known as the genome’s a member of the ENCODE team. gene expression: it turns out that much more ‘d a r k m a t t e r ’. Ren is also part of a subsequent consortium space in the human genome is devoted to From efforts such as the massive Encyclo- called the Roadmap Epigenomics Project3. regulating genes than to the genes themselves. pedia of DNA Elements (ENCODE) project2, These two initiatives — both funded by the Scientists are now trying to validate each launched in 2003 by the US National Human US National Institutes of Health (NIH) — aim predicted element experimentally to ascertain Genome Research Institute, it’s clear that copi- to map and predict the existence of elements its function — a mammoth task, but one for ous regulatory elements are at play, tuning gene in the genome, including in the vast stretches which they now have a powerful new tool.

©2016 Mac millan Publishers Li mited, part of Spri nger Nature. All r13i gh tOCTOBERs reserved. 2016 | VOL 538 | NATURE | 275

TECHNOLOGY GENE REGULATION

Since the gene-editing technique at a time and observing the consequences in a Sanjana, who conducted the research as a for- CRISPR–Cas9 entered the scientific arena, cell assay or animal model. This is less easy to mer postdoc in Zhang’s group and now works the speed at which researchers can test func- do for the non-coding genome because many at the New York Genome Center at New York tional elements in the non-coding regions has of the elements are redundant, and so deleting University. ramped up. But it is still a daunting endeavour: just one might not alter gene expression or pro- Other CRISPR–Cas9 screening data have more than 3 million regulatory DNA regions, duce an obvious change. “It’s a huge challenge challenged ENCODE predictions. Richard thought to contain some 15 million binding that we have at the moment to really distin- Sherwood of Harvard Medical School in sites for regulatory proteins called transcription guish between functional and non-functional Boston, Massachusetts, and his collaborators factors, control gene expression in the human elements detected by ENCODE,” says geneticist created an approach called a multiplexed edit- cell types studied thus far. About 150,000 may Ran Elkon of Tel Aviv University in Israel. ing regulatory assay6 to screen for non-coding be active in any given cell type. CRISPR–Cas9 is particularly accelerating regions that might influence gene expression in These could be crucial to understanding scientists’ exploration of enhancers. The tech- well-known mouse embryonic stem-cell lines. disease, because most single-nucleotide nology enables scientists to alter large numbers Using this technique, they obtained quantita- changes associated with common diseases fall of regulatory elements in a high-throughput tive information about the extent to which in regions outside protein-coding genes, and way, using libraries of RNA guide fragments these regulatory regions might contribute to they often overlap with DNA sites highlighted that target and disrupt different regions in the gene expression. Some of their results are dis- by ENCODE as having regulatory function. genome, to observe the outcome. Not only is the cordant with regions flagged by ENCODE as Certain regulatory elements that normally method relatively fast, but researchers can also potential enhancers because, when mutated, drive gene expression are thought to underpin run the assays directly in human cells. these areas did not affect gene expression. the mechanism of cancer, for example. Dis- Experiments of this type have already Moreover, the researchers also discovered rupting a gene’s regulatory elements, the data turned up some unexpected findings. While mysterious sections that they dubbed suggest, could thus have as drastic an impact a postdoc working with cancer biologist ‘unmarked regulatory elements’, or UREs, that on cell function as disrupting the gene itself. Reuven Agami at the Netherlands Cancer do not fit into any category of functional ele- Using CRISPR–Cas9, scientists now have an Institute in Amsterdam, Elkon was involved ments. The team is currently exploring how opportunity to test that premise by introducing in performing the first screen of regulatory widespread these UREs might be in the genome. targeted mutations into non-coding sequences elements using the advanced editing system4. This new type of assay, along with other gene- and observing the consequences. The CRISPR–Cas9 approach enabled them to editing-based screens, will play an increasingly test individually those enhancers predicted by important part in the validation of ENCODE DECODING A COMPLEX WORLD ENCODE to bind a transcription factor called candidates, says Sherwood. How much of DNA’s dark matter has a function p53. Interest in p53 is high because the protein in gene control is still up for debate. In 2012, is a known tumour suppressor that is mutated TECHNIQUE TWEAKS ENCODE scientists proposed on the basis of in more than 50% of human tumours. The Investigators working on the ENCODE and biochemical-assay predictions that 80% of the researchers were able to pinpoint two enhanc- Roadmap projects have relied mostly on a bio- non-coding genome has a function2. But this ers from more than a thousand genomic sites chemical technique called DNase-seq, which figure soon proved to be an overestimate as that affect p53’s tumour-suppressing func- sequences and maps all exposed regions of the researchers narrowed the definition of ‘func- tion, located near the p53-encoding gene. A genome. In these sections, the DNA is relaxed tion’ and devised experimental methods, such predicted third enhancer has yet to be located instead of tightly coiled around histones, and as reporter assays, to test these functions. “The because it is far from any gene, let alone one thus is more likely to facilitate transcription- number still isn’t fully known”, in part because related to p53. factor binding that drives gene activation. the mapping isn’t complete, says Michael In a separate screen, the group targeted bind- By mapping these areas, investigators can Snyder, a geneticist at Stanford University in ing sites for oestrogen receptor-α — which is pinpoint candidate enhancers, promoters, California and a member of ENCODE. “Most implicated in breast cancer — and identi- silencers, insulators and other regulatory ele- people would say between 10% and 20% of the fied three enhancer sequences that influence ments in the non-coding genome (see ‘Spot the [non-coding] genome is likely to have a func- tumour growth; these elements could thus have regulators’). tion where, if you disrupt it, you will affect a role in the develop- Another method, ATAC-seq, detects and something.” ment of resistance to “It’s a huge sequences sites in the chromatin that are acces- But regulatory elements have a bewildering breast-cancer therapy. challenge to sible to the transposase used for the array of functions and forms, which makes At the Broad Insti- distinguish assay. Both DNase-seq and ATAC-seq produce tackling them a formidable challenge. Even the tute of MIT and Har- between a genome-wide view of regions of open chro- best-known types, such as spots in the genome vard in Cambridge, functional and matin. According to the researchers, because known as promoters, which lie next to a gene Massachusetts, bio- non-functional such epigenomic profiles can map the extent to where transcription begins, and enhancers engineer Feng Zhang which genes are activated in certain cell types, — regions that when bound by specific and his group also elements.” they could be useful for clinical decision- transcription factors alter the likelihood used CRISPR–Cas9 to making, and ATAC-seq is fast enough for this of a gene being read — are hard to study. In identify genes essential to the survival of can- purpose7. Many, however, consider a tech- addition to the sheer number of these sites, cer cells. Using a melanoma model, they first nique known as chromatin immunoprecipi- estimated at 15 million, enhancers may be screened around 18,000 genes in human cells tation (ChIP)-seq to be the most reliable for positioned thousands of base pairs away from to pinpoint ones that might underlie resistance this purpose because it is the only one that can the gene that they control. This makes it tough to the melanoma drug vemurafenib. Then, in identify all potential binding sites for a given to predict where their target genes are located a follow-up study published last month5, they transcription factor. and what they do. described a new screen that identified regula- Even so, biochemical assays can only hint at Thus far, ENCODE and Roadmap have tory regions on either side of several resistance function. CRISPR–Cas9 cell screens, by con- offered up important clues, but the real proof genes. Their findings fit well with ENCODE trast, are more concrete because scientists can that these predicted regulatory elements actu- data that predict regulatory regions at these introduce a mutation or deletion at a particular ally do something comes from a functional test. locations — and they also reveal new functional site in the genome and observe how it influences For genes, this mostly entails deleting them one elements, says molecular biologist Neville gene expression. The disadvantage is that these

276 | NATURE | VOL 538 | 13 OCTOBER©2016 2016Mac mil l|a nCORRECTEDPublishers Li mited ,24par tOCTOBERof Spri nger N a2016ture. All ri ghts reserved.

ADAPTED FROM E. KHURANA ET AL. NATURE REV. GENET. 17, 93–108 (2016). specific networksspecific of gene-regulatory elements model computational Baltimore, Maryland. ofMichael Johns Beer Hopkins University in biologist computational says genomes, worm is harder inhuman genomes than inyeast or which enhancers are active inagiven context tion. But even with algorithms, predicting which researchers can probe then for func can predict transcription-factor binding sites, pret biochemical mapping data. Algorithms providing scientists with smart ways to inter tive power, says. Sherwood machine-learning tool to improve its predic have done, been data their fed could into be a OnceRen. enough of kinds of these screens bydicted biochemical those signatures,” says mainly as atool to validate functions pre gene-editing approaches up. this scale will jana says —although he is optimistic that future screens would barely cover asingle page, San stoy’s classic novel sented, for example, by three copies Tol of Leo genomefull of 3billion pairs were base repre tests cover portions smaller of genome. the If the They trainedThey their open-source algorithm, extent incomplex are they perturbed diseases. are operating in a given and cell type to what Beer and his collaboratorsBeer have developed a New computational tools are already “In CRISPR short serve the term, Ithink will Human and thusmusthavearole ingeneexpression. di erent speciestoreveal regulatory elementsthatare conserved Evolutionary conservation (yellowpeaks)alignssequencesfrom histone-modication ortranscription-factor (TF)-bindingassays. peaks) orcommontoall(blue)are identiedbyopen-chromatin, Regulatory elementsactiveindi erent tissues(red, brown andgreen SPOT THEREGULATORS between species. expression bycombiningresults from biochemicalassayswithevolutionarycomparisons Scientists canidentifyfunctionalregions intheDNAthatare activeinmodulatinggene A Chimp Mouse C Heart Brain Liver T T A War and Peace G G T 8 to predict which tissue- TT GG TG A G G (1869), such (1869), such ©

2 ABT1 0 T A A 1 6

------EP300

M a c essentially a regulatory or epigenetic program approaches to cancer treatment. “Cancer is it remains unknown. It open up could also new cancertrace provenance in patients for whom ers possibility inthe of using epigenomic data to informative about tumour origin, which ush Consequently, DNA the sequence can be genomic of profile to each cell. type specific density in atumour by is defined epi the DNA-repair enzymes. open regions the because by accessed can be rather ones than exposed in the —possibly clustertype ininaccessible chromatin regions showed Washington inSeattle and his collaborators Stamatoyannopoulos of University the of A study published last year by geneticist John scientists’ about thinking how cancers arise. genome Roadmap consortium are shifting datathe that have streamed infrom Epi the cell multiplication, death or senescence. But reveal simple-to-measure outcomes, such as neuropscyhiatric disorder —cancer lines cell condition to study at level cell than, the say, a ments cancer because and is asimpler disease to probe ele functional between links the followed by mouse ENCODE data in2014. lines using gene data from ENCODE in 2012, deltaSVM, oncalled human lymphoblastoid cell m i A2M l l The scientists found also that mutation Scientists have on cancer initially focused a n

ABHD11 P u b 9 l that mutations inagiven cancer cell i s h ESR1 e r s

L i B m TF i Compilation ofallregulatory for eachtissue. interactions thatisunique network ofregulatory their targetgenesbuildsa elements’ connectionsto t di erent assays. their targetgenebyusing control canbematchedto far from thegenesthat they Regulatory elementslocated e d ,

Target gene TF p a r t

o TF f

S p r i n g TF e r

N a t TF u r e .

A l l

r 13 OCTOBER 2016 | VOL 2016 | NATURE 538 | OCTOBER 13 | 277 i g - - - ­ h

t s

r e important for memory formation memory for important RNA influences transcription the of agene showed inrat neurons that an extracoding of Alabama at Birmingham and his colleagues neuroscientist Jeremy Day of University the have therapeutic potential. year, this Earlier can influence silencing way, inagene-specific dubbed ‘extracoding RNAs’, and they because groups genes. elements to silence These are methyltransferase 1, which adds methyl turned on or off by blocking DNAthe tors that to control seem agene whether is discovered Medical immunologist School Daniel Tenen screen. And ateam of scientists by led Harvard pectedly, such inSherwood’s as UREs the After regulatory crop all, still signals up unex genomethe that existing assays have missed. were previously not imagined to exist.” patternsthese now are starting to come out that “As we’ve analysed lots of cancer genomes, of all genomic instability,” says Stamatoyannopoulos. that program is development the of genetic and that is superimposed on acell, and result the of 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. based in Cary, North Carolina. Chi Kelly Rae more,” says Stamatoyannopoulos. ter microscope, we wouldn’t sequencing be any disruptive. couldtechnology be “If we had abet change inreal markers. using time specific This a large to watch scale state the of genome the roll out high-resolution live-cell imaging on But looking ahead, researchers might able be to is technological —the still engine of ENCODE. influences and health disease. picturethe of how agiven regulatory element element. That knowledge help in will to fill to predict target the genes regulatory for every ‘4D Nucleome’ project, for instance, which aims genes. The NIHCommon Fund the has begun key to predicting be will an element’s target close contact with their regulatory elements, and of 3Dthe folding that positions genes in standing of how DNA is packaged into a cell, Stamatoyannopoulos says. Aspatial under to cover most of regulatory the DNA by 2020, non-coding genome space inthe and expects s e

r It’s possible that there are elements still in Next-generation sequencing —and has been The ENCODE continue team will to map the

(2013). Y. &Greenleaf, W.J. (2016). (2016). (2016). 1045–1048 (2010). (2012). (2014). Savell, K.E. Di Ruscio,A. Polak, P. Lee, D. Buenrostro, J.D.,Giresi, P. G.,Zaba,L.C.,Chang,H. Rajagopal, N. Sanjana, N.E. Korkmaz, G. Bernstein, B.E. ENCODE Project Consortium Ezkurdia, I. v e d . GENE REGULATION et al.NatureGenet. et al.Nature 10 et al. et al.NatureCommun. apotential class new of regula et al.NatureBiotechnol. et al.Nature et al.NatureBiotechnol. is afreelance is science writer et al.Science et al.NatureBiotechnol.

Hum. Mol.Genet. Nature Meth.

518,

503, 360–364(2015).

47,

353, Nature 955–961(2015). 371–376(2013). 1545–1549 1545–1549

TECHNOLOGY 10,

23, 7,

12091(2016).

34, 1213–1218 1213–1218 489, ■

5866–5878 5866–5878 11 34, . 28, 192–198 192–198 167–174 167–174 57–74 57–74

- - - - -