<<

Research summary The problem repeating structure) is methylated, except Mining for a small (73 kb) region that coincides with The first draft sequence of the the location of containing was released more than 20 years the centromeric CENP-A. This the gaps of ago. However, the limitations of available region probably represents the site of the technologies meant that parts of our genetic kinetochore, a macromolecular structure 8 code — including regions corresponding that ensures the accurate segregation of to , and duplicated during division. families — could not be fully We also reconstructed the evolution of sequenced. This information gap has the ape over limited our understanding of the regions’ the past 25 million years by sequencing and Glennis A. Logsdon1 & organization, genetic variation, evolution assembling the orthologous centromeres in Evan E. Eichler1,2,† and roles in complex biological processes the , orangutan and macaque. that are required for life. It has also impeded Our data show that α-satellite HOR advances in the discovery, diagnosis and structures evolved after great apes diverged The first gapless, -to- treatment of human disease. Scientists have from Old World monkeys. In addition, telomere sequence of a human therefore been working towards filling in our comparison of human centromere the gaps in the current draft of the human chromosome 8 haplotypes (that is, of those , chromosome 8, sequence (GRCh38). inherited from a single parent) shows that is complete. Sequencing and there is greater variation in the centromere assembly of the corresponding than in other regions of the genome, The solution and that the centromere mutates two- to centromere in the chimpanzee, fourfold faster than the rest of the genome. orangutan and macaque reveals Advances in long-read sequencing details of its rapid evolution over technologies have enabled the resolution of complex, repeat-rich regions from native Future directions the past 25 million years. DNA1. For example, ultra-long sequence reads from Oxford Nanopore Technologies The first telomere-to-telomere assembly (ONT) and high-fidelity (HiFi) data from of a human autosome sets the stage for Pacific Biosciences (PacBio) have facilitated a complete map of a haploid human the accurate assembly of complex structural genome. The Telomere-to-Telomere (T2T) variants2–4, segmental duplications5–7, Consortium is currently leading an effort centromeres6,8 and the complete human X to do this and is likely to reveal hundreds chromosome9. Many of these assemblies of millions of new bases, aiding the study used DNA from complete hydatidiform of sequence composition, variation, mole (CHM) cell lines (for example, CHM13 function and evolution in the human and CHM1). These have no maternal genome. The next step will be to generate genome (but a duplicated paternal one), complete from normal diploid which avoids the need to assemble both cells. This will require the development of haplotypes of a diploid genome. technologies and computational tools that We developed an assembly method can completely phase and assemble both that leverages the strengths of ultra-long maternal and paternal haplotypes across reads (>100 kilobases) from ONT and HiFi complex structural variant regions of the long reads (~15–25 kb) from PacBio to genome. Such an effort is being led by the resolve all five gaps in the DNA sequence Human Pangenome Reference Consortium of human chromosome 8 from CHM13 (HPRC) and the Structural cells (Fig. 1). Specifically, the ONT reads Variation Consortium (HGSVC)4. The ability were used to build a sequence scaffold that to sequence and assemble diploid genomes This is a summary of: was subsequently replaced with PacBio could pave the way to the production of Logsdon, G. A. et al. The structure, continuous DNA sequences. The result complete genomic maps for patients, and to function and evolution of a complete is a complete, highly accurate (>99.99%) the use of such maps in tailoring treatments human chromosome 8. Nature https://doi. telomere-to-telomere sequence assembly to the genetic variant underlying a disease. org/10.1038/s41586-021-03420-7 (2021). of human chromosome 8. Our assembly Summary author affiliation: contains more than 3.5 million bases that 1Department of Genome Sciences, University were missing from GRCh38 and map to the of Washington School of Medicine, Seattle, centromere, two segmental duplications WA, USA. 2Howard Hughes Medical Institute, and both telomeres. We also identified University of Washington, Seattle, WA, USA. 12 new -coding , and generated †e-mail: [email protected] a chromosome-wide DNA methylation map. The completion of the chromosome 8 Cite this as: centromere sequence provides insight into Nature https://doi.org/10.1038/d41586-021- the biology of the region. We found that the 01095-8 (2021) centromeric α-satellite higher-order repeat Competing interests (HOR) array (that is, units of repeating DNA The authors declare no competing interests. that are arranged in tandem to form a larger

Nature | Published online 14 May 2021 ©2021 Spri nger Nature Li mited. All rights reserved. ©2021 Spri nger Nature Li mited. All rights reserved.

EXPERT OPINION REFERENCES

As well as presenting a blueprint chromosome 8 centromere, including the 1. Logsdon, G. A., Vollger, M. R. & for a method to produce finished methylation status of the α-satellite HOR Eichler, E. E. Nature. Rev. Genet. human chromosome assemblies, arrays, because this predicts the likely position 21, 597–614 (2020). the authors demonstrate the importance of of the functional kinetochore.” Nils Stein, gap-free sequences. The exciting key example Leibniz Institute of Plant Genetics and Crop 2. Chaisson, M. J. P. et al. Nature 517, of this study is the presentation of the entire Plant Research, Seeland, Germany. 608–611 (2015).

3. Jain, M. et al. Nature Biotechnol. 36, FIGURE 338–345 (2018). 4. Ebert, P. et al. Science 372, eabf7117 5 gaps (2021).

5. Vollger, M. R. et al. Ann. Hum. Genet. Chromosome 8 GRCh38 84, 125–140 (2020).

6. Nurk, S. et al. Genome Res. https://doi. org/10.1101/gr.263566.120 (2020). Chromosome 8 CHM13 7. Cheng, H., Concepcion, G. T., Feng, X., No gaps Zhang, H. & Li, H. Nature Methods 18, 170–175 (2021). Strategy to resolve gaps

ONT ONT 8. Bzikadze, A. V. & Pevzner, P. A. Nature

HiFi Biotechnol. 38, 1309–1316 (2020).

9. Miga, K. H. et al. Nature 585, 79–84

ONT HiFi HiFi (2020). Sca old ultra-long ONT reads Replace with PacBio HiFi contigs Integrate and validate

Findings in gapped regions

mRNA Me Me Me Ancient Young Ancient

New sequence New sequence New sequence New genes New epigenetic patterns New evolutionary features

Figure 1 | Resolving the gaps in human chromosome 8. The five gaps present in chromosome 8 in the current human reference genome sequence (GRCh38) were resolved using CHM13 cells, which have a single haploid equivalent of the human genome. Ultra-long reads from Oxford Nanopore Technologies (ONT) were assembled into a sequence scaffold. High-fidelity (HiFi) reads from Pacific Biosciences (PacBio) were then used to generate continuous DNA sequences that replaced the ONT-based scaffold. The HiFi assemblies were integrated into a previously generated assembly of CHM13 chromosome 8 (ref. 4) and validated. Closure of these gaps allowed new genes, epigenetic patterns (such as DNA methylation (red diamonds)) and evolutionary features to be identified in previously unseen regions of the human chromosome 8 sequence.

BEHIND THE PAPER

We have been fortunate to work with an has been key. When we first visualized the amazing team of scientists as part of the sequence organization and evolutionary layers T2T Consortium, HPRC and HGSVC. These of the chromosome 8 centromere, we were consortia bring together some of the most excited to realize that there are many features creative minds in genome sequencing and of the centromere, such as DNA methylation assembly. In addition to advances in long- patterns and organization, that read sequencing technology, the proposal were previously unknown. This has opened up 15 years ago that complete hydatidiform mole new areas in the study of centromere biology DNA (which is devoid of allelic variation2) be and evolution. used to finish sequencing the human genome

Nature | Published online 14 May 2021 ©2021 Spri nger Nature Li mited. All rights reserved. ©2021 Spri nger Nature Li mited. All rights reserved.