Genome Sequence of the Brown Norway Rat Yields Insights Into Mammalian Evolution
Total Page:16
File Type:pdf, Size:1020Kb
articles Genome sequence of the Brown Norway rat yields insights into mammalian evolution Rat Genome Sequencing Project Consortium* *Lists of participants and affiliations appear at the end of the paper ........................................................................................................................................................................................................................... The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality ‘draft’ covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruc- tion of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage- independent evolutionary events such as expansion of gene families, orthology relations and protein evolution. Darwin believed that “natural selection will always act very slowly, primate: (1) These rodent genomic changes include approximately often only at long intervals of time”1. The consequences of evolution 250 large rearrangements between a hypothetical murid ancestor over timescales of approximately 1,000 millions of years (Myr) and and human, approximately 50 from the murid ancestor to rat, and 75 Myr were investigated in publications comparing the human about the same from the murid ancestor to mouse. (2) A threefold- with invertebrate and mouse genomes, respectively2,3. Here we higher rate of base substitution in neutral DNA is found along the describe changes in mammalian genomes that occurred in a shorter rodent lineage when compared with the human lineage, with the time interval, approximately 12–24 Myr (refs 4, 5) since the com- rate on the rat branch 5–10% higher than along the mouse branch. mon ancestor of rat and mouse. (3) Microdeletions occur at an approximately twofold-higher rate The comparison of these genomes has produced a number of than microinsertions in both rat and mouse branches. insights: † A strong correlation exists between local rates of microinsertions † The rat genome (2.75 gigabases, Gb) is smaller than the human and microdeletions, transposable element insertion, and nucleotide (2.9 Gb) but appears larger than the mouse (initially 2.5 Gb (ref. 3) substitutions since divergence of rat and mouse, even though these but given as 2.6 Gb in NCBI build 32, see http://www.ncbi.nlm. events occurred independently in the two lineages. nih.gov/genome/seq/NCBIContigInfo.html). † The rat, mouse and human genomes encode similar numbers of Background genes. The majority have persisted without deletion or duplication since the last common ancestor. Intronic structures are well History of the rat conserved. The rat, hated and loved at once, is both scourge and servant to † Some genes found in rat, but not mouse, arose through expansion mankind. The “Devil’s Lapdog” is the first sign in the Chinese of gene families. These include genes producing pheromones, or zodiac and traditionally carries the Hindu god Ganesh6. Rats are a involved in immunity, chemosensation, detoxification or reservoir of pathogens, known to carry over 70 diseases. They are proteolysis. involved in the transmission of infectious diseases to man, including † Almost all human genes known to be associated with disease have cholera, bubonic plague, typhus, leptospirosis, cowpox and hanta- orthologues in the rat genome but their rates of synonymous virus infections. The rat remains a major pest, contributing to substitution are significantly different from the remaining genes. famine with other rodents by eating around one-fifth of the world’s † About 3% of the rat genome is in large segmental duplications, a food harvest. fraction intermediate between mouse (1–2%) and human (5–6%). Paradoxically, the rat’s contribution to human health cannot be These occur predominantly in pericentromeric regions. Recent overestimated, from testing new drugs, to understanding essential expansions of major gene families are due to these genomic nutrients, to increasing knowledge of the pathobiology of human duplications. disease. In many parts of the world the rat remains a source of † The eutherian core of the rat genome—that is, bases that align meat. orthologously to mouse and human—comprises a billion nucleo- The laboratory rat (R. norvegicus) originated in central Asia and tides (,40% of the euchromatic rat genome) and contains the vast its success at spreading throughout the world can be directly majority of exons and known regulatory elements (1–2% of the attributed to its relationship with humans7.J.Berkenhout,in genome). A portion of this core constituting 5–6% of the genome his 1769 treatise Outline of the Natural History of Great Britain, appears to be under selective constraint in rodents and primates, mistakenly took it to be from Norway and used R. norvegicus while the remainder appears to be evolving neutrally. Berkenhout in the first formal Linnaean description of the species. † Approximately 30% of the rat genome aligns only with mouse, a Whereas the black rat (Rattus rattus) was part of the European considerable portion of which is rodent-specific repeats. Of the landscape from at least the third century AD and is the species non-aligning portion, at least half is rat-specific repeats. associated with the spread of bubonic plague, R. norvegicus probably † More genomic changes occurred in the rodent lineages than the originated in northern China and migrated to Europe somewhere 493 NATURE | VOL 428 | 1 APRIL 2004 | www.nature.com/nature © 2004 Nature Publishing Group articles around the eighteenth century8. They may have entered Europe BCM-HGSC, analysis was performed by an international team, after an earthquake in 1727 by swimming the Volga river. representing over 20 groups in six countries and relying largely on gene and protein predictions produced by Ensembl. The rat in research R. norvegicus was the first mammalian species to be domesticated Determination of the genome sequence for scientific research, with work dating to before 1828 (ref. 9). The first recorded breeding colony for rats was established in 1856 Atlas and the ‘combined’ sequencing strategy (ref. 9). Rat genetics had a surprisingly early start. The first studies Despite progress in assembling draft sequences2,3,39–44 the question by Crampe from 1877 to 1885 focused on the inheritance of coat of which method produces the highest-quality products is unre- colour10. Following the rediscovery of Mendel’s laws at the turn of solved. A significant issue is the choice between logistically simpler the century, Bateson used these concepts in 1903 to demonstrate WGS approaches versus more complex strategies employing bac- that rat coat colour is a mendelian trait10. The first inbred rat terial artificial chromosome (BAC) clones45–48. In the Public Human strain, PA, was established by King in 1909, the same year that Genome Project2 a BAC by BAC hierarchical approach was used and systematic inbreeding began for the mouse10. Despite this, the provided advantages in assembling difficult parts of the genome. mouse became the dominant model for mammalian geneticists, The draft mouse sequence was a pure WGS approach using the while the rat became the model of choice for physiologists, nutri- ARACHNE assembler3,49,50 but underrepresented duplicated tionists and other biomedical researchers. Nevertheless, there are regions owing to ‘collapses’ in the assembly3,51–53. This limitation over 234 inbred strains of R. norvegicus developed by selective of the mouse draft sequence was tolerable owing to the planned full breeding, which ‘fixes’ natural disease alleles in particular strains use of BAC clones in constructing the final finished sequence. or colonies11. The RGSPC opted to develop a ‘combined’ approach using both Over the past century, the role of the rat in medicine has WGS and BAC sequencing (Fig. 1). In the combined approach, transformed from carrier of contagious diseases to indispensable WGS data are progressively melded with light sequence coverage of tool in experimental medicine and drug development. Current individual BACs (BAC skims) to yield intermediate products called examples of use of the rat in human medical research include ‘enriched BACs’ (eBACs). eBACs covering the whole genome are surgery12, transplantation13–15, cancer16,17, diabetes18,19, psychiatric then joined into longer structures (bactigs). Bactigs are joined to disorders20 including behavioural intervention21 and addiction22, form larger structures: superbactigs, then ultrabactigs. During this neural regeneration23,24,wound25,26 and bone healing27,space process other data are introduced, including BAC end sequences, motion sickness28, and cardiovascular disease29–31. In drug develop- DNA fingerprints and other long-range information (genetic mar- ment, the rat is routinely employed both to demonstrate therapeutic kers, syntenic information), but the process is constrained by eBAC efficacy15,32,33 and to assess toxicity of novel therapeutic compounds structures. before