The Draft Genome Sequence of the Ferret (Mustela Putorius Furo) Facilitates Study of Human Respiratory Disease
Total Page:16
File Type:pdf, Size:1020Kb
LETTERS OPEN The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory disease Xinxia Peng1, Jessica Alföldi2, Kevin Gori3, Amie J Eisfeld4, Scott R Tyler5,6, Jennifer Tisoncik-Go1, David Brawand2,22, G Lynn Law1, Nives Skunca7,8, Masato Hatta4, David J Gasper4, Sara M Kelly1, Jean Chang1, Matthew J Thomas1, Jeremy Johnson2, Aaron M Berlin2, Marcia Lara2,22, Pamela Russell2,22, Ross Swofford2, Jason Turner-Maier2, Sarah Young2, Thibaut Hourlier9, Bronwen Aken9, Steve Searle9, Xingshen Sun5, Yaling Yi5, M Suresh4, Terrence M Tumpey10, Adam Siepel11, Samantha M Wisely12, Christophe Dessimoz3,13,14, Yoshihiro Kawaoka4,15–18, Bruce W Birren2, Kerstin Lindblad-Toh2,19, Federica Di Palma2,22, John F Engelhardt5,20, Robert E Palermo1 & Michael G Katze1,21 The domestic ferret (Mustela putorius furo) is an important gaps, has a contig N50 size of 44.8 kb, a scaffold N50 size of 9.3 Mb animal model for multiple human respiratory diseases. It is and quality metrics comparable to other genomes sequenced using considered the ‘gold standard’ for modeling human influenza Illumina technology (Table 1 and Supplementary Note). RNA-seq virus infection and transmission1–4. Here we describe the data for annotation were obtained from polyadenylated transcripts 2.41 Gb draft genome assembly of the domestic ferret, using RNA from 24 samples of 21 tissues from male and female fer- constituting 2.28 Gb of sequence plus gaps. We annotated rets, including both developmental and adult tissues (Supplementary 19,910 protein-coding genes on this assembly using RNA- Table 1). The genome assembly was annotated using the Ensembl seq data from 21 ferret tissues. We characterized the ferret gene annotation system6 (Ensembl release 70). Protein-coding gene 7 Nature America, Inc. All rights reserved. America, Inc. Nature host response to two influenza virus infections by RNA-seq models were annotated by combining alignments of Uniprot mam- 4 analysis of 42 ferret samples from influenza time-course data mal and other vertebrate protein sequences and the aforementioned and showed distinct signatures in ferret trachea and lung RNA-seq data. The ferret genome can be viewed as unanchored © 201 tissues specific to 1918 or 2009 human pandemic influenza scaffolds along with the Ensembl genome models in both the virus infections. Using microarray data from 16 ferret samples University of California Santa Cruz and Ensembl genome browser reflecting cystic fibrosis disease progression, we showed that interfaces. We also used the tool liftOver to map the coordinates from npg transcriptional changes in the CFTR-knockout ferret lung the ferret assembly onto the well-finished genome sequence of its reflect pathways of early disease that cannot be readily studied phylogenetic neighbor, the domestic dog (Canis familiaris; V3.1); in human infants with cystic fibrosis disease. this mapping is a useful resource for a genome based on short-read sequencing and facilitates browsing the ferret genome in the surrogate We performed whole-genome sequencing with DNA from an individ- context of dog chromosomes (Supplementary Fig. 1). ual female sable ferret (M. putorius furo) and created a genome assem- Using the annotated ferret protein sequences, we constructed a bly using ALLPATHS-LG5. The draft assembly is 2.41 Gb including highly resolved phylogenetic tree (Supplementary Fig. 2). As expected, 1Department of Microbiology, University of Washington, Seattle, Washington, USA. 2Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA. 3European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK. 4Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin, Madison, Wisconsin, USA. 5Department of Anatomy and Cell Biology, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA. 6Molecular and Cellular Biology Program, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA. 7Department of Computer Science, Swiss Federal Institute of Technology (ETH Zurich), Zurich, Switzerland. 8Swiss Institute of Bioinformatics, Zurich, Switzerland. 9Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. 10Centers for Disease Control and Prevention, Atlanta, Georgia, USA. 11Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA. 12Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, Florida, USA. 13Department of Genetics, Evolution and Environment, University College London, London, UK. 14Department of Computer Science, University College London, London, UK. 15ERATO Infection-Induced Host Responses Project, Japan Science and Technology Agency, Saitama, Japan. 16Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo, Tokyo, Japan. 17Department of Special Pathogens, International Research Center for Infectious Diseases, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan. 18Laboratory of Bioresponses Regulation, Department of Biological Responses, Institute for Virus Research, Kyoto University, Kyoto, Japan. 19Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden. 20Center for Gene Therapy, Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA. 21Washington National Primate Research Center, Seattle, Washington, USA. 22Present addresses: MRC Functional Genomics Unit, University of Oxford, Oxford, UK (D.B.); Biogen Idec, Cambridge, Massachusetts, USA (M.L.); Division of Biology, California Institute of Technology, Pasadena, California, USA (P.R.); Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK (F.D.P.). Correspondence should be addressed to M.G.K. ([email protected]) or R.E.P. ([email protected]). Received 20 May; accepted 22 October; published online 17 November 2014; doi:10.1038/nbt.3079 1250 VOLUME 32 NUMBER 12 DECEMBER 2014 NATURE BIOTECHNOLOGY LETTERS Table 1 Summary details of the ferret genome assembly and sets between human and ferret was highly significant (chi-squared associated Ensembl annotation test P values < 10−186; Supplementary Table 7). To refine the sets of Mustela putorius furo genome tissue-specific genes, we clustered genes with similar expression Assembly MusPutFur1.0 patterns across the 14 tissue samples (7 from ferret and 7 from human) Date June 2011 into 7 disjoint clusters (Fig. 1c and Supplementary Table 8). This clus- Total assembly length 2.41 Gb Total sequence 2.28 Gb tering analysis revealed that many ferret and human genes exhibited Short read coverage 162 × highly concordant, tissue-specific expression patterns. The assignment Organization 7,783 unplaced scaffolds of a gene cluster to a specific tissue was evident by its significantly Scaffold N50 9.3 Mb increased expression in that tissue relative to the rest of the tissues of Ensembl annotation the same species for all comparisons except between skeletal muscle 70.1 Database version and heart (Supplementary Table 9). The similarity between skeletal Date August 2012 Coding genes 19,910 muscle and heart may be attributed to the presence of striated muscle Noncoding genes 3,614 cells in both tissues. The clusters include transcription factors (TFs) Pseudogenes 287 with known tissue specificities in human samples, such as OLIG2 and Gene transcripts 23,963 NEUROD2 (brain), OVOL1 (testis), MYF6 (heart, skeletal muscle), and 9 Supporting evidence: 1.1 × 10 mRNA-seq reads (Paired end 2 × 100; 220 Gb) from lung-specific Iroquois-class homeodomain TFs IRX2, IRX3 and IRX59. 21 individual tissues including developmental stages, and respiratory tissues from 2009 SOIV infected ferrets. The same specificity was seen for the ferret tissues. Sequence com- parisons showed that ferret TFs in brain, skeletal muscle/heart, lung and kidney gene clusters exhibited even greater similarity to human ferret falls within the Caniformia suborder of the Carnivores, as repre- orthologs than the rest of the ferret genome, suggesting strong con- sented by the domestic dog, cat, giant panda and walrus, and the sup- servation of functional regulation (Supplementary Tables 10 and 11). port values are high for most clades (Supplementary Tables 2 and 3 Some genes related to immune and inflammatory functions, including and Methods). Although the clade containing the ferret diverged from TFs associated with Th17 cells (BATF, IRF4, AHR)10, showed increased a common ancestor before the divergence of the rodent and human/ expression in ferret and human lungs, which is likely a consequence of primate lineages, branch lengths in the tree indicate rapid evolution the greater proportion of immune cells in this compartment and the in the rodent clade, which has resulted in less genetic divergence possible presence of bronchus-associated lymphoid tissue. The broad between humans and ferrets than between humans and mice. Indeed, similarity between ferret and human tissue-specific gene expression in comparing protein sequences between the species, we found that for suggests the regulation of gene expression in tissue compartments is 75% of all orthologous triplets, ferret proteins are closer than mouse also highly conserved. proteins to human proteins (Fig. 1a and Supplementary Table 4). For Ferrets are frequently used as a model for human influenza virus example, the ferret cystic fibrosis transmembrane conductance regu- infection,