Genome 2Fractional WGD 2412.6 Γ 1595.2 ): 0.17 0.51 Ε ): 0.64 • 233.6 136.8 0.02
Total Page:16
File Type:pdf, Size:1020Kb
Where does biological order come from? Gavin Conant Bioinformatics Research Center Biological Sciences Program in Genetics [email protected] conantlab.org Tracking flipprob.:0.0081 Fixation rate( preservationrate( Genome 2fractional WGD 2412.6 γ 1595.2 ): 0.17 0.51 ε ): 0.64 • 233.6 136.8 0.02 0.09 133.0 84.8 0.08 105.7 0.06 31.2 187.4 35.0 Metagenomics 312.3 281.2 188.7 298.0 410.1 192.6 0.16 195.8 0.17 997.0 282.0 0.17 388.0 608.2 0.54 0.24 258.8 0.23 Thellungiella halophila Thellungiella halophila Arabidopsis thaliana Arabidopsis thaliana Aethionema arabicum Eutrema parvulum Eutrema parvulum Aethionema arabicum Arabidopsis lyrata Arabidopsis lyrata Capsella rubella Capsella rubella 3189.3 3095.4 3126.4 2203.4 2033.5 2096.5 2098.7 2036.6 2040.6 3409.6 3131.5 3221.5 AA32G00445 Tp4g15760 20180168 10023932 AT2G33430 DAL 0.999 Tp1g03280 20187201 10012323 AT1G04450 AA32G00441 Tp4g15800 20179724 10025564 4001438 RIC1 0.998 AA19G00112 Tp1g03240 20188475 10009108 100399 AT1G04420 0.999 from PG#1 All singlecopy AA19G00113 Tp1g03230 20188581 AT1G04410 C-NAD-MDH1, AA32G00436 Tp2g08140 20208201 10026756 AT5G43330 C-NAD-MDH2 0.997 Tp1g03220 20187279 10008610 AT1G04400 AT-PHH1 AA32G00435 Tp2g08150 20208192 0.997 Tp1g03180 20187678 10011057 100393 ERF14 AA32G00429 Tp2g08290 20207965 10028227 800707 ERF96 0.995 AA19G00116 Tp1g03170 20188613 10012007 100392 AA32G00428 Tp2g08300 20207948 10028453 AT5G43420 AT5G43420 0.994 from PG#2 All singlecopy AA19G00118 Tp1g03150 20188264 10010342 100390 AT1G04340 AA32G00426 Tp2g08340 20208037 10027196 AT5G43460 0.994 Tp1g03140 20185337 10010725 100389 AT1G04330 AA32G00423 Tp2g08370 800693 AT5G43540 0.994 Tp1g03110 20188017 10012200 AT1G04300 MUSE13 AA32G00422 Tp2g08380 20208101 AT5G43560 MUSE14 0.996 AA32G00420 Tp2g08410 20207976 10026325 AT5G43600 ATAAH-2 0.999 in allgenomes Fully duplicated AA19G00122 Tp1g03080 20186862 10010394 AT1G04270 RPS15 AA32G00418 Tp2g08430 20208136 10027304 AT5G43640 AT5G43640 0.998 AA19G00123 Tp1g03070 20187483 10011583 AT1G04260 MPI7 AA32G00416 Tp2g08460 20208202 0.998 AA19G00124 Tp1g03060 20185631 10010236 AT1G04250 ATIAA17 AA32G00414 10005662 0.998 AA19G00125 Tp1g03050 20188076 10010367 100381 IAA3 AA32G00413 Tp2g08490 20208050 10007151 AT5G43700 ATAUX2-11 0.998 Other AA32G00412 Tp2g08500 20207979 10028673 AT5G43710 MNS4 0.999 AT5G43720 AT1G04230 AA32G00411 Tp2g08510 20208234 10026768 AT5G43720 0.998 37000003 AA32G00410 Tp2g08530 20208075 10025898 AT5G43745 AT5G43745 0.998 AA19G00126 Tp1g03040 20187045 10008867 37000001 KCS2 AA32G00409 Tp2g08570 20208134 10026222 8000500 KCS20 0.998 AA19G00128 Tp1g03030 20186753 10008166 100378 AT1G04210 complexity 0.999 More fractionated parental genome Less fractionated parental genome Originsof • duplications genome Gene and … Open Project • Non-traditional hardware for bioinformatics: Approaches in the lab • LINUX-based use of scripts for WGD and metagenomic analysis • Perl/Python programming • c/c++ tool development • Parallel computing – MPI – OpenMP – CUDA , split et al. olfe ubin 1977; 2002) or S. bayanus orce and W hitt , 2000; R are shared by et al. very soon after the and W eoighe onery S. cerevisiae 2006). More recently, we erris K. polysporus ) suggested that a rapid loss and C ondrashov Kluyveromyces polysporus 2007). S. cerevisiae et al. 3378±3386 Nucleic Acids Research, 2002, Vol. 30 No. 15 ã 2002 Oxford University Press 2000). 1993). These two observations 1977; F ynch et al. 1 2000; K S. castellii orce 2004). Generally speaking, functional et al. cannell ughes GenomeHistory: a software tool and its application , and et al. onery and F cannell (S isbee to fully sequenced genomes and H and C ynch oszul Gavin C. Conant* and Andreas Wagner 2000), while studies of individual duplicate gene Duplicate gene loss itself can drive other evolutionary When comparing such relatively distantly related ughes ynch through subfunctionalization (the partitioning of ances- tral functions between the1999; L duplicate pair; F processes. An analysis of the timings of duplicate gene loss in four post-WGD yeast species ( S. cerevisiae species that nonetheless shareloss a also WGD, duplicate complicates gene inferences regarding molecular Candida glabrata of many duplicate pairs contributed to a species radiation after the WGD (S have shown that thefrom the lineage yeast leading to genome duplication. Asduplicates from a the WGD result, in only 47% of gene et al. pairs suggest that these pairs canperiods be preserved (B over long mon in eukaryotes (L H divergence occurs either through neofunctionalization (the appearance of aL novel function in one duplicate; Department of Biology, 167 Castetter Hall, The University of New Mexico, Albuquerque, NM 87131, USA indicate the existence ofduplicate selective genes. forces that Amongsuggested preserve are the functional divergence forcesto and maintain high requirements that dosages of a gene have1999; (S K been Received April 8, 2002; Revised and Accepted June 4, 2002 im ABSTRACT and oof et al. et al. 1997; H ABSTRACT extract the number of non-synonymous nucleotide substitu- ei ARTICLE IN PRESS tions per nucleotide site (K ) and the number of synonymous an a 2006; K We present a publicly available software tool (http:// liften nucleotide substitutions per nucleotide site (Ks) for all gene ,V Manuscript received April 12, 2007 www.unm.edu/~compbio/software/GenomeHistory) cannell duplicates in a genome from information on coding regions hields e.g. et al. that identi®es all pairs of duplicate genes in a gen- Accepted for publication April 21, 2008 contained in FASTA ®les. With suitable precautions, Ks can ome and then determines the degree of synonym- be used to estimate the time that has elapsed since a gene 2005). In particular, 2004; S ares and S J. Parallel Distrib. Comput. 63 (2003) 674–682 2004) has provided a ous and non-synonymous divergence between each duplication. The ratio Ka/Ks is an enormously useful quantity Gavin C. Conant and Kenneth H. Wolfe in gauging the selective constraint a given sequence pair is Smurfit Institute of Genetics, Trinity College, Dublin 2, Ireland duplicate pair. Using this tool, we analyze the rela- et al. olfe et al. 80% of the genes in the subject to (16). We have named our tool GenomeHistory. It olfe tions between (i) gene function and the propensity 1980). Analyses of full ge- 2006; F relies on existing algorithms, but uses user-con®gurable i (W of a gene to duplicate and (ii) the number of genes ujon Parallel Genehunter: implementation of a linkage analysis package are now available (C and W in a gene family and the family's rate of sequence parameters to automate the analysis of large datasets with ellis olfe for distributed-memory architecturesminimal user input. Created by Whole-Genome Duplication in Yeast evolution. We do so for the complete genomes of Smurfit Institute of Genetics, University of Below, we use GenomeHistory to examine patterns of gene yrne 1973; L a, b c a 2003; D four eukaryotesGavin C. Conant, (®ssionà andSteven budding J. Plimpton, yeast, fruitWilliam ¯y Old, Andreas Wagner, duplication in ®ve fully sequenced genomes. Several genome 2004; K and nematode) and oned prokaryote (Escherichiad b and W Pamela R. Fain, Theresa R. Pacheco, and Grantsequencing Heffelfinger consortia have begun this task in their original S. cerevisiae et al. Identification of orthologous genes across species becomes challenging in the presence of a whole- coli ). Fora Department some of Biology,classes 167 Castetter of genes Hall, The University we observe of New Mexico, a Albuquerque, NM 87131-1091, USA genome that have returned to single copy et al. reports published with the genome sequences (17,18). Extend- genome duplication (WGD). Weall present a possible probabilistic orthology/paralogy methodspecies). assignments for This for identifying approach orthologs a allows thateach set us considers genomic of to region. Two genomes estimate inferencesto how with produced prevent confident by a duplicate we this shared model can genefactor WGD are be loss. of indicative (here in First, seven) of the five our purifying inmodel orthology selection yeast model infers duplicate acting assignments to suggests gene in have that half-life. beenappear there lost Second, soon uncorrelated are we after with significant observe WGD differences andbiased gene differences those toward (up expression between lost genes to more level the whose recently. a showing genes and Gene paralogs losses that knockout biases have soon the fitness after high toward WGD duplicate expression defect. certain copies and However, large of later functional knockout somepreserved losses groups fitness genes in are defects, such duplicate may as by be as well natural lost as selection ribosomal neutrally for proteins. after reasons WGD, including We dosage. another suggest set that of genes while may be initially b 1681–1692 ( July 2008) Computation, Computers, and Mathematics Center, Sandia National Laboratories, Albuquerque, NM, USA 2008 by the Genetics Society of America strong relationship between gene function and a 2006). In addition, these data provide the c ing this and other work (12,19), we here address three onant Ó Agilent Laboratories, Fort Collins, CO, USA ellis 179: gene's propensityd to undergo duplication. Most i Health Sciences Center, The University of Colorado, Fort Collins,questions: CO, USA (i) do genes of different functions differ in their tion (WGD) in an ancestor of the baker’s yeast HE discovery of an ancient whole-genome duplica- notably, ribosomalReceived genes 4 December