Automated Identification of Conserved Synteny After Whole Genome Duplication

Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Automated identification of conserved synteny after whole genome duplication Julian M. Catchen1,2, John S. Conery1, John H. Postlethwait2,∗ May 6, 2009 1 Department of Computer and Information Science, University of Oregon, USA 2 Institute of Neuroscience, University of Oregon, USA ∗ Corresponding author. Phone: 541-346-4538; Fax: 541-346-4548; Email: [email protected] Running Title: Automated identification of conserved synteny Keywords: genome duplication, conserved synteny, ohnologs, gene family evolution 1 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Abstract An important objective for inferring the evolutionary history of gene families is the determination of orthologies and paralogies. Lineage specific paralog loss following whole genome duplication events can cause anciently related homologs to appear in some assays as orthologs. Conserved synteny – the tendency of neighboring genes to retain their relative positions and orders on chromosomes over evolutionary time – can help resolve such errors. Several previous studies examined genome-wide syntenic conservation to infer the contents of ancestral chromosomes and provided insights into the architecture of ancestral genomes, but did not provide methods or tools applicable to the study of the evolution of individual gene families. We developed an automated system to identify conserved syntenic regions in a primary genome using as outgroup a genome that diverged from the investigated lineage before a whole genome duplication event. The product of this automated analysis, the Synteny Database, allows a user to examine fully or partially assembled genomes. The Synteny Database is optimized for the investigation of individual gene families in multiple lineages and can detect chromosomal inversions and translocations as well as ohnologs (paralogs derived by whole-genome duplication) gone missing. To demonstrate the utility of the system, we present a case study of gene family evolution, investigating the ARNTL gene family in the genomes of Ciona intestinalis, amphioxus, zebrafish, and human. 2 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press 1 Introduction An important objective for inferring the evolutionary history of gene families and chromosome segments is the determination of orthology and paralogy. A stepwise approach gener- ally uses BLAST (basic local alignment search tool) (Altschul et al., 1997) to define coarse relationships among genes followed by phylogenetic reconstruction to suggest more detailed hypotheses of descent. Events such as gene duplications or whole genome duplications (WGD), with associated differential gene loss, introduce noise into these analyses. Anoma- lies, such as lineage specific paralog loss, can cause anciently related homologs to appear to be orthologs, thereby confusing sequence similarity with functional homology (Postlethwait, 2006). Such errors can confound attempts to create non-human animal disease models and can obscure recent, species-specific evolutionary change among sister lineages. Orthologs are two genes, one in each of two species, that descended from a single gene in the last common ancestor of those two species. Paralogs are a set of genes derived by duplication within a lineage, and together, a group of paralogs can be co-orthologous to their unduplicated ortholog in a related species. Ohnologs are a special subset of paralogs that result from a whole genome duplication event (Wolfe, 2000). The differential loss of genes that follows a duplication event can create ohnologs gone missing when different ohnologs are lost reciprocally in different lineages. Understanding and distinguishing ohnologs gone missing from orthologs is a perva- sive problem in vertebrate genomics due to multiple genome duplication events. Two rounds of whole-genome duplication events called R1 and R2 likely occurred at the base of the vertebrate lineage after the divergence of non-vertebrate chordates and prior to the appearance of jawed vertebrates (Garcia-Fernàndez and Holland, 1994; Spring, 1997; Dehal and Boore, 2005). A third duplication called R3 likely occurred in the teleost lineage after the divergence of ray-finned and lobe-finned fishes (Amores et al., 1998; Taylor et al., 2003; Jaillon et al., 2004) but before the radiation of the teleosts. Additional genome duplications punctuated the evolution of other lineages, like salmonids, catastomids, goldfish, 3 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Xenopus laevis, and even a rodent (Allendorf and Thorgaard, 1984; Mungpakdee et al., 2008a,b; Uyeno and Smith, 1972; Risinger and Larhammar, 1993; Larhammar and Risinger, 1994; David et al., 2003; Schmid and Steinlein, 1991; Gallardo et al., 1999). Given the per- vasive nature of genome duplication in chordates, and the importance of teleost fish and Xenopus laevis as model organisms, it is important to develop automated methods to identify true orthologs among groups of paralogs and to distinguish them from more ancient, non-orthologous homologs. Figure 1 illustrates the problem of distinguishing orthologs following duplication and lineage-specific loss of a gene g and some of its neighboring genes after WGD (R1), speciation (S) and a second WGD event (R2) in one of the descendant lineages. In an ide- alized case, chromosomes would experience few changes in gene order or gene content, as illustrated by genes of the same color in Figure 1. The most common fate of genes created by a WGD event, however, is pseudogenization and nonfunctionalization (Li, 1980; Watterson, 1983). Surviving duplicates can develop new functions (Ohno, 1970) or partition or lose their existing functions (Force et al., 1999; Lynch and Force, 2000; Winkler et al., 2003; Postlethwait et al., 2004; Jovelin et al., 2007; Jarinova et al., 2008; Chain et al., 2008; Conant and Wolfe, 2008). From the time of the duplication event to the present, duplicated genes can alter their expression patterns (Force et al., 1999) or their exon structure (Altschmied et al., 2002), or their activities (Zhang et al., 2002; Zhang, 2003), and such changes can alter protein-protein interactions or subsequent developmental or physiological functions. In the case of differential gene loss and gene rearrangements in lineages S1 and S2, most reciprocal best hit BLAST algorithms (Wall et al., 2003) would associate gene g2 with g1a and g1b and most phylogenetic methods, due to a lack of data, would find that the most likely hypothesis of descent was that genes g2, g1a, and g1b shared their most recent common ancestor; in other words, these methods would incorrectly infer that g1a and g1b were orthologs of g2. The erroneous assignment of orthology presents a problem because it implies that the last common ancestor at time S had a single gene with a set of functions 4 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press that evolved to g1 (and its subsequent duplicates, g1a and g1b) in S2 and g2 in S1, but in fact, no such gene actually existed. To address this problem, and to better infer orthologies and paralogies, we can take advantage of conserved synteny – the tendency of neighboring genes to retain their relative positions and orders on chromosomes over evolutionary time. In a WGD event, duplicated chromosomes (homeologs) initially have gene orders identical to each other and to their immediate ancestor. Between the time of duplication and speciation events, however, genes can be lost from one homeolog or the other (unless preserved by structures such as em- bedded regulatory elements (Kikuta et al., 2007)), and inversions and other chromosome rearrangements can occur independently on the two duplicated homeologs. These events occurring in the chromosomal vicinity of the gene in question give an identity to all of the genes in the neighborhood. In the example given in Figure 1, we could test the hypothesis that genes g1a and g1b are co-orthologous to gene g2 by first examining the neighbors of g1a and g1b – ensuring that a sufficient number of gene neighbors are also paralogous – and then by checking those neighboring paralogs to ensure that they are orthologous to the neighbors of g2. The conserved syntenic region defined by such genes would confirm (or in this case, reject) the co-orthology of genes g1a and g1b to g2. This approach complements the use of BLAST and phylogenetic reconstruction and provides additional evidence to infer the evolutionary history of gene families independent of sequence identities. Several previous studies examined syntenic conservation at a genomic level to deter- mine the nature of the ancestral chromosomes for that organism’s lineage. Evidence for two rounds of genome duplication in stem vertebrates came from a whole-genome analysis of human, mouse, and fugu pufferfish using the urochordate Ciona intestinalis as an outgroup (Dehal and Boore, 2005). Analysis of the Tetraodon nigroviridis (green spotted pufferfish) genome and the construction of a dense meiotic map for medaka, supported earlier conclu- sions (Amores et al., 1998; Postlethwait et al., 1998; Woods et al., 2000; Postlethwait et al., 2002; Taylor et al., 2003; Van de Peer et al., 2003) that a third genome duplication had occurred in the teleost fish. Analysis of Tetraodon

Automated Identification of Conserved Synteny After Whole Genome Duplication

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Testing for Parallel Genomic and Epigenomic Footprints of Adaptation to Urban Life in a Passerine Bird

Epigenetic Regulation of Neurogenesis by Citron Kinase Matthew Irg Genti University of Connecticut - Storrs, [email protected]

Rna-Sequencing Applications: Gene Expression Quantification and Methylator Phenotype Identification

Calcium Control of Neurotransmitter Release

Supplementary Table 1

The Inverse Agonist DG172 Triggers a Pparβ/Δ-Independent Myeloid Lineage Shift and Promotes GM-CSF/IL-4-Induced Dendritic Cell Differentiation

Gangliosides Interact with Synaptotagmin to Form the High-Affinity Receptor Complex for Botulinum Neurotoxin B

The Molecular Machinery of Neurotransmitter Release Nobel Lecture, 7 December 2013

1 1 2 3 Cell Type-Specific Transcriptomics of Hypothalamic

Graded Co-Expression of Ion Channel, Neurofilament, and Synaptic Genes in Fast- Spiking Vestibular Nucleus Neurons

Neuroscience and Biobehavioral Reviews a Novel NMDA Receptor