A Pluralistic Account of Homology: Adapting the Models to the Data

A Pluralistic Account of Homology: Adapting the Models to the Data

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Harvard University - DASH A Pluralistic Account of Homology: Adapting the Models to the Data The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Haggerty, Leanne S., Pierre-Alain Jachiet, William P. Hanage, David A. Fitzpatrick, Philippe Lopez, Mary J. O’Connell, Davide Pisani, Mark Wilkinson, Eric Bapteste, and James O. McInerney. 2013. “A Pluralistic Account of Homology: Adapting the Models to the Data.” Molecular Biology and Evolution 31 (3): 501-516. doi:10.1093/molbev/mst228. http://dx.doi.org/10.1093/molbev/mst228. Published Version doi:10.1093/molbev/mst228 Accessed February 19, 2015 3:23:04 PM EST Citable Link http://nrs.harvard.edu/urn-3:HUL.InstRepos:11879796 Terms of Use This article was downloaded from Harvard University's DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms- of-use#LAA (Article begins on next page) A Pluralistic Account of Homology: Adapting the Models to the Data Leanne S. Haggerty,1 Pierre-Alain Jachiet,2 William P. Hanage,3 David A. Fitzpatrick,1 Philippe Lopez,2 Mary J. O’Connell,4 Davide Pisani,1,5 Mark Wilkinson,6 Eric Bapteste,2 and James O. McInerney*,1,3 1Bioinformatics and Molecular Evolution Unit, Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland 2Unite´ Mixte de Recherche 7138 Syste´matique, Adaptation, Evolution, Universite´ Pierre et Marie Curie, Paris, France 3Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA 4Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Glasnevin, Dublin, Ireland 5School of Biological Sciences and School of Earth Sciences,UniversityofBristol,Bristol,UnitedKingdom 6Department of Life Sciences, The Natural History Museum, Cromwell Road, London, United Kingdom *Corresponding author: E-mail: [email protected]. Associate editor: David Irwin Abstract Defining homologous genes is important in many evolutionary studies but raises obvious issues. Some of these issues are conceptual and stem from our assumptions of how a gene evolves, others are practical, and depend on the algorithmic decisions implemented in existing software. Therefore, to make progress in the study of homology, both ontological and epistemological questions must be considered. In particular, defining homologous genes cannot be solely addressed under the classic assumptions of strong tree thinking, according to which genes evolve in a strictly tree-like fashion of vertical descent and divergence and the problems of homology detection are primarily methodological. Gene homology could also be considered under a different perspective where genes evolve as “public goods,” subjected to various introgressive processes. In this latter case, defining homologous genes becomes a matter of designing models suited to the actual complexity of the data and how such complexity arises, rather than trying to fit genetic data to some a priori tree-like evolutionary model, a practice that inevitably results in the loss of much information. Here we show how important aspects of the problems raised by homology detection methods can be overcome when even more funda- mental roots of these problems are addressed by analyzing public goods thinking evolutionary processes through which geneshavefrequentlyoriginated.Thiskind of thinking acknowledges distinct types of homologs, characterized by distinct patterns, in phylogenetic and nonphylogenetic unrooted or multirooted networks. In addition, we define “family resem- blances” to include genes that are related through intermediate relatives, thereby placing notions of homology in the broader context of evolutionary relationships. We conclude by presenting some payoffs of adopting such a pluralistic account of homology and family relationship, which expands the scope of evolutionary analyses beyond the traditional, Perspective yet relatively narrow focus allowed by a strong tree-thinking view on gene evolution. Key words: homology, network, comparative genomics, epaktolog, ortholog, paralog. The meaning of scientific terms cannot and should the other hand, practical homology definitions and the not remain fixed forever by the priority of the original relationship between these theoretical and operational definition. This is simply because our experience con- issues is a neglected area of evolutionary biology. In this stantly outruns our terminology. manuscript, we explore a plurality of ontological bases for understanding homology in macromolecular sequences, —Theodosius Dobzhansky (Dobzhansky 1955) and by extension, we explore concepts and definitions of gene family. The ontology—the study of what objects exist Defining Gene Families: A Central Complex and how they relate to one another—is an important aspect of enquiry that is generally addressed before any Task in Evolutionary Studies practical effort to apply this ontology. We contend that Homology is acknowledged as an elusive concept, and yet a tree-thinking perspective has strongly influenced consid- it is central to comparative evolutionary biology, underpins eration of what the ontological basis of homology might be phylogeny reconstruction (Felsenstein 2004) and develop- and has needlessly and unhelpfully constrained understand- mental biology (Brigandt 2003), and is used extensively in ing through the notion that homologs fit into neat gene- ethology and psychology (Ereshefsky 2007). On the one alogical families that have evolved their differences hand, we have ontological concepts of homology, and on according to some underlying phylogenetic tree. ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http:// creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Open Access Mol. Biol. Evol. 31(3):501–516 doi:10.1093/molbev/mst228 Advance Access publication November 22, 2013 501 Haggerty et al. doi:10.1093/molbev/mst228 MBE It has long been recognized that sequence evolution is not a general understanding of gene family at that time); there- tree-like, in particular because of domain shuffling (Enright fore, the “definition” of a family was an operational one, based et al. 1999; Marcotte et al. 1999; Portugaly et al. 2006). It has on a setting in a software programme instead of exploring also long been recognized that this non-tree–like evolution evolutionary history and whether it might be simple or com- results in a network of sequence relationships (Sonnhammer plex. In this case, family definition is a uniformly applied rule and Kahn 1994; Park et al. 1997; Enright and Ouzounis 2000; where one software option fits all. Here, we suggest that Heger and Holm 2003; Ingolfsson and Yona 2008; Song et al. alternatives to such simple approaches are desirable, 2008). However, for an almost equally long period of time, it though perhaps more difficult to achieve. Similarly, while has been assumed that the right way to process this network we stress that the TribeMCL approach has proved to be of was to carve it into homologous parts by clustering (Tatusov enormous benefit, we argue that many important evolution- et al. 1997; Enright and Ouzounis 2000; Yona et al. 2000). ary events and types of family relationship can be missed if Relevant clusters have generally been considered to be gene this kind of approach is the only one that is taken. families with all members presenting full homology with one A number of points should be made at this stage before another. Smaller relevant clusters have also been proposed by getting to the main argument of the article. In this article, we identifying homologous domains, for example, families of specifically wish to discuss homology in the context of genes sequences presenting homology over their entire length but and other genetic components, such as promoters and sub- frequently of smaller size than entire genes (Sonnhammer gene elements—what we term genetic goods (McInerney and Kahn 1994; Park and Teichmann 1998; Apic, Gough, et al. 2011). For such data, the notion of “homolog” and Teichmann 2001b; Wuchty 2001; Enright et al. 2003; Song “gene family” has been written about extensively, but there et al. 2008). Both of these relatively local perspectives on is still no universally agreed consensus on what either of these sequence relationships are familiar to most biologists. terms mean (Duret et al. 1994; Natale et al. 2000; Perriere et al. Consequently, the task of defining gene families has been 2000; Tatusov et al. 2000; Dessimoz et al. 2012; Miele et al. generally delegated to software programs that search for clus- 2012). Additionally, there are significant technical limitations ters or communities of phylogenetically related sequences. for the detection of homologies. Certain cutoffs are imposed Increasingly, with genomic data sets of genuinely enormous on any analysis, which leads to de facto homologies being sizes, the problem is considered best handled by such pro- missed because the sequences no longer

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us