Two Influential Primate Classifications Logically Aligned
Total Page:16
File Type:pdf, Size:1020Kb
Syst. Biol. 65(4):561–582, 2016 © The Author(s) 2016. Published by Oxford University Press on behalf of the Society of Systematic Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. DOI:10.1093/sysbio/syw023 Advance Access publication March 22, 2016 Two Influential Primate Classifications Logically Aligned ,∗ NICO M. FRANZ1 ,NAOMI M. PIER1,DEEANN M. REEDER2,MINGMIN CHEN3,SHIZHUO YU3,PARISA KIANMAJD3, SHAWN BOWERS4, AND BERTRAM LUDÄSCHER5 1School of Life Sciences, PO Box 874501, Arizona State University, Tempe, AZ 85287, USA; 2Department of Biology, Bucknell University, 1 Dent Drive, Lewisburg, PA 17837, USA; 3Department of Computer Science, 2063 Kemper Hall, 1 Shields Avenue, University of California at Davis, Davis, CA 95616, USA; 4Department of Computer Science, 502 East Boone Avenue, AD Box 26, Gonzaga University, Spokane, WA 99258, USA; 5Gradate School of Library and Information Science, 510 East Daniel Street, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA; ∗ Correspondence to be sent to: School of Life Sciences, PO Box 874501, Arizona State University, Tempe, AZ 85287, USA; E-mail: [email protected]. Received 16 February 2015; reviews returned 11 March 2016; accepted 17 March 2016 Benoit Dayrat Abstract.—Classifications and phylogenies of perceived natural entities change in the light of new evidence. Taxonomic changes, translated into Code-compliant names, frequently lead to name:meaning dissociations across succeeding treatments. Classification standards such as the Mammal Species of the World (MSW) may experience significant levels of taxonomic change from one edition to the next, with potential costs to long-term, large-scale information integration. This circumstance challenges the biodiversity and phylogenetic data communities to express taxonomic congruence and incongruence in ways that both humans and machines can process, that is, to logically represent taxonomic alignments across multiple classifications. We demonstrate that such alignments are feasible for two classifications of primates corresponding to the second and third MSW editions. Our approach has three main components: (i) use of taxonomic concept labels, that is name sec. author (where sec. means according to), to assemble each concept hierarchy separately via parent/child relationships; (ii) articulation of select concepts across the two hierarchies with user-provided Region Connection Calculus (RCC-5) relationships; and (iii) the use of an Answer Set Programming toolkit to infer and visualize logically consistent alignments of these input constraints. Our use case entails the Primates sec. Groves (1993; MSW2–317 taxonomic concepts; 233 at the species level) and Primates sec. Groves (2005; MSW3–483 taxonomic concepts; 376 at the species level). Using 402 RCC-5 input articulations, the reasoning process yields a single, consistent alignment and 153,111 Maximally Informative Relations that constitute a comprehensive meaning resolution map for every concept pair in the Primates sec. MSW2/MSW3. The complete alignment, and various partitions thereof, facilitate quantitative analyses of name:meaning dissociation, revealing that nearly one in three taxonomic names are not reliable across treatments—in the sense of the same name identifying congruent taxonomic meanings. The RCC-5 alignment approach is potentially widely applicable in systematics and can achieve scalable, precise resolution of semantically evolving name usages in synthetic, next-generation biodiversity, and phylogeny data platforms. [Alignment; classification; concept taxonomy; logic; ontology; Primates; reasoning; Region Connection Calculus.] Primatologists on the whole don’t understand Cotterill et al. 2014). Reasons for such change are taxonomy, but they need to, and, in the main, they manifold, including the application of alternative want to. species concepts or recognition of new phylogenetic (Groves 2001a:vii) information. Although a great number of changes in the mammal tree of life are also mirrored in emendations Human classifications of perceived natural groups of mammalian nomenclature, the tracking of taxonomic change in the light of new evidence. Over time these changes through the type-centric Linnaean naming changes can affect the validity of taxonomic names and system is not perfect (Franz 2005; Kennedy et al. 2005; stability of their meanings. Users must keep track of Witteveen 2015; Franz et al. 2016; Remsen 2016). name:meaning (read: “name-to-meaning”) relationship Why model evolving name:meaning relationships updates to communicate reliably about perceived in systematics? Many systematists find it desirable organismal groups and retain the ability to integrate that taxonomic name usages are not in conflict with information across multiple incongruent classifications. contemporary phylogenetic inferences. We rely on the Due to a general trend in biology toward generating former (names) to integrate the latter (classifications, synthetic data sets that may reflect heterogeneous phylogenies) over time, and thereby build a long- taxonomic perspectives (e.g., Hinchcliff et al. 2015), term semantic foundation for refining evolutionary the challenge of reconciling evolving taxonomies is knowledge. The many-to-many relationships that becoming increasingly relevant (Franz et al. 2008). Few frequently develop between taxonomic names and tools are designed specifically to model conflicts and their meanings challenge this objective. At least two ambiguities in name:meaning relationships that result apparent solutions exist. The first is to promote stability from taxonomic and phylogenetic advancement. and unity in classification, for instance through the More than 250 years since Linnaeus’ Systema Naturae, adoption of community-wide standards that endorse mammal classifications continue to change at both specific configurations of valid names, synonyms, lower and higher taxonomic levels (e.g., Asher and and taxonomic circumscriptions (Scoble 2004). In the Helgen 2010; Heller et al. 2013; Zachos et al. 2013; present context, the Mammal Species of the World 561 562 SYSTEMATIC BIOLOGY VOL. 65 (henceforth: MSW) editions are standard references the use of distinct taxonomic concept labels in which the for mammalian classifications that aim to unify same taxonomic name can participate(s). For instance, name:meaning usages at the global scale (Honacki the labels Cebidae sec. Groves (1993) versus Cebidae et al. 1982; Wilson and Reeder 1993, 2005). However, sec. Groves (2005) are managed as nonidentical symbols even standards experience an evolution of changing and may therefore symbolize noncongruent meanings. taxonomic names and/or meanings from one edition Properly individuated concepts can be assembled into tothenext(Patterson 1994; Reeder et al. 2007; Solari entire concept hierarchies (ontologies) via parent/child and Baker 2007). This is exemplified by the family-level (is_a) relationships (Thau and Ludäscher 2007). This name Cebidae Bonaparte 1831, whose circumscription way each separately published perspective—Primates varies significantly across MSW editions and other sec. Groves (1993) versus Primates sec. Groves (2005)— treatments (Groves 1993, 2001a, 2005; Rylands and can be represented from the ordinal to the species Mittermeier 2009). In short, standards can mitigate the level. challenges inherent in name-based data integration— The second component involves providing an initial, up to a point. They cannot eliminate systemic limited set of Region Connection Calculus (RCC-5) limitations incurred by using names and nomenclatural articulations (Randell et al. 1992) that express the extent relationships as identifiers of exceedingly granular of taxonomic equivalence among concepts pertaining to taxonomic incongruences. distinct hierarchies. The available RCC-5 articulations The second solution complements the use of singular, are: congruence (==), proper inclusion (>), inverse synthetic classifications or phylogenies (see Hinchcliff proper inclusion (<), overlap (><), and exclusion (| or !) et al. 2015). This solution does not favor one “best” (Koperski et al. 2000; Franz et al. 2008; Franz and taxonomic inference scheme over another. Instead, the Peet 2009; Franz and Cardona-Duque 2013; Weakley goal is to be logically explicit about the similarities and 2015). Such input articulations are provided by humans differences between taxonomies, that is, to resolve the with expertise in the corresponding groups, and reflect evolution of human taxonomic inference making more their understanding of taxonomic meaning relationships formally and precisely than is feasible with names and in light of the available evidence (e.g., subsumed nomenclatural relationships alone. In the terminology concepts, homonymy/synonymy, circumscriptions of of computer science, an individual taxonomy can be the phenotype and genotype, distributional data, etc.). modeled as an ontology (Franz and Thau 2010; Midford For instance, the articulation Cebidae sec. Groves (1993) et al. 2013), and therefore the process of reconciling >< Cebidae sec. Groves (2005) recognizes that each taxonomic meanings across multiple classifications concept entails certain congruent subcomponents, yet is a special case of ontology matching (Euzenat and each also contains additional subcomponents that are