Names Are Not Good Enough: Reasoning over Taxonomic Change in the Andropogon Complex1

Editor(s): Krysztof Janowicz, University of California, Santa Barbara, USA, Benjamin Adams, The University of Auckland, New Zealand; Birgitta König-Ries, Friedrich-Schiller University Jena, Germany; Deborah McGuinness, Rensselaer Polytechnic Institute, USA; Mark Schild- hauer, NCEAS, University of California, Santa Barbara, USA Solicited review(s): Martin Pullan, Royal Botanic Garden Edinburgh, Scotland; 5 anonymous reviewers Open review(s): Albert Meroño, Vrije University Amsterdam, The Netherlands

Nico M. Franza,*, Mingmin Chenb, Parisa Kianmajdb, Shizhuo Yub, Shawn Bowersc, Alan S. Weakleyd, and Bertram Ludäschere aSchool of Life Sciences, PO Box 874501, Arizona State University, Tempe, AZ 85287, USA bDepartment of Computer Science, 2063 Kemper Hall, 1 Shields Avenue, University of California at Davis, Davis CA 95616, USA cDepartment of Computer Science, 502 East Boone Avenue, AD Box 26, Gonzaga University, Spokane, WA 99258 USA dDepartment of Biology, Coker Hall, 120 South Road, University of North Carolina, Chapel Hill, NC 27599 USA eGraduate School of Library and Information Science, 510 East Daniel Street, University of Illinois at Urbana- Champaign, Champaign, IL 61820 USA

Abstract. We present a novel, logic-based solution to the challenge of reconciling the meanings of taxonomic names across multiple biological taxonomies. The challenge arises due to limitations inherent in using type-anchored taxonomic names as identifiers of granular semantic similarities and differences being expressed in original and revised taxonomic classifications. We address this challenge through: (1) the use of taxonomic concept labels – thereby individuating name usages according to particular sources and allowing each to be recognized separately; (2) sets of user-provided Region Connection Cal- culus articulations among concepts (RCC-5: congruence, proper inclusion, inverse proper inclusion, overlap, exclusion); and (3) the use of an Answer Set Programming-based reasoning toolkit that ingests these constraints to infer and visualize con- sistent multi-taxonomy alignments. The feasibility of this approach is demonstrated with a use case involving pairwise align- ments of 11 non-congruent classifications of Eastern United States grass entities variously assigned to the Andropogon glom- eratus-virginicus 'complex' over an interval of 126 years. Analyses of name:meaning identity reveal that, on average, taxonom- ic names are reliable identifiers of taxonomic non-/congruence for approximately 60% of the 127 merge regions obtained in 12 pairwise alignments. The name:meaning cardinality over the entire time interval ranges from 1:6 to 4:1, with only 1:36 names attaining the semantically ideal 1:1 ratio. We discuss the applicability of the RCC-5 alignment approach in the context of achieving logic-based integration of non-/congruent taxonomic concept hierarchies in dynamic biodiversity data environments.

Keywords: answer set programming, biodiversity data integration, biological nomenclature, knowledge representation and reasoning, name:meaning cardinality, ontology alignment, region connection calculus, systematics, taxonomic concepts, visu- alization

1 The authors acknowledge research support from the National Science Foundation, awards DEB–1155984, IIS–118088, DBI– 1147273, and DBI–1342595. *Corresponding author. E-mail: [email protected]. 1. Introduction names to perceived taxonomic groups [30,52,64,96]. We present a novel, logic-based solution to Biological classifications strive to reflect nat- the challenge of integrating the meanings of ural, phylogenetic relationships. They are taxonomic names across multiple biological therefore subject to adjustments whenever taxonomies. The challenge arises due to limi- new evidence regarding the identity of taxo- tations inherent in using taxonomic names as nomic entities or relationships among these is identifiers of granular semantic similarities brought forth by the latest systematic re- and differences being expressed in succeed- search [38]. For many organismal groups in ing classifications. We address this challenge the tree of life, systematists are not close to through the combined use of taxonomic con- completing this process of adjustment. For cepts [5,37], Region Connection Calculus instance, in the past 20 years the number of (RCC-5) articulations [36,71], and an Answer validly recognized of primates has Set Programming-based reasoning toolkit increased from 233 to 488 [76]. While such that infers consistent multi-taxonomy align- necessary taxonomic changes accumulate ments [15,16]. The feasibility of this approach over time, the Codes of nomenclature stipu- is demonstrated with a use case involving 11 late (inter alia) that name identity is ground- classifications of Eastern United States grass ed in the Principle of Typification [29,83]. entities variously assigned to the Andropogon This means that originally proposed and sub- glomeratus-virginicus 'complex' over an in- sequently revised taxonomic groups receive terval of 126 years [37,94]. The RCC-5 align- the same nomenclaturally valid name, or dif- ment approach is relevant to integrating tax- ferent names, based on the recurrently verifi- onomically referenced information in dynam- able identity of individual type specimens ic biodiversity data environments [37,70,74], (e.g., for the species rank) and individual type and generally as a means of tracking concept taxa (e.g., the rank). According to the non-/congruence across semantic hierarchies rules of nomenclature, types are to be desig- with RCC-5 articulations [19,25,92]. Our con- nated at the respective earliest moments of tribution reflects this balance by providing baptizing names, and thus 'anchor' the latter. sufficient detail for biodiversity scientists Typically both a type and a feature-based cir- while making connections to research in cumscription are provided when anchoring knowledge representation and reasoning the meaning (referential extension) of a taxo- [90]. nomic name [30,36,38,96]. However, the for- mer arbiter – i.e., the type identity – has spe- 2. Names as Identifiers of Taxonomic cial weight when dealing with alternative Meanings – Challenges and Solutions name:meaning (read: "name-to-meaning") assignments that become necessary when Why are names not good enough? We adopt taxonomies undergo revisions. Another rele- the view that taxonomic names and nomen- vant, Code-mandated naming rule is the Prin- clatural relationships are necessary but not ciple of Priority [67], which states that in case sufficient for integrating biodiversity data for of (again, type-grounded) synonymy, the old- semantic information environments Web est available name remains the valid one. The [5,37,59,73]. The reasons for this insufficien- vast majority of the 250+ year-old names of cy are systemic and well known to taxonomy Linnaeus [78] are 'eternally validated' by this contributors and users [3,10,66,74]. Ultimate- important Principle. ly they are rooted in the way in which identity Application of the rules of nomenclature to is established according to the rules of no- changing classifications can create semanti- menclature that guide the application of cally complex networks of many-to-many re- lationships among valid and invalid names on where all concepts derived from one hierar- one side, and associated circumscriptions on chy can be connected via parent/child (is_a) the other side [37,44,74]. Thus, in spite of the relationships [37,84,87]. In a subsequent central role of Code-compliant names in in- step, the hierarchies' entities can be aligned terconnecting biodiversity data [69,70,78], in reference to a variety of similarity indica- these names have shortcomings as identifiers tors; including nomenclatural relationships, of granular differences between taxonomic member composition, or diagnostic features perspectives that biodiversity data communi- [16,23,36,44,88]. ties create and apply at any given time. Sound Here we integrate concept-level annotations knowledge representation in the biodiversity of alternative taxonomic perspectives with data realm requires recognition of, and com- two additional workflow components: (1) pensation for, these systemic insufficiencies user provision of an initial set of Region Con- [33,36,59]. nection Calculus (RCC-5) articulations among Solutions to overcome taxonomic related concepts in each taxonomy, and (2) name:meaning dissociations may take two reasoner inference of additional articulations major pathways. One option is to assemble that are consistent with, and implied by, the single, comprehensive taxonomies for partic- given input constraints. These logically aug- ular groups, with periodically updated ver- mented sets of constraints are then translated sions [10,66,79]. This approach offers an im- into visualizations of merge taxonomies or mediate and valuable service to users. How- alignments. The alignments resolve taxonom- ever, in the longer term it often leads to mul- ic in-/congruence with greater granularity tiple distinct perspectives being represented than is possible with names and nomencla- by earlier and later versions of the 'same' tural relationships alone. The alignment standard [5,38,89]. Thus in effect the unitary products allow human users and computers taxonomy turns into an open-ended temporal to understand and integrate information ac- chain of partially incongruent taxonomies. cordingly [24,44,54,85,86]. Overlapping sets of names are reused from Here we apply the taxonomy alignment ap- version to version, with varying circumscrip- proach to the 11-classification Andropogon tions and no explicit tracking of taxonomic glomeratus-virginicus 'complex' use case alignment [19]. In the end, unitary systems (henceforth Andro-UC), using the novel Eu- are likely to promote the proliferation of am- ler/X toolkit [14,15,16,17,18,24,34,54] to in- biguous name:meaning relationships. fer and visualize merge taxonomies. Before Truly alternative – though also complemen- introducing the use case specifics, we first tary – options to unitary classifications are review the basic properties of the toolkit and being developed under the term taxonomic draw parallels to related efforts. concept approach. They share the convention, established in [5], to annotate taxonomic 3. Reasoning about Multi-Taxonomy name usages according to (sec.) particular Alignments with RCC-5 Articulations authors. An example of this convention is: Andropogon virginicus Linnaeus 1753 (name, The Euler/X toolkit is a successor of the name author, year) sec. Weakley 2015 (con- CleanTax software [84,85,86]. The CleanTax cept author, year) [94]. We refer to these prototype was built on top of a traditional combined name sec. author strings as taxo- First-Order Logic (FOL) reasoner [63]. Eu- nomic concept labels. ler/X advancements include interactive work- The resolution gained by using such labels is flow support, inconsistency and ambiguity critical. They permit the assembly of multiple analysis functions [17,18,84], and the use of alternative, internally coherent hierarchies Answer Set Programming (ASP) reasoners, disjoint, i.e., taxonomically exclusive of each based on Stable Model Semantics [40,41,61]. other; and (3) parent coverage – parent con- Taxonomy alignment problems are modeled cepts are completely circumscribed by (in- as sets of constraints (T1, T2, A, C), where: T1 cluded in) the union of their children. For the and T2 are the two input taxonomies in need present use case, all constraints apply by de- of alignment; A are the initial set of user- fault, but in the toolkit each is relaxable provided articulations; and C are additional where appropriate [14]. relevant constraints (Figure 1A). Each input The toolkit functions with relevance to the taxonomy (T1, T2) is separately represented Andro-UC are as follows (Figure 1). (1) Visu- from root to leaves through hierarchical par- alization of each input taxonomy in the for- ent/child (is_a) concept relationships [87]. An mat of an is_a hierarchy, and including the set example of the parent/child relationship is: of user-provided articulations (Figure 1B). (2) Andropogon virginicus sec. Weakley (2015) is Analysis of logical consistency. If the input a parent of Andropogon virginicus "old-field constraints are jointly inconsistent (con- variant" sec. Weakley (2015). The RCC-5 ar- straint over-specification), then no align- ticulations vocabulary (A) consists of five ments are obtained. (3) Inference and repre- basic set relationships2 which are used to sentation of one or more consistent align- compare the referential extensions of taxo- ments, grounded in the consistent user- nomic concept pairs; viz. congruence (==), provided articulations and additional, logical- proper inclusion (>), inverse proper inclusion ly implied articulations. Alignments are (<), overlap (><), and exclusion (|) [4, gener- 35,55,70]. For instance, Andropogon capillipes ated in two data formats: (a) as the set of sec. Weakley (2015) < Andropogon capillipes Maximally Informative Relations (MIR [84]) sec. Weakley (2006) indicates that the later interpretable by humans and machines, and (2015) concept has a narrower taxonomic (b) as alignment visuali-zations that aid hu- circumscription than the earlier (2006) con- man comprehension of taxonomic in- cept with the identical taxonomic name. Am- /congruence across the input classifications biguity due to incomplete information can be and their constituent elements (Figure 1C). expressed in RCC-5 using the disjunction "or". Increased use of disjunctions yields a lattice of 32 possible articulations (R32) that starts with the empty disjunction (∅, or "False"), denoting an inconsistent articulation at the bottom, and ends with the tautological articu- lation of all five disjoint base relations (== or > or < or >< or |), indicating maximal ambigu- ity or lack of any knowledge, at the top [44,84,85,87]. The set (C) of constraints applicable to taxon- omy alignments are [87]: (1) non-emptiness – each concept has at least one representing instance; (2) sibling disjointness – children concepts of a parent concept are reciprocally

2 In the qualitative reasoning domain [55], the basic RCC-5 relationships are known as EQ (equals), PP (proper part), PP-1 (inverse proper part), PO (partial overlap), and DR (disjoint region).

Fig. 1. Overview of input/output information for processing with the Euler/X taxonomy alignment toolkit, using the example of the Blomquist (1948) / Small (1933) alignment. (A) Input data format, showing the two input taxonomies and the set of six user-provided input articulations (Appendix 1). (B) Input visualization, with legend (left) providing information on numbers of input concepts per taxonomy, is_a relationships, and RCC-5 articulations. (C) Alignment visualization, with legend (left) providing information on non- /congruent concepts and properly including/overlapping edges. Visualization conventions, including annotations of name/meaning identity (=, ≠, +, –), are reused in Figures 4–6. inform interactive decision tree routines, Additional toolkit functions include logic- where the user is repeatedly prompted to re- based diagnosis and repair options in the case solve ambiguous (i.e., disjunctive) articula- of inconsistent input (= constraint over- tions, thereby reducing the number of possi- specification), and visualiza- ble word alignments. Both sets of functions are intended to aid the user in achieving con- tions of multiple alignments as aggregate and sistent, well-specified alignments [34]. How- cluster views in the case of ambiguous input ever, neither set of functions is needed to (= constraint under-specification) properly align the Andro-UC input, which by [15,16,17,18,24]. The latter visualizations can virtue of the unambiguously specified user input displayed in Figure 2 already satisfies biological taxonomies and derivations of FOL the criteria of consistency and sufficiency. We constraints are further discussed in [87]. refer readers to other contributions where Use of the RCC-5 articulations means that these issues are discussed in more detail ambiguities due to incomplete knowledge in [17,18,34,35,54]. alignments are modeled through disjunctive articulations, which may be present in the 4. Relationship of the RCC-5 Multi- input articulations, output MIR3, or both [34]. Taxonomy Alignment Approach to Other Disjunctive articulations of the R32 lattice Methods such as "A == B or A > B" are readily modeled in ASP or RCC reasoners but are more difficult To our knowledge, the specific combination of to represent in OWL-DL. In particular, the generating reasoner-inferred alignments be- toolkit-affiliated reasoners [26,40] represent tween multiple biological taxonomies with the RCC-5 input articulations using only the RCC-5 articulations (and ASP reasoners) has set operations of union and subset relation, no immediate precedent in the broader se- thereby subdividing the input concepts into mantics domain. The logic foundations for polynomially many Euler regions [72]. The this particular approach were developed in reasoning process yields the subset relations [16,44,86]. The step of modeling an input tax- of all Euler regions, then reassembles these to onomy as an is_a hierarchy is well established reconstitute the input concepts, and finally [38,66,90]. However, the remaining steps in infer their respective RCC-5 relations, as de- our toolkit workflow diverge from existing tailed in [14]. ontology matching or provenance-tracking Parallel efforts to derive taxonomic concept applications [19,22,25,53,80,90,92,97]. The alignments 'directly' from textual descrip- use of RCC-5 articulations is the most signifi- tions through the application of Natural Lan- cant difference, reflecting the preference of guage Processing methods and phenotype domain scientists for expressing concept rela- ontologies are introduced in [23]. Other taxo- tionships with these five basic set constraints nomically focused integration projects that do [35,37,94]. not utilize RCC-5 include [10,13,66,73,88]. Biodiversity scientists are often faced with The degree to which the RCC-5 alignment ap- use cases where sets of taxonomic occurrence proach is relevant to other field that model records or entities can either be relevantly semantic drift requires further exploration. merged, or not, for information ingestion into subsequent analyses. This requirement, to- 5. Input and Alignment Conditions for the gether with the notion that taxonomic bound- Andropogon Use Case aries are natural and empirically accessible [38], may motivate using RCC-5 over alterna- The Andro-UC has been selected to demon- tives that express similarity ratios among in- strate the multi-taxonomy alignment ap- dividual concepts and concept hierarchies proach for several reasons. First among these [92]. The latter are most appropriate for ex- is the availability of preexisting concept cir- pressing "how semantically close?" two con- cumscriptions and articulations through co- cepts are. However, for the biodiversity scien- author Alan S. Weakley, an expert on the Flo- tist this begs an additional question [36]: "are ra (and floristic legacy) of the Southern and the differences significant, or negligible, for Mid-Atlantic States [93,94]. An earlier version the purpose of merging data?" In this context, of the use case was published in [37] and in- RCC-5 provides direct, actionable, set theory- based information for multi-taxonomy inte- gration. The specific representation needs for 3 A MIR is the unique node in a given R32 lattice that implies all other true articulations in the lattice. cluded eight classifications. Three recent clas- 5.1. Taxonomic particulars sifications are herein added to the Andro-UC. The use case is furthermore suitable because The history of the Andro-UC is reviewed in it illustrates the considerable extent to which [37,93,94]. The 11 input classifications T1, …, names and meanings may dissociate over T11 are each reproduced according to the time as Code-compliant names are applied to source publications incongruent taxonomic classifications. The (Figure 2). All input articulations were pro- implications for integrating biodiversity data vided by the user in tabular format (Figure 3), are thereby made clear. Moreover, with only which readily facilitates translation into RCC- 100 concepts, the Andro-UC is relatively 5 relations. Strictly speaking, the Andro-UC small. Its outer taxonomic boundaries are concerns the "A. virginicus-A. glomeratus well defined and stable throughout the 126- complex" as circumscribed in [94]. The use year time interval (1889-2015). These prop- case is thus much narrower in scope than the erties allow us to present the alignment visu- entire genus-level concept Andropogon sec. alizations within the confines of this contribu- Clayton et al. (2013) [20], which includes tion. Additional comments on the relevance of more than 100 species-level concepts world- this use case and applicability of our ap- wide. proach to other alignment challenges are of- fered in the Discussion. Fig. 2. Tabular representation of the input alignment of taxonomic names and concepts used in the 11 succeeding classifications of the Andro-UC, as provided by Weakley [37, 93,94]. Columns represent classifications whereas rows contain information on taxonomic name and concept identity (via taxonomic concept labels, see column headers). Cell shadings indicate congruent multi-concept line- ages. Consecutive concept numbers (1-100) are reused in Figure 3 for the purpose of comparison. See text for further details. referred to as "BONAP (2014)", and Weakley The classifications of the Andro-UC include, in (2015) [94]. chronological sequence (Figures 2 and 3): The tabular representation of Figure 2 en- Hackel codes taxonomic congruence as a function of occupying the same row (width). For in- (1889) [50], Small (1933) [81], Blomquist stance, A. capillipes var. capillipes sec. Weak- (1948) [7], Hitchcock & Chase (1950) [51], ley (2015) (concept 90) == A. virginicus var. Radford et al. (1968) [71], abbreviated as glaucus subvar. glaucus sec. Hackel (1889) "RAB (1968)", Godfrey & Wooten (1979) [46], (concept 4). Hence articulations between con- Campbell (1983) [11], Campbell (2003) [12], cept pairs pertaining to different classifica- Weakley (2006) [93], Kartesz (2014) [57], tions can be 'read off' the table by tracing their cell positions and vertical extensions – encoded with unique shadings – across the taxonomic incongruences of the Andro-UC are respective columns. The consistent column due to differential subdivisions of entities depth across all classifications indicates that recognized by various authors within jointly the taxonomic boundaries of the 'complex' agreed-upon outer taxonomic confines. are congruent throughout. This means that all

Fig. 3. Hierarchical, multi-level representations of the 11 input classifications of the Andro-UC (see also Appendix 1). Taxonomic name and concept identities (numbered from 1-100) as in Figure 2. guide the source taxonomies, and are usually Another noteworthy aspect of the input rep- made explicit therein. For instance, the ac- resentation are higher-ranked entities (com- ceptance of two variety-level concepts A. pare Figures 2 and 3). These entities are not glomeratus var. hirsutior sec. Weakley (2006) depicted in Figure 2, because the table pro- (concept 76) and A. glomeratus var. glomera- vided by Weakley emphasizes congruence tus sec. Weakley (2006) (concept 77) (Figure among the narrowest concepts recognized in 2) implies recognition of the species-level each classification. However, these higher- concept A. glomeratus sec. Weakley (2006) level entities are implied by conventions that (concept 75) (Figure 3). Our representations fully account for the im- 2015/1889 (Figure 6C), representing the plied higher-level taxonomic concepts, yield- largest possible time interval. ing comprehensive alignments with up to In configuring the pairwise alignments, we four levels (Figure 3). Where necessary, we represent the later (younger) taxonomy as T2 have added nominal (type) taxonomic names and the earlier (older) taxonomy as T1 [34]. and concepts to represent comparable ranked Accordingly, the visualizations (Figures 1, 4- entities at all levels; e.g., Andropogon virgin- 6) show concepts unique to T2 as green rec- icus var. viridis sec. Hackel (1889) was added tangles, and concepts unique to T1 as yellow (concept 6) and is comparable to Andropogon octagons. Aligned regions with multiple con- virginicus var. glaucus sec. Hackel (1889) gruent concepts are shown as grey rectangles (concept 3) of the same rank and source clas- with rounded corners (Figure 1C). We use the sification. shorthand of [35] for taxonomic concept la- bels, where (e.g.) Andropogon virginicus var. 5.2. Input configuration, workflow execution, decipiens sec. Weakley (2015) becomes and reproducibility "2015.A_virginicus_var_decipiens". The 12 input files (.txt format) for the Andro-UC are The Euler/X toolkit is open source and avail- provided in Appendix 1. able at [15]. The software can be cloned and All alignments were obtained using "polyno- then deployed on a desktop using the com- mial encoding/possible world/reduced con- mand-line interface. An overview of the tainment graph" commands, which show toolkit's reasoning and visualization options overlapping articulations among input con- is available through the "help" command. Ad- cepts as blue dashed lines in the output visu- ditional software dependencies include Py- alizations [14,15,16]. The commands gener- thon, the Answer Set Programming reasoners ate the set of output MIR (.csv format) and DLV [26] and Potassco (Gringo, claspD) [40], GraphViz-rendered alignment visualizations and GraphViz [39]. (.pdf format). The input conventions for labeling concepts The sets of Maximally Informative Relations and representing parent/child (is_a) relation- (MIR) for each of the 12 alignments are pro- ships and articulations are in accordance with vided in Appendix 2. To ensure complete re- [33,34,35,36]. They are exemplified in Figure producibility, we have also prepared the 1A for the 1948/1933 alignment. We limit Andro-UC use case as an experiment our study to showing pairwise taxonomy at http://recomputation.org/ [42,43]. alignments (see also Discussion), and there- fore show outcomes for the following ten in- 6. Analyses of Name:Meaning Dissociation put configurations (Figures 4-6): 1933/1889 (Figure 4A), 1948/1933 (Figure 4B), Quantitative analyses of evolving taxonomic 1950/1948 (Figure 4C), 1968/1950 (Figure name:meaning identity are central to this use 4D), 1979/1968 (Figure 4E), 1983/1979 case. To this end we provide three comple- (Figure 5B), 2003/1983 (Figure 5C), mentary groups of results [35,37,44]. First, 2006/2003 (Figure 5D), 2014/2006 (Figure we add annotations to the alignment regions 6A), and 2015/2014 (Figure 6B). This strict (Figures 1C, 4-6), as follows. (1) For regions chronological sequence is supplemented with with multiple congruent concepts (==), we two alignments; i.e., (1) 1979/1950 (Figure add either the symbol "=" or "≠" to express 5A), which overcomes the gap in resolution that the corresponding taxonomic names are generated by the intermediate, coarse RAB identical (=), or not (≠). (2) For unique, non- (1968) classification that contains only one congruent regions (not ==), we supply a "+" if species-level concept (Figure 3); and (2) the corresponding taxonomic names are also unique in the alignment, or a "–" if the names <, ><, |] : ≠ combinations as instances of reli- occur redundantly (and thus must have mul- able names [R]: identical names denote con- tiple incongruent meanings). Accordingly, an gruence, and non-identical names denote in- abundance of "=" and "+" symbols indicates congruence. In contrast, the remaining com- that shared names symbolize congruent con- binations, either == : ≠ or [>, <, ><, |] : =, in- cepts, whereas unique names symbolize con- dicate unreliable names [UR]. cepts particular to only one classification. Third, we reinterpret the input displayed in Both types of symbols (=, +) may be viewed as Figure 2 to evaluate the performance of indicators of names performing reliably as names as concept identifiers over the entire identifiers of taxonomic concepts. We will 1889-2015 interval. We adopt Remsen's [74] refer to these as 'reliable names'. On the other notion of cardinality to address two ques- hand, "≠" and "–" symbols are indicative of tions. First, how many usages and meanings 'unreliable names', where nomenclatural sta- are associated with each of the 36 unique bility or change are dissociated from taxo- taxonomic names in the Andro-UC (Table 4)? nomic congruence or incongruence. We rec- For instance, a name:meaning cardinality of ord the frequency of each symbol for the 12 1:3 indicates that an identical name was used alignments to obtain the corresponding to- in (at least) three classifications, and associ- tals, percentages, and ratios of relia- ated with three reciprocally incongruent tax- ble:unreliable names (Table 1). The top-level onomic meanings. Second, how many (non- concept (root) of the complex (sec. auctorum) identical) names are associated with each of is excluded from the counts because of its un- the 21 congruent sets (or lineages) of taxo- derspecified nomenclatural identity. nomic meanings in the Andro-UC (Table 5)? Second, we compute simple name:meaning For instance, a name:meaning cardinality of identity analyses based on the output MIR 3:1 indicates that three non-identical names data (Tables 2 and 3). For each alignment, we were used to identify meanings (or meaning record the numbers of input concepts (T2, chains [60]) across classifications that are T1), input articulations (A), and MIR. The lat- taxonomically congruent. For the purpose of ter are partitioned according to each RCC-5 labeling the chains, we select the most recent articulation (Table 2). As above, MIR articu- (youngest) taxonomic concept label that an- lating root concepts are excluded. The quo- chors an instance of the chain, which extends tient of (1) the number of congruent articula- to congruent concepts in one or more preced- tions (==) in the alignment and (2) the num- ing classifications. An example is ber of input concepts in the concept-poorer 2014.A_capillipes (youngest concept label, taxonomy (either T2 or T1) reflect the relative used to label the chain) == congruence between two taxonomies [35]. If 1889.A_virginicus_var_glaucus (oldest con- the ratio approaches 100% then relative con- cept label used for an entity in the chain). gruence is high, possibly in spite of differ- In addition to showing the dynamics of name: ences in name usages. We furthermore re- meaning cardinality, Tables 4 and 5 indicate solve name:meaning identity of articulated how often certain names or meanings re- concept pairs, based on the following ten /appear in the Andro-UC, and whether their combinatorial categories (Table 3): (1) two occurrences are continuous or interrupted by articulated concepts in T2 and T1 are either intermediate classifications. taxonomically congruent (==) or not (>, <, ><, |) [five options]; and (2) their corresponding names are either identical (=) or not (≠) [two options]. We regard both the == : = and the [>, 7. Results differentiate units within the complex. Incon- gruences among these recent perspectives 7.1. Extent and origins of taxonomic are rooted mainly in disagreements on how to incongruence name and integrate low-level entities into parent concepts. Interestingly, Hackel (1889) Each of the 12 input configurations yields a already recognized seven low-level entities, single, consistent, and unambiguously re- and in that sense his classification is more solved alignment (Figures 4-6). The 12 visual- congruent with contemporary perspectives izations clearly illustrate that none of the (Figure 6C) than with those published in paired input taxonomies are entirely congru- 1933-1979. ent, instead showing 2-12 unique regions In addition to unequal granularity, five align- (compare Figures 5B and 5C), and an overall ments show overlapping (><) articulations, ratio of 56 congruent to 71 non-congruent most frequently between species-level con- regions (Table 1). While we cannot examine cepts. These relations are challenging to rep- each alignment in fine detail, we highlight se- resent without recourse to RCC-5, because lect phenomena that capture the extent and they fracture the hierarchical pattern of in- causes of taxonomic incongruence in the verse/proper concept inclusion from tip- to Andro-UC. One cause for incongruence is un- root-level concepts [34,35,36,37]. The result- equal granularity across classifications. For ing alignments become directed acyclical instance, at the lowest taxonomic level, classi- graphs. fications authored from 1933 to 1979 recog- The 1950/1948 alignment represents an in- nize 1-5 concepts, whereas taxonomies pub- teresting case of overlap (Figure 4C). Both lished outside of this interval accept 7-9 con- Hitchcock & Chase (1950) and Blomquist cepts (Figures 2 and 3). Such differences (1948) recognize three identically named cause the more finely resolving taxonomy to species-level concepts within in the complex, have one or more non-congruent (properly one of which is also taxonomically congru- included) low-level concepts in comparison ent (1950.A_capillipes == 1948.A_capillipes). to its counterpart (e.g., Figures 4A and 5B). The authors' other two species-level concepts For instance, alignments of any taxonomy to have four reciprocally overlapping articula- that of the most coarse-grained RAB (1968) tions. classification are only congruent with regards Of particular note is the articulation to the root-level concepts (Figures 4D and 1950.A_glomeratus >< 1948.A_glomeratus. 4E), given that RAB (1968) recognize no addi- Figure 2 illustrates that the overlap of these tional taxonomic subdivisions within the two concepts creates three alignment re- complex. In the context of its immediate pre- gions: (1) the region congruent with decessor and successor (Figures 2, 4D, 4E, 1950.A_virginicus_ var_hirsutior, (2) the re- and 5A), the 1968 classification appears dis- gion congruent with ruptive because the chain of taxonomic reso- 1948.A_virginicus_var_tenuispatheus, and (3) lution between Hitchcock & Chase (1950) and the region that actually constitutes the over- Godfrey & Wooten (1979) is not propagated lap. This latter region is not uniquely recog- in RAB (1968). nized in either input classification, and there- Taxonomies produced in 1983 or later show fore has no input name assigned to it. Yet oth- higher levels of congruence between their er authors, including Hackel (1889), have finest-degree entities (Figures 5C, 5D, and 6). recognized and named the region (e.g., By and large, taxonomists publishing in the 1889.A_macrourus_var_abbreviatus; see Fig- past 30 years have adopted Campbell's ure 2). (1983) perspective on how finely one should

Fig. 4. Visualizations for alignments 1-5 of the Andro-UC, 1889-1979. Representation conventions and annotations as in Figure 1C. (A) Small (1933) / Hackel 1889 alignment; (B) Blomquist (1948) / Small (1933) alignment; (C) Hitchcock & Chase (1950) / Blomquist (1948) alignment; (D) RAB (1968) / Hitchcock & Chase (1950) alignment; (E) Godfrey & Wooten (1979) / RAB (1968).

Generalizing the phenomenon exemplified in the 1950/1948 alignment, we observe that 7.2. Quantification of name:meaning overlap of two (or more) concepts creates dissociation combined concept regions for which there are no unique names in the respective input tax- Taxonomic names are reliable identifiers of onomies. Nevertheless, identifiers for these taxonomic in-/congruence for 77/127 alignment regions are required to express the (60.6%) of the regions present in the 12 extent to which the input concepts can be pairwise alignments of the Andro-UC (range: aligned with each other. The toolkit's "com- 38.5-83.3%) (Table 1). The highest ratios are bined concepts" command uniquely labels obtained for the 1968/1950 and 1979/1968 these regions (see [34]). alignments. The latter include no congruent Overall, occurrences of differential resolution regions, since every unique name also sym- and overlapping concepts in the Andro-UC bolizes a unique alignment region (Figures 4D result in pairwise alignments with 5-15 re- and 4E). The 5:13 ratio (38.5%) for the gions (Table 1). Taking the 12 alignments in 2015/1889 alignment (Figure 6C) is low as conjunction, 44.1% of the 127 inferred align- expected. In particular, 0/7 congruent con- ment regions are taxonomically congruent cept regions in this 126 year-spanning align- (range: 0.0-85.7%), leaving the remaining ment have reliable names; i.e., each of these 55.9% incongruent. This ratio of in- regions is labeled by two non-identical /congruence between paired taxonomies is names. However, taxonomic names in the the semantic basis of the dis-/agreements Andro-UC do not necessarily perform better that taxonomic names are suited to identify over short time intervals, or in alignments and track, though only up to a point, as we whose input taxonomies are closer to the analyze in the next section. present (2015). One example is the 2006/2003 alignment (Figure 5D), which has (603/824 = 78.9%) of the remaining name: an undesirable 6:9 ratio meaning relationships are in the | : ≠ catego- ry. Indeed, any | : = combination would con- (40.0%) of reliable:unreliable names. stitute erroneous nomenclatural practice, The 824 output MIR permit finer assessments where two reciprocally exclusive concepts of name:meaning dissociation (Table 3). Ac- have identical names and therefore presuma- cordingly, among all 60 instances of pairwise bly refer to the same type, which in turn taxonomic concept congruence (==) in the would mean that they are not mutually exclu- MIR, 33 also carry identical names (== : =), sive (unless homonymy is involved). No such whereas 27 pairings have non-identical name errors are evident in the Andro-UC. strings (== : ≠) (ratio 1.22:1). The majority

Fig. 5. Visualizations for alignments 6-9 of the Andro-UC, 1950-2006. Representation conventions and annotations as in Figure 1C. (A) Godfrey & Wooten (1979) / Hitchcock & Chase (1950); (B) Campbell (1983) / Godfrey & Wooten (1979); (C) Campbell (2003) / Campbell (1983); (D) Weakley (2006) / Campbell (2003). MIR, 16 refer to concept pairings with identi- Among the remaining 161 non-congruent cal names (2 > : =; 9 < : =; and 5 >< : =), and articulations (>, <, ><) in the set of output 145 have non-identical names. These types of occurrences of unreliable names appear in name:meaning categorization yield an the 1933/1889 (5 MIR), 2006/2003 (7 MIR), and 2015/1889 (10 MIR) alignments. overall ratio of 781:43 reliable:unreliable names for the entire Andro-UC. The highest

Fig. 6. Visualizations for alignments 10-12 of the Andro-UC, 1889-2015. Representation conventions and annotations as in Figure 1C. (A) BONAP (2014) / Weakley (2006); (B) Weakley (2015) / BONAP (2014); (C) Weakley (2015) / Hackel (1889). single treatments (Table 4). Cumulatively, the Quantification of name:meaning cardinality use case entails 88 taxonomic name usages over the 126-year period of the Andro-UC re- veals that 18/36 taxonomic names (50.0%) and 46 unique name:meaning combinations have been used in multiple treatments, (ratio: 1.91:1). Only one name – A. virginicus – whereas the other 18 names are particular to is used in every classification. Three addition- al names – i.e., A. capillipes, A. glomeratus, and them. Most, though not all, taxonomic names A. virginicus var. virginicus – are used in 6-8 of with 2-4 usages appear in temporally consec- the 11 input classifications. The other utive taxonomies (relation: 9/13). (32/36) names occur in less than half of

Table 1. Summary of taxonomic and nomenclatural identities of Euler regions across 12 alignment visualizations for the Andro-UC (see Figures 4-6). Columns show the number of aligned regions (excluding the congruent parent region), ratio of congruent (==) versus (not ==) unique regions, percentage of congruent regions (% ==), ratio of identical (=) versus different (≠) names occupying the con- gruent regions (= : ≠), ratio of unique (+) versus non-unique (–) names occupying unique regions (+ : –), and ratio and percentage of reliable versus unreliable names (see text for explanation). Totals are percentages are provided for the cumulative values across all alignments.

Alignment T2 / T1 Figure Re- == : not % == = : ≠ + : – Reliable : % Reliable gions1 == Not 1 1933 / 4A 10 3 : 7 30.0% 0 : 3 6 : 1 6 : 4 60.0% 1889 2 1948 / 4B 7 2 : 5 28.6% 1 : 1 2 : 3 3 : 4 42.9% 1933 3 1950 / 4C 9 3 : 6 33.3% 3 : 0 2 : 4 5 : 4 55.6% 1948 4 1968 / 4D 6 0 : 6 0.0% 0 : 0 5 : 1 5 : 1 83.3% 1950 5 1979 / 4E 5 0 : 5 0.0% 0 : 0 4 : 1 4 : 1 80.0% 1968 6 1979 / 5A 8 3 : 5 37.5% 2 : 1 3 : 2 5 : 3 62.5% 1950 7 1983 / 5B 15 3 : 12 20.0% 1 : 2 10 : 2 11 : 4 73.3% 1979 8 2003 / 5C 14 12 : 2 85.7% 11 : 1 0 : 2 11 : 3 78.6% 1983 9 2006 / 5D 15 9 : 6 60.0% 4 : 5 2 : 4 6 : 9 40.0% 2003 10 2014 / 6A 12 8 : 4 66.7% 6 : 2 2 : 2 8 : 4 66.7% 2006 11 2015 / 6B 13 6 : 7 46.2% 4 : 2 4 : 3 8 : 5 61.5% 2014 12 2015 / 6C 13 7 : 6 53.8% 0 : 7 5 : 1 5 : 8 38.5% 1889 Totals – – 127 56 : 71 44.1% 32 : 24 45 : 26 77 : 50 60.6%

1 Number of aligned regions excludes the root/parent region ("Andropogon" sec. auctorum) whose name is held constant throughout.

Table 2. Summary of numbers of input concepts (T 2 / T1) and input articulations (A) for the 12 alignments of the Andro-UC, and of the Maximally Informative Relations (MIR), including totals and partitions according to each type of RCC-5 articulation. Legend: Rel. == signifies relative congruence, i.e., the ratio of the number of == articulations in the alignment divided by the number of input concepts in the concept-poorer taxonomy (either T2 or T 1). The total for the relative congruence column shows the average percentage.

Alignment Concepts Concepts Articula- MIR1 == > < >< | Rel. == T2 T1 tions 1 4 12 8 33 (48) 4 6 2 0 21 100% 2 7 4 6 18 (28) 2 1 3 1 11 50.0% 3 7 7 7 36 (49) 3 3 3 4 23 42.9% 4 2 7 1 6 (14) 0 6 0 0 0 0.0% 5 7 2 1 5 (12) 0 0 5 0 0 0.0% 6 6 7 6 30 (42) 3 5 2 2 18 50.0% 7 14 6 10 65 (84) 3 4 15 2 41 50.0% 8 14 14 10 169 (196) 12 15 17 0 125 85.7% 9 12 14 10 143 (168) 9 10 13 0 111 75.0% 10 10 12 9 99 (120) 8 6 4 1 80 80.0% 11 12 10 10 99 (120) 7 2 10 0 80 70.0% 12 12 12 10 121 (144) 9 0 19 0 93 75.0% Totals 106 107 88 824 (1025) 60 58 93 10 603 56.6%

1 Number in parentheses includes all MIR that articulate the root/parent region ("Andropogon" sec. auctorum) which are otherwise excluded from the counts.

The most frequently used name – A. virgin- of the spectrum, five concepts display globally icus – also has the highest number of incon- unique meanings that whose meanings are gruent taxonomic meanings, with a unique to one classification (two authored in name:meaning cardinality of 1:6 (Table 4). Six 1979; and one in 1968, 1950, and 1948, re- consecutive classifications authored in in spectively). 1933-1983 all propagate incongruent mean- At the other end of the length spectrum, there ings of "A. virginicus" (Figure 2). Only three are five additional names have more than one mean- concepts whose meanings are unique to one ing in the Andro-UC; viz. A. capillipes classification (two authored in 1979; and one (name:meaning cardinality = 1:2), A. glomera- in 1968, 1950, and 1948, respectively). tus (1:4) and A. virginicus var. virginicus (1:2). The least favorable name:meaning cardinality The 88 name usages in the Andro-UC corre- among the 21 chains 4:1; meaning that four spond to 21 chains of taxonomically congru- non-identical names are used to identify sets ent concepts (Table 5). Of these, the chain of taxonomically congruent concepts. This symbolized by 2015.A_glaucopsis (most re- ratio applies to six concept chains: cent member) is the longest, with elements 2015.A_capillipes, 2015.A_dealbatus, appearing in 9/11 classifications and under 2015.A_virginicus, 2015.A_glaucopsis, 2015. four non-identical names. Other long chains A_hirsutior, and 2015.A_tenuispatheus. Con- include 2015.A_virginicus (8 usages/4 non- versely, a cardinality of 1:1 is obtained in identical names), 2014.A_capillipes (8/2), 9/21 chains, of which only four have more 2015.A_hirsutior (7/4), and than one usage (Table 5). 2015.A_tenuispatheus (7/4). At the other end

Table 3. Analysis of taxonomic name:meaning relationships in the 12 alignments of the Andro-UC, based on the 824 Maximally In- formative Relations (MIR), and including assessments of reliable names [R] and unreliable names [UR]. Legend: == : = congruent con- cepts, identical name(s) [R]; == : ≠ congruent concepts, non-identical names [UR]; > : = T2 concept more inclusive, identical name(s) [UR]; > : ≠ T2 concept more inclusive, non-identical names [R]; < : = T 2 concept less inclusive, identical name(s) [UR]; < : ≠ T2 con- cept less inclusive, non-identical names [R]; >< : = T2 and T 1 concepts overlapping, identical name(s) [UR]; >< : ≠ T2 and T 1 concepts overlapping, non-identical names [R]; | : = T2 and T 1 exclusive of each other, identical name(s) [UR]; | : ≠ T2 and T 1 exclusive of each other, non-identical names [R]; Total R = total of all reliable [R] name pairings per alignment; Total UR = total of all unreliable [UR] name pairings per alignment.

Alignment == : = == : ≠ > : = > : ≠ < : = < : ≠ >< : = >< : ≠ | : = | : ≠ Total R Total UR 1 0 4 0 6 1 1 0 0 0 21 28 5 2 1 1 1 0 1 2 0 1 0 11 15 3 3 3 0 0 3 0 3 2 2 0 23 34 2 4 0 0 1 5 0 0 0 0 0 0 5 1 5 0 0 0 0 1 4 0 0 0 0 4 1 6 2 1 0 5 0 2 1 1 0 18 28 2 7 1 2 0 4 0 15 1 1 0 41 62 3 8 11 1 0 15 1 16 0 0 0 125 167 2 9 4 5 0 10 2 11 0 0 0 111 136 7 10 6 2 0 6 0 4 1 0 0 80 96 3 11 5 2 0 2 2 8 0 0 0 80 95 4 12 0 9 0 0 1 18 0 0 0 93 111 10 Totals 33 27 2 56 9 84 5 5 0 603 781 43

Table 4. Analysis of name:meaning cardinality for the entire Andro-UC, based on 88 name usages of 36 taxonomic names correspond- ing to 46 unique (sets of) taxonomic meanings. Cell values indicate (1) that the name is used and (2) which of the 1-n meanings is symbolized by the name in the corresponding classification. Names are ordered according to their frequency of use in the 12 classifica- tions. Non-congruent (sets of) meanings associated with each name are numbered in reverse chronological order, i.e., starting with the 2015 taxonomy. See also Figure 1.

# Taxonomic 188 193 194 195 196 197 198 200 200 201 201 Usag- Mean- name 9 3 8 0 8 9 3 3 6 4 5 es ings 1 A. virginicus 2 1 6 5 4 3 2 2 1 1 1 11 6 2 A. glomera- 4 3 2 4 4 3 2 1 8 4 tus 3 A. capillipes 2 2 2 2 2 2 1 7 2 4 A. virginicus var. virgin- 2 2 2 2 1 1 1 6 2 icus 5 A. glomera- tus 1 1 1 1 1 5 1 var. glom- eratus 6 A. glaucopsis 1 1 1 1 4 1 7 A. virginicus var. decipi- 1 1 1 1 4 1 ens 8 A. glomera- tus 1 1 1 3 1 var. hirsu- tior 9 A. glomera- tus 1 1 1 3 1 var. pu- milus 1 A. virginicus 1 0 var. glau- 1 1 3 1 cus 1 A. glomera- 1 tus 1 1 2 1 var. glau- copsis 1 A. hirsutior 1 1 2 1 2 1 A. tenu- 1 1 2 1 3 ispatheus

Table 4. Continued.

# Taxonomic name 188 193 194 195 196 197 198 200 200 201 201 Usag- Mean- 9 3 8 0 8 9 3 3 6 4 5 es ings 1 A. virginicus 1 1 2 1 4 var. glaucopsis 1 A. virginicus 5 var. glaucus 1 1 2 1 "drylands vari- ant" 1 A. virginicus 6 var. glaucus 1 1 2 1 "wetlands vari- ant" 1 A. virginicus 7 var. virginicus 1 1 2 1 "old-field variant" 1 A. virginicus 8 var. virginicus 1 1 2 1 "smooth variant" 1 A. capillipes 9 "drylands vari- 1 1 1 ant" 2 A. capillipes 0 "wetlands vari- 1 1 1 ant" 2 A. dealbatus 1 1 1 1 2 A. macrourus 1 1 1 2 2 A. macrourus 1 1 1 3 var. abbreviatus 2 A. macrourus 1 1 1 4 var. genuinus 2 A. macrourus 1 1 1 5 var. glaucopsis 2 A. macrourus 1 1 1 6 var. hirsutior 2 A. virginicus 1 1 1 7 "old-field variant" 2 A. virginicus 1 1 1 8 "smooth variant" 2 A. virginicus 1 1 1 9 var. abbreviatus A. virginicus 3 var. glaucus 1 1 1 0 subvar. dealbatus A. virginicus 3 var. glaucus 1 1 1 1 subvar. glaucus 3 A. virginicus 1 1 1 2 var. hirsutior 3 A. virginicus 1 1 1 3 var. tenuispatheus 3 A. virginicus 4 var. virginicus 1 1 1 "deceptive vari- ant" 3 A. virginicus 1 1 1 5 var. viridis 3 A. virginicus 6 var. viridis 1 1 1 subvar. genuinus Concepts per taxonomy 11 3 6 6 1 5 13 13 11 9 11 88 46 / Cumulative totals

The information shown in Tables 4 and 5 chains (87.5%) with multiple elements are provides an intuitive sense of how taxonomic 'interrupted'. names fare in the longer term as identifiers of In spite of persistent taxonomic meaning evo- taxonomy meanings in the Andro-UC. The lution, identifiers could nevertheless (in prin- performance of names should be evaluated in ciple) be designed to achieve a name:meaning the context of taxonomic stability. High taxo- cardinality of 1:1. In that case taxonomic nomic stability would be reflected in an names would simultaneously show a score of abundance of occupied cells in Table 5, be- 1 in the "Meanings" column of Table 4 and a cause early-authored concepts would have score of 1 in the "Names" column of Table 5. congruent successors – with either identical Thirty-two names meet the former condition, or non-identical names – in the 1889-2015 and nine names meet the latter condition. time interval. This is not the case: only However, the intersection of these two sets of 85/231 cells (36.8%) have values, and 14/16 includes only one name – A. virginicus var. abbreviatus. This name, used exclusively in greater than 1 across the columns of Tables 4 Godfrey & Wooten (1979), is the only identi- and 5. fier that requires neither the "sec." annotation In summary, even though names used in the nor an articulation with RCC-5 to reliably Andro-UC act as identifiers of meanings with identify its associated meaning in the entire reliability ratios of 56.6% or higher in the lo- Andro-UC. The remaining 35/36 names are cal, pairwise alignments (Tables 1-3), their syntactically and/or semantically 'compro- global reliability is such that > 97.2% diverge mised', showing name:meaning cardinality from an ideal name:meaning cardinality of relations other than 1:1. This outcome is also 1:1. This assessment remains adequate even reflected in the accumulation of numbers if taxonomic change is taken into account.

Table 5. Analysis of taxonomic name:meaning cardinality for the entire Andro-UC, based on 85 occurrences of concepts ("members") that participate in 21 congruent concept chains, where individual chains are labeled with 1-4 taxonomic names. Cell values indicate (1) that the concept is an element of the chain and (2) which of the 1-n names is used to symbolize the member in the corresponding clas- sification. Each of the 21 chains is labeled by its most recent member, and concept lineages are ordered accordingly. Non-identical (sets of) names associated with each chain are numbered in reverse chronological order, i.e., starting with their name in the 2015 taxonomy. See also Figure 1 and Table 4.

# Concept chain label 188 193 194 195 196 197 198 200 200 201 201 Mem- Names 9 3 8 0 8 9 3 3 6 4 5 bers 1 2015.A_capillipes 4 3 3 2 1 5 4 2 2015.A_dealbatus 4 3 3 2 1 5 4 3 2015.A_virginicus 3 (4) 1 2 2 2 2 1 1 1 8 3 (4) 4 2015.A_virginicus 2 2 1 3 2 _OldFieldVariant 5 2015.A_virginicus 2 2 1 3 2 _SmoothVariant 6 2015.A_virginicus 2 1 1 1 1 5 2 _var_decipiens 7 2015.A_glaucopsis 4 3 3 1 2 2 1 1 1 9 4 8 2015.A_hirsutior 4 3 2 2 2 1 1 7 4 9 2015.A_glomeratus 1 2 1 1 1 1 6 2 (3) _var_glomeratus (2) 10 2015.A_tenuispathe 4 3 2 2 1 2 1 7 4 us 11 2014.A_capillipes 2 1 1 1 1 2 2 1 1 8 2 12 2014.A_virginicus 1 1 1 3 1 _var_virginicus 13 2014.A_glomeratus 1 1 2 1 14 2006.A_glomeratus 1 1 2 1 15 2003.A_virginicus 1 1 1 3 1 16 2003.A_glomeratus 2 1 1 1 4 2 17 1979.A_virginicus 1 1 1 18 1979.A_virginicus 1 1 1 _var_abbreviatus 19 1968.A_virginicus 1 1 1 20 1950.A_virginicus 1 1 1 21 1948.A_virginicus 1 1 1 Concepts per taxonomy / 11 3 6 6 1 5 13 13 11 9 11 85 44 (46) Cumulative totals

8. Discussion sizing on new insights gained from our repre- sentation and reasoning approach. We also We focus our discussion on the performance assess the relevance of the RCC-5 multi- of names as identifiers of taxonomic concepts, taxonomy alignment approach for wider ap- empha- plication (with scalability implications) in the biodiversity data realm, and potential appli- solved into proper inclusion at lower levels. cations to other semantic integration tasks. For instance, the 2014/2006 alignment (Fig- ure 6A) shows the articulation 8.1. New knowledge products 2014.A_glomeratus >< 2006.A_glomeratus at the level directly following the root. At the What aspects of our approach are new and next lower level, the overlap is resolved: both valuable? The Andro-UC illustrates the unique concepts share congruent subelements, and ability of RCC-5 multi-taxonomic alignments each also has additional subelements that are to resolve taxonomic meaning evolution at unique to it as a parent-level concept, but more granular levels than is possible using nevertheless congruent with another child- taxonomic names and nomenclatural rela- level concept in the alternative classification tionships [70]. This follows directly from the (and with non-identical names and ranks). input information – we can only represent Hence biological data identified at this lower and align the entities shown in Figure 1 if level can be fully integrated across the two taxonomic concept labels and RCC-5 articula- very disparate taxonomic perspectives. In tions are used. Critically, the approach re- summary, the reasoner-inferred MIR provide quires an initial set of articulations provided new, human- and machine-interpretable in- by human (expert) users, and grounded in formation that can drive the integration of their assessments of pertinent taxonomic ev- biological data linked to different, aligned idence, that satisfy criteria of consistency and concept taxonomies with more precision and (lack of) ambiguity to yield well-specified reliability than taxonomic names alone alignments. Compliance with these criteria is [6,30,34,35,44]. achieved by the interactive toolkit workflow The alignment visualizations are logically [34]. congruent with the output MIR [14,16,24,87]. New knowledge products for the Andro-UC Their unique value lies in aiding human users include the output MIR, alignment visualiza- to understand multi-concept relationships tions, and name:meaning cardinality analyses. through tree-like representations. Visualiza- Through the reasoning process, the set of 88 tion tools for multi-taxonomy relationships user-provided input articulations is logically have advanced significantly over the past 20 tested and augmented to yield 824 Maximally years [3,5,16,47,48,98]. Nevertheless, the Eu- Informative Relations (1025 MIR if the root ler/X toolkit is the first platform to leverage concepts are included). The MIR derived for RCC-5 relationships and logic reasoning to each alignment can be queried to determine yield comprehensive, tree-like alignment vis- whether any concept pairs (and ancillary bio- ualizations. logical data) are suitable for integration, or The visualizations communicate uniquely not [35,44,54,85,86]. In particular, articula- valuable information. For instance, Figures 1- tions of congruence ("yes, integrate") and ex- 3 all show information related to the clusion ("no") between two concepts are re- 1948/1933 alignment. Figure 2 effectively ciprocally actionable in this context. Proper visualizes the lowest-level entities and articu- inclusion and inverse proper inclusion are lations of the entire Andro-UC, but is not well least unilaterally actionable without ambigui- suited for input taxonomies nested into three ty ("add data assigned to the less inclusive or more ranks (or phylogenetic levels). Such concept to those of the more inclusive one"). tables are 'flattened' into two dimensions. Overlap is the most challenging articulation Figure 3, in turn, can shows all nested entities for the purpose of merging ancillary infor- for the individual 1948/1933 taxonomies, but mation. However, in some instances overlap does not provide accurate multi-concept at higher levels in an alignment can be re- alignment information. Using names to navi- gate across these trees may lead to erroneous complex, but no more complicated than that conclusions such as 1948.A_virginicus | of many other continuously revised groups 1933.A_glomeratus, when the proper articu- [1,21,33,37,44,68,77,89]. The poor perfor- lation is ><. Although they are commonly mance of names as identifiers of divergent used to convey information about taxonomic taxonomic meanings is not exceptional for the stability and change [8,82,95], collections of field (herein broadly defined to include phy- unaligned tree graphs have limited resolution logenetics). power in absence of RCC-5 alignment infor- The problem of name:meaning dissociation in mation. biological taxonomy is systemic. It is rooted In contrast, the alignment visualizations (Fig- in Code-mandated principles that promote ure 1) simultaneously communicate infor- stability and change in naming (largely) as a mation about nomenclatural identity, multi- function of nomenclatural type identity and level tree hierarchy, and multi-tree in- priority. To some degree the inadequacies are /congruence. Their interpretation is intuitive; manageable through social processes, includ- for instance, the proportion and position of ing conservative re-/naming practices or grey squares versus green rectangles or yel- 'standardized' taxonomies [6,45,56,79,91]. In low octagons communicate the extent and practice, the long-term drawbacks of using localization of taxonomic in-/congruence in taxonomic names as concept identifiers are an alignment (compare, e.g., Figures 5B and frequently mitigated by the ability of well- 5C). The relative occurrences of (=, ≠, +, –) trained human scientists to contextualize annotations show the degree to which taxo- name usages and thereby infer the intended nomic names can reliably integrate taxonom- meanings [31,33,69]. However, no counter- ic meanings. acting human practice can alter the insight that taxonomic names and nomenclatural re- 8.2. Building better identifiers for biodiversity lationships are fundamentally not designed to data track granular similarities and differences in taxonomic meaning of the sort exemplified in How relevant is our representation approach the Andro-UC. Computer algorithms in par- to the broader, semantics-facilitated biodi- ticular struggle with inferring what "Andro- versity data realm? Multiple reviewers raised pogon virginicus" means 'in the currently in- this important question. We think that it is tended context' [62], i.e., when the relevant too early to attempt a comprehensive answer. context of the name usage is not made explic- Technical, scientific, cognitive-evolutionary, it. Something beyond the type-anchored and socio-cultural constraints affect how name identity is needed if taxonomic perspec- identifier granularity is managed in the bio- tives are to be translated into entities fit for logical domain. Predicting how the RCC-5 use in open-ended, semantics-enabled infor- alignment approach will fare in light of these mation environments. trade-offs is beyond the scope of this analysis. Specifying the referential extension of taxo- We can, however, assess the particulars and nomic names for reliable reuse requires more generalities of the Andro-UC, and what it than ostension (the act of pointing) to exem- teaches us about identifying and linking taxo- plars (types). Ostensive definitions of taxo- nomically identified information in open- nomic meanings are bound to under-specify ended biodiversity data environments. the intended meanings in many applied con- The scale of the Andro-UC is small. Weakley's texts, such as those of the Andro-UC. Instead (2015) classification recognizes seven spe- it is more appropriate to model the name-to- cies-level concepts – more than any other (currently-perceived-)taxon linkage as a mat- taxonomy. The taxonomic history is evidently ter of theory construction [32,75]. The chal- lenge of integrating biodiversity data then development should concentrate on modular- becomes one of aligning multiple taxonomic izing the reasoning process, specifically by theories, which can be modeled with the RCC- using a divide-and-conquer approach that 5 approach. better leverages the hierarchical structure of The aforementioned insufficiencies are most the input constraints and dynamic us- apparent in cases of multi-concept overlap. er/reasoner interactions. Demonstrating the Such cases are frequent in taxonomy, and practical value of the approach requires mak- they cannot be reduced to the differences in ing the toolkit accessible to larger use cases degree of resolution [27,65,68,77]. As an ex- and biodiversity data environment where ample, the 1950/1948 classifications of the taxonomy evolution is an important variable Andro-UC concur that there are three identi- to identify and control. cally named species-level concepts entailed in The analysis of the Andro-UC demonstrates the complex (Figure 4C). They also concur the potential of reasoning about taxonomies that 1950/1948.A_virginicus has three varie- and at the same time leaves much room for ty-level child concepts. However, they disa- further work. In particular, the 11 input clas- gree on the extent to which the available, sifications allow for 55 pairwise comparisons, type-anchored names reach out to perceived, of which only 12 are presented here. This and necessarily more inclusive, taxa pre- omission is deliberate. New toolkit releases sumed (more precisely: theorized) to consti- will have the ability to align more than two tute natural, evolutionary entities [9]. As a input taxonomies simultaneously (but remain consequence of this differential inference of in development). Such an approach entails 'extra-typical' taxonomic boundaries, the four new reasoning challenges and products. For 1950/1948 species-level concepts overlap in instance, we could ask to what extent 12 complex ways (Figure 4C). Such multi-theory alignments produced in the current study are overlap is more frequent at higher taxonomic sufficient for recovering the full set of 55 levels, where the performance of names as pairwise alignments, based on transitive rea- identifiers of taxonomic meanings becomes soning. Solutions to such challenges are rele- increasingly poor [34,35,37]. vant to the issue of scalability, and can inform The herein demonstrated alignment approach the users' practice of engaging with the paves the way for building better taxonomic toolkit. concept identifiers and multi-taxonomy reso- Pathways to broader implementation should lution services. focus on directly integrating the use of taxo- nomic concept labels, parent/child relation- 8.3. Scalability of the RCC-5 alignment ships, RCC-5 articulations, and reasoning and approach visualization services into prominent biodi- versity data platforms [2,36,37,49,58,60]. We How widely applicable (or scalable) is the envision information environments where RCC-5 multi-taxonomy alignment approach identifications of organismal occurrence rec- within the field of biological taxonomy? Gen- ords are augmented to the level of carrying erally speaking, reasoning about taxonomies taxonomic concept labels [54]. The circum- with RCC-5 remains in its infancy scriptions of the respective concepts are also [16,34,35,44,87]. At present, the Euler/X managed in the platform, and consistent, toolkit can effectively process consistent, well-specified RCC-5 alignments are provid- well-specified, pairwise input taxonomies ed. Building such an infrastructure would with up to 200-400 concepts each [14,15,16]. permit biologically significant queries of the While this scale is sufficient for small- to me- following types. (1) Return all records identi- dium-sized alignment use cases, future toolkit fied to the name Andropogon virginicus (op- tionally, with synonyms or algorithmically omy. Explorations of the toolkit's perfor- matched alternative spellings). This query mance in relation to other integration chal- type corresponds to the current capability of lenges is generally recommendable if the new many environments [10,70,73]. (2) Return all focal domain shares several of the toolkit's records identified to the taxonomic concept critical (taxonomic) input/output constraints label Andropogon virginicus sec. Weakley [87]. This means that other semantic integra- (2015) and, alternatively, Andropogon virgin- tion challenges that need to consistently align icus sec. RAB (1968). Show the corresponding and visualize multiple, hierarchically struc- occurrence-based distribution maps (less in- tured sets of concepts with coverage and/or clusive concept [2015] – small set of records; disjoint siblings constraints may benefit from more inclusive concept [1968] – large set of exploring the RCC-5 alignment approach. records). (3) Translate all occurrence records Our approach can be complemented by Se- originally identified to Weakley (2015)- mantic Web methods that reason over con- endorsed concept labels into the correspond- cept similarity and drift by leveraging Natural ing BONAP (2014)-endorsed concept labels Language Processing techniques and rela- (see Figure 6B). (4) Highlight 'problem rec- tionships defined in OWL-DL ontologies ords' identifiable to multiple non-congruent [13,23,28,38,73,80,92]. Such complementary concepts in the set of aligned classification analyses of concept identity and semantic standards used for identifications. (5) Show evolution are now possible. records in this target region as identified ac- cording to the most, or least, granular con- Appendices cept-level taxonomy. (6) For any set of rec- Appendix 1. Set of 13 Euler/X toolkit input data files for all ords (and related biological data) identified to alignments produced in the Andro-UC (Figures 1, 4-6). Each file any pair of taxonomic concept labels, assess is saved is saved in.txt format and contains annotations and whether the records and data can be re- instructions for run commands to yield the alignments and visualizations shown in the corresponding figures. trieved and integrated based on the reasoner- inferred MIR. Appendix 2. Set of 13 Euler/X toolkit output Maximally In- The above queries (2)-(6) are biologically formative Relations (MIR) for the input data files provided in Appendix 1. Each output file is saved in .csv format. The MIR significant and depend on utilizing the RCC-5 files form the basis for analyses of name:meaning relations alignment approach to achieve the desired (Tables 2 and 3). degree of resolution. Such logic-enabled inte- gration services are urgently needed in our References assessment to build open-ended biodiversity [1] J. Alroy. How many named species are valid? Proceedings data environments that can manage the com- of the National Academy of Sciences of the United States of plexities of evolving taxonomic knowledge. America, 99(6), 3706–3711, 2002. doi: Strengths of the RCC-5 approach in this con- 10.1073/pnas.062691099. [2] S.J. Baskauf S.J., and C.O. Webb. Darwin-SW: Darwin Core- text include explicitness, consistency, ma- based terms for expressing biodiversity data as RDF. Se- chine-interpretability, and flexibility in pro- mantic Web – Interoperability, Usability, Applicability - Special Issue on Semantics for Biodiversity, 2015. (in cessing diverse forms of taxonomic concept press) input ranging from minimally structured lists [3] J.H. Beach, S. Pramanik, and J.H. Beaman. Hierarchic taxo- of taxonomic concept labels to phylogenies nomic databases. In: R. Fortuner, editor, Advances in Com- puter Methods for Systematic Biology: Artificial Intelligence, and monographic revisions [8,34,35,37,54]. Databases, Computer Vision, pages 241–256. John Hopkins University Press, Baltimore, 1993. 8.4. Non-taxonomic alignment challenges [4] B. Bennett. Spatial reasoning with propositional logics. In Principles of Knowledge Representation and Reasoning: Pro- ceedings of the 4th International Conference (KR94), pages The RCC-5 alignment approach has so far 51–62. Morgan Kaufmann, San Francisco, 1994. been limited to use cases in biological taxon- [5] W.G. Berendsohn. The concept of "potential taxa" in data- in Databases, 1(4), 379–474, 2009. doi: bases. Taxon, 44(2), 207–212, 1995. doi: 10.1561/1900000006. 10.2307/1222443. [20] W.D. Clayton, M.S. Vorontsova, K.T. Harman, and H. [6] W.G. Berendsohn, and M. Geoffroy. Networking taxonomic Williamson. GrassBase – The Online World Grass Flora. concepts – uniting without 'unitary-ism'. In G. Curry, and C. Royal Botanic Gardens, Kew, 2013. Humphries, editors, Biodiversity Databases – Techniques, http://www.kew.org/data/grasses-db/ Politics, and Applications. Systematics Association Special [21] M.J. Costello, P. Bouchet, G. Boxshall, K. Fauchald, D. Volume 73, pages 13–22. CRC Taylor & Francis, Baton Gordon, B.W. Hoeksema, G.C.B. Poore, R.W.M. van Soest, S. Rouge, 2007. doi: 10.1201/9781439832547.ch3. Stöhr, T.C. Walter, B. Vanhoorne, W. Decock, and W. Appel- [7] H.L. Blomquist. The Grasses of North Carolina. Duke Uni- tans. Global coordination and standardisation in marine versity Press, Durham, 1948. biodiversity through the World Register of Marine Species [8] L. Bocak, C. Barton, A. Crampton-Platt, D. Chesters, D. (WoRMS) and related databases. PLoS ONE, 8(1), e51629, Ahrens, and A.P. Vogler. Building the Coleoptera tree-of-life 2013. doi: 10.1371/journal.pone.0051629. for >8000 species: composition of public DNA data and fit [22] B. Cuenca Grau, Z. Dragisic, K. Eckert, J. Euzenat, A. with Linnaean classification. Systematic Entomology, Ferrara, R. Granada, V. Ivanova, E. Jiménez-Ruiz, A.O. 39(1), 97–110, 2014. doi: 10.1111/syen.12037. Kempf, P. Lambrix, A. Nikolov, H. Paulheim, D. Ritze, F [9] R. Boyd. Homeostasis, species, and higher taxa. In R.A. Scharffe, P. Shvaiko, C. Trojahn, and O. Zamazal. Results of Wilson, editor, Species: New Interdisciplinary Essays, pages the Ontology Alignment Evaluation Initiative 2013. In Pro- 141–185. Bradford Book, MIT Press, Cambridge, 1999. ceedings of the 8th ISWC workshop on Ontology Matching [10] B. Boyle, N. Hopkins, Z. Lu, J.A. Raygoza Garay, D. (OM), October 2013, Sydney, Australia, pages 61–100, 2013. Mozzherin, T. Rees, N. Matasci, M.L. Narro, W.H. Piel, S.J. [23] H. Cui. CharaParser for fine-grained semantic anno- Mckay, S. Lowry, C. Freeland, R.K. Peet, and B.J. Enquist. tation of organism morphological descriptions. Journal of The taxonomic name resolution service: an online tool for the American Society for Information Science and Tech- automated standardization of names. BMC Bioinfor- nology, 63(4), 738–754, 2012. doi: 10.1002/asi.22618. matics, 14:16, 2013. doi: 10.1186/1471-2105-14-16. [24] T.N. Dang, N.M. Franz, B. Ludäscher and A.G. Forbes. [11] C.S. Campbell. Systematics of the Andropogon virgin- ProvenanceMatrix: a visualization tool for multi-taxonomy icus complex (Gramineae). Journal of the Arnold Arbore- alignments. In: V. Ivanova, P. Lambrix, S. Lohmann, and C. tum, 64(2), 171–254, 1983. Pesquita, editors, Voila!2015. Proceedings of the Interna- [12] C.S. Campbell. Andropogon. In M.E. Barkworth, K.M. tional Workshop on Visualizations and User Interfaces for Capels, S.Long, and M.B. Piep. editors, Flora of North Ameri- Ontologies and Linked Data, Bethlehem, Pennsylvania, USA, ca, Volume 25, 2015. October 11, 2015. CEUR Workshop Proceedings, 1456, 13– http://herbarium.usu.edu/webmanual 24, 2015. [13] R. Chawuthai, H. Takeda, V. Wuwongse, and U. Jinbo. [25] S.B. Davidson, S.C. Boulakia, A. Eyal, B. Ludäscher, A logical model for taxonomic concepts for expanding T.M. McPhillips, S. Bowers, M.K. Anand, and J. Freire. Prov- knowledge using Linked Open Data. In P. Larmande, E. Ar- enance in scientific workflow systems. IEEE Data Engineer- naud, I. Mougenot, C. Jonquet, T. Libourel, and M. Ruiz, edi- ing Bulletin, 30(4), 44–50, 2007. doi: tors, S4BioDiv 2013, Semantics for Biodiversity – Proceed- 10.1145/1376616.1376772. ings of the First International Workshop on Semantics for [26] DLVSYSTEM, 2014. http://www.dlvsystem.com/ Biodiversity, Montpellier, France, May 27, 2013, pages 1–8. [27] J. Endersby. Lumpers and splitters: Darwin, Hooker, CEUR Workshop Proceedings, Vol-797, 2013. and the search for order. Science, 326(5959), 1496–1499, [14] M. Chen. Query Optimization and Taxonomy Integra- 2009. doi: 10.1126/science.1165915. tion. Ph.D. Dissertation, University of California at Davis, [28] J. Euzenat, and P. Shvaiko. Ontology Matching, Sec- Davis, CA, 2014. ond Edition. Springer, New York, 2013. doi: 10.1007/978- [15] B. Ludäscher, M. Chen, S. Yu, P. Kianmajd, S. Bowers, 3-642-38721-0. N. Franz, and T. McPhillips. Euler Project – Reasoning Over [29] P.L. Farber. The type concept in zoology in the first Taxonomies, 2015. https://github.com/EulerProject half of the [16] M. Chen, S. Yu, N. Franz, S. Bowers, and B. Ludäscher. nineteenth century. Journal of the History of Biology, 9(1), Euler/X: a toolkit for logic-based taxonomy integration. 93–119, 1976. arXiv:1402.1992 [cs.LO], 2014. [30] N.M. Franz. On the lack of good scientific reasons for [17] M. Chen, S. Yu, N. Franz, S. Bowers, and B. Ludäscher. the growing phylogeny/classification gap. Cladistics, 21(5), A hybrid diagnosis approach combining black-box and 495–500, 2005. doi: 10.1111/j.1096-0031.2005.00080.x. white-box reasoning. In: A. Bikakis, P. Fodor, and D. Ro- [31] N.M. Franz. Letter to Linnaeus. In S. Knapp, and Q.D. man, editors, Rules on the Web: from Theory to Applications, Wheeler, editors, Letters to Linnaeus, pages 63–74. Linnean Proceedings of the 8th International Sympoium, RuleML Society of London, London, 2009. 2014. Lecture Notes in Computer Science 8620: 127–141, [32] N.M. Franz. Anatomy of a cladistic analysis. Cladis- 2014. doi: 10.1007/978-3-319-09870-8_9. tics, 30(3), 294–321, 2014. doi: 10.1111/cla.12042. [18] M. Chen, M., S. Yu., P. Kianmajd, N. Franz, S. Bowers, [33] N.M. Franz, and J. Cardona-Duque. Description of and B. Ludäscher. Provenance for explaining taxonomy two new species and phylogenetic reassessment of alignments. In 5th International Provenance and Annotation Perelleschus Wibmer & O'Brien, 1986 (Coleoptera: Curculi- Workshop, June 09-13, 2014, Cologne, IPAW '14. Lecture onidae), with a complete taxonomic concept history of Notes in Computer Science 8628: 258–260, 2015. doi: Perelleschus sec. Franz & Cardona-Duque, 2013. Systemat- 10.1007/978-3-319-16462-5_27. ics and Biodiversity, 11(2), 209–236, 2013. doi: [19] J. Cheney, L., Chiticariu, and W.-C. Tan. Provenance in 10.1080/14772000.2013.806371. databases: why, how, and where. Foundations and Trends [34] N.M. Franz, N.M., M. Chen, S. Yu, P. Kianmajd, S. Bow- 10.1057/ivs.2009.29. ers, and B. Ludäscher. Reasoning over taxonomic change: [50] Hackel, E. Andropogoneae. In A.L.P.P. de Candolle, exploring alignments for the Perelleschus use case. PLoS and C. de Candolle, editors, Monographiae Phanerogama- ONE 10(2): e0118247, 2015. doi: rum, 6, 1-716, 1889. 10.1371/journal.pone.0118247. [51] A.S. Hitchcock, and A. Chase. Manual of the Grasses of [35] N.M. Franz, N.M. Pier, D.M. Reeder, M. Chen, S. Yu, P. the United States, Second Edition. United States Department Kianmajd, S. Bowers, and B. Ludäscher. Two influential of Agriculture Miscellaneous Publication No. 200, 1950. primate classifications logically aligned. Systematic Biolo- [52] ICZN – International Commission on Zoological No- gy, 2016. (in press) arXiv:1412.1025. menclature. International Code of Zoological Nomenclature. [36] N.M. Franz, and R.K. Peet. Towards a language for Fourth Edition. International Trust for Zoological Nomen- mapping relationships among taxonomic concepts. Sys- clature, London, 1999. tematics and Biodiversity, 7(1), 5–20, 2009. doi: [53] V. Ivanova, and P. Lambrix. A system for aligning 10.1017/S147720000800282X. taxonomies and debugging taxonomies and their align- [37] N.M. Franz, R.K. Peet, and A.S. Weakley. On the ments. In P. Cimiano, M. Fernández, V. Lopez, S. Schlobach, use of taxonomic concepts in support of biodiversity re- and J. Völker, editors. The Semantic Web: ESWC 2013 Satel- search and taxonomy. In Q.D. Wheeler, editor, The New lite Events, ESWC 2013 Satellite Events, Montpellier, France, Taxonomy, Systematics Association Special Volume Series May 26-30, 2013. Lecture Notes in Computer Science, 7955, 74, pages 63–86. Taylor & Francis, Boca Raton, 2008. doi: 152–156, 2013. doi: 10.1007/978-3-642-41242-4_15. 10.1201/9781420008562.ch5. [54] M.A. Jansen, and N.M. Franz. Phylogenetic revision of [38] N.M. Franz, and D. Thau. Biological taxonomy and Minyomerus Horn, 1876 sec. Jansen & Franz, 2015 (Coleop- ontology development: scope and limitations. Biodiversity tera, Curculionidae) using taxonomic concept annotations Informatics, 7(1), 45–66, 2010. doi: and alignments. ZooKeys, 52, 1–133, 2015. doi: 10.17161/bi.v7i1.3927. 10.3897/zookeys.528.6001 [39] E.R. Gansner, and S.C. North. An open graph visuali- [55] P. Jonsson, and T. Drakengren. A complete classifica- zation system and its applications to software engineering, tion of tractability in RCC-5. Journal of Artificial Intelli- Software – Practice and Experience, 30(11), 1203–1233, gence Research, 6(1), 211–221, 1997. 2000. doi: 10.1002/1097-024X(200009)30:11<1203::AID- [56] H. Kaiser, B.I. Crother, C.M.R. Kelly, L. Luiselli, M. SPE338>3.0.CO;2-N. O'Shea, H. Ota, P. Passos, W.D. Schhleip, and W. Wüster. [40] M. Gebser, B. Kaufmann, R. Kaminski, M. Ostrowski, Best practices: in the 21st century, taxonomic decisions in T. Schaub, and M.T. Schneider. Potassco: the Potsdam An- herpetology are acceptable only when supported by a body swer Set Solving Collection. AI Communications, 24(2), of evidence and published via peer-review. Herpetological 107–124, 2011. doi: 10.3233/AIC-2011-0491. Review, 44(1), 8–23, 2013. [41] M. Gelfond. Answer sets. In F.van Harmelen, V. [57] J.T. Kartesz. The Biota of North America Program Lifschitz, and B. Porter, editors, Handbook of Knowledge (BONAP). 2014. North American Plant Atlas. Chapel Hill, NC, Representation, pages 285–316. Elsevier, Amsterdam, 2014. http://bonap.net/napa 2008. [58] J. Kennedy, R. Hyam, R. Kukla, and T. Paterson. [42] I.P. Gent. The recomputation manifesto. Standard data model representation for taxonomic infor- arXiv:1304.3674 [cs.GL] mation. OMICS, 10(2), 220–230, 2006. doi: [43] I.P. Gent, L. Kothoff. Recomputation.org: experienc- 10.1089/omi.2006.10.220. es of its first year and lessons learned. In 2014 IEEE/ACM [59] J. Kennedy, R. Kukla, and T. Paterson. Scientific 7th International Conference on Utility and Cloud Compu- names are ambiguous as identifiers for biological taxa: ting (UCC), pages 968–973, 2014. doi: their context and definition are required for accurate data 10.1109/UCC.2014.158. integration. In B. Ludäscher, and L. Raschid, editors, Data [44] M. Geoffroy, and W.G. Berendsohn. The concept Integration in the Life Sciences: Proceedings of the Second problem in taxonomy: importance, components, approach- International Workshop, San Diego, CA, USA, July 20–22, es. Schriftenreihe für Vegetationskunde, 39, 5–14, 2003. pages 80–95. DILS 2005, LNBI 3615, 2005. doi: [45] M. Geoffroy, and A. Güntsch. Assembling and navi- 10.1007/11530084_8 gating the potential taxon graph. Schriftehreihe für Vegeta- [60] D. Lepage, G. Vaidya, and R. Guralnick. Avibase – a tionskunde, 39, 71–82, 2003. database system for managing and organizing taxonomic [46] H.C.J. Godfrey. Challenges for taxonomy. Nature, concepts. ZooKeys, 420, 117–135, 2014. doi: 417(6884), 17–19, 2002. doi: 10.1038/417017a. 10.3897/zookeys.420.7089 [47] R.K. Godfrey, and J.W. Wooten. Aquatic and Wetland [61] V. Lifschitz. Twelve definitions of a stable model. In of Southeastern United States, . Uni- M.G. de la Banda, and E. Pontelli, editors, Logic Program- versity of Georgia Press, Athens, 1979. ming. Lecture Notes in Computer Science, 5366, 37–51, [48] M. Graham, P. Craig, and J. Kennedy. Visualisation to 2008. doi: 10.1007/978-3-540-89982-2_8 aid biodiversity studies through accurate taxonomic rec- [62] J. McCarthy. Programs with common sense. In Mech- onciliation. In A. Gray, K. Jeffery, and J. Shao, editors, Pro- anisation of Thought Processes, Proceedings of the Symposi- ceedings of British National Conference on Database Sys- um of the National Physics Laboratory, London, U.K.: Her tems: Sharing Data, Information and Knowledge, Cardiff, Majesty's Stationery Office, pages 77–84, 1959. United Kingdom. Lecture Notes in Computer Science, 5071, [63] W. McCune. Prover9 and Mace4, 2005-2010. 280–291, 2008. doi: 10.1007/978-3-540-70504-8_29. www.cs.unm.edu/~mccune/prover9 [49] M. Graham, and J. Kennedy. A survey of multiple tree [64] J. McNeill, N.J. Turland, F.R. Barrie, W.R. Buck, W. visualisation. Information Visualization, 9(4), 235–252, Greuter, and J.H. Wiersema. International Code of Nomen- 2010. doi: clature for Algae, Fungi, and Plants (Melbourne Code). Ko- changes in the Basic Formal Ontology. In P. Garbacz, and eltz Scientific Books, Königstein, 2012. O. Kutz, editors, Formal Ontology in Information Systems, [65] S. Meiri, and G.M. Mace. New taxonomy and the FOIS 2014, IOS Press, pages 227–240, 2014. doi: origin of species. PLoS Biology, 5(7), e194, 2007. doi: 10.3233/978-1-61499-438-1-227. 10.1371/journal.pbio.0050194. [81] J.K. Small. Manual of the Southeastern Flora, Being [66] P.E. Midford, T.A. Dececchi, J.P. Balhoff, W.M. Dahdul, Descriptions of the Seed Plants Growing Naturally in Florida, N. Ibrahim, H. Lapp, J.G. Lundberg, P.M. Mabee, P.C. Sereno, Alabama, Mississippi, Eastern Louisiana, Tennessee, North M. Westerfield, T.J. Vision, and D.C. Blackburn. The verte- Carolina, South Carolina, and Georgia. University of North brate taxonomy ontology: a framework for reasoning Carolina Press, Chapel Hill, 1933. across model organism and species phenotypes. Journal of [82] S.A. Smith, C.E. Hinchcliff, J.F. Allman, G. Burleigh, R. Biomedical Semantics, 4: 34, 2013. doi: 10.1186/2041- Chaudhary, L.M. Coghill, K.A. Crandall, J. Deng, B.T. Drew, R. 1480-4-34. Gazis, K. Gude, D.S. Hibbett, L.A. Katz, H.D. Laughinghouse [67] A. Minelli. Publications in taxonomy as scientific IV, E.J. McTavish, P.E. Midford, C.L. Owen, R.H. Ree, J.A. papers and legal documents. Proceedings of the California Rees, D.E. Soltis, T. Williams, and K.A. Cranston. Synthesis Academy of Sciences, 56, Supplement I(20), 225–231, of phylogeny and taxonomy into a comprehensive tree of 2005. life. Proceedings of the National Academy of Sciences of the [68] J.M. Padial, and I. De La Riva. Taxonomic inflation United States of America, 112(41), 12764–12769, 2015. and the stability of species lists: the perils of ostrich's be- doi: 10.1073/pnas.1423041112. havior. Systematic Biology, 55(5), 859–867. doi: [83] P.F. Stevens. Metaphors and typology in the devel- 10.1080/1063515060081588. opment of botanical systematics 1690–1960, or the art of [69] R.D.M. Page. BioNames: linking taxonomy, texts, and putting new wine in old bottles. Taxon, 33(2), 169–211, trees. PeerJ, 1:e190, 2013. doi: 10.7717/peerj.190. 1984. [70] D.J. Patterson, J. Cooper, P.M. Kirk, R.L. Pyle, and D.P. [84] D. Thau, S. Bowers, and B. Ludäscher. Merging tax- Remsen. Names are key to the big new biology. Trends in onomies under RCC-5 algebraic articulations. In Conference Ecology and Evolution, 25(12), 686–691, 2010. doi: on Information and Knowledge Management. Proceeding of 10.1016/j.tree.2010.09.004. the 2nd International Workshop on Ontologies and Infor- [71] A.E. Radford, H.E. Ahles, and C.R. Bell. Manual of the mation systems for the Semantic Web, pages 47–54. ACM, Vascular Flora of the Carolinas. University of North Caroli- New York, 2008. doi: 10.1145/1458484.1458492. na Press, Chapel Hill, 1968. [85] D. Thau, S. Bowers, and B. Ludäscher. Merging sets [72] D.A. Randell, Z. Cui, and A.G. Cohn. A spatial of taxonomically organized data using concept mappings logic based on regions and connection. In B. Nebel, W. under uncertainty. In Proceedings of the 8th International Swartout, and C. Rich, editors, Proceedings of the Third In- Conference on Ontologies, Databases, and the Applications of ternational Conference on the Principles of Knowledge Rep- Semantics (ODBASE 2009). OTM 2009. Lecture Notes in resentation and Reasoning, pages 165–176, Morgan Kauf- Computer Science, 5871, 1103–1120, 2009. doi: mann, Los Altos, 1992. 10.1007/978-3-642-05151-7_26. [73] T. Rees. Taxamatch, an algorithm for near ('fuzzy') [86] D. Thau, S. Bowers, and B. Ludäscher. Towards best- matching of scientific names in taxonomic databases. PLoS effort merge of taxonomically organized data. In Proceed- ONE, 9(9), e107510, 2014. doi: ings of the 2010 IEEE 26th International Conference on Data 10.1371/journal.pone.0107510. Engineering Workshops (ICDEW), 1-6 March 2010. IEEE [74] D. Remsen. The use and limits of scientific names in Xplore, pages 151–154, 2010. doi: biological informatics. In E. Michel, editor, Anchoring Bio- 10.1109/ICDEW.2010.5452756. diversity Information from Sherborn to the 21st Century and [87] D. Thau, and B. Ludäscher. Reasoning about taxon- Beyond. ZooKeys, 550, 207–223, 2016. doi: omies in first-order logic. Ecological Informatics, 2(3), 10.3897/zookeys.550.9546. 195–209, 2007. doi: 10.1016/j.ecoinf.2007.07.005. [75] O. Rieppel. The performance of morphological char- [88] J. Tuominen, N. Laurenne, and E. Hyvönen. Biological acters in broad-scale phylogenetic analyses. Biological names and taxonomies on the Semantic Web – managing Journal of the Linnean Society, 92(2), 297–308, 2007. doi: the change in scientific conception. In G. Antoniou, M. Gro- 10.1111/j.1095-8312.2007.00847.x. belnik, E. Simperl, B. Parsia, D. Plexousakis, P. Leenheer, [76] A.B. Rylands, and R.A. Mittermeier. Primate taxono- and J. Pan, editors. The Semantic Web: Research and Appli- my: species and conservation. Evolutionary Anthropology, cations. Lecture Notes in Computer Science, 6644, 255– 23(1), 8–10, 2014. doi: 10.1002/evan.21387. 269, 2011. doi: 10.1007/978-3-642-21064-8_18. [77] G. Sangster. The application of species criteria in [89] R.I. Vane-Wright. Indifferent philosophy versus avian taxonomy and its implications for the debate over almighty authority: on the taxonomic status of Amauris hy- species concepts. Biological Reviews, 89(1), 199–214, alites Butler (Lepidoptera: Nymphalidae). Systematics and 2014. doi: 10.1111/brv.12051. Biodiversity, 1(1), 3–11, 2003. doi: [78] R.T. Schuh. The Linnaean system and its 250-year 10.1017/S1477200003001063. persistence. The Botanical Review, 69(1), 59–78, 2003. [90] F. van Harmelen, V. Lifschitz, and B. Porter, editors. doi: 10.1663/0006- The Handbook of Knowledge Representation. Elsevier, Ox- 8101(2003)069[0059:TLSAIY]2.0.CO;2. ford, 2008. [79] M.J. Scoble. Unitary or unified taxonomy? Philosoph- [91] M. Vences, J.M. Guayasamin, A. Miralles, and I. De La ical Transactions of the Royal Society of London, Series B, Riva. To name or not to name: criteria to promote economy 359, 699–710, 2004. doi: 10.1098/rstb.2003.1456. of change in Linnaean classification schemes. Zootaxa, [80] S. Seppälä, B. Smith, and W. Ceusters. Applying the 3636(2), 201–244, 2013. doi: 10.11646/zootaxa.3636.2.1. realism-based ontology-versioning method for tracking [92] S. Wang, S. Schlobach, and M. Klein. Concept drift and how to identify it. Web Semantics: Science, Services and Agents on the World Wide Web, 9(3), 247–265, 2011. doi: 10.1016/j.websem.2011.05.003. [93] A.S. Weakley. Flora of the Carolinas, Virginia, Geor- gia, and Surrounding Areas. Working Draft of 6 January 2006. University of North Carolina Herbarium, Chapel Hill, 2006. http://www.herbarium.unc.edu/FloraArchives/Wea kleyFlora_2006-Jan.pdf [94] Weakley, A.S. Flora of the Southern and Mid-Atlantic States. Working Draft of 29 Mary, 2015. University of North Carolina Herbarium, Chapel Hill, 2015. http://www.herbarium.unc.edu/FloraArchives/Wea kleyFlora_2015-05-29.pdf [95] W.C. Wheeler, M. Whiting, Q.D. Wheeler, and J.M. Carpenter. The phylogeny of the extant hexapod orders. Cladistics, 17(2), 113–169, 2001. doi: 10.1111/j.1096- 0031.2001.tb00115.x. [96] J. Witteveen. Naming and contingency: the type method of biological taxonomy. Biology & Philosophy, 30, 569–586, 2015. doi: 10.1007/s10539-014-9459-6. [97] J. Zhao, A. Miles, G. Klyne, and D. Shotton. Linked data and provenance in biological data Webs. Briefings in Bioinformatics, 10(2), 139–152, 2009. doi: 10.1093/bib/bbn044. [98] Y. Zhong, S. Jung, S. Pramanik, and J.H. Beaman. Data model and comparison and query methods for interacting classifications in a taxonomic database. Taxon, 45(2), 223– 241, 1996.