<<

G Model

JCZ-25325; No. of Pages 7 ARTICLE IN PRESS

Zoologischer Anzeiger xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Zoologischer Anzeiger

jou rnal homepage: www.elsevier.com/locate/jcz

Peeking behind the page: using natural language processing to

identify and explore the characters used to classify sea anemones

a,∗ b b

Marymegan Daly , Lorena A. Endara , John Gordon Burleigh

a

Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Aronoff Lab, 318 West 12th Avenue, Columbus, OH 43210, USA

b

Department of Biology, The University of Florida, Bartram Hall, 876 Newell Dr., Gainesville, FL 32611, USA

a r t i c l e i n f o a b s t r a c t

Article history: Although most phylogenetic investigations are motivated by questions about the evolution of morpho-

Received 20 October 2014

logical attributes, morphological data are increasingly rare as a source of characters for reconstructing

Received in revised form 16 March 2015

phylogeny, in part because these attributes are time consuming to collect. Here we describe methods

Accepted 17 March 2015

to mine the information contained in classifications as a source of phylogenetic characters, using the

Available online xxx

classification of actiniarian sea anemones (: ) as our exemplar system. Our natural lan-

guage processing pipeline recovers more than 400 characters in the most widely-used classification of sea

Keywords:

anemones. However, the majority of these are problematic, reflecting semantic or logical inconsistencies

Actiniaria

Cnidaria or being scored for only a single taxon and thus inappropriate for phylogenetic reconstruction. Although

Systematics the classification cannot be directly translated into a phylogenetic matrix, the exposure of the characters

Matrix that underlie a classification provide important perspective into the basis and limits of a classification

system and offer a valuable starting point for the creation of a phylogenetic matrix.

© 2015 Published by Elsevier GmbH.

1. Introduction disagreed about whether the group was monophyletic and about

how to interpret and link the taxa within the order (reviewed in

Actiniarian sea anemones are conspicuous members of marine Daly et al., 2007; Rodríguez et al., 2014). A stable classification

habitats, dominating some shallow water and polar communities arose through collaboration between the two most prolific schol-

and playing significant roles in reef, hydrothermal, and shelf sys- ars of actiniarian biology, Oskar Carlgren (Swedish, 1865–1954) and

tems (Fautin, 1989; Fautin et al., 2013). Because the actiniarian Thomas A. Stephenson (British, 1898–1961) when Carlgren (1949)

communities of most habitats and on most continents comprise codified and revised the system initially proposed by Stephenson,

diverse lineages, this ecological breadth is probably not the result 1920, 1921, 1922. Stephenson’s (1949) contribution of the preface

of in situ radiations, but instead reflects ancient diversity, a pat- to Carlgren’s classification highlights this as a consensus system

tern also seen in their close relatives, scleractinian (Barbeitos largely agreed upon by both of them. Carlgren (1949) divides the

et al., 2010; Stolarski et al., 2011). 1200 species of Actiniaria known into three suborders; the largest

Like most cnidarians, actiniarians have relatively simple bodies: encompasses the vast majority of species and is further subdivided

an actiniarian is a tubular, tentaculate polyp whose body consists of into superfamilies (mistakenly referred to as “tribes”). Carlgren

highly folded and extruded sheets of one-to-three cell layers of tis- (1949) also synthesized the diversity of Ptychodactiaria, a group he

sue. Although simple in anatomy compared to triplobastic , recognized as an order but that is now classified as a family within

actiniarians show the greatest polyp-level diversity of Cnidaria, Actiniaria (reviewed by Rodríguez et al., 2014).

with complex interior anatomy, several unique anatomical struc- Carlgren’s (1949) classification has been challenged by the

tures, and diversity in the morphology of the column and tentacles discovery of new taxa (e.g., Fautin and Hessler, 1989; Daly and

(Daly et al., 2007). Goodwill, 2009; Rodríguez et al., 2009), consideration of new

The synthesis of this diversity into a coherent framework posed character systems (e.g., Schmidt, 1969, 1974), reexamination of

a challenge for 19th and early 20th century systematists, who characters in detail (e.g., Cappola and Fautin, 2000), and phyloge-

netic analyses (e.g., Daly et al., 2002, 2008; Gusmão and Daly, 2010;

Rodríguez et al., 2008, 2012, 2014). Although these challenges have

∗ empirical backing, they are more limited in taxonomic scope and

Corresponding author. Tel.: +1 614 247 8412; fax: +1 6142927774.

in the breadth of morphological features considered than is the

E-mail addresses: [email protected] (M. Daly), lendara@flmnh.ufl.edu

(L.A. Endara), gburleigh@ufl.edu (J.G. Burleigh).

http://dx.doi.org/10.1016/j.jcz.2015.03.004

0044-5231/© 2015 Published by Elsevier GmbH.

Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore

the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004

G Model

JCZ-25325; No. of Pages 7 ARTICLE IN PRESS

2 M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx

classification of Carlgren (1949). However broad it is in terms of for each taxa. In its default setting, which we used, MatrixGener-

data and scope, Carlgren’s (1949) system is replete with contra- ator uses the hierarchical classification to fill in character states

dictions and arbitrariness in terms of the implied hierarchy of for taxa of the lower taxonomic ranks. For example, if a family

characters (reviewed in Daly et al., 2008; Rodríguez et al., 2012, description contained a specific character state, then all of the gen-

2014). Carlgren’s system is not phylogenetic in the modern sense, era within that family would also be coded for that character state.

although there are indications that he viewed some of the higher Currently, the ETC website, specifically the “Text Capture” option

taxa as having phylogenetic cohesiveness (Carlgren, 1942), and (Fig. 1B), includes software that enables users to perform all of these

he explicitly recognized (Carlgren, 1949: 7) that the system was steps to obtain character datasets from taxonomic descriptions

built upon imperfect information. Resolving the conflict between (http://etc-dev.cs.umb.edu/etcsite/).

Carlgren’s classification, phylogenetic analyses of anemones, and Finally, we used “MatrixConverter” software to evaluate the

the information embodied in character systems not considered by characters data output by MatrixGenerator. MatrixConverter is a

Carlgren (1949) requires careful study of diverse character systems freely available, platform-independent software program designed

in a phylogenetic context. to facilitate the transformation of raw phenomic character data into

This seemingly daunting task is made easier by new tools that discrete character matrices (Liu et al., 2015). It takes as input the

facilitate the extraction of character information from monographs. tab-delimited character files output from MatrixGenerator (or the

Semi-automated text mining and natural language processing “Text Capture” option of the ETC website) and provides an easy

(NLP) programs designed for the concise and technical format to use interface that enables users to evaluate the characters and

of formal taxonomic descriptions enable extraction of the infor- ultimately code them as discrete character states for evolutionary

mation embodied in monographs and text-based descriptions of inference. For example, in our analysis the initial list of charac-

species (e.g., Cui, 2012; reviewed by Burleigh et al., 2013). These ters contained duplicate and nonsense characters. Duplicates are

tools render accessible centuries’ worth of biodiversity informa- those features clearly referring to the same structure character con-

tion, allowing for explicit examination of characters and data. The cept but using different words (e.g., shape of “base,” “basal disc,” or

parsing of descriptive narratives into characters exposes data that “pedal disc”); nonsense characters are those features that are log-

underlies classifications, proposals of synonymy, or hypotheses of ically inconstant (e.g., “count” and “presence” as two independent

relationship and thus allows these individual studies to be synthe- characters for a structure that can only occur singly, like the column,

sized and compared. or a position-based attribute of a feature defined by its position,

We parse Carlgren’s (1949) classification using a series such as “position of the proximal end”). Identification of duplicates

of NLP and phylogenetic character discovery tools that were and nonsensical features requires expertise with the organismal

developed as part of the AVAToL Next Generation Phenomics system. We identified these by grouping the putative characters

project (Burleigh et al., 2013). The functions of these pro- based on the system of origin (e.g., tentacle, actinopharynx) and

grams are all implemented online as part of the ETC website then scrutinizing each feature. We also made comparisons across

(http://etc-dev.cs.umb.edu/etcsite/). The parsing pipeline first uses character systems for characters to identify logically or structurally

CharaParser (Cui, 2012) to identify the characters and character related features. This process is referred to as “collapsing and edit-

state data from the diagnoses in Carlgren (1949) and then uses ing” in the results and discussion.

MatrixGenerator and Matrix Converter (Liu et al., 2015) to eval-

uate the characters and character states. These tools expose the

3. Results and discussion

character data contained in the classification, allowing us to evalu-

ate the number, representativeness, and nature of the features that NLP processing of Carlgren (1949) recovered 418 raw charac-

implicitly underlie Carlgren’s groupings. ters; collapsing and editing of these results reduced the total to

259 putative characters (Appendix A). These span anatomical sys-

tems (Fig. 2), but are weighted towards aspects of external anatomy

2. Methods

(including features of the column and tentacles) and the arrange-

We first obtained a text-only version of Carlgren’s (1949) clas- ment and morphology of the mesenteries. It is not surprising that

sification from the Tree of Life web portal (Fautin et al., 2000). This these are the dominant sources of information; these features are

included 168 taxonomic descriptions written in telegraphic format, easily accessible, requiring (in most cases) no special equipment to

including descriptions from 6 superfamilial taxa, 42 families, and observe and describe them. The majority of these putative charac-

133 genera. We used the CharaParser NLP software to semantically ters are presence/absence attributes or quantitative features like

parse the taxonomic descriptions, identifying phenomic characters number of mesenteries per cycle. Attributes of the mesenterial

and specific character states (Cui, 2012). During the initial Chara- and column musculature are the fourth- and fifth- most abundant

Parser steps, the text words are analyzed based on their position source of characters (Fig. 2); even if aggregated together as “mus-

in the text, and they are compared to a built-in glossary. Chara- cle histology,” these systems account for fewer characters than do

Parser classifies the words into three classes: structures, characters, the tentacles. The anatomy and arrangement of the aboral end is

and other terms. This machine-made classification needs to be a richer source of characters than is the anatomy of the oral end,

reviewed by the user, and words that were not recognized need even if attributes of the actinopharynx are combined with those of

to be categorized to provide the system with additional informa- the oral disc proper (Fig. 2).

tion for the parsing step. For the classification, words were loaded Although Carlgren (1949) included a list of technical terms, his

into the Ontology Term Organizer (OTO; Huang et al., 2012). OTO classification nonetheless relied on characters that are ambiguously

provides an interface in which the user can categorize terms by defined (e.g., “marginal spherules”, see England, 1987; Daly, 2003

dragging and dropping them into predefined categories (Fig. 1). “vesicles”, Häussermann, 2004), a problem that complicates any

After the terms are categorized, CharaParser is run one more time. direct translation of the putative features he used into characters for

The output of CharaParser is an annotated XML file that phylogenetic analysis or subsequent classification. Other attributes

describes the semantic mark-up of the text descriptions. The used by Carlgren may reflect preservation artifacts (England, 1987),

MatrixGenerator software transforms the XML output file from reducing either the number of states (e.g., for marginal sphincter

CharaParser into a tab-delimited file of phenomic character muscles) or the number of putative characters (e.g., for tissue layer

descriptions. Each column in the tab-delimited file represents a thickness). Although we removed obvious duplicates, at least some

character identified by CharaParser, with the character-states listed of the characters that remain are not independent, and therefore

Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore

the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004

G Model

JCZ-25325; No. of Pages 7 ARTICLE IN PRESS

M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx 3

Fig. 1. Diagram of the semi-automated NLP parsing method used to transform taxonomic descriptions into taxon/character matrices for evolutionary inference. (A) Software

pipeline using CharaParser, Ontology Term Organizer, MatrixGenerator, and Matrix Converter software. (B) Web-based method using the explorer of taxon concepts (ETC)

website. Arrows indicate the corresponding functions of the tabs in the ETC pipeline.

potentially problematic for phylogenetic analyses. For example, the tend to be defined by a one to a few autapomorhic features that are

number of mesenteries and the number of tentacles are related thus scored for all subordinate taxa.

across Actiniaria (Stephenson, 1928), but in a way that defies As acknowledged by Stephenson in his preface, attributes of

one-to-one correspondence. Other putative characters embody or cnidae are relatively under-sampled in Carlgren (1949). These com-

conflate multiple variables. Disparate attributes may be lumped prise 3.5% of the characters identified via NLP. All of the diagnoses

under the rubric of “architecture” or “arrangement” characters. For of genera and family in Carlgren (1949) include some information

example, the physa is treated by Carlgren (1949) distinct from a on cnidae, but the information is typically not very precise, indi-

pedal disc, but these are manifestations of the same body part; the cating only the presence of some kind of cnidae someplace within

shared elements are obscured by this coding, as are the many vari- the body. This broad information does not capture finer-scale dif-

ables that contribute to defining a physa (see Daly and Ljubenkov, ferences in terms of the tissue-level distribution of nematocysts

2008). and may ignore phylogenetic information (Fautin, 1988; Rodríguez

Carlgren’s (1949) classification, like that of his predecessors, is et al., 2009). Furthermore, the characters of cnidae in Carlgren

based on authoritative interpretation of those features that had (1949) clearly require subdivision in terms of the diversity of cap-

been used in species descriptions and superspecific classifications. sules: Schmidt (1969), Mariscal (1974), and England and Robson

However, the hierarchy of the classification conflicts with the hier- (1991) have highlighted the diversity of forms subsumed under

archy of features inferred from phylogenetic analyses. For example, the categories used by Carlgren (1949) (e.g., “mastigophore” or

Carlgren (1949) considers the anatomy and musculature of the “basitrich”).

aboral end to be highly significant and uses this to circumscribe The collapsed and edited data set includes 87 putative characters

the superfamilal group Athenaria; phylogenetic analysis suggests scored only for a single taxon; this represents 34% of the collapsed

that these features have had a complex evolutionary history, with and edited data. External anatomy and organization of the column

multiple independent losses (Daly et al., 2002, 2008; Rodríguez (61%), column musculature (58%), ciliated filaments (54%), and oral

et al., 2012, 2014). This hierarchy clearly guided the reporting of end (50%), represent a relatively greater proportion of singleton

character information in Carlgren (1949): for example, relatively characters. In contrast, aboral end (24%), mesenterial musculature

few features of the aboral end are reported for taxa outside of (18%), and actinopharynx (9%), contribute proportionally fewer sin-

Athenaria. The granularity of classification is heterogeneous, at gleton characters. Because the input data were hierarchical, with

least with respect to group size: the vast majority of families con- the information in higher-level descriptions being applied to all

tain fewer than six genera and only three have ten or more genera. of the subordinate taxa, the features used in familial and subordi-

This heterogeneity significantly impacts the distribution of charac- nal classification are less likely to be among the singleton features,

ter information because smaller (fewer genera or species) families except in the case of monogeneric higher taxa. Thus, we infer that

Carlgren (1949) relies relatively more on attributes of the sys-

Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore

the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004

G Model

JCZ-25325; No. of Pages 7 ARTICLE IN PRESS

4 M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx

Our results demonstrate the promise of new NLP methods to

rapidly extract character data from legacy taxonomic literature.

Creating the initial dataset of 418 characters took only a matter of

a few hours on a laptop, and this time could be greatly reduced

if the characters were mapped against an existing comprehen-

sive glossary of characters for the group. These analyses expose

the information in legacy taxonomic treatments to scrutiny and

make these data computable, allowing us to quantify the basis

of classification systems and to undertake evolutionary inference.

Yet our analyses also highlight the critical importance of expert

evaluations of the results of the semi-automated NLP pipeline. Sim-

ply using this raw output can be problematic; nearly 40% of the

raw characters were duplicate or nonsense characters, and oth-

ers may still be non-independent or contain non-homologous or

otherwise incomparable character states. Incorporating detailed

ontological information for the characters and character states in

the NLP pipeline may reduce the number of problematic characters,

but the results still must be carefully reviewed by an expert. Thus,

the NLP pipeline does not eliminate the necessity of morphological

expertise to obtain quality phenotypic datasets, but it does greatly

expedite the process of obtaining this data for scientists with this knowledge.

Acknowledgements

We thank Hong Cui Jing Liu Maureen O’Leary, Thomas Roden-

hausen, and Elvis H. Wu for their support and feedback in the

development of the NLP pipeline and Carsten Leuter, Lars Vogt and

Fig. 2. Diagrammatic longitudinal section through a , showing the

Gerhard Scholtz for their support in organizing the “e-morphology”

anatomical distribution of characters used in Carlgren’s (1949) classification. Num-

symposium and symposium volume for ICIM-3. The software was

bers in the circle refer to the number of collapsed and edited characters. Cnidae

apply to or reflect the whole organism, not only the tentacles as might be inferred developed with support from the US National Science Foundation.

from the placement of the circle. Refer to Appendix A for list of collapsed and edited characters.

Appendix A.

tems in which singletons are under-represented for subordinal Characters distilled from Carlgren’s (1949) classification using

and familial classification, and that his -level relies the NLP-pipeline described in the text. Duplicate and nonsense

more heavily on those systems in which singletons are relatively characters have been culled, but this list likely contains features

over-represented. The systems (or particular characters) for which that are not logically independent.

only one taxon has been scored are not necessarily inapplicable

Tentacles Architecture of arm

or inappropriate for more inclusive groups. Instead, we interpret

Architecture of marginal tentacle

these to represent a relatively large and untapped set of features

Architecture of tentacle

ripe for further study.

Arrangement of arm

The value of character matrices derived from legacy data for Arrangement of inner tentacle

evolutionary inference ultimately may be limited by the lack of Arrangement of knob

Arrangement of outer (marginal) tentacle

phylogenetic perspective in the original classification rather than

Arrangement of outgrowth

the NLP software. The large percentage of single-taxon characters

Arrangement of tentacle

from Carlgren’s (1949) text is not uncommon for legacy taxonomic

Count of aperture

classifications (Endara and Burleigh, personal observation). These Count of endocoelic tentacle

Count of exocoelic tentacle

may represent autapomorphies, which may be useful to diagnose

Count of marginal tentacle

a taxon but not classify it relative to other taxa. In other cases,

Count/presence of circlet

the failure to note the absence of characters in taxonomic descrip-

Count/presence of arm

tions, or more generally the lack of parallel descriptions, also often Fragility of tentacle

produces a number of single-taxon characters that could be evo- Function of tentacle

Growth order or position of inner tentacle

lutionarily informative presence/absence characters. Finally, the

Growth order or position of outer tentacle

hierarchical structure of the descriptions poses a challenge for

Length of endocoelic tentacle

classification, especially if the higher-level taxa do not represent

Length of exocoelic tentacle

clades or if the character-state descriptions for higher taxa do not Orientation of endocoelic tentacle

Orientation of exocoelic tentacle

precisely represent the character states for all of the lower-level

Orientation of longitudinal muscle

taxa. It is straightforward to implement the MatrixGenerator soft-

Position of discal tentacle

ware so that it does not ascribe characters to lower-level taxa

Presence of aboral thickening

based on descriptions of higher-level taxa, but in the case of the Presence of tentacle

extremely hierarchical description of Carlgren (1949), this would Presence/count of knob

Prominence of exocoelic tentacle

greatly reduce the number of characters and increase the amount of

Prominence of discal tentacle

missing data in the matrices. We minimally recommend a carefully

Prominence of endocoelic tentacle

evaluation of these higher-level characters that result from the NLP

Relief of discal tentacle

pipeline.

Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore

the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004

G Model

JCZ-25325; No. of Pages 7 ARTICLE IN PRESS

M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx 5

Reproduction of tentacle Size of protuberance

Shape of arm Size of scapus

Shape of discal tentacle Size of tubercle

Shape of endocoelic tentacle Texture of column

Shape of exocoelic tentacle Width of band

Shape of knob

Aboral end Architecture of aboral end

Oral end Architecture of ectodermal longitudinal muscles Architecture of physa

Architecture of oral disc Count of annulus

Count of conchula Count of pore

Presence/count of lobe Development of aboral body-end

Presence/shape of conchula Development of basal disc

Prominence of lobe Development of physa

Prominence of mouth Function of aboral end

Shape of lobe Prominence of pedal disc

Shape of oral disc Prominence of physa

Reproduction of base

Column organization and (Relative) width of body

Shape of base

external anatomy

Texture of margin

Architecture of band

Width of pedal disc

Architecture of capitulum

Architecture of edge Column muscles and wall Architecture of mesogloeal sphincter

Architecture of longitudinal band Arrangement of longitudinal muscle

Architecture of marginal pseudospherule Depth of ectoderm

Architecture of row Development of longitudinal muscle

Arrangement of adhesive verrucae Variability of sphincter

Arrangement of battery Development of mesogloeal sphincter

Arrangement of lower part of column Development of endodermal sphincter

Arrangement of nemathybome Position of sphincter

Arrangement of tenaculus Presence/nature of sphincter

Count of battery Prominence of longitudinal muscle

Count of longitudinal row Texture of longitudinal muscle

Count of marginal spherules Architecture of endodermal sphincter

Count of mesogloeal papillae Texture of endodermal sphincter

Count of outgrowth Shape of mesogloea

Count of papillae Relative thickness of ectodermal layer

Count of pseudospherule Relative thickness of endodermal layer

Count of row Relative thickness of endodermal muscle

Count of weaker papillae Relative thickness of longitudinal muscle

Development of cuticle Relative thickness of mesogloeal layer

Development of fosse

Mesentery arrangement and Architecture of ventrolateral mesentery

Development of protuberance

structure

Development of pseudospherule

Architecture of lateral mesentery

Development of verruca

Architecture of couple

Orientation or position of row

Architecture of dorsal directive

Position of band

Architecture of gonad (gametogenic tissue)

Position of longitudinal row

Architecture of macrocneme

Presence of scapus

Architecture of mesentery

Presence of fosse

Architecture of microcneme

Presence of marginal spherule

Architecture of stronger mesentery

Presence of parapet

Architecture of stronger partner

Presence of pseudospherule

Architecture of ventral directive

Prominence of battery

Architecture of weaker partner

Prominence of differentiation

Arrangement of cycles

Prominence of fold

Arrangement of directive

Prominence of longitudinal furrow

Arrangement of gonad (gametogenic tissue)

Prominence of longitudinal row

Arrangement of macrocnemes

Prominence of parapet

Arrangement of mesentery

Prominence of periderm

Arrangement of youngest pair

Prominence of region

Count of cycles

Prominence of verrucae

Count of lateral mesentery

Prominence of vertical row

Count of macrocnemes

Relief of capitulum

Count of mesentery

Relief of scapus

Count of microcneme

Relief of scapulus

Count of stronger mesentery

Relief of vesicle

Count of youngest cycle

Relief or texture of scapulus

Development of mesentery

Shape of column

Development of partner

Shape of margin

Count of couple

Shape of papilla

Development of youngest cycle

Shape of parapet

Development of youngest mesentery

Shape of scapus

Growth order of cycles

Shape of tubercle

Growth order of mesentery

Shape of vesicle

Growth order of lateral endocoel

Relative size of column

Orientation of mesentery

Size of adherent area

Orientation of partner

Size of band

Position of oldest cycle

Size of body

Position of stronger mesentery

Size of fold

Position or prominence of genital mesentery

Size of nematocyst battery

Presence/count of pedicel

Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore

the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004

G Model

JCZ-25325; No. of Pages 7 ARTICLE IN PRESS

6 M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx

Presence/count of directive References

Prominence of cycles

Prominence of endocoel

Barbeitos, M.S., Romano, S.L., Lasker, H.R., 2010. Repeated loss of coloniality and

Prominence of mesentery

symbiosis in scleractinian corals. Proc. Nat. Acad. Sci. 107 (26), 11877–11882.

Reproductivity of mesentery Burleigh, J.G., Alphonse, K., Alverson, A.J., Bik, H.M., Blank, C., Cirranello, A.L., Cui,

Reproductivity of directive H., Daly, M., Dietterich, T.G., Gasparich, G., Irvine, J., Julius, M., Kaufman, S., Law,

Reproductivity of microcneme E., Liu, J., Moore, L., O’Leary, M.A., Passarotti, M., Ranade, S., Simmons, N.B.,

Reproductivity of primary mesentery Stevenson, D.W., Thacker, R.W., Theriot, E.C., Todorovic, S., Velazco, P.M., Walls,

R.L., Wolfe, J.M., Yu, M., 2013. Next-generation phenomics for the Tree of Life.

Reproduction of ventral directive

PLoS Curr. Tree Life 2013 (June 26 Edition 1),

Size of adjacent mesenterial

http://dx.doi.org/10.1371/currents.tol.085c713acafc8711b2ff7010a4b03733.

Size of directive mesentery

Cappola, V.A., Fautin, D.G., 2000. All three species of Ptychodactiaria belong to

Size of partner mesenteries

order Actiniaria (Cnidaria: Anthozoa). J. Mar. Biol. Assoc. UK 80, 995–1005.

Size of mesentery

Carlgren, O., 1949. A survey of the Ptychodactiaria: Corallimorpharia and Actiniaria

Size of smaller mesentery

Kung. Svenska Vetensk. -Akad. Handling 1, 1–121.

Variability of directive

Cui, H., 2012. Charaparser for fine-grained semantic annotation of organism

Variability of members of a pair

morphological descriptions. J. Am. Inf. Sci. Tech. 63, 738–754.

Width of mesentery

Daly, M., 2003. The anatomy terminology, and homology of acrorhagi and

Reproduction of stronger mesentery pseudoacrorhagi in sea anemones. Zool. Verhand. 342, 89–101.

Reproduction of lateral couple Daly, M., Brugler, M.R., Cartwright, P., Collins, A.G., Dawson, M.N., Fautin, D.G.,

France, S.C., McFadden, C.S., Opresko, D.M., Rodríguez, E., Romano, S.L., Stake,

Mesenterial muscles Architecture of retractor

J.L., 2007. The phylum Cnidaria: a review of phylogenetic patterns and

Count of basilar muscles

diversity 300 years after Linnaeus. Zootaxa 1668, 127–182.

Count of folds

Daly, M., Chaudhuri, A., Gusmão, L., Rodríguez, E., 2008. Phylogenetic relationships

Count of parietobasilar muscles among sea anemones (Cnidaria: Anthozoa: Actiniaria). Mol. Phyl. Evol. 48,

Count of retractor muscles 292–301.

Development of basilar muscles Daly, M., Goodwill, R.H., 2009. Andvakia discipulorum a new species of burrowing

Development of parietal muscles sea anemone from Hawaii with a revision of Andvakia Danielssen 1890. Pac.

Development of parietobasilar muscles Sci. 63, 265–267.

Daly, M., Lipscomb, D.L., Allard, M.W., 2002. A simple test: evaluating explanations

Development of retractor

for the relative simplicity of the Edwardsiidae (Cnidaria: Anthozoa). Evolution

Length of muscles

56, 502–510.

Position of parietobasilar muscles

Daly, M., Ljubenkov, J.C., 2008. Edwardsiid sea anemones of California (Cnidaria:

Position of retractor muscles

Actiniaria: Edwardsiidae), with descriptions of eight new species. Zootaxa

Prominence of pennon

1860, 1–27.

Shape of parietal muscle

England, K.W., Robson, E.A., 1991. Nematocysts of sea anemones (Actiniaria

Shape of parietobasilar muscle

Ceriantharia and Corallimorpharia: Cnidaria): nomenclature. Hydrobiologia

Shape of retractor muscle

216–217, 691–697.

Size of basilar muscle England, K.W., 1987. Certain Actiniaria (Cnidaria: Anthozoa) from the Red Sea and

Size of parietal muscle tropical Indo-Pacific Ocean. Bull. Brit. Mus. Nat. Hist. (Zoology) 53, 205–292.

Size of parietobasilar muscle Fautin, D.G., 1988. Importance of nematocysts to actinian taxonomy. In: Hessinger,

Size of retractor muscle D.A., Lenhoff, H.M. (Eds.), The Biology of Nematocysts. Academic Press, San

Width of parietobasilar muscle Diego, pp. 487–500.

Fautin, D.G., 1989. Anthozoan dominated benthic environments. Pro. 6th Int. .

Width of retractor muscle

Reef. Symp. 3, 231–236.

Siphonoglyph and associated Architecture of actinopharynx Fautin, D.G., Hessler, R.R., 1989. Marianactis bythios, a new genus and species of

structures actinostolid sea anemone (Coelenterata: Actiniaria) from the Mariana vents.

Architecture of siphonoglyph Proc. Biol. Soc. Washington 102 (4), 815–825.

Architecture of ventral siphonoglyph Fautin, D.G., Malarky, L., Soberón, J., 2013. Latitudinal diversity of sea anemones

(Cnidaria: Actiniaria). Biol. Bull. 224 (2), 89–98.

Arrangement of siphonoglyph

Fautin, D.G., Romano, S.L., Oliver, Jr. W.A., 2000. Zoantharia. Sea Anemones and

Presence and count of siphonoglyph

Corals. Version 04 October 2000.

Development of siphonoglyph

http://tolweb.org/Zoantharia/17643/2000.10.04 in The Tree of Life Web

Length of siphonoglyph

Project, http://tolweb.org/

Reproduction of siphonoglyph

Gusmão, L.C., Daly, M., 2010. Evolution of sea anemones (Cnidaria: Actiniaria:

Size of siphonoglyph

Hormathiidae) symbiotic with hermit crabs. Mol. Phylo. Evol. 56,

Variability of siphonoglyph 868–877.

Width of siphonoglyph

Häussermann, V., 2004. Identification and taxonomy of soft-bodied hexacorals

exemplified by Chilean sea anemones; including guidelines for sampling,

Ciliated filaments and Arrangement of ciliated tract

preservation and examination. J. Mar. Biol. Assoc. UK 84, 931–936.

associated structures

Huang, F., Macklin, J., Morris, P.J., Sanyal, P.P., Morris, R.A., Cui, H., 2012. OTO:

Presence of nematosome

ontology term organizer. J. Am. Inf. Sci. Tech. 49, 1–3.

Architecture of filament

Liu, J., Endara, L., Cui, H., Burleigh, J.G., 2015. MatrixConverter: facilitating

Development of filament

construction of phenomic character matrices. App. Plant Sci. 3, 2.

Prominence of ciliated tract

Mariscal, R.N., 1974. Nematocysts. In: Muscatine, L., Lenhoff, H.M. (Eds.),

Structure of ciliated tract Coelenterate Biology: Reviews and New Perspectives. Academic Press, San

Presence of acontia Diego, pp. 129–178.

Count of acontia Rodríguez, E., Barbeitos, M.S., Brugler, M.R., Crowley, L.M., Grajales, A., Gusmão, L.,

Development of acontium Häussermann, V., Reft, A., Daly, M., 2014. Hidden among sea anemones: the

first comprehensive phylogenetic reconstruction of the order Actiniaria

Width of acontium

(Cnidaria, Anthozoa, ) reveals a novel group of hexacorals. PloS

Arrangement of acontia

One 9, e96998.

Size (length) of acontium

Rodríguez, E., Barbeitos, M., Daly, M., Gusmão, L.C., Häussermann, V., 2012. Toward

Cnidae Arrangement of microbasic b-mastigophores a natural classification: phylogeny of acontiate sea anemones (Cnidaria,

Arrangement of nematocysts Anthozoa, Actiniaria). Cladistics 28, 375–392.

Rodríguez, E., Castorani, C.N., Daly, M., 2008. Morphological phylogeny of family

Presence of atrichs

Actinostolidae (Anthozoa: Actiniaria) with a description of a new genus and

Presence basitrichs

species of hydrothermal vent sea anemone redefining family Actinoscyphiidae.

Presence of microbasic amastigophores

Invert. Syst. 22, 439–452.

Abundance of nematocysts

Rodríguez, E., López-González, P.J., Daly, M., 2009. New family of sea anemones

Presence of spirocysts

(Actiniaria: Acontiaria) from deep polar seas. Polar Biol. 32, 703–717.

Prominence of microbasic b-mastigophore

Schmidt, H., 1969. Die Nesselkapseln der Aktinien und ihre

Shape of microbasic macrobasic amastigophores

differentialdiagnostishce Bedeutung. Helgol. Wiss. Meeresunters. 19,

284–317.

Schmidt, H., 1974. On evolution in the Anthozoa. Proc. 2nd Int. Coral Reef Symp. 1,

533–560.

Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore

the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004

G Model

JCZ-25325; No. of Pages 7 ARTICLE IN PRESS

M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx 7

Stephenson, T.A., 1920. On the classification of Actiniaria Part I. Forms with acontia Stephenson, T.A., 1922. On the classification of Actiniaria: Part III – definitions

and forms with a mesogleal sphincter. Q. J. Microsc. Sci. 64, connected with the forms dealt with in Part II. Q. J. Microsc. Sci. 66, 247–319.

425–574. Stephenson, T.A., 1928. The British Sea Anemones. The Ray Society, London.

Stephenson, T.A., 1921. On the classification of Actiniaria. Part II. Consideration of Stolarski, J., Kitahara, M.V., Miller, D.J., Cairns, S.D., Mazur, M., Meibom, A., 2011.

the whole group and its relationships, with special reference to forms not The ancient evolutionary origins of Scleractinia revealed by azooxanthellate

treated in Part I. Q. J. Microsc. Sci. 65, 493–576. corals. BMC Evol. Biol. 11 (1), 316.

Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore

the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004