G Model
JCZ-25325; No. of Pages 7 ARTICLE IN PRESS
Zoologischer Anzeiger xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Zoologischer Anzeiger
jou rnal homepage: www.elsevier.com/locate/jcz
Peeking behind the page: using natural language processing to
identify and explore the characters used to classify sea anemones
a,∗ b b
Marymegan Daly , Lorena A. Endara , John Gordon Burleigh
a
Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Aronoff Lab, 318 West 12th Avenue, Columbus, OH 43210, USA
b
Department of Biology, The University of Florida, Bartram Hall, 876 Newell Dr., Gainesville, FL 32611, USA
a r t i c l e i n f o a b s t r a c t
Article history: Although most phylogenetic investigations are motivated by questions about the evolution of morpho-
Received 20 October 2014
logical attributes, morphological data are increasingly rare as a source of characters for reconstructing
Received in revised form 16 March 2015
phylogeny, in part because these attributes are time consuming to collect. Here we describe methods
Accepted 17 March 2015
to mine the information contained in classifications as a source of phylogenetic characters, using the
Available online xxx
classification of actiniarian sea anemones (Cnidaria: Anthozoa) as our exemplar system. Our natural lan-
guage processing pipeline recovers more than 400 characters in the most widely-used classification of sea
Keywords:
anemones. However, the majority of these are problematic, reflecting semantic or logical inconsistencies
Actiniaria
Cnidaria or being scored for only a single taxon and thus inappropriate for phylogenetic reconstruction. Although
Systematics the classification cannot be directly translated into a phylogenetic matrix, the exposure of the characters
Matrix that underlie a classification provide important perspective into the basis and limits of a classification
system and offer a valuable starting point for the creation of a phylogenetic matrix.
© 2015 Published by Elsevier GmbH.
1. Introduction disagreed about whether the group was monophyletic and about
how to interpret and link the taxa within the order (reviewed in
Actiniarian sea anemones are conspicuous members of marine Daly et al., 2007; Rodríguez et al., 2014). A stable classification
habitats, dominating some shallow water and polar communities arose through collaboration between the two most prolific schol-
and playing significant roles in reef, hydrothermal, and shelf sys- ars of actiniarian biology, Oskar Carlgren (Swedish, 1865–1954) and
tems (Fautin, 1989; Fautin et al., 2013). Because the actiniarian Thomas A. Stephenson (British, 1898–1961) when Carlgren (1949)
communities of most habitats and on most continents comprise codified and revised the system initially proposed by Stephenson,
diverse lineages, this ecological breadth is probably not the result 1920, 1921, 1922. Stephenson’s (1949) contribution of the preface
of in situ radiations, but instead reflects ancient diversity, a pat- to Carlgren’s classification highlights this as a consensus system
tern also seen in their close relatives, scleractinian corals (Barbeitos largely agreed upon by both of them. Carlgren (1949) divides the
∼
et al., 2010; Stolarski et al., 2011). 1200 species of Actiniaria known into three suborders; the largest
Like most cnidarians, actiniarians have relatively simple bodies: encompasses the vast majority of species and is further subdivided
an actiniarian is a tubular, tentaculate polyp whose body consists of into superfamilies (mistakenly referred to as “tribes”). Carlgren
highly folded and extruded sheets of one-to-three cell layers of tis- (1949) also synthesized the diversity of Ptychodactiaria, a group he
sue. Although simple in anatomy compared to triplobastic animals, recognized as an order but that is now classified as a family within
actiniarians show the greatest polyp-level diversity of Cnidaria, Actiniaria (reviewed by Rodríguez et al., 2014).
with complex interior anatomy, several unique anatomical struc- Carlgren’s (1949) classification has been challenged by the
tures, and diversity in the morphology of the column and tentacles discovery of new taxa (e.g., Fautin and Hessler, 1989; Daly and
(Daly et al., 2007). Goodwill, 2009; Rodríguez et al., 2009), consideration of new
The synthesis of this diversity into a coherent framework posed character systems (e.g., Schmidt, 1969, 1974), reexamination of
a challenge for 19th and early 20th century systematists, who characters in detail (e.g., Cappola and Fautin, 2000), and phyloge-
netic analyses (e.g., Daly et al., 2002, 2008; Gusmão and Daly, 2010;
Rodríguez et al., 2008, 2012, 2014). Although these challenges have
∗ empirical backing, they are more limited in taxonomic scope and
Corresponding author. Tel.: +1 614 247 8412; fax: +1 6142927774.
in the breadth of morphological features considered than is the
E-mail addresses: [email protected] (M. Daly), lendara@flmnh.ufl.edu
(L.A. Endara), gburleigh@ufl.edu (J.G. Burleigh).
http://dx.doi.org/10.1016/j.jcz.2015.03.004
0044-5231/© 2015 Published by Elsevier GmbH.
Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore
the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004
G Model
JCZ-25325; No. of Pages 7 ARTICLE IN PRESS
2 M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx
classification of Carlgren (1949). However broad it is in terms of for each taxa. In its default setting, which we used, MatrixGener-
data and scope, Carlgren’s (1949) system is replete with contra- ator uses the hierarchical classification to fill in character states
dictions and arbitrariness in terms of the implied hierarchy of for taxa of the lower taxonomic ranks. For example, if a family
characters (reviewed in Daly et al., 2008; Rodríguez et al., 2012, description contained a specific character state, then all of the gen-
2014). Carlgren’s system is not phylogenetic in the modern sense, era within that family would also be coded for that character state.
although there are indications that he viewed some of the higher Currently, the ETC website, specifically the “Text Capture” option
taxa as having phylogenetic cohesiveness (Carlgren, 1942), and (Fig. 1B), includes software that enables users to perform all of these
he explicitly recognized (Carlgren, 1949: 7) that the system was steps to obtain character datasets from taxonomic descriptions
built upon imperfect information. Resolving the conflict between (http://etc-dev.cs.umb.edu/etcsite/).
Carlgren’s classification, phylogenetic analyses of anemones, and Finally, we used “MatrixConverter” software to evaluate the
the information embodied in character systems not considered by characters data output by MatrixGenerator. MatrixConverter is a
Carlgren (1949) requires careful study of diverse character systems freely available, platform-independent software program designed
in a phylogenetic context. to facilitate the transformation of raw phenomic character data into
This seemingly daunting task is made easier by new tools that discrete character matrices (Liu et al., 2015). It takes as input the
facilitate the extraction of character information from monographs. tab-delimited character files output from MatrixGenerator (or the
Semi-automated text mining and natural language processing “Text Capture” option of the ETC website) and provides an easy
(NLP) programs designed for the concise and technical format to use interface that enables users to evaluate the characters and
of formal taxonomic descriptions enable extraction of the infor- ultimately code them as discrete character states for evolutionary
mation embodied in monographs and text-based descriptions of inference. For example, in our analysis the initial list of charac-
species (e.g., Cui, 2012; reviewed by Burleigh et al., 2013). These ters contained duplicate and nonsense characters. Duplicates are
tools render accessible centuries’ worth of biodiversity informa- those features clearly referring to the same structure character con-
tion, allowing for explicit examination of characters and data. The cept but using different words (e.g., shape of “base,” “basal disc,” or
parsing of descriptive narratives into characters exposes data that “pedal disc”); nonsense characters are those features that are log-
underlies classifications, proposals of synonymy, or hypotheses of ically inconstant (e.g., “count” and “presence” as two independent
relationship and thus allows these individual studies to be synthe- characters for a structure that can only occur singly, like the column,
sized and compared. or a position-based attribute of a feature defined by its position,
We parse Carlgren’s (1949) classification using a series such as “position of the proximal end”). Identification of duplicates
of NLP and phylogenetic character discovery tools that were and nonsensical features requires expertise with the organismal
developed as part of the AVAToL Next Generation Phenomics system. We identified these by grouping the putative characters
project (Burleigh et al., 2013). The functions of these pro- based on the system of origin (e.g., tentacle, actinopharynx) and
grams are all implemented online as part of the ETC website then scrutinizing each feature. We also made comparisons across
(http://etc-dev.cs.umb.edu/etcsite/). The parsing pipeline first uses character systems for characters to identify logically or structurally
CharaParser (Cui, 2012) to identify the characters and character related features. This process is referred to as “collapsing and edit-
state data from the diagnoses in Carlgren (1949) and then uses ing” in the results and discussion.
MatrixGenerator and Matrix Converter (Liu et al., 2015) to eval-
uate the characters and character states. These tools expose the
3. Results and discussion
character data contained in the classification, allowing us to evalu-
ate the number, representativeness, and nature of the features that NLP processing of Carlgren (1949) recovered 418 raw charac-
implicitly underlie Carlgren’s groupings. ters; collapsing and editing of these results reduced the total to
259 putative characters (Appendix A). These span anatomical sys-
tems (Fig. 2), but are weighted towards aspects of external anatomy
2. Methods
(including features of the column and tentacles) and the arrange-
We first obtained a text-only version of Carlgren’s (1949) clas- ment and morphology of the mesenteries. It is not surprising that
sification from the Tree of Life web portal (Fautin et al., 2000). This these are the dominant sources of information; these features are
included 168 taxonomic descriptions written in telegraphic format, easily accessible, requiring (in most cases) no special equipment to
including descriptions from 6 superfamilial taxa, 42 families, and observe and describe them. The majority of these putative charac-
133 genera. We used the CharaParser NLP software to semantically ters are presence/absence attributes or quantitative features like
parse the taxonomic descriptions, identifying phenomic characters number of mesenteries per cycle. Attributes of the mesenterial
and specific character states (Cui, 2012). During the initial Chara- and column musculature are the fourth- and fifth- most abundant
Parser steps, the text words are analyzed based on their position source of characters (Fig. 2); even if aggregated together as “mus-
in the text, and they are compared to a built-in glossary. Chara- cle histology,” these systems account for fewer characters than do
Parser classifies the words into three classes: structures, characters, the tentacles. The anatomy and arrangement of the aboral end is
and other terms. This machine-made classification needs to be a richer source of characters than is the anatomy of the oral end,
reviewed by the user, and words that were not recognized need even if attributes of the actinopharynx are combined with those of
to be categorized to provide the system with additional informa- the oral disc proper (Fig. 2).
tion for the parsing step. For the classification, words were loaded Although Carlgren (1949) included a list of technical terms, his
into the Ontology Term Organizer (OTO; Huang et al., 2012). OTO classification nonetheless relied on characters that are ambiguously
provides an interface in which the user can categorize terms by defined (e.g., “marginal spherules”, see England, 1987; Daly, 2003
dragging and dropping them into predefined categories (Fig. 1). “vesicles”, Häussermann, 2004), a problem that complicates any
After the terms are categorized, CharaParser is run one more time. direct translation of the putative features he used into characters for
The output of CharaParser is an annotated XML file that phylogenetic analysis or subsequent classification. Other attributes
describes the semantic mark-up of the text descriptions. The used by Carlgren may reflect preservation artifacts (England, 1987),
MatrixGenerator software transforms the XML output file from reducing either the number of states (e.g., for marginal sphincter
CharaParser into a tab-delimited file of phenomic character muscles) or the number of putative characters (e.g., for tissue layer
descriptions. Each column in the tab-delimited file represents a thickness). Although we removed obvious duplicates, at least some
character identified by CharaParser, with the character-states listed of the characters that remain are not independent, and therefore
Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore
the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004
G Model
JCZ-25325; No. of Pages 7 ARTICLE IN PRESS
M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx 3
Fig. 1. Diagram of the semi-automated NLP parsing method used to transform taxonomic descriptions into taxon/character matrices for evolutionary inference. (A) Software
pipeline using CharaParser, Ontology Term Organizer, MatrixGenerator, and Matrix Converter software. (B) Web-based method using the explorer of taxon concepts (ETC)
website. Arrows indicate the corresponding functions of the tabs in the ETC pipeline.
potentially problematic for phylogenetic analyses. For example, the tend to be defined by a one to a few autapomorhic features that are
number of mesenteries and the number of tentacles are related thus scored for all subordinate taxa.
across Actiniaria (Stephenson, 1928), but in a way that defies As acknowledged by Stephenson in his preface, attributes of
one-to-one correspondence. Other putative characters embody or cnidae are relatively under-sampled in Carlgren (1949). These com-
conflate multiple variables. Disparate attributes may be lumped prise 3.5% of the characters identified via NLP. All of the diagnoses
under the rubric of “architecture” or “arrangement” characters. For of genera and family in Carlgren (1949) include some information
example, the physa is treated by Carlgren (1949) distinct from a on cnidae, but the information is typically not very precise, indi-
pedal disc, but these are manifestations of the same body part; the cating only the presence of some kind of cnidae someplace within
shared elements are obscured by this coding, as are the many vari- the body. This broad information does not capture finer-scale dif-
ables that contribute to defining a physa (see Daly and Ljubenkov, ferences in terms of the tissue-level distribution of nematocysts
2008). and may ignore phylogenetic information (Fautin, 1988; Rodríguez
Carlgren’s (1949) classification, like that of his predecessors, is et al., 2009). Furthermore, the characters of cnidae in Carlgren
based on authoritative interpretation of those features that had (1949) clearly require subdivision in terms of the diversity of cap-
been used in species descriptions and superspecific classifications. sules: Schmidt (1969), Mariscal (1974), and England and Robson
However, the hierarchy of the classification conflicts with the hier- (1991) have highlighted the diversity of forms subsumed under
archy of features inferred from phylogenetic analyses. For example, the categories used by Carlgren (1949) (e.g., “mastigophore” or
Carlgren (1949) considers the anatomy and musculature of the “basitrich”).
aboral end to be highly significant and uses this to circumscribe The collapsed and edited data set includes 87 putative characters
the superfamilal group Athenaria; phylogenetic analysis suggests scored only for a single taxon; this represents 34% of the collapsed
that these features have had a complex evolutionary history, with and edited data. External anatomy and organization of the column
multiple independent losses (Daly et al., 2002, 2008; Rodríguez (61%), column musculature (58%), ciliated filaments (54%), and oral
et al., 2012, 2014). This hierarchy clearly guided the reporting of end (50%), represent a relatively greater proportion of singleton
character information in Carlgren (1949): for example, relatively characters. In contrast, aboral end (24%), mesenterial musculature
few features of the aboral end are reported for taxa outside of (18%), and actinopharynx (9%), contribute proportionally fewer sin-
Athenaria. The granularity of classification is heterogeneous, at gleton characters. Because the input data were hierarchical, with
least with respect to group size: the vast majority of families con- the information in higher-level descriptions being applied to all
tain fewer than six genera and only three have ten or more genera. of the subordinate taxa, the features used in familial and subordi-
This heterogeneity significantly impacts the distribution of charac- nal classification are less likely to be among the singleton features,
ter information because smaller (fewer genera or species) families except in the case of monogeneric higher taxa. Thus, we infer that
Carlgren (1949) relies relatively more on attributes of the sys-
Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore
the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004
G Model
JCZ-25325; No. of Pages 7 ARTICLE IN PRESS
4 M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx
Our results demonstrate the promise of new NLP methods to
rapidly extract character data from legacy taxonomic literature.
Creating the initial dataset of 418 characters took only a matter of
a few hours on a laptop, and this time could be greatly reduced
if the characters were mapped against an existing comprehen-
sive glossary of characters for the group. These analyses expose
the information in legacy taxonomic treatments to scrutiny and
make these data computable, allowing us to quantify the basis
of classification systems and to undertake evolutionary inference.
Yet our analyses also highlight the critical importance of expert
evaluations of the results of the semi-automated NLP pipeline. Sim-
ply using this raw output can be problematic; nearly 40% of the
raw characters were duplicate or nonsense characters, and oth-
ers may still be non-independent or contain non-homologous or
otherwise incomparable character states. Incorporating detailed
ontological information for the characters and character states in
the NLP pipeline may reduce the number of problematic characters,
but the results still must be carefully reviewed by an expert. Thus,
the NLP pipeline does not eliminate the necessity of morphological
expertise to obtain quality phenotypic datasets, but it does greatly
expedite the process of obtaining this data for scientists with this knowledge.
Acknowledgements
We thank Hong Cui Jing Liu Maureen O’Leary, Thomas Roden-
hausen, and Elvis H. Wu for their support and feedback in the
development of the NLP pipeline and Carsten Leuter, Lars Vogt and
Fig. 2. Diagrammatic longitudinal section through a sea anemone, showing the
Gerhard Scholtz for their support in organizing the “e-morphology”
anatomical distribution of characters used in Carlgren’s (1949) classification. Num-
symposium and symposium volume for ICIM-3. The software was
bers in the circle refer to the number of collapsed and edited characters. Cnidae
apply to or reflect the whole organism, not only the tentacles as might be inferred developed with support from the US National Science Foundation.
from the placement of the circle. Refer to Appendix A for list of collapsed and edited characters.
Appendix A.
tems in which singletons are under-represented for subordinal Characters distilled from Carlgren’s (1949) classification using
and familial classification, and that his genus-level taxonomy relies the NLP-pipeline described in the text. Duplicate and nonsense
more heavily on those systems in which singletons are relatively characters have been culled, but this list likely contains features
over-represented. The systems (or particular characters) for which that are not logically independent.
only one taxon has been scored are not necessarily inapplicable
Tentacles Architecture of arm
or inappropriate for more inclusive groups. Instead, we interpret
Architecture of marginal tentacle
these to represent a relatively large and untapped set of features
Architecture of tentacle
ripe for further study.
Arrangement of arm
The value of character matrices derived from legacy data for Arrangement of inner tentacle
evolutionary inference ultimately may be limited by the lack of Arrangement of knob
Arrangement of outer (marginal) tentacle
phylogenetic perspective in the original classification rather than
Arrangement of outgrowth
the NLP software. The large percentage of single-taxon characters
Arrangement of tentacle
from Carlgren’s (1949) text is not uncommon for legacy taxonomic
Count of aperture
classifications (Endara and Burleigh, personal observation). These Count of endocoelic tentacle
Count of exocoelic tentacle
may represent autapomorphies, which may be useful to diagnose
Count of marginal tentacle
a taxon but not classify it relative to other taxa. In other cases,
Count/presence of circlet
the failure to note the absence of characters in taxonomic descrip-
Count/presence of arm
tions, or more generally the lack of parallel descriptions, also often Fragility of tentacle
produces a number of single-taxon characters that could be evo- Function of tentacle
Growth order or position of inner tentacle
lutionarily informative presence/absence characters. Finally, the
Growth order or position of outer tentacle
hierarchical structure of the descriptions poses a challenge for
Length of endocoelic tentacle
classification, especially if the higher-level taxa do not represent
Length of exocoelic tentacle
clades or if the character-state descriptions for higher taxa do not Orientation of endocoelic tentacle
Orientation of exocoelic tentacle
precisely represent the character states for all of the lower-level
Orientation of longitudinal muscle
taxa. It is straightforward to implement the MatrixGenerator soft-
Position of discal tentacle
ware so that it does not ascribe characters to lower-level taxa
Presence of aboral thickening
based on descriptions of higher-level taxa, but in the case of the Presence of tentacle
extremely hierarchical description of Carlgren (1949), this would Presence/count of knob
Prominence of exocoelic tentacle
greatly reduce the number of characters and increase the amount of
Prominence of discal tentacle
missing data in the matrices. We minimally recommend a carefully
Prominence of endocoelic tentacle
evaluation of these higher-level characters that result from the NLP
Relief of discal tentacle
pipeline.
Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore
the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004
G Model
JCZ-25325; No. of Pages 7 ARTICLE IN PRESS
M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx 5
Reproduction of tentacle Size of protuberance
Shape of arm Size of scapus
Shape of discal tentacle Size of tubercle
Shape of endocoelic tentacle Texture of column
Shape of exocoelic tentacle Width of band
Shape of knob
Aboral end Architecture of aboral end
Oral end Architecture of ectodermal longitudinal muscles Architecture of physa
Architecture of oral disc Count of annulus
Count of conchula Count of pore
Presence/count of lobe Development of aboral body-end
Presence/shape of conchula Development of basal disc
Prominence of lobe Development of physa
Prominence of mouth Function of aboral end
Shape of lobe Prominence of pedal disc
Shape of oral disc Prominence of physa
Reproduction of base
Column organization and (Relative) width of body
Shape of base
external anatomy
Texture of margin
Architecture of band
Width of pedal disc
Architecture of capitulum
Architecture of edge Column muscles and wall Architecture of mesogloeal sphincter
Architecture of longitudinal band Arrangement of longitudinal muscle
Architecture of marginal pseudospherule Depth of ectoderm
Architecture of row Development of longitudinal muscle
Arrangement of adhesive verrucae Variability of sphincter
Arrangement of battery Development of mesogloeal sphincter
Arrangement of lower part of column Development of endodermal sphincter
Arrangement of nemathybome Position of sphincter
Arrangement of tenaculus Presence/nature of sphincter
Count of battery Prominence of longitudinal muscle
Count of longitudinal row Texture of longitudinal muscle
Count of marginal spherules Architecture of endodermal sphincter
Count of mesogloeal papillae Texture of endodermal sphincter
Count of outgrowth Shape of mesogloea
Count of papillae Relative thickness of ectodermal layer
Count of pseudospherule Relative thickness of endodermal layer
Count of row Relative thickness of endodermal muscle
Count of weaker papillae Relative thickness of longitudinal muscle
Development of cuticle Relative thickness of mesogloeal layer
Development of fosse
Mesentery arrangement and Architecture of ventrolateral mesentery
Development of protuberance
structure
Development of pseudospherule
Architecture of lateral mesentery
Development of verruca
Architecture of couple
Orientation or position of row
Architecture of dorsal directive
Position of band
Architecture of gonad (gametogenic tissue)
Position of longitudinal row
Architecture of macrocneme
Presence of scapus
Architecture of mesentery
Presence of fosse
Architecture of microcneme
Presence of marginal spherule
Architecture of stronger mesentery
Presence of parapet
Architecture of stronger partner
Presence of pseudospherule
Architecture of ventral directive
Prominence of battery
Architecture of weaker partner
Prominence of differentiation
Arrangement of cycles
Prominence of fold
Arrangement of directive
Prominence of longitudinal furrow
Arrangement of gonad (gametogenic tissue)
Prominence of longitudinal row
Arrangement of macrocnemes
Prominence of parapet
Arrangement of mesentery
Prominence of periderm
Arrangement of youngest pair
Prominence of region
Count of cycles
Prominence of verrucae
Count of lateral mesentery
Prominence of vertical row
Count of macrocnemes
Relief of capitulum
Count of mesentery
Relief of scapus
Count of microcneme
Relief of scapulus
Count of stronger mesentery
Relief of vesicle
Count of youngest cycle
Relief or texture of scapulus
Development of mesentery
Shape of column
Development of partner
Shape of margin
Count of couple
Shape of papilla
Development of youngest cycle
Shape of parapet
Development of youngest mesentery
Shape of scapus
Growth order of cycles
Shape of tubercle
Growth order of mesentery
Shape of vesicle
Growth order of lateral endocoel
Relative size of column
Orientation of mesentery
Size of adherent area
Orientation of partner
Size of band
Position of oldest cycle
Size of body
Position of stronger mesentery
Size of fold
Position or prominence of genital mesentery
Size of nematocyst battery
Presence/count of pedicel
Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore
the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004
G Model
JCZ-25325; No. of Pages 7 ARTICLE IN PRESS
6 M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx
Presence/count of directive References
Prominence of cycles
Prominence of endocoel
Barbeitos, M.S., Romano, S.L., Lasker, H.R., 2010. Repeated loss of coloniality and
Prominence of mesentery
symbiosis in scleractinian corals. Proc. Nat. Acad. Sci. 107 (26), 11877–11882.
Reproductivity of mesentery Burleigh, J.G., Alphonse, K., Alverson, A.J., Bik, H.M., Blank, C., Cirranello, A.L., Cui,
Reproductivity of directive H., Daly, M., Dietterich, T.G., Gasparich, G., Irvine, J., Julius, M., Kaufman, S., Law,
Reproductivity of microcneme E., Liu, J., Moore, L., O’Leary, M.A., Passarotti, M., Ranade, S., Simmons, N.B.,
Reproductivity of primary mesentery Stevenson, D.W., Thacker, R.W., Theriot, E.C., Todorovic, S., Velazco, P.M., Walls,
R.L., Wolfe, J.M., Yu, M., 2013. Next-generation phenomics for the Tree of Life.
Reproduction of ventral directive
PLoS Curr. Tree Life 2013 (June 26 Edition 1),
Size of adjacent mesenterial
http://dx.doi.org/10.1371/currents.tol.085c713acafc8711b2ff7010a4b03733.
Size of directive mesentery
Cappola, V.A., Fautin, D.G., 2000. All three species of Ptychodactiaria belong to
Size of partner mesenteries
order Actiniaria (Cnidaria: Anthozoa). J. Mar. Biol. Assoc. UK 80, 995–1005.
Size of mesentery
Carlgren, O., 1949. A survey of the Ptychodactiaria: Corallimorpharia and Actiniaria
Size of smaller mesentery
Kung. Svenska Vetensk. -Akad. Handling 1, 1–121.
Variability of directive
Cui, H., 2012. Charaparser for fine-grained semantic annotation of organism
Variability of members of a pair
morphological descriptions. J. Am. Inf. Sci. Tech. 63, 738–754.
Width of mesentery
Daly, M., 2003. The anatomy terminology, and homology of acrorhagi and
Reproduction of stronger mesentery pseudoacrorhagi in sea anemones. Zool. Verhand. 342, 89–101.
Reproduction of lateral couple Daly, M., Brugler, M.R., Cartwright, P., Collins, A.G., Dawson, M.N., Fautin, D.G.,
France, S.C., McFadden, C.S., Opresko, D.M., Rodríguez, E., Romano, S.L., Stake,
Mesenterial muscles Architecture of retractor
J.L., 2007. The phylum Cnidaria: a review of phylogenetic patterns and
Count of basilar muscles
diversity 300 years after Linnaeus. Zootaxa 1668, 127–182.
Count of folds
Daly, M., Chaudhuri, A., Gusmão, L., Rodríguez, E., 2008. Phylogenetic relationships
Count of parietobasilar muscles among sea anemones (Cnidaria: Anthozoa: Actiniaria). Mol. Phyl. Evol. 48,
Count of retractor muscles 292–301.
Development of basilar muscles Daly, M., Goodwill, R.H., 2009. Andvakia discipulorum a new species of burrowing
Development of parietal muscles sea anemone from Hawaii with a revision of Andvakia Danielssen 1890. Pac.
Development of parietobasilar muscles Sci. 63, 265–267.
Daly, M., Lipscomb, D.L., Allard, M.W., 2002. A simple test: evaluating explanations
Development of retractor
for the relative simplicity of the Edwardsiidae (Cnidaria: Anthozoa). Evolution
Length of muscles
56, 502–510.
Position of parietobasilar muscles
Daly, M., Ljubenkov, J.C., 2008. Edwardsiid sea anemones of California (Cnidaria:
Position of retractor muscles
Actiniaria: Edwardsiidae), with descriptions of eight new species. Zootaxa
Prominence of pennon
1860, 1–27.
Shape of parietal muscle
England, K.W., Robson, E.A., 1991. Nematocysts of sea anemones (Actiniaria
Shape of parietobasilar muscle
Ceriantharia and Corallimorpharia: Cnidaria): nomenclature. Hydrobiologia
Shape of retractor muscle
216–217, 691–697.
Size of basilar muscle England, K.W., 1987. Certain Actiniaria (Cnidaria: Anthozoa) from the Red Sea and
Size of parietal muscle tropical Indo-Pacific Ocean. Bull. Brit. Mus. Nat. Hist. (Zoology) 53, 205–292.
Size of parietobasilar muscle Fautin, D.G., 1988. Importance of nematocysts to actinian taxonomy. In: Hessinger,
Size of retractor muscle D.A., Lenhoff, H.M. (Eds.), The Biology of Nematocysts. Academic Press, San
Width of parietobasilar muscle Diego, pp. 487–500.
Fautin, D.G., 1989. Anthozoan dominated benthic environments. Pro. 6th Int. Coral.
Width of retractor muscle
Reef. Symp. 3, 231–236.
Siphonoglyph and associated Architecture of actinopharynx Fautin, D.G., Hessler, R.R., 1989. Marianactis bythios, a new genus and species of
structures actinostolid sea anemone (Coelenterata: Actiniaria) from the Mariana vents.
Architecture of siphonoglyph Proc. Biol. Soc. Washington 102 (4), 815–825.
Architecture of ventral siphonoglyph Fautin, D.G., Malarky, L., Soberón, J., 2013. Latitudinal diversity of sea anemones
(Cnidaria: Actiniaria). Biol. Bull. 224 (2), 89–98.
Arrangement of siphonoglyph
Fautin, D.G., Romano, S.L., Oliver, Jr. W.A., 2000. Zoantharia. Sea Anemones and
Presence and count of siphonoglyph
Corals. Version 04 October 2000.
Development of siphonoglyph
http://tolweb.org/Zoantharia/17643/2000.10.04 in The Tree of Life Web
Length of siphonoglyph
Project, http://tolweb.org/
Reproduction of siphonoglyph
Gusmão, L.C., Daly, M., 2010. Evolution of sea anemones (Cnidaria: Actiniaria:
Size of siphonoglyph
Hormathiidae) symbiotic with hermit crabs. Mol. Phylo. Evol. 56,
Variability of siphonoglyph 868–877.
Width of siphonoglyph
Häussermann, V., 2004. Identification and taxonomy of soft-bodied hexacorals
exemplified by Chilean sea anemones; including guidelines for sampling,
Ciliated filaments and Arrangement of ciliated tract
preservation and examination. J. Mar. Biol. Assoc. UK 84, 931–936.
associated structures
Huang, F., Macklin, J., Morris, P.J., Sanyal, P.P., Morris, R.A., Cui, H., 2012. OTO:
Presence of nematosome
ontology term organizer. J. Am. Inf. Sci. Tech. 49, 1–3.
Architecture of filament
Liu, J., Endara, L., Cui, H., Burleigh, J.G., 2015. MatrixConverter: facilitating
Development of filament
construction of phenomic character matrices. App. Plant Sci. 3, 2.
Prominence of ciliated tract
Mariscal, R.N., 1974. Nematocysts. In: Muscatine, L., Lenhoff, H.M. (Eds.),
Structure of ciliated tract Coelenterate Biology: Reviews and New Perspectives. Academic Press, San
Presence of acontia Diego, pp. 129–178.
Count of acontia Rodríguez, E., Barbeitos, M.S., Brugler, M.R., Crowley, L.M., Grajales, A., Gusmão, L.,
Development of acontium Häussermann, V., Reft, A., Daly, M., 2014. Hidden among sea anemones: the
first comprehensive phylogenetic reconstruction of the order Actiniaria
Width of acontium
(Cnidaria, Anthozoa, Hexacorallia) reveals a novel group of hexacorals. PloS
Arrangement of acontia
One 9, e96998.
Size (length) of acontium
Rodríguez, E., Barbeitos, M., Daly, M., Gusmão, L.C., Häussermann, V., 2012. Toward
Cnidae Arrangement of microbasic b-mastigophores a natural classification: phylogeny of acontiate sea anemones (Cnidaria,
Arrangement of nematocysts Anthozoa, Actiniaria). Cladistics 28, 375–392.
Rodríguez, E., Castorani, C.N., Daly, M., 2008. Morphological phylogeny of family
Presence of atrichs
Actinostolidae (Anthozoa: Actiniaria) with a description of a new genus and
Presence basitrichs
species of hydrothermal vent sea anemone redefining family Actinoscyphiidae.
Presence of microbasic amastigophores
Invert. Syst. 22, 439–452.
Abundance of nematocysts
Rodríguez, E., López-González, P.J., Daly, M., 2009. New family of sea anemones
Presence of spirocysts
(Actiniaria: Acontiaria) from deep polar seas. Polar Biol. 32, 703–717.
Prominence of microbasic b-mastigophore
Schmidt, H., 1969. Die Nesselkapseln der Aktinien und ihre
Shape of microbasic macrobasic amastigophores
differentialdiagnostishce Bedeutung. Helgol. Wiss. Meeresunters. 19,
284–317.
Schmidt, H., 1974. On evolution in the Anthozoa. Proc. 2nd Int. Coral Reef Symp. 1,
533–560.
Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore
the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004
G Model
JCZ-25325; No. of Pages 7 ARTICLE IN PRESS
M. Daly et al. / Zoologischer Anzeiger xxx (2015) xxx–xxx 7
Stephenson, T.A., 1920. On the classification of Actiniaria Part I. Forms with acontia Stephenson, T.A., 1922. On the classification of Actiniaria: Part III – definitions
and forms with a mesogleal sphincter. Q. J. Microsc. Sci. 64, connected with the forms dealt with in Part II. Q. J. Microsc. Sci. 66, 247–319.
425–574. Stephenson, T.A., 1928. The British Sea Anemones. The Ray Society, London.
Stephenson, T.A., 1921. On the classification of Actiniaria. Part II. Consideration of Stolarski, J., Kitahara, M.V., Miller, D.J., Cairns, S.D., Mazur, M., Meibom, A., 2011.
the whole group and its relationships, with special reference to forms not The ancient evolutionary origins of Scleractinia revealed by azooxanthellate
treated in Part I. Q. J. Microsc. Sci. 65, 493–576. corals. BMC Evol. Biol. 11 (1), 316.
Please cite this article in press as: Daly, M., et al., Peeking behind the page: using natural language processing to identify and explore
the characters used to classify sea anemones. Zool. Anz. (2015), http://dx.doi.org/10.1016/j.jcz.2015.03.004