<<

Reconstruction of Belemnite Evolution Using Formal Concept Analysis

Radim Belohlavek,1,2∗ Martin Koˇst’´ak3†, and Petr Osicka2 1State University of New York at Binghamton, Binghamton, NY [email protected] 2Palacky University, Olomouc, Czech Republic 3Charles University, Prague, Czech Republic

Abstract The paper presents results on using for- mal concept analysis in the problem of identification of taxa and reconstructing evolution from paleontological data. We present results of experiments performed with belemnites—a group of extinct marine cephalophods which seems particularly suit- Figure 1: Schematic sketch of belemnite body with able for such a purpose. We demonstrate position of the internal shell. Apical grey part repre- that the methods of formal concept analysis sents the rostrum. are capable of revealing taxa and relation- ships among them which are relevant from a paleobiological point of view. Coleoid played an important role in the ecosystem in both the Northern and Southern hemispheres. Inside this diverse group, es- 1 Introduction and Problem Setting pecially belemnites were a common part of nectonic An important system-theoretic concept is that of a assemblages in marine shallower water ecosystems. taxonomy, i.e. a classification scheme arranged in They belong to coleoids with internal shell. The ex- a hierarchical structure. Biological taxonomies and ternal morphology of belemnites resembles partly Re- the methods of devising them are perhaps the best cent squids and their behavior—larger sets of spec- known and most widely studied. There exist sev- imens concentrated probably during spawning acts. eral approaches to biological classification, with phy- However, the internal shell characteristic is strongly logenetics (cladistics) and phenetics (numerical taxon- different in both, non related groups. In omy) being perhaps the two most important methods belemnites, the internal shell is composed of three ma- [Dunn and Everitt, 2004; Jardin and Sibson, 1971; jor components—the most commonly preserved cal- Kitching et al., 1998; Sneath and Sokal, 1973]. The citic rostrum, aragonitic phragmoconus, and dorsal aim of this paper is to explore the idea of utilizing pro-ostracum. The belemnite taxonomy and system- formal concept analysis in identification of taxa and atics are based on rostrum characterstics. This hy- devising taxonomies in paleobiology. The basic idea is drodynamic and also hydrostatic organ shows gradual to identify taxa with formal concepts which are par- changes in their evolutionary history. A schematic ticular groupings of objects characterized by sharing sketch of a belemnite body is shown in Fig. 2. For [ certain properties (attributes). detailed morphologic features of rostrum see Chris- tensen, 1997; Koˇst’´aket al., 2004]. The overall goal of our research is to find out The most important morphologic features in ros- whether and how formal concept analysis may help trum morphology are size, shape and especially the in identification of taxa and reconstructing evolution part called the alveolar end and/or fracture. From from paleobiological data. In the paper, we re- quite a large systematic group (order ), port some of our first results which we obtained from we chose a part of family Belemnitellidae—a domi- data about a particular group of extinct cephalophods nant Upper Cretaceous belemnite group. called belemnites, which are similar to modern squid. As no soft body parts are preserved for taxonomic ∗The research of R. Belohlavek was supported by grant distinction, belemnite taxonomy is mainly based on no. MSM 6198959214 and by grant no. P103/10/1056 of the following morphological characteristics of the the Czech Science Foundation. belemnite rostrum: † The research of M. Koˇst’´akwas supported by grant • Shape and size of the rostrum. no. 205/07/1365 of the Czech Science Foundation and by grant no. MSM 0021620855. • Structure of the alveolar end. • Internal structures at alveolar end. The thus introduced arrow operators form a Galois • External characteristics of the rostrum. connection between X and Y and play an important role in FCA. The set of all formal concepts of hX,Y,Ii • Structure of apex. is denoted by B(X,Y,I). That is, Belemnites are extinct cephalopod group with no ↑ ↓ descendents. Their systematic, palaeoecology, palaeo- B(X,Y,I) = {hA, Bi | A = B,B = A}. biogeography and stratigraphy is based on palaeonto- Under a partial order ≤, defined for logical research. However, the simulation of belemnite hA1,B1i, hA2,B2i ∈ B(X,Y,I) by evolution opens new options for using various math- ematical models, for which the presented cephalopod hA1,B1i ≤ hA2,B2i iff A1 ⊆ A2 (iff B2 ⊆ B1), group is particularly suitable. B(X,Y,I) happens to be a complete lattice, so- called concept lattice associated to hX,Y,Ii [Gan- 2 Formal Concept Analysis ter and Wille, 1999]. The partial order ≤ rep- Formal concept analysis (FCA) is a method for resents the subconcept-superconcept (generalization- data analysis with applications in various domains specialization) relationship between formal concepts. (see [Ganter and Wille, 1999] for foundations and That is, hA1,B1i ≤ hA2,B2i means that formal con- [Carpineto and Romano, 2004] for applications). The cept hA2,B2i (representing e.g. mammal) is more gen- input to FCA consists of a data table describing a eral than hA1,B1i (e.g. dog). Efficient algorithms for set X = {1, . . . , n} objects and a set Y = {1, . . . , m} computing B(X,Y,I) exist. of attributes. The table specifies attribute values of The concept lattice is visalized by a so-called la- the objects. The main aim in FCA is to reveal from belled line diagram (or Hasse diagram) [Ganter and the data a hierarchically organized set of particular Wille, 1999]. The diagram consists of nodes, some of clusters, called formal concepts, and a small set of which are connected by lines. The formal concepts are particular attribute dependencies, called attribute im- represented by nodes, the subconcept-superconcept plications. FCA aims at formalizing and utilizing a relationship is represented by the lines, and a particu- traditional theory of concepts, in which a concept is lar way of labeling the nodes by object and attribute understood as an entity consisting of its extent (col- names is used. In particular every node (formal con- lection of all objects to which the concept applies) cept) is connected to all of its direct predecessors (for- and its intent (collection of all attributes to which are mal concepts which are more specific) and direct suc- characteristic for the concept). For example, the ex- cessors (formal concepts which are more general). The tent of the concept dog consists of all dogs while its diagram is easily understood by users and provides a intent consists of attributes such as barks, has tail, has hierarchical view on the data. four limbs, etc. The information extracted from data In several situations, the expert finds certain for- in FCA is well-comprehensible by users because FCA mal concepts in the concept lattice interesting and works with notions which humans are used to reason relevant while others not. One reason for this is with in the ordinary life. This is an important feature that in addition to the input data (object, attributes, of FCA which makes it appealing for users. their relationship), the expert may have further knowl- In the basic setting of FCA, attributes are assumed edge regarding the objects and attributes. Accord- to be binary, i.e. a given object either has or does not ing to this knowledge, the expert may regard some have a given attribute. A data table with binary at- formal concepts interesting (relevant) and the others tributes is represented by a triplet hX,Y,Ii, called a not. In [Belohlavek and Sklenar, 2005; Belohlavek formal context, which consists of the above-mentioned and Vychodil, 2009], we developed an approach to sets X and Y of objects and attributes, and a binary handling one particular type of a background knowl- relation I between X and Y (incidence relation, to- edge which we utilize in this paper. The idea is have relation). Thus, I ⊆ X × Y , hx, yi ∈ I indicates that the attributes may not be equally important to that object x has attribute y, hx, yi 6∈ I indicates that form concepts. To give a simple example, consider x does not have y. Objects x ∈ X correspond to table the following attributes of books: hardbound, paper- rows, attributes y ∈ Y correspond to table columns, back, engineering, science, philosophy. In a reason- and I is represented by 0s and 1s in the table en- able taxonomy of books, we naturally consider the tries (i.e., 1 indicates that the corresponding object attributes engineering, science, and philosophy more does have the corresponding attribute and 0 indicates important than hardbound or paperback. As a conse- the opposite). A formal concept of hX,Y,Ii is any quence, we would consider a formal concept character- pair hA, Bi of sets A ⊆ X (extent, set of objects to ized by (i.e. with its intent consisting of) hardbound which the concept applies) and B ⊆ Y (intent, set of (all books which are hardbound) not natural (relevant, attributes characterizing the concept) such that B is interesting). One the other hand, the formal con- just the set of attributes shared by all objects from A, cept characterized by engineering (books on engineer- and A is the set of all objects sharing all attributes ing), the one characterized by science, and perhaps from B. In symbols, this can be written as A↑ = B also the one characterized by engineering and paper- and B↓ = A, where back (paperback books on engineering) would be con- ↑ sidered natural. Such a background knowledge may A = {y ∈ Y | for each x ∈ A: hx, yi ∈ I}, be represented by so-called attribute-dependency for- B↓ = {x ∈ X | for each y ∈ B : hx, yi ∈ I}. mulas (AD-formulas) [Belohlavek and Sklenar, 2005; Belohlavek and Vychodil, 2009]. In our example, the AD-formula representing our background knowledge Table 1: Objects: selected belemnite species is belemnite species label Actinocamax verus antefragilis 0 hbound t pback v engineering t science t philosophy, Praeactinocamax primus 1 P. plenus 2 but in general, there background knowledge is repre- P. plenus cf. strehlensis 3 sented by a set of AD-formulas. An AD-formula over P. triangulus 4 a set Y of attributes is an expression D1 v D2 where P. aff triangulus 5 D1,D2 ⊆ Y . A formal concept hA, Bi of hX,Y,Ii is P. sozhensis 6 considered compatible with a set T of AD-formulas P. contractus 7 (hA, Bi satisfies T ) if for each D1 v D2 from T , if P. planus 8 D1 ∩ B 6= ∅ then D2 ∩ B 6= ∅. That is, a formal P. coronatus 9 concept characterized by hardbound is not compati- P. matesovae 10 ble with the above background knowledge because in P. medwedicicus 11 this case, B = {hardbound}, and thus the for above P. sp. 1 12 AD-formula D1 v D2 with D1 = {hbound, pback} P. sp. 2 13 and D2 = {engineering, science, philosophy}, we have P. strehlensis 14 D1 ∩ B 6= ∅ but D2 ∩ B = ∅. Intuitively, hA, Bi con- P. bohemicus 15 tains a less important attribute but does not contain P. aff. bohemicus 16 any of the prescribed more important attributes. The P. cobbani 17 set of all formal concepts of hX,Y,Ii compatible with P. manitobensis 18 T is denoted by BT (X,Y,I) and is considered the set P. cf. manitobensis 19 of all natural (interesting, relevant) formal concept in P. sternbergi 20 the data hX,Y,Ii given the background knowledge T . P. walkeri 21 Goniocamax intermedius 22 3 Belemnite Evolution in Time, Space G. surensis 23 The morphologic changes in time and space are well G. christenseni 24 documented especially in early belemnitellids [Koˇst’´ak, G.lundgreni 25 2004]. Relatively few morphologic features in belem- nite rostrum, clear taxonomy, and new systematic and • Recognized migration patterns and spreading stratigraphic revision (see below) make it possible to succession in time and space use formal methods for identification of taxa. Belemnites show an interesting model of migration • Quite simple morphology of preserved parts, with patterns and provincialism in and Cretaceous clear changes in time. coleoids. We have used the Upper Cretaceous belem- We have analyzed 26 species belonging to three gen- nites (Belemnitellidae). Their evolution centre lied in era in the Cenomanian–Coniacian interval (i.e., 97–87 the SE parts of the Russian Platform and mass migra- Ma). This phase of belemnite evolution shows the tions affected shallow seas in Central and NW Europe, highest freqency in morphologic changes. Thus, we Siberian areas and North America during the Cenoma- suppose this to be a very suitable for our purpose. nian through Coniacian. They show marked provin- cialism in the North European, the North American 4 Experiments and the East European Provinces [Christensen, 1997; Koˇst’´aket al., 2004]. The last belemnitellids became Dataset The data (formal context hX,Y,Ii) used in extinct at the K/T boundary (although, it concerns our experiment consists 26 belemnite species (objects only last 1-2 species; note: K/T boundary is the Cre- of X) depicted in Table 4 and 38 rostrum character- taceous/Tertiary boundary with mass extinction 65.5 istics (attributes of Y ) depicted in Table 2. Due to Ma). The fall in belemnitellid diversity is related to lack of space, we do not display the table representing the reduction of shallow sea areas, eustasy, and the which species have which attributes. radiation of new vertebrate groups (i.e. Neoselachii, Teleostei). The analysis selected results of which are Attribute priorities and AD-formulas The cor- provided below focuses on their evolution, particularly responding concept concept lattice B(X,Y,I) consists on the frequency of morphologic changes, endemicism, of 616 formal concepts is is thus quite large. From the and species and generic relationships. paleontological point of view, it contains both formal We chose the Upper Cretaceous belemnites from the formal concepts which seem natural and interesting Cenomanian–Coniacian interval for the folowing rea- as well as those which are not natural. To eliminate sons the latter, we employed AD-formulas (see Section 2). • Systemetic revision has been finished recently. Namely, it is indeed the case that a paleontologist considers some attributes more important than oth- • Clear stratigraphic position. ers. The use of AD-formulas to make his background • The evolutionary lineage from one ancestor is well knowledge explicit and to take it into account in the documented. subsequent analysis is therefore a natural solution. Table 2: Attributes and their groups Group/name label Size large guards (more 65mm) a medium guards (65-80mm) b small guards (less 65mm) c cigar shape in dorsovenral view d lanceolate in DV view e slightly lanceolate in DV view f Figure 2: Hasse diagram of the order set of attribute subcylindrical in DV view g subsets used for the AD formula generation conical in DV view h cigar shape in lateral view i lanceolate in L view j According to expert opinion, we partitioned the set slightly lanceolate in L view k Y of attributes into the seven groups shown in Ta- subcylindrical in L view l ble 2 and partially ordered them as depicted in Fig. 4. conical in L view m Then, for every two groups D1,D2 ⊆ Y of the seven Flattening groups such Y1 is a direct predecessor of Y2, we add laterally flattened n to the set T of our AD-formulas the AD-formula dorsally flattened o D1 v D2. For example, since the group Others B ventrally flattened p is a direct predecessor of Others A, we add Alveolar fracture, pseudoalveolus high conical alveolar fracture q β t γ t δ t  t η v y t z t α low conical alveolar fracture r to T . shallow pseudoalveolus (less 3mm) s deep pseudoalveolus (more 3mm) t Cross section The concept lattice The constrained concept lat- oval cross section of alveolar fracture u tice BT (X,Y,I) (containing only the formal concepts oval to triangular cross section of AF v which are compatible with the AD-formulas from T , triangular cross section of AF w see Section 2) contains 121 formal concepts only and is circular cross section of AF x depicted in Fig. 4. The concepts are labeled by num- Others A bers which we use below when interpreting the formal bottom of ventral fissure y concepts and taxa and their relationships. Due to lack dorsolateral furrows z of space, we postpone the description of all the formal venral furrow α concepts to a full version of this paper. Others B dorsolateral depressions β Interpretation The Upper Cretaceous belemnites granulation of the whole guard γ (family Belemnitellidae = belemnitellids in this pa- striation of the whole guard δ per) are represented by 7 genera: Actinocamax, Prae- mucro  actinocamax, Goniocamax, Gonioteuthis, Belemnella, vascular imprints frequent η Belemnellocamax, Belemnitella, and problematic gen- Others C era Belemnocamax and Fusiteuthis. While the phylo- conellae θ genic lineage going from Goniocamax to younger Go- granulation of a part of the guard ι nioteuthis, Belemnella, and Belemnitella is quite clear striation of a part of the guard κ and is connected with progressive calcification of the vascular imprints rare λ aleveolar part [Christensen, 1997], the relationships between earlier genera like the origin of the Late Cre- taceous belemnites are still unclear. The concept analysis strictly derived and separated genus Actinocamax from other belemnitellids (con- cept No. 3). We have used one species (subspecies) of Actinocamax verus antefragilis (the stratigraphic range of this species is very long and falls into the period of rapid evolution of another belemnite taxa). Table 3: Interesting formal concepts Concept Extent Intent 3 0 c d i n q x z 2 22 23 c e f l p t v z β λ  4 23 c d e f l p s t v z β κ λ  103 22 b c e f k l p t v z β 0 δ λ  1 3 8 11 79 104 116 8 2 3 7 14 15 16 17 18 b p r v α 5 2 107 24 9 14 26 12 73 105 20 21 11518 17 6 4 39 25 20 10 41 3115 7680 27 22 13 47 74 89 106 19 7 579634 49 21 42 43 23 488116 35 97 30 9128 54 75 90 79 10 b e l p s u z β δ λ  32 56 58 10851 66 62 44 4550 55 98 8277 37 36 29 68 11 1 2 3 7 8 14 16 17 b p r v β κ α 38 78 101 102 53 6311310946 6752 69 33 85 868392 20 21 88 104 1 2 19 20 a f p r u β κ α 114 70 40 103 59 60 64 116 6 a b c e k l p q u z β 61 65 71 72 94 93 δ 95 17 11 13 b l p t v β α 84 115 4 a b c e k l p t w z β 99 λ  α 110 100 18 11 22 b l p t v z β 78 13 b e l p t u v β κ α

40 9 b g m p t u z α 111 87 117 112 119 114 18 a b f k l p r s v w z β γ ι α 70 15 b f g l m p r v w β

γ ι κ α 118 95 17 b e f k l p r u v z β 120 ι κ  α 112 20 a b f k p r u v z β γ Figure 3: Hasse diagram of ordered set of 121 conceps ι κ λ  α resulting from the size reduction via AD formulas. 117 1 2 a b c e f g k l p r u z β δ κ α This and later species of Actinocamax are so similar 119 1 a b c d e f g j k l p each other that their distinguishing is possible just at r u z β δ κ α subspecies level. The concept analysis of this genus 118 2 a b c e f g k l p r u showed no relationship to other belemnite taxa. This v z β δ κ λ  α should be interpreted like an extreme derivation from 87 2 8 b e k l p r u z β δ λ belemnitellids (size, shape of the rostrum and alveolar  α fracture) and/or by presence of another belemnoid an- 84 2 7 8 b e k l p r u z β λ  cestors. In this respect, the Actinocamax evolutionary α lineage should be excluded from belemnitellids as their 102 14 b c e j k p r u v α ancestor probably does not belong to taxa related 101 1 14 b c e j k p r u α to earlier genus Praeactinocamax. Species of Prae- 90 1 3 b e j p r u β α actinocamax were commonly distributed in Euroasia 5 4 9 11 12 13 22 24 b p t and North America from the Cenomanian through 25 Coniacian (Santonian in Greenland). This genus oc- 56 24 25 b f l o p t u θ y z β curred a few milions years before the earliest Actinoca- δ λ  α max and the first species of this genus—P. primus is generaly considered [Naidin, 1964; Christensen, 1997] to be also the earliest species of belemnitellids. If this opinion is true, P. primus and its stright descendent P. plenus (primus/plenus group) are initial taxa for all younger belemnitellids (i.e. monophyletic group) and this must be clearly detected in concept analysis. If we look at concept No. 11, we can see attributes common to another taxa of this genus. Original mor- process. phologic features in primus/plenus group (also con- cepts No. 117,118, etc.) are present especially in the References East European species. [Belohlavek and Sklenar, 2005] R. Belohlavek and V. Of a high importance is concept No. 8. showing Sklenar. Formal concept analysis constrained attributes common for both conservative morphologi- by attribute-dependency formulas. LNCS (Proc. cal features in the East European species (group No. ICFCA 2005), 3403(2005), pp. 176-191. 14 (related to No. 11)) but also to relatively distinct Turonian North American (P. manitobensis, concept [Belohlavek and Vychodil, 2009] R. Belohlavek and No. 114, P. cobbani and P. sternbergi—concepts No. V. Vychodil. Formal Concept Analysis With 61, 95, 112) and the Late Turonian species from Cen- Background Knowledge: Attribute Priorities. tral and NW Europe (P. bohemicus, concept No. 70). IEEE Transactions on Systems, Man, and Cy- In this respect, we observe at least two different evo- bernetics, Part C. 39(4)(2009), 399–409. lutionary lineage, i.e. the expression of allopatric spe- [Carpineto and Romano, 2004] C. Carpineto and G. ciation in different geographic areas. Romano. Concept Data Analysis. Theory and Ap- High endemic species with unknown or poorly doc- plications. J. Wiley, 2004. umented phylogenetic relationship are P. sozhensis [Christensen, 1997] W. K. Christensen. The Late Cre- (concept No. 116) and P. matesovae (concept No. taceous belemnite family Belemnitellidae: Tax- 79). onomy and evolutionary history. Bulletin of the Another belemnite lineage is evolved also from Geological Society of Denmark. 44(1997), 59–88. the earliest species of Praeactinocamax by form- [Dunn and Everitt, 2004] G. Dunn and B. S. Everitt. ing a deeper pseudoalveolus (a space surrounding An Introduction to Mathematical Taxonomy. the phragmoconus). This evolutionary innovation is Dover, 2004. partly observable already in some specimens of P. [Ganter and Wille, 1999] B. Ganter and R. Wille. plenus population. So, the origin of this lineage took Formal Concept Analysis. Mathematical Founda- place probably in the Late Cenomanian. A typical tions. Springer, Berlin, 1999. morphologic features inside this morphologic lineage [Jardin and Sibson, 1971] N. Jardin and R. Sibson. are summarized in the concept No. 1. (see attributes Mathematical Taxonomy (Probability & Mathe- in concepts No. 1, 5). This evolutionary lineage shows matical Statistics). J. Wiley, 1971. markedly higher number of rare and endemic species (P. medwedicicus, P. coronatus, P. aff. triangulus, [Kitching et al., 1998] I. Kitching et al. Cladistics: P. sp. 1 and 2, etc.). The iteration could be under- Theory and Practice of Parsimony Analysis. Ox- stood also as an expression of allopatric speciation in ford University Press, 1998. the East European Cretaceous sea. This lineage is [Koˇst’´ak,2004] M. Koˇst’´ak.Cenomanian through the also important by the origin of the earliest species of lowermost Coniacian Belemnitellidae Pavlow genus Goniocamax (G. intermedius and G. surensis— (Belemnitida, Coleoidea) of the East European concepts No. 2,4,103) and advanced G. christenseni Province. Geolines. 18(2004), 59–109. and G. lundgreni (concept No. 56). The last men- [Koˇst’´aket al., 2004] M. Koˇst’´ak, S. Cech,ˇ B. Ekrt, tioned species couple is also considered to be an ances- M. Mazuch, F. Wiese, S. Voigt, C. J. Wood. tor of younger Belemnitella stock (including also genus Belemnites of the Bohemian Cretaceous Basin in Gonioteuthis). The transition between Goniocamax a global context. Acta Geologica Polonica. 54, and Belemnitella is gradual and relatively clear. 4(2004), 511–533. [Naidin, 1964] D. P. Naidin. Upper Cretaceous belem- 5 Conclusions nites from the Russian Platform and adjacent areas. Actinocamax, Gonioteuthis, Belemnelloca- The results demonstrate that natural taxa and inter- max (In Russian). Izdatelstvo Moskovskogo Go- esting relationships have been revealed using this ap- sudarstvennogo Universiteta, Moskva, 1964, 190 proach. In particular, the analysis confirmed: sepa- pp. rated Actinocamax lineage which is probably derived [Priss, 2003] U. Priss. Formalizing Botanical Tax- from another ancestor than in genus Praeactinoca- onomies. LNAI (Proc. ICCS 2003) 2746(2003), max; three different morphologic trends in genus pp. 309–322. Praeactinocamax, explained by allopatric speciation; clearly separated North American and rare NW and [Sneath and Sokal, 1973] P. H. A. Sneath, R. R. Central European species from the East Europaen Sokal. Numerical Taxonomy: The Principles and Praeactinocamax taxa; iteration with endemic taxa; Practice of Numerical Classification. W. H. Free- the genus origin and its derivation (Goniocamax from man & Co., 1973. Praeactinocamax). In future research, we plan to fo- cus on the following topics: further analyses of the present data (employ further methods of FCA); appli- cation of the method to further fossil groups; a long- term goal is to study how expert paleontologist form taxa and taxonomies and possible formalization of this