Expert Systems with Applications 38 (2011) 3009–3014

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

Precise classification within genus level based on simulated annealing aided cloud classifier ⇑ Erxu Pi a,b, Hongfei Lu a, , Bo Jiang a,c, Jian Huang a, Qiufa Peng a, Xiuyan Lin a a College of Chemistry and Life Science, Zhejiang Normal University, Jinhua 321004, China b Department of Biology and State (China) Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong, PR China c Changshu Institute of Technology, Changshu 215500, Jiangsu Province, China article info abstract

Keywords: This is a series research on plant numerical taxonomy, which provides a precise classification method for Plant numerical taxonomy the description, discrimination, and review of proposals for new or revised plant species to be recognized Quantitative attributes as taxon units within the genus level. We firstly used all the available quantitative attributes to build Shortest path based simulated annealing cloud models for different sections. Then, the shortest path based simulated annealing algorithm (SPSA) algorithm was applied for optimizing these models. After these, the optimized models were validated by the previ- Cloud model ously used quantitative attribute data. Results showed that cloud models’ accuracy rates of Sect. Tuber- Genus culata, Sect. Oleifera and Sect. Paracamellia were 85.00%, 60.00%, 80.00%. And we found some interesting overlaps between the type species and ‘expected species’ that the selected expected species Camellia oleifera and Camellia brevistyla are also type species of Sect. Oleifera and Sect. Paracamellia, respectively. Here we suggest that the expected species be served as an illustration in plant numerical taxonomy. Based on the simulated annealing aided cloud classifier, the taxon hedges, associated with ‘expected spe- cies’, were setting to advance our common understanding of sections and improve our capability to rec- ognize and discriminate plant species. These procedures provide a dynamic and practical way to publish new or revised descriptions of species and sections. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction the distances of species to their ‘expected species’ (Lu et al., 2009) in each section (or genus) should get the global minimum Cloud theory is now a popular theory handling uncertainty values before the best classification being gained (Pi et al., 2009). based on the uncertain transition between qualitative concept Then, the hedges problem could be converted into a shortest path and quantitative description (Li, Di, Li, & Shi, 1998; Li, Han, Chan, problem by studying its distance function and species adding/ & Shi, 1997; Li, Han, Shi, & Chan, 1998). Based on this theory, the removing heuristic rules, and the corresponding mathematical cloud classifier has been developed recent years for adaptive lin- model established for this problem (Dong, Wu, & Hou, 2009). guistic hedge (Lu, Pi, Peng, Wang, & Zhang, 2009). This classifier Hence, the SPSA may have good performances in optimizing the represents a qualitative concept with three digital characteristics, cloud classifier. expected value Ex, entropy r and deviation D (Di, Li, & Li, 1998a, In this research, the shortest path based simulated annealing 1998b), which integrates the fuzziness and randomness of a lin- aided cloud classifier (SPSACM) method is used for plant classifica- guistic term in a unified way. Our previously work (Lu et al., tion by analyzing leaf morphology and anatomy data which is 2009) applied this classifier in the plant numerical taxonomy. In partly from our previous work (Lin, Peng, Lv, Du, & Tang, 2008; which, the particle swarm optimization algorithm (PSO) was used Lu et al., 2008). Our purpose is to provide a basic tool in plant for optimizing the sections’ hedges. However, there is still some taxonomy. potential improvement left for us as the accurate rates is not high enough. In this work, we will apply the shortest path based simu- 2. Materials and methods lated annealing algorithm (SPSA) in optimizing the cloud classifier. SPSA is a new kind of dynamic multi-stage facility layout prob- 2.1. Materials lem under dynamic business environment, in which new species may be added into, or old species may be removed from their Adult leaves fully exposed to sunlight of of the genus original taxa. Since every section (or genus) has its own range, Camellia are collected from the International Camellia Species Gar- den in Jinhua city, including 15 species in Section Paracamellia ⇑ Corresponding author. Sealy: Camellia grijsii Hance, Camellia confuse Craib, Camellia kissi

0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.08.090 3010 E. Pi et al. / Expert Systems with Applications 38 (2011) 3009–3014

Wall., Camellia fluviatilis Hand.-Mazz., Camellia brevistyla Coh. St., the assigned sample size of one section, and di is the species num- Camellia hiemalis Nakai, Camellia obtusifolia Chang, Camellia malifl- ber of the ith section, then the replicate number (ri P 3) of each si ora Lindl, Camellia shensiensis Chang, Camellia puniceiflora Chang, species in the ith section is: ri ¼ . di Camellia tenii Sealy, Camellia microphylla (Merr.) Chien, Camellia miyagii, Camellia odorata, Camellia phaeoclada Chang; 6 species in 2.3. The shortest path based simulated annealing algorithm Section Oleifera Chang: Camellia gauchowensis Chang, Camellia lanceoleosa Chang, Camellia sasanqua Thunb., Camellia vietnamensis The result of dynamic species selection and mergence (classifi- Huang ex Hu, C. oleifera Abel, Camellia yuhsienensis; 15 species in cation) problem is a series of selections. Let L0 be the initial selec- Section Tuberculata Chang: Camellia tuberculata Chien, Camellia tion. (L11, L12, ..., L1m) represents m feasible selections chosen by lipingensis Chang, Camellia rhytidocarpa Chang and Liang, Camellia some heuristic rules in planning stage 1 and (L21, L22, ..., L2m) rhytidophylla Y. K. Li and M.Z. Yang, Camellia leyeensis Chang, and (Ln1, Ln2, ..., Lnm) are m feasible selections chosen by some Camellia anlungensis Chang, Camellia rubituberculata Chang, Camel- heuristic rules in planning stage 2 and stage n, respectively. In or- lia acutiperulata Chang and Ye, Camellia acuticalyx Chang, Camellia der to construct a shortest path optimization problem, a virtual atuberculata Chang, Camellia obovatifolia Chang, Camellia rubimuri- node Le is added at the rear of the nth period. Assume that selec- cata Chang and Z.R. Xu, Camellia parvimuricata Chang, Camellia tions in the nth period can be transferred into the virtual mergence. Hupehensis Chang, Camellia zengii H.T. Chang. The specimens are Now the dynamic species selection and mergence problem be- deposited in the Zhejiang Normal University (ZJNU) herbarium. comes a L0 ? Le shortest path problem. Define 0–1 variable Uij as follows: 2.2. Data collection 0; if node i and j are directly connected Nij ¼ ð1Þ For expedient analysis in each attribute model, the sample size 1; else of one section is assigned artificially (Fig. 1). For example, when s is Then the mathematical model for the shortest path based dynamic species selection and mergence problem becomes: X min Dt ¼ NijEij ð2Þ ij

where Dt is total distance and Uij is a 0–1 variable indicating the di- rect connection relationship between node i and j. Eij represents the distance of the edge between node i and j, and m is number of fea- sible selections chosen by some heuristics in each stage. Let E and X are sets of all the edges and nodes, respectively. Let P be the set of all the nodes except the virtual node and nodes at the last period, Q be the set of all the nodes except the initial node and nodes at the first period, and let M be the set of all the nodes from the second stage to the (n 1)th stage. Then, we have P [ Q = X and P \ Q = M

Nij 2f0; 1g; 8ði; jÞ2E ð3Þ It means that there is at most one connecting edge between any two nodes X X Nij ¼ Nij ¼ m 8ði; jÞ2E and node i 2 P; node j 2 Q j i ð4Þ

This constraint indicates that, for every node i in P, there are m out- going edges

Eij ¼ 0; 8ði; jÞ2E ð5Þ

Similarly, constraint (5) says that, for every node j in Q, there are m incoming edges X Eij ¼ 0; if node j is virtual node Le ð6Þ j

Constraint (5) means that values of edges are not less than 0. Con- straint (6) says that the selections in the last stage can be trans-

formed into a virtual selection Le.

2.4. Expected species in three sections of genus Camellia

In the traditional plant taxonomy, type species is usually the type of an included species, in which case it can be indicated by the name of this species (International Code of Botanical Nomen- clature, 2000). While for convenient analysis, we temporarily de- fine in numerical plant taxonomy a new nomenclature ‘expected Fig. 1. The whole data structure used in the SPSACM. Si(x) represents the ith species’: An included species that has the minimum sum of section, which contains (s/ri) species. weighted squares of derivate deviation from the expected value E. Pi et al. / Expert Systems with Applications 38 (2011) 3009–3014 3011 of all the attributes. So the expected species could best exemplify and C shows visible differences in the cloud models with different the essential characteristics of the genus to which it belongs. attribute base. Fig. 2D displays all available linguistic atoms gener- Accordingly, the expected species could be selected by Formula ated by a series of linguistic atom generators in this research. These (7) for the first step (Lu et al., 2009): generators have different rules for analysis of quantity and qualita- tive data. Xn 2 Cloud models in Fig. 2B, which are based on attributes with CðxÞ¼ wðXkÞðxkl ExkÞ ð7Þ k¼1 small weights (CMSW) — combined with WAB and AAB, get faint hedges (Table 1). In these cloud models, many species in one sec- For the lth species in the ith section, xkl is the average value of its kth tion are placed into other sections’ models (Table 2). For example, attribute, and the Exk is the expected value of this attribute. Then, 30.00% of species belonged to Sect. Paracamellia are placed into expected species in ith section stands out when its C(x) get the min- Sect. Oleifera’s model S2(x). Comparatively, based on attributes imum value in this section. (combined with AAD and WAD) with large weights (CMLW), cloud models in Fig. 2C get sharper hedges. And the accuracy rate 2.5. Programs of classification is much higher, with only 16.70% of species be- longed to Sect. Paracamellia are misplaced in Sect. Oleifera’s model After pretreatment, all data are analyzed under the SPSACM S2(x). method by the software Matlab (the 7.0th edition). Cloud models based on all the weighted attributes (CMAW) get The shortest path based simulated annealing algorithm pro- the highest accuracy rate in Sect. Tuberculata (85.00%) and Sect. grams are available from http://www.iwr.uni-heidelberg.de/ Oleifera (60.00%), while the accuracy rate of classification in Sect. groups/comopt/software/TSPLIB95. Paracamellia (80.00%) is a little lower than CMLW (81.70%).

3. Results 3.2. Selecting the expected species of each section

3.1. Different classification results based on different attributes Among the 13 attributes (Table 1), the AAD, LAB, WAD, TAB and TPP contribute most (78.22% in total) to the classification within Fig. 2A shows a 3-D (based on three selective attributes) distri- section’s level of genus Camellia. As an expected species, it must bution of original data. After the classification procedure, Fig. 2B contain as many expected characters of the section as possible.

Fig. 2. (A) Shows the distribution of original data; (B) and (C) represent cloud models based on parameters with small weight and large weight, respectively; (D) shows the base cloud. 3012 E. Pi et al. / Expert Systems with Applications 38 (2011) 3009–3014

Table 1 Weights and expected values of the available 13 parameters.

Parameters Weight Ex Sect. Tuberculata Sect. Oleifera Sect. Paracamellia Adaxial epidermal cell AADa (lm2) 0.1816 1157.1 941.9 744.7 PADa (lm) 0.0415 83.2 88.1 81.1 LADa (lm) 0.1131 45.3 43.1 39.6 WADa (lm) 0.2258 37.8 22.6 17.9 Abaxial epidermal cell AABa (lm2) 0.0260 1285.9 1265.1 856.7 PABa (lm) 0.0276 91.7 106.4 90.3 LABa (lm) 0.0021 48.4 64.7 48.4 WABa (lm) 0.0159 38.9 19.6 19.8 TADa (lm) 0.0560 27.2 18.1 17.2 TABa (lm) 0.1307 18.1 14.9 13.7 TPPa (lm) 0.1310 87.0 99.4 110.3 TSPa (lm) 0.0132 147.5 157.8 158.9 Paraffin sections TLFa (lm) 0.0355 292.3 299.9 311.2

a Parameters abbreviations: AAD, area of adaxial epidermal cell; PAD, perimeter of adaxial epidermal cell; LAD, maximum length of adaxial epidermal cell; WAD, minimum width of adaxial epidermal cell; AAB, area of abaxial epidermal cell; PAB, perimeter of abaxial epidermal cell; LAB, maximum length of abaxial epidermal cell; WAB, minimum width of abaxial epidermal cell; TAD, thickness of adaxial epidermal cell; TAD, thickness of adaxial epidermal cell; TPP, thickness of palisade parenchyma; TSP, thickness of spongy parenchyma; TLF, thickness of leaf.

Table 2 Different cloud modelsa based on different parameters.

Cloud CMAW CMSW CMLW models Sect. Sect. Sect. Sect. Sect. Sect. Sect. Sect. Sect. Tuberculata (%) Oleifera (%) Paracamellia (%) Tuberculata (%) Oleifera (%) Paracamellia (%) Tuberculata (%) Oleifera (%) Paracamellia (%)

S1(x) 85.00 5.00 1.70 60.00 3.30 8.30 83.30 6.70 1.60

S2(x) 10.00 60.00 18.30 21.70 60.00 30.00 8.30 53.30 16.70

S3(x) 5.00 35.00 80.00 18.30 36.70 61.70 8.40 40.00 81.70

a Based on different parameters, S1(x), S2(x), S3(x) represents the cloud models of Section Tuberculata, Section Oleifera and Section Paracamellia, respectively. Data in this table show the percentage of species in each section being merged into S1(x), S2(x), S3(x), respectively. CMAW: cloud models based on all available attributes; CMSW: cloud models based on attributes with small weights; CMlW: cloud models based on attributes with large weights.

Table 3 Results of expected species selecting in three sections of genus Camellia.

Sect. Tuberculata C(x)a Sect. Oleifera C(x) Sect. Paracamellia C(x) C. tuberculata 18857.95 C. gauchowensis 3029.43 C. grijsii 8146.63 C. lipingensis 11638.94 C. lanceoleosa 55670.44 C. confuse 5825.60 C. rhytidocarpa 648.12 C. sasanqua 8623.32 C. kissi 2603.99 C. rhytidophylla 3707.03 C. vietnamensis 21755.15 C. fluviatilis 13904.80 C. leyeensis 46153.04 C. oleifera 1725.89 C. brevistyla 585.12 C. anlungensis 1335.96 C. yuhsienensis 3812.56 C. hiemalis 13965.50 C. rubituberculata 1232.12 C. obtusifolia 29335.75 C. acutiperulata 55847.40 C. maliflora 31076.10 C. acuticalyx 27271.28 C. shensiensis 9029.30 C. atuberculata 6257.15 C. puniceiflora 894.76 C. obovatifolia 2377.21 C. tenii 264855.29 C. rubimuricata 12954.09 C. microphylla 26552.68 C. parvimuricata 19120.58 C. miyagii 17238.59 C. Hupehensis 10217.42 C. odorata 53747.68 C. zengii 8331.86 C. phaeoclada 98531.74

a C(x) values of all species are generated by Formula (6). The C. oleifera,C.brevistyla and C. rhytidocarpa are expected species of Section Oleifera, Section Paracamellia and Section Tuberculata, respectively.

Take Sect. Tuberculata for example, the abovementioned five attri- 3.3. Optimization of sections’ hedges by SPSA butes of the expected species should try its best to approach the Ex values 1157.1 lm2, 45.3 lm, 37.8 lm, 18.1 lm, 87.0 lm, respec- To optimize the classifications, the SPSA are applied for dividing tively. What’s more, the expected species should contain as much the three sections’ hedges. In this method, two largest weighting as possible information of the section. attributes of 36 Camellia species are performed as a 2-D coordi- Some interesting overlaps are found between the type species nates (Fig. 3A). When the total distance gains the shortest solution, and expected species (Table 3): the selected expected species C. the three sections’ hedges are established as Fig. 3B. oleifera and C. brevistyla are also type species of Sect. Oleifera and For a comparative analysis with common cloud classifier, the Sect. Paracamellia (Chang, 1998), respectively. SPSACM models are reconstructed based on the optimized hedges E. Pi et al. / Expert Systems with Applications 38 (2011) 3009–3014 3013

(Brach & Song, 2005). Recent years, floras of large scope have been written by collaboration of many authors who collectively have examined thousands of plant samples and evaluated and incorporated information from dozens, or even hundreds, of pub- lications (Wen, 1994). For the botanists edited these floras, two primary issues must be well considered: I. How should the ‘‘key- s” included in floras enable the user to identify an unknown plant validly and simply by only a small quantity of plant char- acters/attributes (Brach & Song, 2006; Kuoh & Song, 2005)? II. How should the botanists edited these floras make a section (or genus) impressed the common users by only a special species (usually the type species, but not all the time) of this section (or genus)? In this paper, we hope to find good solutions for these two is- sues based on the SPSACM method. Here the significance of attri- bute weight and ‘expected species’ are discussed as follow.

4.1. The attribute weight in sort order of flora keys

Flora keys are excellent identification tools for systematic bot- any (Dallwitz, 1980; Dallwitz, Paine, & Zurcher, 2000; Heidorn, 2001). In printed publications, indented dichotomous keys are the most common forms of identification keys. In these keys, the identification (ID) process must follow a predefined path, at each step asking: does the plant have traits ‘‘A” or ‘‘B”? Each lead of a couplet (e.g., 1a, 1b) provides contrasting, diagnostic characters (Wu & Raven, 1994). The identification process involves using available or easily observed characters, and then selecting the appropriate character/attribute state or numerical value (Brach & Song, 2005). Previous researches on taxonomic keys commonly benefit from centralized database (Brach & Song, 2005; Dallwitz, 1992; Pank- hurst, 1991) and have the primary principle of ‘‘identify plants and other organisms by decision tree based on plentiful attributes” (Jarvie & Stevens, 1998). However, our main purpose in this work is to know or identify plants and other organisms using as less as possible attributes. Hence, these attributes used here must contain enough discriminative information among the identified plants and other organisms. Apparently, the larger the similarity of an Fig. 3. The shortest path based simulated annealing algorithm on 36 Camellia species. Two largest weighting attributes of the 36 species are performed as a 2-D attribute, the smaller the action (weight value) of the attribute to coordinates (A). When the total distance gains the shortest solution, the three classification is (Lu et al., 2009). A weight value based flora key sections’ hedges are established as (B). could show people a list of sorted attributes. Usually, a few of the front attributes in these keys have large discriminative infor-

Table 4 mation (for example, the WAD, AAD, TAB and TPP in total have a The classification results of SPSACM method on Camellia plants. weight value of 0.6691). These keys offer potential advantages over other types of keys when the users’ knowledge of taxonomy is lim- Cloud SPSACM method based on all parameters models ited and the species are difficult to differentiate especially within Sect. Tuberculata Sect. Oleifera Sect. Paracamellia genus level. (%) (%) (%)

S1(x) 92.86 7.14 0.00 4.2. The ‘expected species’ as an illustration of a section S2(x) 0.00 87.50 12.50

S3(x) 7.14 5.36 87.50 The plants are listed commonly in floras, represented within a Based on different parameters, S1(x), S2(x), S3(x) represent the SPSACM models of classification system that indicates which plants are most similar Section Tuberculata, Section Oleifera and Section Paracamellia, respectively. Data in or are thought to be related (Wen, 1994). For identification use, this table show the percentage of species in each section being merged into S1(x), the type species which is first discovered in a genus (or a section, S2(x), S3(x), respectively. etc.), is usually kept as a type specimen. However, there are some (Table 4). Obviously, the classification results are significantly im- troubles for type specimen in practice use. proved. The accuracy rates of Section Tuberculata, Section Oleifera The first reason is that collection of specimens for which the and Section Paracamellia are 92.86%, 87.50% and 87.50%, morphology does not correspond satisfactorily to any of the spe- respectively. cies currently included in these genera is a common situation in field investigations. For example, the type specimen and Thun- berg’s original description of Cleyera japonica include elements of 4. Discussion both Cleyera and Ternstroemia. Thunberg subsequently transferred C. japonica to Ternstroemia but without resolving the problem of In the plant numerical taxonomy, floras are used as most the name being based on two different elements (Chang, 1998). comprehensive tools for people to identify and distinguish plants Another factor that greatly complicates taxonomic matters is the 3014 E. Pi et al. / Expert Systems with Applications 38 (2011) 3009–3014 fact that several species were described by early authors, who pro- References vided only a very vague description (Linnaeus, 1753; Linnæus, 1759); for some of them, no type specimens or other authentic Brach, A. R., & Song, H. (2005). ActKey: A Web-based interactive identification key program. Taxon, 54(4), 1041–1046. specimens are available (Rindi, Lam, & López-Bautista, 2009). For Brach, A. R., & Song, H. (2006). eFloras: New directions for online floras exemplified these reasons, we need a developing illustration similar with type by the Project. Taxon, 55(1), 188–192. specimen. Chang, H. T. (1998). . In Delectis Florae Reipublicae Popularis Sinicae The SPSACM method presented here is a comprehensive ap- Agendae Academiae Sinicae Edita (Ed.), Flora Reipublicae Popularis Sinica (Vol. 49(3), pp. 11–46). Beijing: Science Press. proach to determining these attribute weights, hedges of sections Dallwitz, M. J. (1980). A general system for coding taxonomic information. Taxon, and selecting ‘‘expected species” in plant numerical taxonomy. 29, 41–46. When the ‘‘expected species” is selected, the ascription of a diver- Dallwitz, M. J. (1992). A comparison of matrix-based taxonomic identification systems with rule-based systems. In F. L. Xiong (Ed.), Proceedings of IFAC gent species is no more an endless divergences since its member- workshop on expert systems in agriculture (pp. 21–58). Beijing: International ship degree could be validly calculated based on the SPSACM Academic Publishershttp://www.deltaintkey.co. method and the new illustration. It implies that the selected ex- Dallwitz, M. J., Paine, T. A., & Zurcher, E. J. (2000). Principles of interactive keys. . pected species can indicate their sections, too. In another word, Di, K. C., Li, D. Y., & Li, D. R. (1998a). Knowledge representation and discovery in the SPSACM method could be a good tool for solving the diver- spatial databases based on cloud theory. International Archives of gence within section level in the numerical taxonomy of genus Photogrammetry and Remote Sensing, 32, 544–551. Di, K. C., Li, D. R., & Li, D. Y. (1998b). Intelligent query in spatial databases based on Camellia. For example, due to a valid distance between two sec- cloud model. In Li Deren et al. (Eds.). Spatial information science, technology and tions (result from the D(Nij, Eij) in Formula (2), we suggest C. punic- its applications (pp. 437–445). Wuhan, China: WTU SM Press. eiflora of Sect. Paracamellia be merged into Sect. Oleifera since they Dong, M., Wu, C., & Hou, F. (2009). Shortest path based simulated annealing algorithm for dynamic facility layout problem under dynamic business are divided in S2(x) but with long distances to Ex value of S3(x). For environment. Expert Systems with Applications, 36, 11221–11232. the same reason, C. lanceoleosa and C. vietnamensis are merged into Heidorn, P. B. (2001). A tool for multipurpose use of online flora and fauna: The Sect. Paracamellia. What’s more, this method could also be used biological information browsing environment (BIBE). . International Code of Botanical Nomenclature (2000). The 16th international Here we suggest again that the ‘‘expected species” be served as botanical congress, St Louis. Missouri, July–August 1999. International an illustration in plant numerical taxonomy. Though it seems to be Association for Plant Taxonomy, Articles 10.1, 8.1 and 10.4 (Electronic version similar with the type species in traditional taxonomy, many differ- of the original English text). Jarvie, J. K., & Stevens, P. F. (1998). Interactive keys, inventory, and conservation. ences between them are very significant. Sometimes, they’re not Conservation Biology, 12, 222–224. the same species in one section. Take Sect. Tuberculata for example, Kuoh, C. S., & Song, H. (2005). Interactive key to Taiwan grasses using characters of the expected species selected by SPSACM is C. rhytidocarpa, but the leaf anatomy – The ActKey approach. Taiwania, 50, 261–271. Li, D. Y., Di, K. C., Li, D. R., & Shi, X. M. (1998). Mining association rules with linguistic type species is C. tuberculata. Our purpose is to dig the superiority cloud models. In PAKDD-98, the 2nd Pacific-Asia conference on know ledge of its expected characters in numerical taxonomy. discovery and data mining, Melbourne, Australia, 15–17 April 1998 (pp. 392–394). Heidelberg: Springer-Verlag. Li, D. Y., Han, J., Chan, E., & Shi, X. M. (1997). Knowledge representation and 5. Conclusion discovery based on linguistic atoms. In H. J. Lu & H. Motoda (Eds.), KDD: Techniques and applications (pp. 3–20). London: World Scientific Press. The proposed SPSACM method, based on attribute similarity, is Li, D. Y., Han, J. W., Shi, X. M., & Chan, M. C. (1998). Knowledge representation and discovery based on linguistic atoms. Knowledge-based Systems, 10, 431–440. extended from the cloud model and simulated annealing algo- Lin, X. Y., Peng, Q. F., Lv, H. F., Du, Y. Q., & Tang, B. Y. (2008). Leaf anatomy of Sect. rithm. We have firstly demonstrated by experiments that the tax- Oleifera Chang and Sect. Paracamellia Sealy and its taxonomic significance. onomic results based on the SPSACM method have shown the Journal of Systematics and Evolution, 46(2), 183–193 (formerly Acta. Phytotaxon. Sin.). superiority performance over some related methods. Linnaeus, C. (1753). Species plantarum (Vol. 2). Holmiae (Stockholm): Impensis Then, the weight values of attributes are highly commended in Laurentii Salvii (pp. 561–1200). establishing of flora keys. Linnæus, C. (1759). Systema naturæ per regna tria naturæ, secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. Tomus II. Editio Besides, we propose again the new nomenclature ‘expected spe- decima, reformata. (pp. 825–1384). Holmiæ. (Salvius). cies’: an included species that has the minimum sum of weighted Lu, H. F., Jiang, B., Shen, Z. G., Shen, J. B., Peng, Q. F., & Cheng, C. G. (2008). squares of derivate deviation from the expected value of all the Comparative leaf anatomy, FTIR discrimination and biogeographical analysis of Camellia section Tuberculata (Theaceae) with a discussion of its taxonomic attributes. As its perfect actions in taxonomy been proved, we sug- treatments. Plant Systematics and Evolution, 274, 223–235. gest it could be used as an illustration in plant numerical Lu, H. F., Pi, E. X., Peng, Q. F., Wang, L. L., & Zhang, C. J. (2009). A particle swarm taxonomy. optimization-aided fuzzy cloud classifier applied for plant numerical taxonomy Accordingly, the SPSACM models include most available, essen- based on attribute similarity. Expert Systems with Applications, 36(5), 9388–9397. tial attributes of Camellia plants and the results of these applica- Pankhurst, R. J. (1991). Practical taxonomic computing. Cambridge: Cambridge tions extend our general understanding of many classification University Press. problems. Pi, E. X., Peng, Q. F., Lu, H. F., Shen, J. B., Du, Y. Q., Huang, F. L., et al. (2009). Leaf morphology and anatomy of section Camellia (Theaceae). Botanical Journal of the Linnean Society, 159, 456–476. Acknowledgements Rindi, F., Lam, W. D., & López-Bautista, M. J. (2009). Phylogenetic relationships and species circumscription in Trentepohlia and Printzina (Trentepohliales, Chlorophyta). Molecular Phylogenetics and Evolution, 52, 329–339. The authors would like to thank Y.F. Huang and L.J. Ma for sub- Wen, X. Y. (1994). Interactive key for families of Chinese angiosperms. Flora of China stantial help in data collection. Funding of Innovation Fund for the (online ed.). St. Louis, USA: Missouri Botanical Garden Press. Beijing, China: Master’s Academe of Zhejiang Normal University is also gratefully Science Press. Wu, Z. Y., & Raven, P. H. (1994). Flora of China. Beijing: Science Press. St. Louis: acknowledged. Missouri Botanical Garden Press.