Precise Plant Classification Within Genus Level Based on Simulated
Total Page:16
File Type:pdf, Size:1020Kb
Expert Systems with Applications 38 (2011) 3009–3014 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Precise plant classification within genus level based on simulated annealing aided cloud classifier ⇑ Erxu Pi a,b, Hongfei Lu a, , Bo Jiang a,c, Jian Huang a, Qiufa Peng a, Xiuyan Lin a a College of Chemistry and Life Science, Zhejiang Normal University, Jinhua 321004, China b Department of Biology and State (China) Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong, PR China c Changshu Institute of Technology, Changshu 215500, Jiangsu Province, China article info abstract Keywords: This is a series research on plant numerical taxonomy, which provides a precise classification method for Plant numerical taxonomy the description, discrimination, and review of proposals for new or revised plant species to be recognized Quantitative attributes as taxon units within the genus level. We firstly used all the available quantitative attributes to build Shortest path based simulated annealing cloud models for different sections. Then, the shortest path based simulated annealing algorithm (SPSA) algorithm was applied for optimizing these models. After these, the optimized models were validated by the previ- Cloud model ously used quantitative attribute data. Results showed that cloud models’ accuracy rates of Sect. Tuber- Genus Camellia culata, Sect. Oleifera and Sect. Paracamellia were 85.00%, 60.00%, 80.00%. And we found some interesting overlaps between the type species and ‘expected species’ that the selected expected species Camellia oleifera and Camellia brevistyla are also type species of Sect. Oleifera and Sect. Paracamellia, respectively. Here we suggest that the expected species be served as an illustration in plant numerical taxonomy. Based on the simulated annealing aided cloud classifier, the taxon hedges, associated with ‘expected spe- cies’, were setting to advance our common understanding of sections and improve our capability to rec- ognize and discriminate plant species. These procedures provide a dynamic and practical way to publish new or revised descriptions of species and sections. Ó 2010 Elsevier Ltd. All rights reserved. 1. Introduction the distances of species to their ‘expected species’ (Lu et al., 2009) in each section (or genus) should get the global minimum Cloud theory is now a popular theory handling uncertainty values before the best classification being gained (Pi et al., 2009). based on the uncertain transition between qualitative concept Then, the hedges problem could be converted into a shortest path and quantitative description (Li, Di, Li, & Shi, 1998; Li, Han, Chan, problem by studying its distance function and species adding/ & Shi, 1997; Li, Han, Shi, & Chan, 1998). Based on this theory, the removing heuristic rules, and the corresponding mathematical cloud classifier has been developed recent years for adaptive lin- model established for this problem (Dong, Wu, & Hou, 2009). guistic hedge (Lu, Pi, Peng, Wang, & Zhang, 2009). This classifier Hence, the SPSA may have good performances in optimizing the represents a qualitative concept with three digital characteristics, cloud classifier. expected value Ex, entropy r and deviation D (Di, Li, & Li, 1998a, In this research, the shortest path based simulated annealing 1998b), which integrates the fuzziness and randomness of a lin- aided cloud classifier (SPSACM) method is used for plant classifica- guistic term in a unified way. Our previously work (Lu et al., tion by analyzing leaf morphology and anatomy data which is 2009) applied this classifier in the plant numerical taxonomy. In partly from our previous work (Lin, Peng, Lv, Du, & Tang, 2008; which, the particle swarm optimization algorithm (PSO) was used Lu et al., 2008). Our purpose is to provide a basic tool in plant for optimizing the sections’ hedges. However, there is still some taxonomy. potential improvement left for us as the accurate rates is not high enough. In this work, we will apply the shortest path based simu- 2. Materials and methods lated annealing algorithm (SPSA) in optimizing the cloud classifier. SPSA is a new kind of dynamic multi-stage facility layout prob- 2.1. Materials lem under dynamic business environment, in which new species may be added into, or old species may be removed from their Adult leaves fully exposed to sunlight of plants of the genus original taxa. Since every section (or genus) has its own range, Camellia are collected from the International Camellia Species Gar- den in Jinhua city, including 15 species in Section Paracamellia ⇑ Corresponding author. Sealy: Camellia grijsii Hance, Camellia confuse Craib, Camellia kissi 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.08.090 3010 E. Pi et al. / Expert Systems with Applications 38 (2011) 3009–3014 Wall., Camellia fluviatilis Hand.-Mazz., Camellia brevistyla Coh. St., the assigned sample size of one section, and di is the species num- Camellia hiemalis Nakai, Camellia obtusifolia Chang, Camellia malifl- ber of the ith section, then the replicate number (ri P 3) of each si ora Lindl, Camellia shensiensis Chang, Camellia puniceiflora Chang, species in the ith section is: ri ¼ . di Camellia tenii Sealy, Camellia microphylla (Merr.) Chien, Camellia miyagii, Camellia odorata, Camellia phaeoclada Chang; 6 species in 2.3. The shortest path based simulated annealing algorithm Section Oleifera Chang: Camellia gauchowensis Chang, Camellia lanceoleosa Chang, Camellia sasanqua Thunb., Camellia vietnamensis The result of dynamic species selection and mergence (classifi- Huang ex Hu, C. oleifera Abel, Camellia yuhsienensis; 15 species in cation) problem is a series of selections. Let L0 be the initial selec- Section Tuberculata Chang: Camellia tuberculata Chien, Camellia tion. (L11, L12, ..., L1m) represents m feasible selections chosen by lipingensis Chang, Camellia rhytidocarpa Chang and Liang, Camellia some heuristic rules in planning stage 1 and (L21, L22, ..., L2m) rhytidophylla Y. K. Li and M.Z. Yang, Camellia leyeensis Chang, and (Ln1, Ln2, ..., Lnm) are m feasible selections chosen by some Camellia anlungensis Chang, Camellia rubituberculata Chang, Camel- heuristic rules in planning stage 2 and stage n, respectively. In or- lia acutiperulata Chang and Ye, Camellia acuticalyx Chang, Camellia der to construct a shortest path optimization problem, a virtual atuberculata Chang, Camellia obovatifolia Chang, Camellia rubimuri- node Le is added at the rear of the nth period. Assume that selec- cata Chang and Z.R. Xu, Camellia parvimuricata Chang, Camellia tions in the nth period can be transferred into the virtual mergence. Hupehensis Chang, Camellia zengii H.T. Chang. The specimens are Now the dynamic species selection and mergence problem be- deposited in the Zhejiang Normal University (ZJNU) herbarium. comes a L0 ? Le shortest path problem. Define 0–1 variable Uij as follows: 2.2. Data collection 0; if node i and j are directly connected Nij ¼ ð1Þ For expedient analysis in each attribute model, the sample size 1; else of one section is assigned artificially (Fig. 1). For example, when s is Then the mathematical model for the shortest path based dynamic species selection and mergence problem becomes: X min Dt ¼ NijEij ð2Þ ij where Dt is total distance and Uij is a 0–1 variable indicating the di- rect connection relationship between node i and j. Eij represents the distance of the edge between node i and j, and m is number of fea- sible selections chosen by some heuristics in each stage. Let E and X are sets of all the edges and nodes, respectively. Let P be the set of all the nodes except the virtual node and nodes at the last period, Q be the set of all the nodes except the initial node and nodes at the first period, and let M be the set of all the nodes from the second stage to the (n À 1)th stage. Then, we have P [ Q = X and P \ Q = M Nij 2f0; 1g; 8ði; jÞ2E ð3Þ It means that there is at most one connecting edge between any two nodes X X Nij ¼ Nij ¼ m 8ði; jÞ2E and node i 2 P; node j 2 Q j i ð4Þ This constraint indicates that, for every node i in P, there are m out- going edges Eij ¼ 0; 8ði; jÞ2E ð5Þ Similarly, constraint (5) says that, for every node j in Q, there are m incoming edges X Eij ¼ 0; if node j is virtual node Le ð6Þ j Constraint (5) means that values of edges are not less than 0. Con- straint (6) says that the selections in the last stage can be trans- formed into a virtual selection Le. 2.4. Expected species in three sections of genus Camellia In the traditional plant taxonomy, type species is usually the type of an included species, in which case it can be indicated by the name of this species (International Code of Botanical Nomen- clature, 2000). While for convenient analysis, we temporarily de- fine in numerical plant taxonomy a new nomenclature ‘expected Fig. 1. The whole data structure used in the SPSACM. Si(x) represents the ith species’: An included species that has the minimum sum of section, which contains (s/ri) species. weighted squares of derivate deviation from the expected value E. Pi et al. / Expert Systems with Applications 38 (2011) 3009–3014 3011 of all the attributes. So the expected species could best exemplify and C shows visible differences in the cloud models with different the essential characteristics of the genus to which it belongs. attribute base. Fig. 2D displays all available linguistic atoms gener- Accordingly, the expected species could be selected by Formula ated by a series of linguistic atom generators in this research. These (7) for the first step (Lu et al., 2009): generators have different rules for analysis of quantity and qualita- tive data. Xn 2 Cloud models in Fig. 2B, which are based on attributes with CðxÞ¼ wðXkÞðxkl À ExkÞ ð7Þ k¼1 small weights (CMSW) — combined with WAB and AAB, get faint hedges (Table 1).