An Automatic Leaf-Based Identification Pipeline for Plant
Total Page:16
File Type:pdf, Size:1020Kb
Zhang et al. Horticulture Research (2021) 8:172 Horticulture Research https://doi.org/10.1038/s41438-021-00608-w www.nature.com/hortres ARTICLE Open Access MFCIS: an automatic leaf-based identification pipeline for plant cultivars using deep learning and persistent homology Yanping Zhang1,JingPeng1, Xiaohui Yuan1,2,LisiZhang3, Dongzi Zhu3, Po Hong 3,JiaweiWang 3, ✉ Qingzhong Liu3 and Weizhen Liu 1,2 Abstract Recognizing plant cultivars reliably and efficiently can benefit plant breeders in terms of property rights protection and innovation of germplasm resources. Although leaf image-based methods have been widely adopted in plant species identification, they seldom have been applied in cultivar identification due to the high similarity of leaves among cultivars. Here, we propose an automatic leaf image-based cultivar identification pipeline called MFCIS (Multi-feature Combined Cultivar Identification System), which combines multiple leaf morphological features collected by persistent homology and a convolutional neural network (CNN). Persistent homology, a multiscale and robust method, was employed to extract the topological signatures of leaf shape, texture, and venation details. A CNN-based algorithm, the Xception network, was fine-tuned for extracting high-level leaf image features. For fruit species, we benchmarked the MFCIS pipeline on a sweet cherry (Prunus avium L.) leaf dataset with >5000 leaf images from 88 varieties or unreleased selections and achieved a mean accuracy of 83.52%. For annual crop species, we applied the MFCIS pipeline to a soybean (Glycine max L. Merr.) leaf dataset with 5000 leaf images of 100 cultivars or elite breeding lines collected at fi fi 1234567890():,; 1234567890():,; 1234567890():,; 1234567890():,; ve growth periods. The identi cation models for each growth period were trained independently, and their results were combined using a score-level fusion strategy. The classification accuracy after score-level fusion was 91.4%, which is much higher than the accuracy when utilizing each growth period independently or mixing all growth periods. To facilitate the adoption of the proposed pipelines, we constructed a user-friendly web service, which is freely available at http://www.mfcis.online. – Introduction have been employed for plant variety identification1 4. Plant cultivar identification is a fundamental field of Nevertheless, the accuracies of these methods mainly interest for plant breeding in terms of cultivar rights depend on the amount of genetic differentiation across protection and the continuous breeding of new cultivars. the cultivars evaluated and the number and type of The conventional method for cultivar identification molecular markers selected in the analysis system2. These usually relies on the expertize and empirical knowledge of methods involve complicated operating procedures in a crop breeders and has unpredictable accuracy and low laboratory that are time-consuming and are not well efficiency. In recent decades, diverse molecular markers suited for use by ordinary farmers. Therefore, rapid, stable, and accurate methods for plant cultivar identifi- cation are required. Correspondence: Weizhen Liu ([email protected]) A computer vision pipeline based on morphological 1School of Computer Science and Technology, Wuhan University of features extracted from leaf images is an ideal option. Technology, Wuhan, Hubei, China Compared with the other organs, plant leaves have more 2Chongqing Research Institute, Wuhan University of Technology, Chongqing, China general and lasting characteristics in terms of shape, Full list of author information is available at the end of the article © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a linktotheCreativeCommons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Zhang et al. Horticulture Research (2021) 8:172 Page 2 of 14 texture, and venation and therefore have been successfully of leaf shape, texture, and venation both locally and used for plant species identification/plant taxonomy5,6.To globally. Topology is a branch of mathematics that con- date, many image processing-based algorithms and soft- cerns spatial properties under deformation while dis- ware, such as Leaf GUI7, Rosette Tracker8, Leaf-GP9, allows tearing apart or reattaching23,24. The core idea of MorphoLeaf10, and Phenotiki11, that measure leaf mor- PH is to track the appearance and disappearance of holes phological phenotypes have been reported. However, a in topological space at multiple scales. The obtained fea- majority of the phenotypes extracted using these algo- tures are a set of points in the plane, called the persistence rithms are not descriptive enough to discriminate leaves diagram25 (PD), which can characterize structural prop- from different cultivars. Moreover, there are some leaf erties and reflect spatial information in an invariant way26. image-based plant species identification pipelines that use Compared to the traditional feature extractors and CNN- features of leaf shape, texture, and venation indepen- based methods, PH is more resilient to local perturbations – – dently12 15 or combine them16 19. Unfortunately, when and scale variance25,27. This is because although the transferring these methods from species identification to shapes of connected components and holes may change cultivar identification, most of them failed due to the high under geometric transformations, the number of com- similarity of leaves among different cultivars. Recently, a ponents and holes remains the same28. Therefore, it is deep convolutional neural network (DCNN)-based iden- recognized that the incorporation of morphological fea- tification method for apple cultivars was reported and tures extracted by PH into CNN-based models can pro- tested on an apple leaf image dataset consisting of more vide a significant performance improvement. These ideas than 10,000 images from 14 cultivars20. The unprece- have been validated in a wide range of computer vision dented accuracy demonstrated the fantastic effectiveness and biomedical image processing studies, such as auto- of CNNs in cultivar recognition. However, the DCNN- mated tumor segmentation29,30, 3D surface texture ana- based method required a large number of leaf images for lysis31, and image classification27. each cultivar to train the model. A backpropagation In this study, we propose a leaf image-based and auto- neural network-based cultivar identification method was matic plant cultivar classification pipeline, called MFCIS proposed for oleander identification. Using 18 global leaf (Multi-feature Combined Cultivar Identification System), characters (morphology and color) extracted from 22 that combines morphological features of the leaf shape, cultivars, it only achieved an average accuracy of texture, and venation extracted by PH and the high-level approximately 54.55%21. Its accuracy might be improved image features extracted by the fine-tuned Xception32 by adding more local leaf features into the model, but the model. We primarily tested the proposed pipeline on a computation speed would be sacrificed. The multi- sweet cherry leaf cultivar dataset with >5000 leaf images at orientation region transform method was developed to the same growth stage derived from 88 commercial cul- characterize leaf contour and vein structure features tivars or unreleased selections. To assess the general- simultaneously for soybean cultivar identification22. They ization ability of the proposed pipeline, we also applied achieved an accuracy of ~54.2% in the recognition of the MFCIS to a soybean leaf cultivar dataset consisting of 100 soybean cultivars. This study was the first to inves- 5000 leaf images of 100 cultivars. Unlike the sweet cherry tigate large-scale cultivar classification with leaf images dataset, leaves in the soybean dataset were collected from and proved the effectiveness of leaf venation features in five different growth periods. Therefore, we attempted cultivar classification. However, the drawback is that it cultivar identification using leaves from each period requires vein images with primary and secondary vein independently, mixed leaves from all growth periods annotations, which is quite labor intensive. without taking the soybean growth periods into account, These challenges of leaf image-based cultivar identifi- and leaves from each period independently with a fusion cation models motivated us to develop a cultivar identi- of the prediction results. fication pipeline with the following characteristics: (1) Automation: the leaf images can be directly fitted as the Materials and methods input into the pipeline, and the cultivar identification Plant materials results are the output. (2) Multi-feature combination: the Sweet cherry cultivar dataset leaf morphological features are comprehensively mea- Sweet cherry leaves