Automated Processing and Identification of Benthic Invertebrate Samples

J. N. Am. Benthol. Soc., 2010, 29(3):867–874 ’ 2010 by The North American Benthological Society DOI: 10.1899/09-080.1 Published online: 8 June 2010 Automated processing and identification of benthic invertebrate samples David A. Lytle1,7, Gonzalo Martı´nez-Mun˜ oz2,8, Wei Zhang2,9, Natalia Larios3,10, Linda Shapiro3,4,11, Robert Paasch5,12, 6,13 2,14 2,15 Andrew Moldenke , Eric N. Mortensen , Sinisa Todorovic , AND Thomas G. Dietterich2,16 1 Department of Zoology, Oregon State University, Corvallis, Oregon 97331 USA 2 School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon 97331 USA 3 Department of Electrical Engineering, University of Washington, Seattle, Washington 98195 USA 4 Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195 USA 5 Department of Mechanical Engineering, Oregon State University, Corvallis, Oregon 97331 USA 6 Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331 USA Abstract. We present a visually based method for the taxonomic identification of benthic invertebrates that automates image capture, image processing, and specimen classification. The BugID system automatically positions and images specimens with minimal user input. Images are then processed with interest operators (machine-learning algorithms for locating informative visual regions) to identify informative pattern features, and this information is used to train a classifier algorithm. Naı¨ve Bayes modeling of stacked decision trees is used to determine whether a specimen is an unknown distractor (taxon not in the training data set) or one of the species in the training set. When tested on images from 9 larval stonefly taxa, BugID correctly identified 94.5% of images, even though small or damaged specimens were included in testing. When distractor taxa (10 common invertebrates not present in the training set) were included to make classification more challenging, overall accuracy decreased but generally was close to 90%. At the equal error rate (EER), 89.5% of stonefly images were correctly classified and the accuracy of nonrejected stoneflies increased to 96.4%, a result suggesting that many difficult-to-identify or poorly imaged stonefly specimens had been rejected prior to classification. BugID is the first system of its kind that allows users to select thresholds for rejection depending on the required use. Rejected images of distractor taxa or difficult specimens can be identified later by a taxonomic expert, and new taxa ultimately can be incorporated into the training set of known taxa. BugID has several advantages over other automated insect classification systems, including automated handling of specimens, the ability to isolate nontarget and novel species, and the ability to identify specimens across different stages of larval development. Key words: bioassessment, water quality, automated insect identification, bioindicators, Plecoptera. Automated taxonomic identification is an area of 7 [email protected] rapid innovation, and computer vision methods are 8 [email protected] being developed for visual classification of spiders 9 [email protected] 10 (Do et al. 1999), butterflies (Watson et al. 2004, Bhanu [email protected] et al. 2008), plants (Clark 2007), plankton (Rodenacker 11 [email protected] et al. 2006), and other groups (reviewed in MacLeod 12 [email protected] 2007). Many of these approaches achieve high levels 13 [email protected] of accuracy with specific data sets, but considerable 14 [email protected] challenges remain before automated classification can 15 [email protected] be deployed on a large scale. First, automated 16 [email protected] methods for specimen handling and image processing 867 868 D. A. LYTLE ET AL. [Volume 29 are needed in addition to automated image classifi- neously to provide a classification; the method is cation. Most approaches currently rely on significant similar to an experienced taxonomist integrating input from human users, such as landmarking of many subtle visual cues to sight-identify a specimen morphological features on each image by hand, which by gestalt, or overall appearance. This approach considerably slows the classification process. This minimizes reliance on specific diagnostic characters step requires too much time to be practical for insect that might require manipulation, dissection, or other studies that might have hundreds of specimens per labor-intensive handling. sample. Second, new methods must be able to Here we describe the general BugID approach. We recognize and reject novel species that lie outside trained it using images from 9 larval stonefly taxa their training set of known species. With current commonly found in Pacific Northwest streams and approaches, samples must be prescreened to ensure rivers, and then tested its ability to classify images that nontarget specimens are removed, a requirement from novel specimens of these 9 taxa. We then tested that necessitates some amount of a priori specimen BugID’s accuracy in the context of distractors, images classification and that undercuts the utility of auto- from 10 common invertebrates that were not in the mated classification systems in the first place. Last, training set. methods must be robust to variability arising from ontogenetic changes within species, damaged speci- Methods mens, and variable imaging conditions. Sample collection and processing These challenges are especially apparent in benthic invertebrate samples collected as part of bioassess- Benthic macroinvertebrates were collected from ment projects. A benthic invertebrate sample can be Oregon rivers and streams with standard Oregon collected in a matter of minutes, but sample process- Department of Environmental Quality protocols (Ha- ing might require hours or even days of laboratory fele and Mulvey 1998). Collection sites were distribut- work. Thus, an automated method for benthic ed across several biotic provinces, including temperate invertebrate sample processing and classification coastal rainforest, alpine forest, high desert, and valley could greatly increase the number of samples used floodplain. In the laboratory, specimens were separat- for monitoring and conservation efforts. Sample ed and identified to genus or species. All larval stages processing usually involves separating target taxa and damaged specimens (missing legs, antennae) were from detritus (fragments of leaves, sand, and other included, and each specimen was identified indepen- debris) and nontarget species and then classification dently by 2 taxonomists. All specimens were kept in of specimens to an appropriate taxonomic level individual vials, assigned unique numbers, and acces- (usually genus or species), so the ability to isolate sioned into the Oregon State Arthropod Collection unknown specimens is important. Last, invertebrate after imaging (OSAC lot #0278). specimens span a large range of ontogenetic diversity Our main focus was on the larval stage of 9 stonefly because most taxa are collected during the larval taxa spanning 7 families. Each taxon was common in stage, and specimens are often bent, broken, and streams in the Pacific Northwest: Calineuria californica, discolored with sediment. Doroneuria baumanni, Hesperoperla pacifica (Perlidae), Our BugID approach automates specimen han- Isoperla sp. (Perlodidae), Moselia infuscata (Leuctridae), dling, image capture, and image classification into a Pteronarcys sp. (Pteronarcyidae), Sweltsa sp. (Chlor- single process (Sarpola et al. 2008). The method is not operlidae), Yoraperla sp. (Peltoperlidae), and Zapada fully integrated from sample jar to identified speci- sp. (Nemouridae). These names were abbreviated in mens, but we have automated several key steps. We our analyses as CAL, DOR, HES, ISO, MOS, PTE, begin with an apparatus that automatically positions SWE, YOR, and ZAP, respectively. We obtained ,100 specimens under a microscope and captures images. specimens for most species, except Moselia (24 Images are then rescaled and processed to remove specimens) and Pteronarcys (45 specimens). background noise. The classification method uses To test the ability of the algorithms to discern feature-based classification techniques developed by known stonefly species from unknown or novel our group (Larios et al. 2007, Mortensen et al. 2007, specimens, we used larval specimens of 10 distractor Martı´nez-Mun˜ oz 2009). Rather than focus on specific taxa that are commonly found in benthic invertebrate diagnostic features typically used by human taxono- samples (10 specimens per taxon): the ephemeropter- mists, this feature-based approach applies machine ans Baetis (Baetidae), Ameletus (Ameletidae), Ephemer- learning techniques to find multiple regions of ella (Ephemerellidae), Caudatella (Ephemerellidae), interest within images that are informative for species Ironodes (Heptageniidae), and Rhithrogena (Heptagen- classification. These regions are analyzed simulta- iidae); the trichopterans Brachycentrus (Brachycentri- 2010] AUTOMATED INVERTEBRATE IDENTIFICATION 869 dae) and Neophylax (Uenoidae); the plecopteran region as a vector of 128 numerical values that are Taenionema (Taeniopterygidae); and the amphipod approximately viewpoint-invariant and illumination- Hyalella (Hyalellidae). These specimens were consid- invariant. The classification system was built by ered distractors in the sense that we did not include

Automated Processing and Identification of Benthic Invertebrate Samples

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support