Automated Insect Identification Through Concatenated Histograms

Machine Vision and Applications manuscript No. (will be inserted by the editor) Automated Insect Identification through Concatenated Histograms of Local Appearance Features Feature Vector Generation and Region Detection for Deformable Objects Enrique Larios1, Hongli Deng3,WeiZhang3, Matt Sarpola4, Jenny Yuen7, Robert Paasch4, Andrew Moldenke5,DavidLytle6, Salvador Ruiz Correa8,EricMortensen3, Linda Shapiro2,TomDietterich3 1 University of Washington, Department of Electrical Engineering 2 University of Washington, Department of Computer Science and Engineering 3 Oregon State University, School of Electrical Engineering and Computer Science 4 Oregon State University, Department of Mechanical Engineering 5 Oregon State University, Department of Botany and Plant Pathology 6 Oregon State University, Department of Zoology 7 Massachusetts Institute of Technology, Computer Science and AI Laboratory 8 Children’s National Medical Center, Department of Diagnostic Imaging and Radiology Received: date / Revised: date Abstract This paper describes a computer vision ap- We evaluate this classification methodology on a task proach to automated rapid-throughput taxonomic iden- of discriminating among four stonefly taxa, two of which, tification of stonefly larvae. The long-term goal of this Calineuria and Doroneuria, are difficult even for ex- research is to develop a cost-effective method for environ- perts to discriminate. The results show that the com- mental monitoring based on automated identification of bination of all three detectors gives four-class accuracy indicator species. Recognition of stonefly larvae is chal- of 82% and three-class accuracy (pooling Calineuria and lenging because they are highly articulated, they exhibit Doroneuria) of 95%. Each region detector makes a valu- a high degree of intraspecies variation in size and color, able contribution. In particular, our new PCBR detector and some species are difficult to distinguish visually, de- is able to discriminate Calineuria and Doroneuria much spite prominent dorsal patterning. The stoneflies are im- better than the other detectors. aged via an apparatus that manipulates the specimens into the field of view of a microscope so that images are obtained under highly repeatable conditions. The images are then classified through a process that involves (a) Key words classification, object recognition, interest identification of regions of interest, (b) representation of operators, region detectors, SIFT descriptor those regions as SIFT vectors [1], (c) classification of the SIFT vectors into learned “features” to form a histogram of detected features, and (d) classification of the feature 1 Introduction histogram via state-of-the-art ensemble classification al- gorithms. The steps (a) to (c) compose the concatenated There are many environmental science applications that feature histogram (CFH) method. We apply three region could benefit from inexpensive computer vision meth- detectors for part (a) above, including a newly developed ods for automated population counting of insects and principal curvature-based region (PCBR) detector. This other small arthropods. At present, only a handful of detector finds stable regions of high curvature via a wa- projects can justify the expense of having expert ento- tershed segmentation algorithm. We compute a separate mologists manually classify field-collected specimens to dictionary of learned features for each region detector, obtain measurements of arthropod populations. The goal and then concatenate the histograms prior to the final of our research is to develop general-purpose computer classification step. vision methods, and associated mechanical hardware, for Send offprint requests to: Tom Dietterich, 1145 Kel- rapid-throughput image capture, classification, and sort- ley Engineering Center, Corvallis, OR 97331-5501, USA, ing of small arthropod specimens. If such methods can be [email protected] made sufficiently accurate and inexpensive, they could 2 Enrique Larios et al. have a positive impact on environmental monitoring and Some taxa exhibit interesting patterns on their dorsal ecological science [2–4]. sides, but others are not patterned. Some taxa are dis- The focus of our initial effort is the automated recog- tinctive, others are very difficult to identify. Finally, as nition of stonefly (Plecoptera) larvae for the biomoni- the larvae repeatedly molt, their size and color change. toring of freshwater stream health. Stream quality mea- Immediately after molting, they are light colored, and surement could be significantly advanced if an economi- then they gradually darken. This variation in size, color, cally practical method were available for monitoring in- and pose means that simple computer vision methods sect populations in stream substrates. Population counts that rely on placing all objects in a standard pose can- of stonefly larvae and other aquatic insects inhabiting not be applied here. Instead, we need methods that can stream substrates are known to be a sensitive and robust handle significant variation in pose, size, and coloration. indicator of stream health and water quality [5]. Because To address these challenges, we have adopted the these animals live in the stream, they integrate water bag-of-features approach [7–9]. This approach extracts quality over time. Hence, they provide a more reliable a bag of region-based “features” from the image without measure of stream health than single-time-point chem- regard to their relative spatial arrangement. These fea- ical measurements. Aquatic insects are especially use- tures are then summarized as a feature vector and classi- ful as biomonitors because (a) they are found in nearly fied via state-of-the-art machine learning methods. The all running-water habitats, (b) their large species diver- primary advantage of this approach is that it is invariant sity offers a wide range of responses to water quality to changes in pose and scale as long as the features can change, (c) the taxonomy of most groups is well known be reliably detected. Furthermore,withanappropriate and identification keys are available, (d) responses of choice of classifier, not all features need to be detected many species to different types of pollution have been in order to achieve high classification accuracy. Hence, established, and (e) data analysis methods for aquatic even if some features are occluded or fail to be detected, insect communities are available [6]. Because of these ad- the method can still succeed. An additional advantage is vantages, biomonitoring using aquatic insects has been that only weak supervision (at the level of entire images) employed by federal, state, local, tribal, and private re- is necessary during training. source managers to track changes in river and stream A potential drawback of this approach is that it ig- health and to establish baseline criteria for water qual- nores some parts of the image, and hence loses some ity standards. Collection of aquatic insect samples for potentially useful information. In addition, it does not biomonitoring is inexpensive and requires relatively little capture the spatial relationships among the detected re- technical training. However, the sorting and identifica- gions. We believe that this loss of spatial information tion of insect specimens can be extremely time consum- is unimportant in this application, because all stoneflies ing and requires substantial technical expertise. Thus, share the same body plan and, hence, the spatial layout aquatic insect identification is a major technical bottle- of the detected features provides very little discrimina- neck for large-scale implementation of biomonitoring. tive information. Larval stoneflies are especially important for biomon- The bag-of-features approach involves five phases: (a) itoring because they are sensitive to reductions in wa- region detection, (b) region description, (c) region clas- ter quality caused by thermal pollution, eutrophication, sification into features, (d) combination of detected fea- sedimentation, and chemical pollution. On a scale of or- tures into a feature vector, and (e) final classification ganic pollution tolerance from 0 to 10, with 10 being of the feature vector. For region detection, we employ the most tolerant, most stonefly taxa have a value of 0, three different interest operators: (a) the Hessian-affine 1, or 2 [5]. Because of their low tolerance to pollution, detector [10], (b) the Kadir entropy detector [11], and change in stonefly abundance or taxonomic composition (c) a new detector that we have developed called the is often the first indication of water quality degradation. principal curvature-based region (PCBR) detector. The Most biomonitoring programs identify stoneflies to the combination of these three detectors gives better perfor- taxonomic resolution of Family, although when expertise mance than any single detector or pair of detectors. The is available Genus-level (and occasionally Species-level) combination was critical to achieving good classification identification is possible. Unfortunately, because of con- rates. straints on time, budgets, and availability of expertise, All detected regions are described using Lowe’s SIFT some biomonitoring programs fail to resolve stoneflies representation [1]. At training time, a Gaussian mixture (as well as other taxa) below the level of Order. This model (GMM) is fit to the set of SIFT vectors, and each results in a considerable loss of information and, poten- mixture component is taken to define a feature. The tially, in the

Load more