Vision-Based Classification of Mosquito Species

applied sciences

Article Vision-Based Classiﬁcation of Mosquito Species: Comparison of Conventional and Deep Learning Methods

Kazushige Okayasu, Kota Yoshida, Masataka Fuchida and Akio Nakamura *

Department of Robotics and Mechatronics, Tokyo Denki University, Tokyo 120-8551, Japan; [email protected] (K.O.); [email protected] (K.Y.); [email protected] (M.F.) * Correspondence: [email protected]; Tel.: +81-3-5284-5604

Received: 23 July 2019; Accepted: 10 September 2019; Published: 19 September 2019

Abstract: This study aims to propose a vision-based method to classify mosquito species. Toinvestigate the efficiency of the method, we compared two different classification methods: The handcraft feature-based conventional method and the convolutional neural network-based deep learning method. For the conventional method, 12 types of features were adopted for handcraft feature extraction, while a support vector machine method was adopted for classification. For the deep learning method, three types of architectures were adopted for classification. We built a mosquito image dataset, which included 14,400 images with three types of mosquito species. The dataset comprised 12,000 images for training, 1500 images for testing, and 900 images for validating. Experimental results revealed that the accuracy of the conventional method using the scale-invariant feature transform algorithm was 82.4% at maximum, whereas the accuracy of the deep learning method was 95.5% in a residual network using data augmentation. From the experimental results, deep learning can be considered to be effective for classifying the mosquito species of the proposed dataset. Furthermore, data augmentation improves the accuracy of mosquito species’ classification.

Keywords: computer vision; mosquito; convolutional neural network

1. Introduction Mosquitoes are the most important disease vector responsible for causing major deaths among children and adult population. A study conducted in 2015 revealed that more than 430,000 people die of malaria annually [1]. The expert in tropical medicine affirms that blood-sucking mosquitoes comprise Aedes, Anopheles, and Culex species. The activities of mosquitoes differ in terms of their time zones, behavior habits, and mediated infections. The classification of species is important because prevention and extermination of infectious diseases are different for each species. The identification of the species of mosquitoes is difficult for a layman. However, if vision-based classification of mosquito species from pictures captured by a smartphone or mobile device is available, the classification results can be utilized to educate and enlighten the population with inadequate knowledge of mosquitoes about their mediated infectious diseases, especially in Africa, Southeast Asia, and Central and South America, where several infectious diseases are epidemic. So far, some vision-based studies have been conducted to identify the bug species. Fuchida et al. [2] presented the design and experimental validation of an automated vision-based mosquito classification module. The module can identify mosquitoes from other bugs, such as bees and flies, by extracting the morphological features, followed by a support vector machine (SVM)-based classification. Using multiple classification strategies, experimental results that involve the classification between mosquitoes and a predefined set of other bugs demonstrated the efficacy and validity of the proposed approach,

Appl. Sci. 2019, 9, 3935; doi:10.3390/app9183935 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 3935 2 of 11 with a maximum recall of 98%. However, they did not involve the classification of mosquito species. By utilizing the random forest algorithm, Minakshi et al. [3] identified 7 species and 60 mosquitoes based on classifiers and denoting pixel values, with an accuracy of 83.3%. Moreover, the verification was insufficient because the number of images was approximately 60. In 2019, although a convolutional neural network (CNN)-based deep learning method is applied to study numerous vision-based classifications, CNN-based methods are yet to be applied to classify mosquito species. For fine-grained image recognition with a CNN-based deep learning framework, various datasets, such as flowers [4], birds [5], fruits and vegetables [6], animals, and plants [7], are proposed. The dataset of animals and plants [7] contains mosquito images. Because this dataset does not specifically focus on mosquitoes, only a small number of mosquito images are available. Alternatively, a dataset can be constructed to identify the mosquito species. In this study, we considered the mosquito species classification method. This study aimed to compare the handcraft feature-based conventional classification method and CNN-based deep learning method. By comparing the accuracy of the two aforementioned methods, we observed an effective method for identifying mosquito species, thereby improving the recognition accuracy. The rest of this study is organized as follows. Section2 constructs the mosquito dataset and Section3 explains the conventional and deep learning classification methods. The experimental verification and additional experimental validation are presented in Sections4 and5, respectively. Section6 summarizes the main contributions of our study.

2. Dataset Construction This section describes the preconditions and the dataset construction. As discussed in the introduction, Aedes, Anopheles, and Culex are the human blood-sucking mosquitoes. The most infectious species among the Aedes genus are Aedes albopictus and Aedes aegypti. The most infectious species among the Anopheles genus is Anopheles stephensi. The most infectious species among the Culex genus is Culex pipiens pallens. The time zone, behavior habits, and infectious diseases of Aedes albopictus and Aedes aegypti are similar. Experts may find challenges in determining the difference in appearance even with the aid of a microscope. Therefore, we targeted Aedes albopictus out of the two species because of its large global population. We utilized the images obtained by using a single-lens reflex camera to learn, while images obtained using a smartphone were used for testing and further validation. In addition, we assumed that the illuminance varied from 380 to 1340 lx and the background was white herein. Table1 presents the details of the shooting equipment. Figure1 depicts an example of the captured images. The region of the mosquito was manually clipped from the captured images. Figures2 and3 demonstrate the clipped mosquitoes from the images captured using a single-lens reflex camera and a smartphone, respectively. Each clipped image was rotated by 90◦, and all the rotated images were added to the dataset; the number of images had a four-fold increase. Consequently, we obtained the images of each mosquito species, consisting of 4000 images for training, 500 images for testing, and 300 images for validating. In general, the dataset with three types of mosquitoes comprised 12,000 images for training, 1500 images for testing, and 900 images for validating.

Table 1. Shooting equipment characteristics.

Single-Lens Reﬂex Camera Smartphone Product name EOS kiss X8i (Canon) iPhone6S (Apple) Picture size [pixel] 6000 4000 4032 3024 × × Resolution [dpi] 72 72 72 72 × × Lens model EF-S18-55 mm F3.5-5.6 IS STM (Canon) – Focal length [mm] 3.5–5.6 2.2 Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 11

Appl. Sci. 2019, 9, x FOR PEER REVIEWTable 1. Shooting equipment characteristics. 3 of 11 Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 11 Table 1. Single-lensShooting equipment Reflex Camera characteristics. Smartphone Product name EOS kiss X8i (Canon) iPhone6S (Apple) Picture size [pixel] Table 1. Single-lensShooting 6000 equipment Reflex× 4000 Camera characteristics. Smartphone4032 × 3024 Product name EOS kiss X8i (Canon) iPhone6S (Apple) Resolution [dpi] Single-lens 72 ×Reflex 72 Camera Smartphone 72 × 72 Picture size [pixel] 6000 × 4000 4032 × 3024 ProductLens model name EF-S18-55 EOSmm F3.5-5.6kiss X8i IS(Canon) STM (Canon) iPhone6S – (Apple) Resolution [dpi] 72 × 72 72 × 72 FocalPicture length size [mm][pixel] 6000 3.5–5.6 × 4000 4032 2.2 × 3024 Lens model EF-S18-55 mm F3.5-5.6 IS STM (Canon) – Resolution [dpi] 72 × 72 72 × 72 Focal length [mm] 3.5–5.6 2.2 Appl. Sci. 2019Lens, 9, model 3935 EF-S18-55 mm F3.5-5.6 IS STM (Canon) – 3 of 11 Focal length [mm] 3.5–5.6 2.2

(a) (b)

(a) (b) Figure 1. Examples of the captured images: (a) Mosquitoes captured using a single-lens reflex camera and (b) mosquitoes captured(a) using a smartphone. (b) Figure 1. Examples of the captured images: (a) Mosquitoes captured using a single-lens reflex camera Figureand (b)1. mosquitoesExamples ofcaptured the captured using images:a smartphone. (a) Mosquitoes captured using a single-lens reﬂex camera Figure 1. Examples of the captured images: (a) Mosquitoes captured using a single-lens reflex camera and (b) mosquitoes captured using a smartphone. and (b) mosquitoes captured using a smartphone.

(a) (b) (c)

(a) (b) (c) Figure 2. Clipped mosquitoes from the images captured using a single-lens reflex camera: (a) Aedes albopictusFigure 2.(, aClipped(b) ) Anopheles mosquitoes stephensi from, and the (c) imagesCulex pipiens captured(b) pallens using. a single-lens reﬂex camera:(c ()a ) Aedes Figure 2. Clipped mosquitoes from the images captured using a single-lens reflex camera: (a) Aedes albopictus,(b) Anopheles stephensi, and (c) Culex pipiens pallens. albopictus, (b) Anopheles stephensi, and (c) Culex pipiens pallens. Figure 2. Clipped mosquitoes from the images captured using a single-lens reflex camera: (a) Aedes albopictus, (b) Anopheles stephensi, and (c) Culex pipiens pallens.

(a) (b) (c)

Figure 3.(aClipped) mosquito from the images captured(b) using a smartphone: (a) Aedes(c albopictus) , Figure 3. Clipped mosquito from the images captured using a smartphone: (a) Aedes albopictus, (b) (b) Anopheles stephensi, and (c) Culex pipiens pallens. Anopheles stephensi(a) , and (c) Culex pipiens pallens. (b) (c) Figure 3. Clipped mosquito from the images captured using a smartphone: (a) Aedes albopictus, (b) 3. Classification Method Anopheles stephensi, and (c) Culex pipiens pallens. 3. ClassificationFigure 3. Clipped Method mosquito from the images captured using a smartphone: (a) Aedes albopictus, (b) Figure4 illustrates the system flow. We constructed a dataset of mosquito species and conducted Anopheles stephensi, and (c) Culex pipiens pallens. a3. classificationClassification using Method handcraft features or deep learning. The classifications using handcraft features and3. Classification classification Method using deep learning are hereinafter referred to as conventional classification and deep classification, respectively. Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 11

Figure 4 illustrates the system flow. We constructed a dataset of mosquito species and conducted a classification using handcraft features or deep learning. The classifications using handcraft features and classification using deep learning are hereinafter referred to as conventional classification and deep classification, respectively.

3.1. Conventional Classification In this section, we present the details of conventional classification. Twelve types of handcraft features, including shape, color, texture, and frequency, were adopted for handcraft feature extraction and the SVM method was adopted for classification. For the shape features, we extracted speeded-up robust features (SURF) [8], scale-invariant feature transform (SIFT) [9], dense SIFT [10], histogram of oriented gradients (HOG) [11], co- occurrence HOG (CoHOG) [12], extended CoHOG (ECoHOG) [13], and local binary pattern (LBP) [14]. SURF and SIFT comprised of two algorithms of feature point detection and feature description. Feature point detection is robust with respect to the changing scale and noise, and feature description is robust with respect to illumination change and rotation. Dense SIFT uses a grid pattern for feature points and denotes the same feature description as that denoted by SIFT. An example of the detected key point is depicted in Figure 5. The features extracted from SURF, SIFT, and dense SIFT were respectively clustered using k-means and vector quantization, conducted using a bag-of-features [15]. HOG is a histogram of the luminance gradient, and CoHOG is a histogram of the appearance frequency of a combination of the luminance gradients. ECoHOG is a histogram of the intensity of a combination of the luminance gradients. The LBP transforms into an LBP image that is robust to illumination change and exhibits its luminance histogram. As the color features, we calculated color histograms in each colorimetric systems of red, green, blue (RGB), hue, saturation, value (HSV), and L * a * b * [16]. The RGB color system uses a simple concatenation of R, G, and B channel histograms. HSV is a color feature that uses the HSV color specification system; when the color saturation is low in the HSV color system, the hue value becomes unstable. Therefore, the histogram of each channel excludes the value when the saturation value is 10% or less. L * a * b * is a color feature that uses the L * a * b * color system and a three-dimensional histogram of L *, a *, b *. For the texture features, we calculated a co-occurrence matrix feature (GLCM: gray level co- occurrence matrix). We used contrast, dissimilarity, homogeneity, energy, correlation, and angular secondary moment as the statistical values of GLCM [17]. For the frequency features, we utilized GIST [18], which is characterized by imitating human perception. GIST divides the image into small 4 × 4 blocks and extracts the structure of the entire image using filters with different frequencies and scales. Classification of the mosquito species was conducted using SVM and the aforementioned

Appl.features. Sci. 2019 SVM, 9, 3935 considered a soft margin, and multiclass classification was performed using a4 one- of 11 versus-rest method.

FigureFigure 4.4. System flow.flow. 3.1. Conventional Classification In this section, we present the details of conventional classification. Twelve types of handcraft features, including shape, color, texture, and frequency, were adopted for handcraft feature extraction and the SVM method was adopted for classification. For the shape features, we extracted speeded-up robust features (SURF) [8], scale-invariant feature transform (SIFT) [9], dense SIFT [10], histogram of oriented gradients (HOG) [11], co-occurrence HOG (CoHOG) [12], extended CoHOG (ECoHOG) [13], and local binary pattern (LBP) [14]. SURF and SIFT comprised of two algorithms of feature point detection and feature description. Feature point detection is robust with respect to the changing scale and noise, and feature description is robust with respect to illumination change and rotation. Dense SIFT uses a grid pattern for feature points and denotes the same feature description as that denoted by SIFT. An example of the detected key point is depicted in Figure5. The features extracted from SURF, SIFT, and dense SIFT were respectively clustered using k-means and vector quantization, conducted using a bag-of-features [15]. HOG is a histogram of the luminance gradient, and CoHOG is a histogram of the appearance frequency of a combination of the luminance gradients. ECoHOG is a histogram of the intensity of a combination of the luminance gradients. The LBP transforms into an LBP image that is robust to illumination change and exhibits its Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 11 luminance histogram.

(a) (b) (c)

Figure 5. Visualization of keypoint detection: (a) Speeded-up robust features (SURF), (b) scale-invariant Figure 5. Visualization of keypoint detection: (a) Speeded-up robust features (SURF), (b) scale- feature transform (SIFT), and (c) dense SIFT. invariant feature transform (SIFT), and (c) dense SIFT. As the color features, we calculated color histograms in each colorimetric systems of red, green, 3.2. Deep Classification blue (RGB), hue, saturation, value (HSV), and L * a * b * [16]. The RGB color system uses a simple concatenationIn this section, of R, G,we and give B channelthe details histograms. of the CNN-based HSV is a color deep feature classifi thatcation. uses Three the HSV types color of architectures,speciﬁcation system;including when AlexNet the color [19], saturation Visual Geometry is low in Group the HSV Network color system, (VGGNet the hue） [20], value and becomes Deep residualunstable. network Therefore, (ResNet the histogram) [21], were of adopted each channel for classification. excludes the value when the saturation value is 10%These or less. architectures L * a * b * is aare color often feature used that for usesobject the re Lcognition. * a * b * color AlexNet system was and used a three-dimensional as it has been proposed.histogram Note of L *,that a *, VGGNet b *. uses 16 layers of network and that ResNet uses 18 layers of network structure. The CNN learning method was conducted using a stochastic gradient descent method, and the learning rate schedule was obtained using simulated annealing. The ImageNet pre-trained weights [22] were used as initial weights to accelerate the convergence of learning. As explained in Section 2, we captured mosquito images with a simple white background. Images with the little-noised simple background can lead to overfitting images for training and a declined generality. To suppress overfitting, the training images were transformed by the following data augmentation steps. • Rotation: Rotate the image by angle 𝜃

Angle 𝜃 varies randomly in the range from −𝜃 to 𝜃. Here, the limit value 𝜃 is set to be 45°. • Brightness change: Randomly change the brightness of the image.

Change rate 𝛼 is chosen in the range from −𝛼 to 𝛼. Here, the limit value 𝛼 is set to be 40%. • Contrast change: Randomly change the contrast of the image.

Change rate 𝛼 is chosen in the range from −𝛼 to 𝛼. Here, the limit value 𝛼 is set to be 40%. • Saturation change: Randomly change the saturation of the image.

Change rate 𝛼 is chosen in the range from −𝛼 to 𝛼. Here, the limit value 𝛼 is set to be 40%. • Hue change: Randomly change the hue of the image.

Change rate 𝛼 is chosen in the range from −𝛼 to 𝛼. Here, the limit value 𝛼 is set to be 25%.

The limit values, 𝜃 and 𝛼, were determined a priori by reference to the default ones used in the iNaturalist Competition 2018 Training Code. It is noteworthy that the number of training images was constant since the transformed image was replaced by the original one. An example of the data augmentation is depicted in Figure 6. The variation of training images increases in terms of color, brightness, and rotation fluctuation based on the aforementioned processing. Appl. Sci. 2019, 9, 3935 5 of 11

For the texture features, we calculated a co-occurrence matrix feature (GLCM: gray level co-occurrence matrix). We used contrast, dissimilarity, homogeneity, energy, correlation, and angular secondary moment as the statistical values of GLCM [17]. For the frequency features, we utilized GIST [18], which is characterized by imitating human perception. GIST divides the image into small 4 4 blocks and extracts the structure of the entire × image using filters with different frequencies and scales. Classification of the mosquito species was conducted using SVM and the aforementioned features. SVM considered a soft margin, and multiclass classification was performed using a one-versus-rest method.

3.2. Deep Classification In this section, we give the details of the CNN-based deep classification. Three types of architectures, including AlexNet [19], Visual Geometry Group Network (VGGNet) [20], and Deep residual network (ResNet) [21], were adopted for classification. These architectures are often used for object recognition. AlexNet was used as it has been proposed. Note that VGGNet uses 16 layers of network and that ResNet uses 18 layers of network structure. The CNN learning method was conducted using a stochastic gradient descent method, and the learning rate schedule was obtained using simulated annealing. The ImageNet pre-trained weights [22] were used as initial weights to accelerate the convergence of learning. As explained in Section2, we captured mosquito images with a simple white background. Images with the little-noised simple background can lead to overfitting images for training and a declined generality. To suppress overfitting, the training images were transformed by the following data augmentation steps.

Rotation: Rotate the image by angle θ. Angle θ varies randomly in the range from θr to θr. Here, • − the limit value θr is set to be 45◦. Brightness change: Randomly change the brightness of the image. Change rate α is chosen in the • range from αr to αr. Here, the limit value αr is set to be 40%. − Contrast change: Randomly change the contrast of the image. Change rate α is chosen in the • range from αr to αr. Here, the limit value αr is set to be 40%. − Saturation change: Randomly change the saturation of the image. Change rate α is chosen in the • range from αr to αr. Here, the limit value αr is set to be 40%. − Hue change: Randomly change the hue of the image. Change rate α is chosen in the range from • αr to αr. Here, the limit value αr is set to be 25%. −

The limit values, θr and αr, were determined a priori by reference to the default ones used in the iNaturalist Competition 2018 Training Code. It is noteworthy that the number of training images was constant since the transformed image was replaced by the original one. An example of the data augmentation is depicted in Figure6. The variation of training images increases in terms of color, brightness, and rotation ﬂuctuation based on the aforementioned processing. Appl.Appl. Sci. Sci. 20192019,, 99,, x 3935 FOR PEER REVIEW 6 6of of 11 11

(a) Hue −25% (b) Hue 0% (c) Hue +25%

(d) Contrast −40% (e) Contrast 0% (f) Contrast +40%

Figure 6. Examples of data argumentation. Figure 6. Examples of data argumentation. 4. Mosquito Species Classification Experiment 4. Mosquito Species Classification Experiment 4.1. Experimental Conditions 4.1. Experimental Conditions The mosquitoes were classified using the dataset presented in Section2. We used Aedes albopictus, AnophelesThe mosquitoes stefensi, and Culexwere pipiensclassified pallens usingas speciesthe da classificationtaset presented targets. in Section The images 2. We captured used Aedes using albopictusa single-lens, Anopheles reflex camera stefensi were, and used Culex for pipiens training, pallens and theas species images capturedclassificati usingon targets. a smartphone The images were capturedused for theusing test. a single-lens We used the reflex dataset camera with were three used types for of mosquitoestraining, and consisting the images of 12,000captured images using for a smartphonetraining and were 1500 used images for for the testing. test. We In used the case the ofdata theset deep with classification, three types of training mosquitoes and testing consisting phases of 12,000were performed images for five training times, and and 1500 the averageimages for classification testing. In accuracythe case of was the calculated. deep classification, training and testing phases were performed five times, and the average classification accuracy was calculated. 4.2. Experimental Results 4.2. Experimental Results Tables2 and3 show the aggregated accuracies over the three considered species of conventional and deepTables classifications, 2 and 3 show respectively.the aggregated accuracies over the three considered species of conventional and deep classifications, respectively. InTable conventional 2. Result of conventionalclassification, classification. parameters Speeded-up for features robust were features set (SURF);to default histogram values of orientedadopted in OpenCVgradient libraries (HOG); on co-occurrencePython. The HOGhyperparameters (CoHOG); extended of SVM CoHOG were determined (ECoHOG); localbased binary on preliminary pattern experiments.(LBP); red, The green, error-term blue (RGB); penalty hue, parameter, saturation, value C, was (HSV); set to gray one level and co-occurrencethe kernel function matrix (GLCM);was defined as a linearGIST. L*a*b*.kernel. The SVM classifier with these parameters had the highest accuracy in the grid search. Features Accuracy [%] Features Accuracy [%] The classification accuracies with SIFT and SURF are high. By applying the fact that the SURF 81.6 LBP 57.3 classification with dense SIFT exhibits low accuracy, it turns out that an algorithm that detects feature SIFT 82.4 RGB 30.7 points and describes local features is effective. The deep classificationDense SIFT has a lower 55.9 classification accuracy HSV than the conventional 37.4 classification unless it is by dataHOG augmentation. By applying 54.7 data augmentation, L*a*b* the deep classification 33.7 has higher accuracy than theCoHOG conventional classification. 57.0 Therefore, data GLCM augmentation is 53.6effective. ResNet has the highest discriminationECoHOG accuracy of 61.1 95.5%, indicating GIST that deep classification 71.5 is effective for mosquito species classification. Figure 7 shows the confusion matrices. Figure 7a shows one from the conventional classification with SIFT, which achieved the highest accuracy in Table 2. Each species is misclassified from each other in comparison with the deep classification result. Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 11

Figure 7b shows one from the deep classification with ResNet, which achieved the highest accuracy in Table 3. This matrix was calculated from the result of one training and testing phase among five ones. Aedes albopictus and Culex pipiens pallens are relatively misclassified as each other, whereas Anopheles stefensi is relatively classified appropriately. Figure 8 shows an image visualizing the target spot in gradient-weighted class activation mapping (Grad-CAM), which is a visualization method [23] for deep classification. The heatmap covers the body of the mosquito. Aedes albopictus and Culex pipiens pallens have a similar shape but their body colors are different. Nevertheless, Aedes albopictus and Culex pipiens pallens can be classified based on similar body parts from the visualization result. The images captured using a smartphone do not indicate the color differences when the shooting environment is dark because a smartphone camera is unable to offer a quality shot when compared with the single-lens reflex cameras. The above facts lead to misclassification. Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 11 Table 2. Result of conventional classification. Speeded-up robust features (SURF); histogram of Figure oriented7b shows gradient one from (HOG); the co-o deepccurrence classificat HOGion (CoHOG); with ResNet, extended which CoHOG achieved (ECoHOG); the localhighest binary accuracy in patternTable 3. (LBP); This red, matrix green, was blue calculated (RGB); hue, from saturation, the result value of (HSV); one training gray level and co-occurrence testing phase matrix among five ones.(GLCM); Aedes GIST. albopictus L*a*b* and Culex pipiens pallens are relatively misclassified as each other, whereas Anopheles stefensi is relativelyFeatures classified Accuracy appropriately. [%] Features Accuracy [%] Figure 8 shows an image SURFvisualizing the 81.6target spot inLBP gradient-weighted 57.3 class activation mapping (Grad-CAM), which isSIFT a visualization 82.4me thod [23] RGBfor deep classification.30.7 The heatmap covers the body of the mosquito.Dense SIFT 55.9 HSV 37.4 Aedes albopictus and Culex pipiensHOG pallens have54.7 a similar shapeL*a*b* but their body33.7 colors are different. Nevertheless, Aedes albopictus andCoHOG Culex pipiens pallens 57.0 can be GLCMclassified based53.6 on similar body parts Appl. Sci. 2019, 9, 3935 7 of 11 from the visualization result. ECoHOGThe images captured61.1 using a smartphoneGIST do71.5 not indicate the color differences when the shooting environment is dark because a smartphone camera is unable to offer ( ); a qualityTable shot 3.TableResult when 3. Result of deepcompared of classification. deep classification.with the AlexNet; single-l AlexNet; Visualens Visual Geometryreflex Geometry cameras. Group Group Network The Network above (VGGNet); factsVGGNet Deeplead Deepto ( ). misclassification.residualresidual network network (ResNet). ResNet AccuracyAccuracy [%] [%] Table 2. Result of conventionalArchitecture classification.Architecture Speeded-up DataData Augmentation robust Augmentation features (SURF); histogram of oriented gradient (HOG); co-occurrence HOG (CoHOG); extended CoHOG✓ (ECoHOG); local binary pattern (LBP); red, green,AlexNet blue (RGB); hue,AlexNet saturation, 69.8 value69.8 (HSV); gray92.3 level co-occurrence matrix (GLCM); GIST. L*a*b* VGGNet VGGNet 81.2 81.2 91.0 ResNet 74.4 95.5 Features AccuracyResNet [%] Features74.4 Accuracy95.5 [%] SURF 81.6 LBP 57.3 In conventional classification,SIFT parameters82.4 for featuresRGB were set30.7 to default values adopted in OpenCV libraries onDense Python. SIFT The hyperparameters55.9 ofHSV SVM were determined37.4 based on preliminary experiments. The error-termHOG penalty parameter,54.7C, was L*a*b* set to one and 33.7 the kernel function was defined as a linear kernel. The SVMCoHOG classifier with 57.0 these parameters GLCM had the highest 53.6 accuracy in the grid search. The classificationECoHOG accuracies with 61.1 SIFT and SURF GIST are high. 71.5 By applying the fact that the classification with dense SIFT exhibits low accuracy, it turns out that an algorithm that detects feature pointsTable and 3. describesResult of deep local classification. features is e AlexNet;ffective. Visual Geometry Group Network (VGGNet); Deep residualThe deep network classification (ResNet). has a lower classification accuracy than the conventional classification unless it is by data augmentation. By applying data augmentation, the deep classification has higher Accuracy [%] accuracy than the conventionalArchitecture classification. Therefore,Data Augmentation data augmentation is effective. ResNet has the highest discrimination accuracy of 95.5%, indicating that deep✓ classification is effective for mosquito species classification. AlexNet 69.8 92.3 Figure7 shows the confusionVGGNet matrices. Figure 81.27a shows one 91.0 from the conventional classification with SIFT, which achieved the highestResNet accuracy in Table 74.42. Each 95.5 species is misclassified from each other in comparison with the deep classification result.

(a) Conventional classificatio with SIFT (b) Deep Classification with ResNet

Figure 7. Example results of the confusion matrices.

Figure7b shows one from the deep classification with ResNet, which achieved the highest accuracy in Table3. This matrix was calculated from the result of one training and testing phase among five ones. Aedes albopictus and Culex pipiens pallens are relatively misclassified as each other, whereas Anopheles stefensi is relatively classified appropriately. Figure8 shows an image visualizing the target spot in gradient-weighted class activation mapping (Grad-CAM), which is a visualization method [23] for deep classification. The heatmap covers the body of the mosquito. Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 11

Appl. Sci. 2019, 9, 3935 Figure 7. Example results of the confusion matrices. 8 of 11

(a) (b) (c)

FigureFigure 8. 8. ActivationActivation visualization visualization by by gradient-weigh gradient-weightedted class class activation activation mapping mapping (Grad-CAM): (Grad-CAM): (a) Aedes(a) Aedes albopictus albopictus, (b),( Anophelesb) Anopheles stephensi stephensi, and, and (c) Culex (c) Culex pipiens pipiens pallens pallens. .

5. EffectiveAedes albopictus Data Augmentationand Culex pipiens Investigation pallens have Experiment a similar shape but their body colors are different. Nevertheless, Aedes albopictus and Culex pipiens pallens can be classified based on similar body parts from the visualizationWe showed result.that the The deep images classification captured using achieves a smartphone a higher do classification not indicate the accuracy color di ffbyerences data augmentationwhen the shooting in Section environment 4. In this is section, dark because as further a smartphone validation, camerawe investigated is unable which to offer transformation a quality shot stepwhen of compared the data augmentation with the single-lens is effective reflex for cameras. deep classification. The above facts lead to misclassification.

5.1.5. E Experimentalffective Data Conditions Augmentation Investigation Experiment AsWe explained showed thatin Section the deep 3.2, classificationwe applied transformation achieves a higher steps classificationto the training accuracy images for by data augmentation.augmentation inHowever, Section4. it In is this not section, clear aswhich further step validation, contributes we investigatedthe most to whichthe classification. transformation In addition,step of the the data range augmentation values, 𝜃 isand eff ective𝛼, were for determined deep classification. a priori. By the data augmentation described in Section 3.2, we verified the step which contributes the most5.1. Experimental to the classification Conditions accuracy. InAs this explained experiment, in Section we chose 3.2 ,one we transformation applied transformation step from stepsthe steps to the explained training in images Section for 3.2 dataand appliedaugmentation. the chosen However, transformation it is not clear separately which step to the contributes training theimages. most to the classification. In addition, •the rangeRotation: values, rotateθr andthe imageαr, were by determined angle 𝜃. a priori. By the data augmentation described in Section 3.2, we verified the step which contributes the Angle 𝜃 varies randomly in the range from −𝜃 to 𝜃. •mostBrightness to the classification change: Randomly accuracy. change the brightness of the image. In this experiment, we chose one transformation step from the steps explained in Section 3.2 and Change rate 𝛼 is chosen in the range from −𝛼 to 𝛼. •appliedContrast the chosen change: transformation Randomly change separately the contrast to the training of the image. images.

Change rate 𝛼 is chosen in the range from −𝛼 to 𝛼. Rotation: rotate the image by angle θ. Angle θ varies randomly in the range from θr to θr. •• Saturation change: Randomly change the saturation of the image. − Brightness change: Randomly change the brightness of the image. Change rate α is chosen in the Cha• nge rate 𝛼 is chosen in the range from −𝛼 to 𝛼. range from αr to αr. • Hue change:− Randomly change the hue of the image. Contrast change: Randomly change the contrast of the image. Change rate α is chosen in the Cha• nge rate 𝛼 is chosen in the range from −𝛼 to 𝛼. range from αr to αr. − Saturation change: Randomly change the saturation of the image. Change rate α is chosen in the • range from αr to αr. − Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 11

Regarding rotation, the limit value 𝜃 varied at 5° intervals from 0° to 45°. Therefore, the 10 conditions were prepared. Conditions A to J denote the ranges of angle 𝜃; [0 (no rotation,𝜃 = 0)], [−5 to 5 (𝜃 = 5)], …, [−45 to 45 (𝜃 = 45)], respectively. Regarding the brightness, contrast, saturation, or hue change, the limit value 𝛼 varied at five intervals from 0% to 50%. Therefore, the 11 by 4 conditions were prepared. Conditions A to K denote

the ranges of change rate 𝛼; [0 (no change, 𝛼 = 0)], [−5 to 5 (𝛼 = 5)], …, [−50 to 50 (𝛼 = 50)], Appl.respectively. Sci. 2019, 9 , 3935 9 of 11 We utilized the dataset with three types of mosquitoes, comprising 12,000 images for training and Hue900 images change: for Randomly validation. change Training the hue and of valid the image.ation were Change performed rate α is five chosen times in by the considering range from •each condition, and the average of classification accuracy was calculated. Generally, 270 (54 αr to αr. conditions− by five trials) trials of training and validation were performed. Regarding rotation, the limit value θr varied at 5◦ intervals from 0◦ to 45◦. Therefore, the 105.2. conditions Experimental were Results prepared. Conditions A to J denote the ranges of angle θ; [0 (no rotation,θr = 0)], [ 5 to 5 (θr = 5)], ... ,[ 45 to 45 (θr = 45)], respectively. − Figure 9 shows the− experimental results. The error bar shows the standard deviation. Regarding the brightness, contrast, saturation, or hue change, the limit value α varied at five Regarding rotation, the classification accuracy is almost the same at any condition.r As mentioned intervals from 0% to 50%. Therefore, the 11 by 4 conditions were prepared. Conditions A to K denote in Section 2, the clipped training image was rotated by 90°, and all the rotated images were added to the ranges of change rate α; [0 (no change, αr = 0)], [ 5 to 5 (αr = 5)], ... ,[ 50 to 50 αr = 50)], respectively. the dataset. We observed that the rotation variat−ion seems to be retained− in advance; hence, the We utilized the dataset with three types of mosquitoes, comprising 12,000 images for training and rotation has small effects on the accuracy in this experiment that we conducted. 900 images for validation. Training and validation were performed five times by considering each Regarding brightness, saturation, or hue changes, the accuracy improves up to the condition E condition, and the average of classification accuracy was calculated. Generally, 270 (54 conditions by (𝛼 = 25) or G (𝛼 = 30), and there are little changes afterward. five trials) trials of training and validation were performed. Regarding contrast change, the accuracy improves with the increase in the change rate. In 5.2.condition Experimental K(𝛼 = Results 50) of contrast, the highest classification accuracy achieved is 89.1%. From the above, the fluctuation of contrast offers the most improvement in terms of the accuracy of mosquitoFigure9 speciesshows theclassification. experimental results. The error bar shows the standard deviation.

(a) (b)

(c) (d) (e) Figure 9. Classification accuracy due to variations in the data argumentation steps: (a) rotation, Figure 9. Classification accuracy due to variations in the data argumentation steps: (a) rotation, (b) (b) brightness change, (c) contrast change, (d) saturation change, and (e) hue change. brightness change, (c) contrast change, (d) saturation change, and (e) hue change. Regarding rotation, the classification accuracy is almost the same at any condition. As mentioned 6. Conclusion in Section2, the clipped training image was rotated by 90 ◦, and all the rotated images were added to theThis dataset. study We compared observed the that handcraft the rotation feature-based variation conventional seems to be classification retained in advance; method and hence, CNN- the rotationbased deep has smallclassification effects on method. the accuracy We constructed in this experiment a dataset that for we classifying conducted. mosquito species. For conventionalRegarding classification, brightness, saturation,shape, color, or texture, hue changes, and frequency the accuracy were improves adopted upfor tohandcraft the condition feature E (αr = 25) or G (αr = 30), and there are little changes afterward. Regarding contrast change, the accuracy improves with the increase in the change rate. In condition K(αr = 50) of contrast, the highest classification accuracy achieved is 89.1%. From the above, the fluctuation of contrast offers the most improvement in terms of the accuracy of mosquito species classification. Appl. Sci. 2019, 9, 3935 10 of 11

6. Conclusions This study compared the handcraft feature-based conventional classification method and CNN-based deep classification method. We constructed a dataset for classifying mosquito species. For conventional classification, shape, color, texture, and frequency were adopted for handcraft feature extraction, and the SVM method was adopted for classification. For deep classification, three types of architectures were adopted and compared. The deep classification had a lower classification accuracy than the conventional classification unless by data augmentation. However, by data augmentation, deep classification had a higher accuracy than the conventional classification. ResNet achieved the highest discrimination accuracy of 95.5%, indicating that deep classification is effective for mosquito species classification. Furthermore, we verified that data augmentation with the fluctuation of contrast contributes the most improvement.

Author Contributions: K.O. and K.Y. conceived and designed the experiments; K.O. and K.Y. performed the experiments; K.O., K.Y., M.F., and A.N. analyzed the data; K.O., K.Y., M.F. and A.N. wrote the paper. Funding: This work was partially supported by Research Institute for Science and Technology of Tokyo Denki University Grant Number Q19J-03/Japan. Acknowledgments: I wish to acknowledge Hirotaka Kanuka, Chisako Sakuma, and Hidetoshi Ichimura, Department of Tropical Medicine, the Jikei University School of Medicine, for valuable suggestions, infomative advices, and dedicated help to collect mosquito images. Conﬂicts of Interest: The authors declare no conﬂict of interest.

References

1. World Health Organization: Mosquito-Borne Diseases. Available online: http://www.who.int/neglected_ diseases/vector_ecology/mosquito-borne-diseases/en/ (accessed on 1 April 2017). 2. Fuchida, M.; Pathmakumar, T.; Mohan, R.E.; Tan, N.; Nakamura, A. Vision-based perception and classification of mosquitoes using support vector machine. Appl. Sci. 2017, 7, 51. [CrossRef] 3. Minakshi, M.; Bharti, P.; Chellappan, S. Identifying mosquito species using smart-phone camera. In Proceedings of the 2017 European Conference on Networks and Communications (EuCNC), Oulu, Finland, 12–15 June 2017. 4. Nilsback, M.E.; Zisserman, A. Automated flower classification over a large number of classes. In Proceedings of the 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP), Bhubaneswar, India, 16–19 December 2008. 5. Welinder, P.; Branson, S.; Mita, T.; Wah, C.; Schroff, F.; Belongie, S.; Perona, P. Caltech-UCSD Birds 200. In Technical Report CNS-TR-2010-001; California Institute of Technology: Pasadena, CA, USA, 2010. 6. Hou, S.; Feng, Y.; Wang, Z. VegFru: A domain-specific dataset for fine-grained visual categorization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. 7. Horn, G.V.; Aodha, O.M.; Song, Y.; Cui, Y.; Sun, C.; Shepard, A.; Adam, H.; Perona, P.; Belongie, S. The inaturalist species classification and detection dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionin, Salt Lake City, UT, USA, 18–22 June 2018. 8. Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded up robust features. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2006. 9. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 2004, 60, 91–110. [CrossRef] 10. Liu, C.; Yuen, J.; Torralba, A.; Sivic, J.; Freeman, W.T. SIFT flow: Dense correspondence across different scenes. In Proceedings of the European Conference on Computer Vision (ECCV), Marseille, France, 12–18 October 2008. 11. Dalal, N.; Triggs, B. Histogram of oriented gradients for human detection. In Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005. Appl. Sci. 2019, 9, 3935 11 of 11

12. Watanabe, T.; Ito, S.; Yokoi, K. Co-occurrence histograms of oriented gradients for pedestrian detection. In Proceedings of the Pacific-Rim Symposium on Image and Video Technology (PSIVT), Tokyo, Japan, 13–16 January 2009. 13. Kataoka, H.; Aoki, Y. Symmetrical judgment and improvement of CoHOG feature descriptor for pedestrian detection. In Proceedings of the Machine Vision Applications (MVA), Nara, Japan, 13–15 June 2011. 14. Ojala, T.; Pietikainen, M.; Harwood, D. A comparative study of texture measures with classification based on feature distributions. Pattern Recognit. 1996, 29, 51–59. [CrossRef] 15. Csurka, G.; Dance, C.R.; Fan, L.; Willamowski, J.; Bray, C. Visual categorization with bags of keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Prague, Czech Republic, 11–14 May 2004. 16. Palermo, F.; Hays, J.; Efros, A.A. Dating historical color images. In Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy, 7–13 October 2012. 17. Haralick, R.M.; Shanmugam, K. Textural features for image classification. IEEE Trans. Syst. Man Cybernet. 1973, 610–621. [CrossRef] 18. Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comp. Vis. 2001, 3, 145–175. [CrossRef] 19. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012. 20. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. 21. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. 22. He, K.; Girshick, R.; Dollar, P. Rethinking imagenet pre-training. arXiv 2018, arXiv:1811.08883. 23. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).