applied sciences

Article A Convolutional Neural Network Based Auto Features Extraction Method for Classification with Electronic Tongue

Yuan hong Zhong 1,* , Shun Zhang 1, Rongbu He 2, Jingyi Zhang 1, Zhaokun Zhou 1, Xinyu Cheng 1, Guan Huang 1 and Jing Zhang 1 1 Department of School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, ; [email protected] (S.Z.); [email protected] (J.Z.); [email protected] (Z.Z.); [email protected] (X.C.); [email protected] (G.H.); [email protected] (J.Z.) 2 Electric Power Research Institute of Guizhou Power Grid Co., Ltd., Guizhou 550007, China; [email protected] * Correspondence: [email protected]; Tel.: +86-1375-2908-255

 Received: 11 May 2019; Accepted: 18 June 2019; Published: 20 June 2019 

Abstract: Feature extraction is a key part of the electronic tongue system. Almost all of the existing features extraction methods are “hand-crafted”, which are difficult in features selection and poor in stability. The lack of automatic, efficient and accurate features extraction methods has limited the application and development of electronic tongue systems. In this work, a convolutional neural network-based auto features extraction strategy (CNN-AFE) in an electronic tongue (e-tongue) system for tea classification was proposed. First, the sensor response of the e-tongue was converted to time-frequency maps by short-time Fourier transform (STFT). Second, features were extracted by convolutional neural network (CNN) with time-frequency maps as input. Finally, the features extraction and classification results were carried out under a general shallow CNN architecture. To evaluate the performance of the proposed strategy, experiments were held on a tea database containing 5100 samples for five kinds of tea. Compared with other features extraction methods including features of raw response, peak-inflection point, discrete cosine transform (DCT), discrete wavelet transform (DWT) and singular value decomposition (SVD), the proposed model showed superior performance. Nearly 99.9% classification accuracy was obtained and the proposed method is an approximate end-to-end features extraction and pattern recognition model, which reduces manual operation and improves efficiency.

Keywords: electronic tongue; tea classification; auto features extraction; convolutional neural network

1. Introduction Tea is one of the most prevailing beverages across the world. The practice of drinking tea has been a long history in China. Tea contains theine, cholestenone, inose, folic acid and other components, which can improve humanity health. In the actual tasting process, this kind of mellow and fragrant taste of tea stimulates people’s taste buds. There are many external conditions that affect the taste of tea, for example, number of tea leaves added, tea making utensils, tea making time, water quality and the way tea is stored. The synergistic effects of these factors make the unique taste of tea. Organic compounds with different chemical structures and concentrations play a significant role in the quality of tea. The ingredients in tea are very complicated, the most important of which are tea polyphenols, amino acids, alkaloids and other aromatic substances. Tea classification has a wide range of application scenarios. It plays an important role in tea quality estimation, for example, new or old, true or fake judgment of tea. Moreover, the classification

Appl. Sci. 2019, 9, 2518; doi:10.3390/app9122518 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 2518 2 of 15

Appl. Sci. 2019, 9, x FOR PEER REVIEW 2 of 15 and identification of tea provide a reference for healthy tea drinking. What’s more, the classification and recognitionrecognition ofof tea tea are are more more likely likel toy provideto provide the th basise basis for the for designthe design of smart of s teapotmart in the future. in the future.Traditional tea quality assessment methods are based on different analytical instruments, such as highTraditional performance tea liquid quality chromatography assessment methods [1], gas are chromatography based on different [2] andanalytical plasma instruments, atomic emission such asspectrometry high performance [3]. However, liquid these chromatography methods require [1] a, lotgas of chromatography technical personnel, [2] materialand plasma and financial atomic emissionsupport, whichspectrometry lead to [3]. low However, efficiency these and largermethods overhead require [a4 ].lot With of technical the development personnel, ofmaterial sensor andtechnology, financial the support, advantages which of lead methods to low based efficiency on sensor and technology larger overhead are more [4]. distinguished.With the development The typical of sensoradvantages technology, are high the accuracy, advantages simple of methods operation based and on fast sensor detection, technology which are improve more distinguished the efficiency. Theof tea typical quality advantages inspection are obviously. high accuracy, At the simple same time, operation the electronic and fast nosedetection, for gas which analysis improve has alsothe efficiencymade technological of tea quality breakthroughs inspection [obviously5,6]. The arrays. At the of same electrochemical time, the electronic sensors and nose devices for gas have analysis been hasdesigned also made for the technological analysis of complex breakthroughs liquid samples,[5,6]. The such arrays as tasteof electrochemical sensor and electronic sensors tongues and devices [7,8]. Ashave a modernbeen designed intelligent for sensorythe analysis instrument, of complex electronic liquid tongue samples, is skillful such as in taste monitoring sensor theand production electronic toncyclegues of beverage,[7,8]. As a and modern has the intelligent advantages sensory of simple, instrument, fast and electronic low cost, tongue which showsis skillful huge in monitoring potential in thebeverage production quality cycle evaluation of beverage, [9]. and has the advantages of simple, fast and low cost, which shows hugeFigure potential1 shows in beverage the basic quality flow of evaluation the electronic [9]. tongue system for beverage detection and quality evaluation. First, the response signals from the e-tongue hardware system are collected. Then, features of sampling raw data are extracted for pattern recognition.

Electronic tongue Sensor response Feature extraction Pattern recognition hardware system

Figure 1. TheThe basic basic flow flow of the electronic tongue system for liquid detection.

Pattern recognition and features extraction are the two most important parts of the electronic Pattern recognition and features extraction are the two most important parts of the electronic tongue system for liquid classification. Many scholars have contributed to tea pattern recognition. tongue system for liquid classification. Many scholars have contributed to tea pattern recognition. For instance, principal component analysis (PCA) [10], artificial neural network (ANN) [11], support For instance, principal component analysis (PCA) [10], artificial neural network (ANN) [11], support k k vector machine (SVM) [9,12 [9,12]],, k--nearestnearest neighbor algorithm ( k--NN)NN) [13] [13],, r randomandom forest (RF) [14] [14] and autoregressive (AR) (AR) model model [4 ][4 have] have been been put put forward forward for classification for classification analysis analysis in the e-tonguein the e-tongue system. system.Features Features extraction extraction is another is an criticalother critical step of step the of e-tongue the e-tongue as the as quality the quality of features of features selection selection will willdirectly directly affect theaffect quality the ofquality pattern of recognition. pattern rec Generallyognition. speaking,Generally feature speaking, extraction feature is advantageous extraction is advantageousfor three main purposes, for three namely: main purposes, (1) Reduction namely: of random (1) Reduction noise, (2) of reduction random of noise, unwanted (2) reduction systematic of variationsunwanted which systematic are often variations due to experimental which are conditions, often due (3) to data experimental compression in conditions order to capture, (3) data the compressionmost relevant in information order to capture from signals.the most There relevant are diinformationfferences in from the implementation signals. There are details differences of feature in theextraction implementation methods in details different of types feature of electronic extraction tongue methods systems. in different However, type ass far of aselectronic we know, tongue these systems.methods areHowever, similar inas principle,far as we whichknow, canthese be dividedmethods into are the similar following in principle three categories:, which can Features be divided with physicalinto the meaning, following features three acquired categories: by mathematicalFeatures with transformation, physical meaning, features features in frequency acquired domain. by mathematicalIn terms of voltammetry transformation e-tongue,, features in the in early frequency stage, thedomain. features In extractionterms of voltammetry methods of samples e-tongue, were in therelatively early stage, simple, the for features example, extraction raw data methods which wereof sampl collectedes were at relatively a fix frequency simple, from for samplesexample, were raw datatreated which as features were collected [15]. Then, at peaka fix valuefrequency and inflectionfrom sampl pointes were (the maximum treated as value, features the [ minimum15]. Then,value, peak valueand two and inflection inflection values point in(the a circle maximum from samples)value, the were minimum extracted value, as features and two [ 10inflection,16]. Unfortunately, values in a circlethese methodsfrom sampl werees) low were accuracy extracted and as featuresfeatures redundancy.[10,16]. Unfortunately, Therefore, these to extract methods features were more low accuracyeffectively, and many features professional redundancy. researchers Therefore, tried to to extract compress features sampling more effectively, data by various many mathematical professional researcherstransformation tried methods. to compress For instance, sampling Pradip data by Saha various et al. usedmathematical discrete wavelet transformation transform methods. (DWT) with For instance,sliding windows Pradip Saha to extract et al. energyused discrete in different wavelet frequency transform bands (DWT) as features with sliding [17]. Andrea windows Scozzari to extract et al. energyused discrete in diffe cosinerent transformfrequency (DCT) bands to as extract features features, [17]. Andrea and selected Scozzari some et coe al.ffi usedcients discrete as eigenvalues cosine transformfor tea classification (DCT) to [ 18 extract]. In addition, features Santanu, and selected Ghoraiand some et al. coefficients transformed as the eig samplingenvalues data for into tea classificationmatrices, decomposed [18]. In addition,the matrices Santanu into singular Ghoraiand values et (SVD), al. transformed and selected the several sampling singular data values into matricesas features, decomposed for tea classification the matrices [19]. into In addition,singular values SVD and (SVD), DCT and are fusedselected for several feature singular extraction values and asthen features applied for in tea the classification prediction of [ theaflavin19]. In addition, and thearubigin SVD and DCT in Tea are [20 fused]. Although for feature these extraction mathematical and thentransformation applied in methods the prediction made some of improvements and in thearubigin features selection, in Tea they [20] still. Although need to be these set mathematical transformation methods made some improvements in features selection, they still need to be set manually in the selection of features data. Specifically, for DWT, both the number of selected layers in wavelet transform and the strategy of characteristic coefficients (mean, variance, Appl. Sci. 2019, 9, 2518 3 of 15 manually in the selection of features data. Specifically, for DWT, both the number of selected layers in wavelet transform and the strategy of characteristic coefficients (mean, variance, etc.) are determined according to experience. Moreover, the process of manually setting parameters in DCT is similar to DWT. In terms of SVD, diagonal values are selected based on experience. The way of features selection determines the limitations of these mathematical transformation methods. In addition, scholars have tried to fuse the electronic tongue with the electronic nose sensor data to enhance the tea quality prediction accuracies. For example, Nur Zawatil Isqi Zakaria et al. adopted PCA to compress electronic tongue and electronic nose samples to assess a bio-inspired flavor [21]. Wavelet energy feature (WEF) has been extracted from the responses of e-nose and e-tongue for the classification of different grades of Indian [22]. Furthermore, Ruicong Zhi et al. proposed a framework for a multi-level fusion strategy of electronic nose and electronic tongue. The time-domain based feature (mean value and max value of sensor response) and frequency-domain based feature (the energy of DWT) were fused for classification [23]. Runu Banerjee et al. combined electronic tongue data with electronic nose data and then fused DWT with Bayesian statistical analysis to evaluate the artificial flavor of black tea [24]. Mahuya Bhattacharyya Banerjee et al. proposed a cross-perception model (fused of electronic nose and electronic tongue samples by PCA and multiple linear regression) for the detection of aroma and taste of black tea [25]. In general, the manual features-based methods are difficult to adapt to changing scenes and have poor stability. The reason is that the methods adopt empirical parameters and the empirical value is usually no longer effective when the external environment changes. Recently, the impressive achievement of deep architectures on computer vision tasks such as object recognition [26,27], object detection [28] and action recognition [29] have shown the significance of convolution neural network (CNN) in the image domain. Most methods utilize the deep networks as features extraction strategy and then train the pattern recognition model. Inspired by the successful application of CNN in image processing, we proposed an auto features extraction strategy based on CNN (CNN-AFE). The key idea behind our method is to transform the time series into pictures so as to make full use of the advantages of CNN. The significant contribution of the paper is to put forward a deep learning-based auto features extraction strategy in the e-tongue system for tea classification. Furthermore, the proposed model is an approximate end-to-end features extraction and pattern recognition method, which reduces manual operation and improves efficiency. Figure2 shows the implementation process of this work. First, sensors response of the e-tongue was converted to time-frequency maps by STFT. Second, the CNN extracted features automatically with time-frequency maps as input. Finally, the features extraction and classification results were carried out under a general shallow CNN architecture. Compared with other methods, the proposed method avoided manual features selection with training of network parameters. The proposed method has advantages in two aspects. On the one hand, the remaining features extraction steps are completed in network training automatically after transforming the sampling signal into time-frequency images with STFT. On the other hand, the CNN-AFE combines features extraction and pattern recognition into a whole. In terms of traditional algorithms, features extraction and pattern recognition are two separate parts. First, they apply experience-based features extraction method to obtain features, and then they choose different classifiers, such as SVM, k-NN and RF to complete pattern recognition. However, the CNN-AFE is an approximate end-to-end features extraction and pattern recognition method, which is conducive to improve the efficiency and accuracy of pattern recognition. Comprehensively, the proposed method not only has high accuracy, but also omits the inconvenience of manual data selection, which is applicable to most data scenarios. In this study, we adopt tea database for classification, tea samples are collected by an e-tongue system we designed. The proposed method was compared with other state-of-art approaches, and showed superior performance, which is nearly 99.9% classification accuracy. We infer that the CNN-AFE is suitable for liquid classification. Appl. Sci. 2019, 9, 2518 4 of 15 Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 15

Automatic feature Electronic tongue Time-frequency Sensors response extraction classification hardware system feature map with CNN

Sensor data Short-time Fourier Transform spectrum 300 5

4.5 280 4

3.5 260 3

240 2.5 Freq (Hz) Freq Voltage/V 2 220 1.5

1 200 0.5

180 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 200 400 600 800 1000 1200 1400 1600 1800 sample point Time (secs) Figure 2. The overview processing of convolutional neural network (CNN) based auto features Figure 2. The overview processing of convolutional neural network (CNN) based auto features extraction strategy. extraction strategy. In the rest of this paper, we arrange the content as follows: Section2 provides a brief description aboutIn the this e-tongue study, we system, adopt introducestea database experiment for classification settings, tea and sample detailss are of collected the proposed by an e method.-tongue Insystem Section we3, wedesigned. demonstrate The proposed the experimental method resultswas compared for evaluations with other upon state our- ownof-art database. approaches Finally,, and weshowed summarize superior this performance,study in Section which4. is nearly 99.9% classification accuracy. We infer that the CNN-AFE is suitable for liquid classification. 2. MaterialsIn the rest and of Methods this paper, we arrange the content as follows: Section 2 provides a brief description about the e-tongue system, introduces experiment settings and details of the proposed method. In 2.1.Section Electronic 3, we Tonguedemonstrate System the experimental results for evaluations upon our own database. Finally, we summarizeElectronic tonguethis study systems in Section can be4. divided into several types according to data measurement technology,such as potentiometry [30], voltammetry [31] and so on. In addition, spectrophotometric [32] is2. anotherMaterials very and popular Methods means of measurement. Each type of electronic tongue can utilize different types of working electrodes, such as bare electrodes, modified electrodes and biosensors [33]. Typically, 2.1. Electronic Tongue System bare electrodes are gold, silver, and palladium; the modified electrodes are carbon paste processed by doubleElectronic phthalocyanine tongue system compoundss can be or divided conductive into polymers; several types biosensors according are tocarbon data biocompositemeasurement ortechnology graphene, electrodesuch as potentiometry containing enzyme [30], voltammetry and different [31 metal] and catalysts.so on. In addition, Although spectrophotometric different types of electronic[32] is another tongue very systems popular collect means sensors of measurement response in a. di Eachfferent type way, of their electronic features tongue extraction can utilize and patterndifferent recognition types of working methods electrodes, are similar. such In this as paper,bare electrodes, to verify the mod validityified electrodes of features and extraction biosensors and classification[33]. Typically, methods, bare electrodes the widely are used gold, voltammetry silver, and palladium; electronictongue the modified system electrodes is adopted are to collectcarbon sensorspaste processed response. by double phthalocyanine compounds or conductive polymers; biosensors are carbonAs the biocomposite structure shown or graphene in Figure electrode3, we designed containing a voltammetry enzyme and electronic different tongue metal hardware catalysts. systemAlthough composed different of types controller, of electronic multi-frequency tongue systems pulse voltage collectoccurrence sensors response circuit, in constant a different potential way, circuit,their features three-electrode extraction module, and pattern micro-current recognition detection methods circuit are and similar. low-pass In this filter. paper, Three-electrode to verify the modulevalidity consistsof features of working extraction electrodes and classification (WE), reference methods, electrodes the widely (RE) andused counter voltammetry electrodes electronic (CE), whichtongue is system the core is partadopted of the to electroniccollect sensors tongue response hardware. system. Different electrode materials have differentAs responsethe structure characteristics, shown in Figure the details 3, we of thedesigned electrode a voltammetry sensor array usedelectronic in the tongue proposed hardware model aresystem described composed in Table of1 controller,. Figure4 shows multi the-frequency physical pulse diagram voltage of the occurrence electronic tonguecircuit, hardwareconstant potential system wecircuit, designed. three-electrode module, micro-current detection circuit and low-pass filter. Three-electrode module consists of working electrodes (WE), reference electrodes (RE) and counter electrodes (CE), which is the core part of theTable electronic 1. The detailstongue of hardware the electrode system. sensor Different array. electrode materials have different response characteristics, the details of the electrode sensor array used in the proposed No. Electrode Material Role model are described in Table 1. Figure 4 shows the physical diagram of the electronic tongue 1 WE gold, silver, palladium Produce a chemical reaction on the electrode surface. hardware system we designed. 2 CE platinum Make measurement results more stable and reliable. 3 RE silver/silver chloride Provides a reference for the working electrode potential. Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 15

Appl. Sci. 2019, 9, 2518 5 of 15 Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 15

Figure 3. The structure of designed voltammetry electronic tongue hardware system.

Table 1. The details of the electrode sensor array.

No. Electrode Material Role 1 WE gold, silver, palladium Produce a chemical reaction on the electrode surface. 2 CE platinum Make measurement results more stable and reliable. 3 FigureRE 3. Thesilver/silver structure ofchloride designed voltammetryProvides a reference electronic for tongue the working hardware electrode system. potential. Figure 3. The structure of designed voltammetry electronic tongue hardware system.

Table 1. The details of the electrode sensor array.

No. Electrode Material Role 1 WE gold, silver, palladium Produce a chemical reaction on the electrode surface. 2 CE platinum Make measurement results more stable and reliable.

3 RE silver/silver chloride Provides a reference for the working electrode potential.

Figure 4. TThehe designed designed voltammetry voltammetry electronic tongue hardware system.system.

The micro micro-control-control unit of the electronic tongue hardware system is the STM32 controller, controller, whose functions includeinclude generatinggenerating scan scan signals, signals, micro micro current current detection, detection, noise noise filtering, filtering, signal signal sampling sampling and andsignal signal storage. storage. Specifically, Specifically, the the controller controller drives drives the the external external circuit circuit through through the the DA DA conversion conversion to togenerate generate diff differenterent voltage voltage waveforms waveforms in the in first the and first analog and analog voltage voltage waveforms waveforms are equivalent are equivalent added to addedRE through to RE the through constant the potential constant circuit potential under the circuit feedback under of theCE. Then,feedback driven of byCE. potential Then, driven of RE, the by Figure 4. The designed voltammetry electronic tongue hardware system. potentialFaraday current of RE, isthe generated Faraday betweencurrent is WE generated and CE, thebetween micro WE current and detectingCE, the micro circuit current connected detecting to the WE converts the Faraday current into a voltage signal. Next, the filter circuit is designed to eliminate circuit connectedThe micro- controlto the WE unit converts of the electronic the Faraday tongue current hardware into system a voltage is the signal. STM32 Next, controller, the filter whose circuit clutter in the converted signal. Finally, the amplified voltage signal is converted to a digital voltage is functions designed include to eliminate generating clutter scan in signals, the converted micro current signal. detection, Finally, noise the amplified filtering, signal voltage sampling signal is signal by AD module conversion and the response signals from different WE are stored. Two images in convertedand signal to storage. a digital Specifically, voltage signalthe controller by AD drives module the conversionexternal circuit and through the res theponse DA conversion signals from Figure5 represent the typical sensor data of di fferent kind of tea and electrodes, respectively. Figure5a differentto generate WE aredifferent stored. voltage Two images waveforms in Figure in the 5 representfirst and analog the typical voltage sensor waveforms data of differentare equivalent kind of shows five kinds of tea sampling from silver WE and Figure5b displays the response of tea teaadded and electrodes, to RE through respectively the constant. Figure potential 5a shows circuit five under kinds the of feedback tea sampling of CE. from Then, silver driven WE by and Figurefrompotential three 5b displays diofff RE,erent the the WE Faraday response (Gold, current Silver, of Tie guanyinis Palladium). generated tea frombetween It should three WE be different notedand CE, that WE the for (Gold,micro the same currentSilver, concentration Palladium)detecting . Itof shouldcircuit tea from connectedbe thenoted same thatto brew,the for WE the the converts same three concentration di thefferent Faraday WE current of work tea sequentially.frominto a the voltage same Insignal. brew detail,, Next,the the three the gold filter different electrode circuit WE is workappliedis designed sequentially. to collect to eliminate samples In detail, first, clutter the and gold in the the electrodesilver converted electrode is signal. applied isreplaced Finally, to collect to the get samples amplified samples first voltage subsequently,, and signal the silver is and electrodetheconverted palladium is replaced to electrode a digital to is get voltage replaced samples signal for subsequently, data by samplingAD module and finally. the conversion palladium All the andsampling electrode the res signalsponse is replaced are signals stored for from in data the samplingSDdifferent card. finally. WE are All stored. the sampling Two images signals in Figure are stored 5 represent in the the SD typical card. sensor data of different kind of tea and electrodes, respectively. Figure 5a shows five kinds of tea sampling from silver WE and Figure 5b displays the response of Tieguanyin tea from three different WE (Gold, Silver, Palladium). It should be noted that for the same concentration of tea from the same brew, the three different WE work sequentially. In detail, the gold electrode is applied to collect samples first, and the silver electrode is replaced to get samples subsequently, and the palladium electrode is replaced for data sampling finally. All the sampling signals are stored in the SD card. Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 15 Appl. Sci. 2019, 9, 2518 6 of 15 Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 15 250 250

250 250 200 200

200 200 150 150

150 Yuzhu tea 150

Response value Response /mv 100 Electrode Gold 100 Tieguanyin tea value /mvResponse YuzhuWestlake tea tea Electrode Silver

Response value Response /mv Show bud tea 100 Electrode Gold 100 Tieguanyin tea value /mvResponse Electrode Palladium tea Westlake tea 50 Electrode Silver 50 0 100 200 300 400 500 0 50 100 150 200Show250 bud tea300 350 400 450 500 MeasurementElectrode pointPalladium MeasurementBiluochun teapoint 50 50 0 100 200 300 400 500 0 50 100 150 200 (a250) 300 350 400 450 500 (b) Measurement point Measurement point Figure 5. (a) The response(a) of five kinds of tea upon silver working electrodes(b ()WE ); (b) the response of Tieguanyin tea upon different WE (Gold, Silver, Palladium). Figure 5.5. ((aa)) TheThe responseresponse ofof fivefive kinds kinds of of tea tea upon upon silver silver working working electrodes electrodes (WE); (WE ()b; )(b the) the response response of Tieguanyinof Tieguanyin tea tea upon upon diff differenterent WE WE (Gold, (Gold, Silver, Silver, Palladium). Palladium). 2.2. Experiment Settings 2.2. Experiment SettingsSettings 2.2.1. Electronic Tongue Settings 2.2.1. Electronic Tongue Settings 2.2.1. ElectronicAs mentioned Tongue above, Settings the e -tongue system consists of three working electrodes, one reference electrodeAs mentionedmentioned and one above, counter the electrode e-tonguee-tongue was system adopted consists for experiment.of threethree workingworking Electrodes electrodes,electrodes, have oneone been referencereference polished electrodeelectrodewith cloth and and and one one ground counter counter powder electrode electrode before was was adoptedfirst adopted measurement. for for experiment. experiment. In the Electrodes middle Electrodes of have any have been two been polished measurement polished with clothwithprocesses, andcloth ground andthe workingground powder powder electrode before firstbefore is measurement.placed first measurement.in distilled In the water middle In the and of middle anycleaned two of measurementwithany twoan electrochemical measurement processes, theprcleaningocesses, working method the electrode working for is1 placedminelectrode and in distilleddried is placed with water incloth. distilled and The cleaned counterwater with and electrode an cleaned electrochemical and with the anreference cleaningelectrochemical electrode method forcleaningare 1 washed min method and with dried fordistilled with1 min cloth. water and Thedried and counter dried with withcloth. electrode filter The paper.counter and the electrode reference and electrode the reference are washed electrode with distilledare washedA multi water with-frequency and distilled dried with largewater filter pulseand paper. dried voltammetry with filter (MLAPV) paper. [34] method was utilized to generate multiA multi-frequency frequencymulti-frequency and multilarge large- pulsescale pulse voltammetry scanning voltammetry signals. (MLAPV) (MLAPV) Specifically, [34 ][ method34] method we wasmake utilizedwas use utilized of to three generate to frequency generate multi frequencymultiscanning frequency signals, and multi-scale and namely multi 1, scanning- scale2 and scanning2.5 signals. Hz. For signals. Specifically, each frequency, Specifically, we make ten wepulses use make of are three usegenerated frequency of three at the frequency scanning voltage signals,scanningof 1.0, 0.8,namely signals, 0.6, 1, 0.4, namely 2 and 0.2, 2.5−1,0,2, 2 Hz. and −0.4, For2.5 − 0.6,Hz. each −For0.8, frequency, each −1.0 frequency, V. ten To pulses avoid ten arethepulses generated interference are generated at between the voltageat the signals voltage of 1.0, of 0.8,different 0.6, 0.4, frequencies 0.2, 0,2, in0.4, the 0.6,reaction0.8, process,1.0 V. Towe avoid stop the interferencescanning signal between for five signals s after of the diff erentsignal of 1.0, 0.8, 0.6, 0.4,− 0.2,− −0,2,− −0.4,− −0.6,− −0.8, −1.0 V. To avoid the interference between signals of frequenciesdifferentscanning frequencies of in each the fr reactionequency in the process, reactionis finished. we process, stop In the the wesampling scanning stop the step, signal scanning we for set five thesignal ssampling after for thefive signalrate s after at scanning1 theKHz signal to ofget eachscanningmore frequency detail of each information. is fr finished.equency Thus, In is thefinished. we sampling collect In the 10,000 step, sampling we points set thestep, during sampling we 1set Hz the ratescanning sampling at 1 KHz frequency, rate to getat 1 more KHz5000 detailtopoints get information.moreduring detail 2 Hz information. Thus,scanning we collectfrequency Thus, 10,000 we and collect points 4000 10,000 duringpoints points d 1uring Hz during scanning 2.5 Hz 1 Hzscanning frequency, scanning frequency. 5000 frequency, points during 5000 points 2 Hz scanningduringFigure 2 Hz frequency scanning6 illustrates and frequency 4000 the points scanning and during 4000 voltage points 2.5 Hz andd scanninguring the 2.5 corresponding frequency.Hz scanning typicalfrequency. response. Figure 6a displaysFigure the6 6illustrates scanningillustrates the voltage the scanning scanning of different voltage voltage and frequency, the and corresponding the and corresponding Figure typical 6b shows response.typical the response. typical Figure 6response. a Figure displays 6 Ina thedisplaysFigure scanning 6 thea, t hescanning voltage red, green of voltage diff erent and of blue frequency,different histograms frequency, and Figurerepresent and6b shows Figure the scanningthe 6b typicalshows signals the response. typical of 1, In2 response. and Figure 2.56 a,Hz,In theFigurerespectively red, 6 greena, the. andT red,he blueresponse green histograms and of different blue represent histograms scanning the scanningrepresent signals is signals thedep scanningicted of 1, by 2 and lines signals 2.5 ofHz, the of respectively. 1,same 2 and colo 2.5r as TheHz, the responserespectivelyscanning of signals di. Tffheerent responsein scanningFigure of 6 b.different signals The black is scanning depicted horizontal signals by lines line is of indep the Figureicted same by6 color represent lines as of the the the scanning same stop coloof signals scanningr as the in Figurescanningsignal6 b.for signals The five black seconds. in horizontalFigure 6b. lineThe inblack Figure horizontal6 represent line the in stopFigure of scanning6 represent signal the forstop five of seconds.scanning signal for five seconds. 270 1

0.8 260 270 0.61 250 0.80.4 260 0.6 0.2 240 250 0.4 0

0.2-0.2 240230 Scanning Scanning voltage /V -0.40 voltage/mvResponse 220 230

-0.2-0.6 Scanning Scanning voltage /V -0.4 voltage/mvResponse -0.8 220210 -0.6-1 200 -0.8 0 3 6 9 12 15 18 21 24 27 210 0 0.5 1 1.5 2 2.5 3 sample point 4 -1 Times /s x 10 200 0 3 6 9 12 15 18 21 24 27 0 0.5 1 1.5 2 2.5 3 sample point 4 Times(a )/s (b) x 10

FigureFigure 6.6. (a()a )The The scanning scanning(a) voltage voltage of different of different frequency; frequency; (b) typical (b) typicalresponse(b) response of different of discanningfferent frequency. scanningFigure 6. frequency.(a) The scanning voltage of different frequency; (b) typical response of different scanning frequency. Appl. Sci. 2019, 9, 2518 7 of 15

2.2.2. Sample Preparation Five kinds of tea picked from different provinces were applied in the study. They are Yuzhu tea (Chongqing), Show bud tea (Chongqing), Tieguanyin tea (), Biluochun tea () and Westlake tea (Hangzhou), respectively. It is worth mentioning that Biluochun tea, Westlake tea and Show bud are all . These five all contain nutrients such as tea polyphenols, , chlorophyll, caffeine, amino acids and vitamins. In comparison, Yuzhu tea contains more vitamins, and Tieguanyin tea has higher levels of tea polyphenols, catechins, and various amino acids. In terms of flavors, these teas are fragrant, refreshing and a little bitter. For each kind of tea leaf, we selected 1 g with high precision electronic scale and brewed it with 100 mL boiling water for 10 min. Then, the tea leaves were filtered through a sieve to obtain the original tea liquid. What’s more, we brewed 34 times for the same type of tea, and each type of tea was sampled 10 times. Since electrode sampling is related to many factors, there are still slight differences among the ten groups, which can be understood as the diversity of samples. It should be emphasized that the time span of these 34 brews has reached nearly three months. During the three months, we did not seal and refrigerated tea deliberately, that is to say, changes have occurred in tea during these months (such as oxidation), which guarantees the diversity of the samples. In the experiment, we obtained three kinds of tea liquid (including 100%, 50% and 25% concentrations) by mixing the original tea liquid with water. In order to ensure the reliability of the experiment, we collected 340 samples for each concentration of tea liquid and the sum of samples for each working electrode is 5100 (five kinds of tea leaves three concentrations per tea 340 samples × × per concentration). As mentioned above, we used three kinds of working electrodes (gold, silver and palladium). In other words, we got 5100 three samples totally. To speed up the convergence × of the classifier, a normalization between (0, 1) was implemented for each sample under the same working electrode.

2.2.3. Software Platform The features extraction and classification experiments of CNN-AFE are carried out on the same server with 32 G RAM, NVIDIA GeForce GTX Titan GPU (NVIDIA, Santa Clara, CA, USA.) and Linux 64-bit operating system (Canonical, Isle of Man, UK). The model is built with the Pytorch framework (Facebook, Menlo Park, CA, USA), and the programming language is Python.

2.3. Features Extraction and Classification Methods In the electronic tongue system, since each sensor response consists of a large number of voltage measurements, features extraction for each response is particularly challenging. In addition, the validity of extracted features would further affect the accuracy of pattern recognition. It is generally believed that the features hidden in the time series include the characteristics of time domain and transformation domain. Time domain features are relatively intuitive, such as the size of the response signal and the location of the mathematical features points. Features in the transform domain are relatively complex, such as the features from frequency domain or matrix decomposition. It should be noted that most of the traditional methods adopted manual features extraction which function similar to the filter. Manual extraction methods make them difficult to adapt to various scenarios and lack of stability. To put up with this bottleneck in the field of e-tongue, we proposed a novel features extraction method, which learns features through deep learning models automatically. The key idea behind our method is to transform the time series into time-frequency map by appropriate strategy so as to make full use of the advantages of CNN in images features extraction and pattern recognition. The structure of the proposed features extraction method is shown in Figure7 and the algorithm consists of two steps. The first key step is to represent complex time series with features images by the appropriate strategy. The commonly used time-frequency representations of non-stationary Appl. Sci. 2019, 9, 2518 8 of 15 signals are Wigner-Ville Distribution (WVD) [35], short-time Fourier transform (STFT) [36], and Gabor transformAppl. Sci. 2019 [37, 9]., x However, FOR PEER REVIEW considering the characteristics of the transient abruptness of the response8 of 15 signal of the e-tongue system, STFT was adopted to extract the time-frequency characteristics. The secondsignal stepof the is toe-tongue adopt CNNsystem, to STFT extract was features adopted for to learning extract the the time details-frequency hidden characteristics. in the time-frequency The mapsecond and step pattern is to recognitionadopt CNN to automatically. extract features We for innovatively learning the combinedetails hidden STFT in with the CNNtime-frequency for features extractionmap and topattern build anrecognition automatic automatically features extraction. We innovatively architecture. combine In other STFT words, with the CNN proposed for features method basedextraction on CNN to build can be an considered automatic asfeatures a combination extraction of architecture. a variety of In representative other words, or the unknown proposed but effmethodective features based on extraction CNN can methods. be considered Another as point a combination we want to of emphasize a variety is of that representative the model unifies or featuresunknown extraction but effective and classificationfeatures extraction steps methods under a. singleAnother architecture, point we want which to emphasize purpose is is to that improve the classificationmodel unifies accuracy features and extraction the robustness and classification of algorithms. steps under a single architecture, which purpose is to improve classification accuracy and the robustness of algorithms.

L Ws Ws

220

200

180 mv

/ 160

140

120 STFT Response value

100

80 convolution subsampling convolution subsampling fully connected

60 0 50 100 150 200 250 300 350 400 450 500 Time-frequency Measurement point feature map Feature extraction Classification

Figure 7. The structure of proposed features extraction method. Figure 7. The structure of proposed features extraction method. 2.3.1. Short-Time Fourier Transform (STFT) 2.3.1. Short-Time Fourier Transform (STFT) STFT was proposed based on Fourier transform, which has been widely adopted to the analysis signalsSTFT jointly was in proposed time and based frequency on Fourier [36 transform,]. As shown which in has Figure been7 widely, in order adopted to compromise to the analysis the computationalsignals jointly complexity in time and and frequency consistency [36]. betweenAs shown adjacent in Figure window 7, in order signals, to compromise we choose partial the overlapcomputational between complexityconsecutive and windows. consistency The sliding between window adjacent length window is Ws signwhileals, the we movement choose partial distance overlap between consecutive windows. The sliding window length is Ws while the movement of each window is L, and L is equal to Ws/2 typically. distance of each window is L, and L is equal to Ws/2 typically. In detail, given sample s, s is multiplied by a window function which is non-zero only for a short In detail, given sample 풔, 풔 is multiplied by a window function which is non-zero only for a time. When the window slides along the time axis, Fourier Transform of the sample is obtained, short time. When the window slides along the time axis, Fourier Transform of the sample is resulting in a two-dimensional representation of the signal. In mathematics, this is written as: obtained, resulting in a two-dimensional representation of the signal. In mathematics, this is written as: X ∞ jwn STFT(m, n) = X(m, n) = ∞ s(n)w(n m)e− (1) − ( ) ( ) n= ( ) ( ) −푗푤푛 푆푇퐹푇 푚, 푛 = 푿 푚, 푛 = ∑−∞풔 푛 풘 푛 − 푚 푒 (1) 푛=−∞ where, s(n) is the sample and w(n) is the window function. Thus, the one-dimensional sample s is 풔(푛) 풘(푛) 풔 transformedwhere, into is the a two-dimensional sample and signal is theX windowcontaining function the time. Thus, and the frequency one-dimensional characteristics. sample Since is transformed into a two-dimensional signal 푿 containing the time and frequency characteristics. X is a two-dimensional complex number, we convert X into a two-dimensional features image using Since 푿 is a two-dimensional complex number, we convert 푿 into a two-dimensional features the square of the magnitude. image using the square of the magnitude.

 2 푠푝푒푐푡푟표푔푟푎푚spectrogram {X푿((m푚,,n푛))}== |X푿(m푚,, 푛n)| (2) (2) InIn this this study, study, di ffderentifferent kinds kinds of window of window function function and and window window sizes sizes have have been been applied applied to extract to time-frequencyextract time-frequency features. features Detailed. Detailed parameter parameter settings settings and and corresponding corresponding experimental experimental results results are are presented in Section 3. presented in Section3.

2.3.2.2.3.2. Convolutional Convolutional NeuralNeural NetworkNetwork (CNN (CNN)) CNNCNN is is a a type type of of deep deep feedforward feedforward artificialartificial neural neural network network widely widely used used to to analyze analyze visual visual images. Compared with the traditional artificial neural network, the neurons of the CNN and the images. Compared with the traditional artificial neural network, the neurons of the CNN and the neurons of the next layer are not fully connected, but locally connected. Furthermore, the parameters neurons of the next layer are not fully connected, but locally connected. Furthermore, the parameters of the convolution kernel are weight shared. In general, CNN has the following advantages in image of the convolution kernel are weight shared. In general, CNN has the following advantages in processing: 1) Shared convolution kernel, fast speed when dealing with high-dimensional data; 2) Model training replaces manual features extraction and improves stability; 3) Excellent classification effect. Appl. Sci. 2019, 9, 2518 9 of 15 image processing: (1) Shared convolution kernel, fast speed when dealing with high-dimensional data; (2) Model training replaces manual features extraction and improves stability; (3) Excellent classification effect. The structure of CNN is various with different application scenarios, but several modules are similar in these models. For example, when dealing with classification problems, the typical models are AlexNet [27], GoogLeNet [38], ResNet [39], and the three models all include features extraction layer and features mapping classification layer. Features extraction layer includes a plurality of convolution layers, activation layers, and pooling layers. The purpose of convolution operation is to extract features, and activation layers enhance the nonlinearity of both the decision function and the whole neural network, while the pooling layer reduce the size of data space continuously, which can control over-fitting. In the features mapping classification layer, the loss function is applied to punish the difference between predicted and actual results. In addition, in order to improve the generalization ability of the network and prevent over-fitting, we apply the dropout layer in the structure and add regularization constraints to the loss function. As illustrated in Figure7, a multi-layer filter structure is utilized in the proposed method. The parameters of each filter are determined in the training process for features extraction. To summarize, three convolutional layers of different sizes, Rectified Linear Units (ReLU) activation function, and maximizing pooling layer are applied in the proposed model. Moreover, the fully connected layer is utilized for features mapping and classification. Detailed parameter settings of CNN and corresponding experimental results are discussed in Section3.

3. Results and Discussion To evaluate the effect and robustness of the proposed model, in Section 3.1, we discuss three parts: Time-frequency features extraction, network structure, and network parameter optimization. In Section 3.2, we compare the proposed method with the best methods we know so far. Specifically, in Section 3.1, we discuss window functions of different types and sizes when applying STFT to convert samples to time-frequency features. In addition, we discuss the structure of CNN. Then, we optimize the network parameters containing regularization, batch size, and loss function types in the network. All experiment results in the section are based on a tea database collected by an e-tongue system we designed. The code of proposed model is posted in the following link. https://github.com/Shunzhange/auto-feature-extraction-method-for-e-tongue.

3.1. Classification Performance of the Proposed Model

3.1.1. Time-Frequency Features Extraction To define the appropriate window function and window size, CNN network parameters should be set to default. Specifically, dropout is 0.5; regularization function is L2; batch size is 64; loss function is Cross entropy loss [40]. In the validation part of time-frequency features extraction, different types (Hann window, Hamming window and Gaussian window) [41] and sizes (64, 128, 256, 512) of window functions were adopted. The reason why we choose the four window functions is that they are the most widely used, and the choice of window function size is based on the relationship between the scan signal period and the sampling frequency. Figure8 shows the average classification accuracy of four window sizes in three window functions respectively in the condition that experiments iterations generations are 100 and the evaluation approach is five-fold cross-validation. In Figure8a, the best average classification accuracy achieved nearly 99.5% when window function is Hann and window size is 256. In terms of the Hamming window, the best average classification accuracy 99.8% is acquired in Figure8b when the window size is 128. As for the Gaussian window in Figure8c, the best average classification accuracy is 98% when the window size is set to 256. From the three figures, the blue and green lines are always in the lower position. It can be seen that the optimal window size should not be too large or too small. For the blue Appl. Sci. 2019, 9, 2518 10 of 15 lines, we conclude that the window size is too small to cover even half of the response of the scanning pulse, which lead to invalid features extraction. On the contrary, when the window size is 512, we infer that large amounts of redundant features were collected and the effective features are difficult to Appl.exploit. Sci. 2019 In, addition, 9, x FOR PEER large REVIEW windows may affect the performance of the short-time Fourier transform.10 of 15

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

Window size = 64 0.4 0.4 0.4 Window size = 128 Window size = 64 Window size = 64 Window size = 128

Classification accuracy window for Hann Window size = 256 Window size = 128 Classification accuracy window for Hamming 0.3 Window size = 512 0.3 Window size = 256 Classification accuracy for Gaussian window 0.3 Window size = 256 Window size = 512 Window size = 512 0.2 0.2 0.2 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 The number of generations for Hann window The number of generations for Hamming window The number of generations for Gaussian window (a) (b) (c)

FigureFigure 8. 8. ((aa)) Classification Classification accuracy accuracy for for Hann Hann window window with with different different window window size; size; ( (bb)) c classificationlassification accuracyaccuracy for for Hamming Hamming window window with with different different window window size; size; ( (cc)) c classificationlassification accuracy accuracy for for Gaussian Gaussian windowwindow with with different different window size.

3.1.2.3.1.2. Network Network Structure Structure InIn general, general, in in order order to to extract extract features features comprehensively, comprehensively, the the depth depth of of network network structure structure needs needs toto be be sufficient. sufficient. However, However, take take actual actual application application scenario scenario into into account, account, we we hold hold the the view view that that the the complexcomplex structure ofof the the network network is is unnecessary. unnecessary The. The reason reason is that is that samples samples are two-dimensional are two-dimensional which whichindicates indicates that features that features in the time-frequency in the time-frequency map are not map as arerich not as the as real rich pictures. as the real The pictures. experimental The experimentalresults confirm results our conjecture, confirm our that conjecture, applying that deep applying network deep structure network does structure not improve does performance,not improve performance,but brings more but timebring overhead.s more time Based overhead. on the Based above on considerations the above considerations and experimental and experimental verification, verification,we choose a we network choose structure a network containing structure three containing convolutional three convolutional layers. In detail, layers. for convolutional In detail, for layer 1, we adopt the kernel size 3 3 with 3 channels inputs and 32 channels outputs; in terms convolutional layer 1, we adopt the kernel× size 3 × 3 with 3 channels inputs and 32 channels outputs; of convolutional layer 2, we adopt the kernel size 3 3 with 32 channels inputs and 64 channels in terms of convolutional layer 2, we adopt the kernel× size 3 × 3 with 32 channels inputs and 64 outputs; as for convolutional layer 3, we adopt the kernel size 3 3 with 64 channels inputs and 64 channels outputs; as for convolutional layer 3, we adopt the kernel× size 3 × 3 with 64 channels inputs andchannels 64 channels outputs; outputs; moreover, moreover, the activation the activa functiontion function and pooling and pooling layer function layer function are Rectified are Rectified Linear Units (ReLU) [42] and maximizing function respectively; what’s more, the fully connected layer is N Linear Units (ReLU) [42] and maximizing function respectively; what’s more, the fully connected× layer64 (N is is N determined × 64 (N is determined by the window by the size) window to 64 for size) features to 64 for mapping. features mapping.

3.1.3.3.1.3. Network Network Parameter Parameter Optimization Optimization InIn order order to to optimize optimize the the network network parameters, parameters, we we have have conducted comparative comparative experiments experiments to to comparecompare and optimize thethe parametersparameters of of batch batch size, size, regularization regularization options options and and loss loss function. function. Based Based on onthe the experimental experimental results results of the of the time-frequency time-frequency features features extraction extraction section, section, the window the window function function is fixed is fixedto Hann, to Hann, and the and window the window size is size fixed is tofixed 256. to In 256. this In part, this iterations part, iterations generations generations are also are 100 also and 100 the andevaluation the evaluation approach approach is still five-fold is still five cross-validation.-fold cross-validation. AsAs shown in TableTable2 ,2 batch, batch sizes size ares are 16, 16, 32, 32, 64, 64 respectively,, respectively, loss loss functions functions are Crossare Cross entropy entropy loss loss(abbreviated (abbreviated as C) as [ 40C)] [ and40] and multi-class multi-class Hinge Hinge loss loss (abbreviated (abbreviated as H)as H) [43 ].[43 In]. theIn the column column where where L2 L2Regularization Regularization is located, is located, ‘Yes’ means ‘Yes’ ‘withmeans regularizations’ ‘with regularizations and ‘No’ means’ and ‘without‘No’ means regularizations’. ‘without regularizationsThe experimental’. The results experimental show that the results test accuracy show that rate ofthe method test accuracy with regularization rate of method term is with over regularization99.5% regardless term of is the over size 99.5% of the regardless batch size of and the the size loss of function.the batch Itsize is worthand the mentioning loss function that. It the is worthaccuracy mentioning is 99.92% that when the theaccuracy batch is size 99.92% is 32 when and the the loss batch function size is is32 Cross and the entropy loss function loss. Therefore, is Cross entropywe believe loss that. Therefore, the regularization we believe item that is necessary. the regularization In terms of item Loss is function, necessary. the Incross terms entropy of Loss loss function,performs the better cross than entropy the multi-class loss performs Hinge better loss asthan a whole. the multi-class Hinge loss as a whole. AfterAfter the the parameter optimization, thethe trainedtrained modelmodel parameters parameters can can be be saved. saved. When When the the model model is islater later applied applied to ato new a new set ofset data, of data, the modelthe model can predictcan predict the tea the classification tea classification with original with original tea sample tea sampleas input. as input.

Appl. Sci. 2019, 9, 2518 11 of 15

Table 2. Testing accuracy of different parameters selections in CNN.

Batch Size Loss Function L2 Regularization Testing Accuracy Yes 99.91% C No 97.15% 16 Yes 99.83% H No 97.05% Yes 99.92% C No 96.90% 32 Yes 99.70% H No 96.50% Yes 99.80% C No 97.82% 64 Yes 99.90% H No 98.35%

3.2. Comparison with Other Techniques In this part, we compared the proposed method with other state-of-art techniques. In traditional methods, features extraction and classification recognition are usually divided into two separate parts, however, CNN-AFE is an approximate end-to-end model which integrates features extraction and classification recognition into a same architecture. We introduce the comparison methods as follows. As for features extraction, techniques such as features of raw response [15], peak-inflection point [10,16], Discrete wavelet transform (DWT) [17], Discrete cosine transform (DCT) [18], singular value decomposition (SVD) [19] and DCT fused with SVD (DCT + SVD) [20] were utilized as comparison methods. In terms of pattern recognition, several classifiers such as support vector machine (SVM) [9,12], the k-nearest neighbor (k-NN) [13], and random forest (RF) [14] were adopted to compare with the method we proposed. After parameter optimization, we show the best classification accuracy of the comparison algorithm in Table3, the experimental results (average accuracy standard deviation) are ± based on the five-fold cross-validation. The SVD features extraction method performs better (about 98.8%) and the raw response features extraction performs worse (about 80%) relatively.

Table 3. Classification accuracy of other techniques.

Methods Raw Response Peak-Inflection Point DWT SVD DCT DCT + SVD Classifier 80.87% 95.57% 97.80% 98.83% 94.66% 98.96% SVM 0.0028 0.0034 0.0265 0.0004 0.0340 0.0127 ± ± ± ± ± ± 80.00% 95.65% 97.37% 98.82% 94.82% 98.85% KNN 0.0320 0.0072 0.0040 0.0012 0.0085 0.0047 ± ± ± ± ± ± 81.16% 95.87% 94.77% 98.81% 94.77% 98.87% RF 0.0376 0.0056 0.0056 0.0008 0.0192 0.0002 ± ± ± ± ± ±

It is noteworthy that the test accuracy rate of CNN-AFE with L2 regularization term is over 99.5% regardless of the size of the batch and the loss function, and the highest average recognition rate achieved 99.9%, which demonstrates that CNN-AFE provides a more effective solution to the tea classification problem. Further, we believe that the proposed method would make more prominent contributions in the field of liquid classification and identification. Considering all the features obtained are high-dimensional whether the proposed method or traditional methods. Therefore, we applied principal component analysis to compress the features and observe the features distribution image. In detail, as shown in Figure9, red diamonds, blue tones, blue-green squares, black circles and green plus represent Yuzhu tea, Show bud tea, Biluochun tea, Westlake tea, and Tieguanyin tea, respectively. Figure9a–f show the two-dimensional features distribution for features of raw response, peak-inflection point, Discrete wavelet transform (DWT), Appl. Sci. 2019, 9, 2518 12 of 15 Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 15

Discretewavelet cosine transform transform (DWT), (DCT), Discrete singular cosine value transform decomposition (DCT), singular (SVD) and value CNN-AFE, decomposition respectively. (SVD) Comparedand CNN- withAFE, other respectively. methods Compared in terms of with visual other effects, methods in the PCAin terms diagram of visual of the effects, proposed in the method, PCA didiagramfferent typesof the ofproposed tea are moremethod, dispersed different and types the sameof tea typeare more of tea dispersed is more compactlyand the same polymerized. type of tea Thus,is more brings compactly the improvement polymerized. of classificationThus, brings the accuracy. improvement of classification accuracy.

0.2 0.2

0.15 0.1

0.1 0 0.05

-0.1 0

-0.2 -0.05

-0.1 PC2 PC2 (27.66%) PC2 PC2 (16.34%) -0.3 Yuzhu tea -0.15 Yuzhu tea -0.4 Show bud tea Show bud tea Biluochun tea -0.2 Biluochun tea Westlake tea Westlake tea -0.5 Tieguanyin tea -0.25 Tieguanyin tea

-0.6 -0.3 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 PC1 (69.60%) PC1 (54.20%) (a) (b)

0.2 0.1 Yuzhu tea 0.15 Show bud tea 0 Biluochun tea Westlake tea 0.1 Tieguanyin tea -0.1

0.05 -0.2 0

-0.3 PC2 PC2 (34.99%) -0.05 PC2 (7.87%) Yuzhu tea Show bud tea -0.4 -0.1 Biluochun tea Westlake tea Tieguanyin tea -0.15 -0.5

-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 -0.6 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 PC1 (50.56%) PC1 (84.48%) (c) (d)

0.6 Yuzhu tea 0.5 Show bud tea 30 Biluochun tea 0.4 Westlake tea Tieguanyin tea 20 0.3

0.2 10

0.1

0

PC2 PC2 (24.02%) PC2 PC2 (17.96%) 0 -10 Yuzhu tea -0.1 Show bud tea Biluochun tea -0.2 -20 Westlake tea Tieguanyin tea -0.3

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -40 -20 0 20 40 60 80 PC1 (71.18%) PC1 (75.93%) (e) (f)

FigureFigure 9. 9. FeaturesFeatures showshow forfor proposedproposed methodmethod andand comparedcompared methods.methods. ((aa)) Two-dimensionalTwo-dimensional (2D)(2D) principalprincipal component component analysis analysis (PCA) (PCA features) features for raw for response; raw response;(b) 2D PCA (b features) 2D PCAfor peak-inflection features for point;peak- (inflectionc) 2D PCA point; features (c) for 2D discrete PCA wavelet features transform for discrete (DWT); wavelet (d) 2D transform PCA features (DWT for) singular; (d) 2D value PCA decompositionfeatures for singular (SVD); value (e) 2D decomposition PCA features for (SVD discrete); (e) cosine2D PCA transform features (DCT); for discrete (f) 2D PCAcosine features transform for proposed(DCT); (f convolutional) 2D PCA features neural network-based for proposed autoconvolutional features extraction neural (CNN-AFE) network-based method. auto features extraction (CNN-AFE) method. 4. Conclusions 4. ConclusionsIn this study, a CNN-based auto features extraction strategy in the e-tongue system for tea classificationIn this study, was proposed. a CNN-based The proposed auto features CNN extraction structure integrates strategy in automatic the e-tongue features system extraction for tea classification was proposed. The proposed CNN structure integrates automatic features extraction and classification into a unified, which makes the classification effect more robust. 99.9% Appl. Sci. 2019, 9, 2518 13 of 15 and classification into a unified, which makes the classification effect more robust. 99.9% classification accuracy was obtained based on tea database collected by an e-tongue system we designed. In conclusion, compared to these reference techniques, the proposed model in the paper has several advantages: (1) 99.9% classification accuracy; (2) easy features extraction method instead of manual features selection; (3) approximate end-to-end features extraction and classification structure makes the system more robust and accurate, which will make contributions to the e-tongue for more extensive applications in the future.

Author Contributions: Y.h.Z. provided the direction and thought of the algorithm. S.Z. designed the experiments and wrote the manuscript. J.Z. and R.H. were involved in the algorithmic simulation of traditional methods, as well as in the literature review. All authors contributed to the interpretation and discussion of experimental results. Funding: Financial support for this work was provided by National Natural Science Foundation of China [grant number 61501069 and grant number 11873001], the Fundamental Research Funds for the central Universities [Grant number 106112016CDJXZ168815]. Conflicts of Interest: There are no conflicts to declare.

References

1. Zuo, Y.G.; Chen, H.; Deng, Y.W. Simultaneous determination of catechins, caffeine and gallic acids in green, , black and pu-erh teas using HPLC with a photodiode array detector. Talanta 2002, 57, 307–316. [CrossRef] 2. Abozeid, A.; Liu, J.; Liu, Y.; Wang, H.; Tang, Z. Gas chromatography mass spectrometry-based metabolite profiling of two sweet-clover vetches via multivariate data analyses. Bot. Lett. 2017, 164, 385–391. [CrossRef] 3. Herrador, M.A.; Gonzalez, A.G. Pattern recognition procedures for differentiation of Green, Black and Oolong teas according to their metal content from inductively coupled plasma atomic emission spectrometry. Talanta 2001, 53, 1249–1257. [CrossRef] 4. Saha, P.; Ghorai, S.; Tudu, B.; Bandyopadhyay, R.; Bhattacharyya, N. Tea Quality Prediction by Autoregressive Modeling of Electronic Tongue Signals. IEEE Sens. J. 2016, 16, 4470–4477. [CrossRef] 5. Ingleby, P.; Gardner, J.W.; Bartlett, P.N. Effect of micro-electrode geometry on response of thin-film poly(pyrrole) and poly(aniline) chemoresistive sensors. Sens. Actuators B Chem. 1999, 57, 17–27. [CrossRef] 6. Ulmer, H.; Mitrovics, J.; Weimar, U.; Gopel, W. Sensor arrays with only one or several transducer principles? The advantage of hybrid modular systems. Sens. Actuators B Chem. 2000, 65, 79–81. [CrossRef] 7. Vlasov, Y.G.; Legin, A.V.; Rudnitskaya, A.M.; D’Amico, A.; Di Natale, C. “Electronic tongue”—New analytical tool for liquid analysis on the basis of non-specific sensors and methods of pattern recognition. Sens. Actuators B Chem. 2000, 65, 235–236. [CrossRef] 8. Winquist, F.; Holmin, S.; Krantz-Rulcker, C.; Wide, P.; Lundstrom, I. A hybrid electronic tongue. Anal. Chim. Acta 2000, 406, 147–157. [CrossRef] 9. Smyth, H.; Cozzolino, D. Instrumental Methods (Spectroscopy, Electronic Nose, and Tongue) As Tools to Predict Taste and Aroma in Beverages: Advantages and Limitations. Chem. Rev. 2013, 113, 1429–1440. [CrossRef] 10. Ghosh, A.; Bag, A.K.; Sharma, P.; Tudu, B.; Sabhapondit, S.; Baruah, B.D.; Tamuly, P.; Bhattacharyya, N.; Bandyopadhyay, R. Monitoring the Fermentation Process and Detection of Optimum Fermentation Time of Black Tea Using an Electronic Tongue. IEEE Sens. J. 2015, 15, 6255–6262. [CrossRef] 11. Ciosek, P.; Brzozka, Z.; Wroblewski, W.; Martinelli, E.; Di Natale, C.; D’Amico, A. Direct and two-stage data analysis procedures based on PCA, PLS-DA and ANN for ISE-based electronic tongue—Effect of supervised features extraction. Talanta 2005, 67, 590–596. [CrossRef][PubMed] 12. Kundu, P.K.; Kundu, M. Classification of tea samples using SVM as machine learning component of E-tongue. In Proceedings of the International Conference on Intelligent Control Power and Instrumentation (ICICPI), Kolkata, India, 21–23 October 2016; pp. 56–60. 13. Lu, L.; Deng, S.; Zhu, Z.; Tian, S. Classification of Rice by Combining Electronic Tongue and Nose. Food Anal. Method. 2015, 8, 1893–1902. [CrossRef] Appl. Sci. 2019, 9, 2518 14 of 15

14. Liu, M.; Wang, M.; Wang, J.; Li, D. Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar. Sens. Actuators B Chem. 2013, 177, 970–980. [CrossRef] 15. Eriksson, M.; Lindgren, D.; Bjorklund, R.; Winquist, F.; Sundgren, H.; Lundstrom, I. Drinking water monitoring with voltammetric sensors. Procedia Engineering. 2011, 25, 1165–1168. [CrossRef] 16. Wei, Z.; Wang, J. The evaluation of sugar content and firmness of non-climacteric pears based on voltammetric electronic tongue. J. Food Eng. 2013, 117, 158–164. [CrossRef] 17. Saha, P.; Ghorai, S.; Tudu, B.; Bandyopadhyay, R.; Bhattacharyya, N. A Novel Technique of Black Tea Quality Prediction Using Electronic Tongue Signals. IEEE T. Instrum. Meas. 2014, 63, 2472–2479. [CrossRef] 18. Saha, P.; Ghorai, S.; Tudu, B.; Bandyopadhyay, R.; Bhattacharyya, N. Sliding Window-based DCT Featuress for Tea Quality Prediction Using Electronic Tongue. In Proceedings of the 2nd International Conference on Perception and Machine Intelligence, Kolkata, India, 26–27 February 2015; Kundu, M.K., Mazumdar, D., Chaudhury, S., Pal, S.K., Mitra, S., Eds.; ACM: New York, NY, USA, 2015. 19. Saha, P.; Ghorai, S.; Tudu, B.; Bandyopadhyay, R.; Bhattacharyya, N. SVD Based Tea Quality Prediction Using Electronic Tongue Signal. In Proceedings of the International Conference on Intelligent Control Power and Instrumentation (ICICPI), Kolkata, India, 21–23 October 2016; pp. 79–83. 20. Saha, P.; Ghorai, S.; Tudu, B.; Bandyopadhyay, R.; Bhattacharyya, N. Feature Fusion for Prediction of Theaflavin and Thearubigin in Tea Using Electronic Tongue. IEEE T. Instrum. Meas. 2017, 66, 1703–1710. [CrossRef] 21. Zakaria, N.Z.I.; Masnan, M.J.; Zakaria, A.; Shakaff, A.Y.M. A Bio-Inspired Herbal Tea Flavour Assessment Technique. Sensors 2014, 14, 12233–12255. [CrossRef] 22. Banerjee, M.B.; Roy, R.B.; Tudu, B.; Bandyopadhyay, R.; Bhattacharyya, N. Black tea classification employing feature fusion of E-Nose and E-Tongue responses. J. Food Eng. 2019, 244, 55–63. [CrossRef] 23. Zhi, R.; Zhao, L.; Zhang, D. A Framework for the Multi-Level Fusion of Electronic Nose and Electronic Tongue for Tea Quality Assessment. Sensors 2017, 17.[CrossRef] 24. Banerjee, R.R.; Chattopadhyay, P.; Tudu, B.; Bhattacharyya, N.; Bandyopadhyay, R. Artificial flavor perception of black tea using fusion of electronic nose and tongue response: A Bayesian statistical approach. J. Food Eng. 2014, 142, 87–93. [CrossRef] 25. Banerjee, M.B.; Roy, R.B.; Tudu, B. Cross-Perception Fusion Model of Electronic Nose and Electronic Tongue for Black Tea Classification. In Proceedings of the Computational Intelligence, Communications, and Business Analytics: First International Conference CICBA, Kolkata, India, 24–25 March 2017; pp. 407–415. 26. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer Nature: Cham, Switzerland, 2016; Volume 9908, pp. 630–645. 27. Krizhevsky,A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. Acm. 2017, 60, 84–90. [CrossRef] 28. Feichtenhofer, C.; Pinz, A.; Zisserman, A. Convolutional Two-Stream Network Fusion for Video Action Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 26 June–1 July 2016; pp. 1933–1941. 29. Herath, S.; Harandi, M.; Porikli, F. Going deeper into action recognition: A survey. Image Vision Comput. 2017, 60, 4–21. [CrossRef] 30. Legin, A.V.; Vlasov, Y.G.; Rudnitskaya, A.M.; Bychkov, E.A. Cross-sensitivity of chalcogenide glass sensors in solutions of heavy metal ions. Sens. Actuators B Chem. 1996, 34, 456–461. [CrossRef] 31. Barroso De Morais, T.C.; Rodrigues, D.R.; de Carvalho Polari Souto, U.T.; Lemos, S.G. A simple voltammetric electronic tongue for the analysis of coffee adulterations. Food Chem. 2019, 273, 31–38. [CrossRef][PubMed] 32. Buratti, S.; Ballabio, D.; Benedetti, S.; Cosio, M.S. Prediction of Italian red wine sensorial descriptors from electronic nose, electronic tongue and spectrophotometric measurements by means of Genetic Algorithm regression models. Food Chem. 2007, 100, 211–218. [CrossRef] 33. Wei, Z.; Yang, Y.; Wang, J.; Zhang, W.; Ren, Q. The measurement principles, working parameters and configurations of voltammetric electronic tongues and its applications for foodstuff analysis. J. Food Eng. 2018, 217, 75–92. [CrossRef] 34. Zhong, H.J.; Deng, S.P.; Tian, S.Y.Multifrequency large amplitude pulse voltammetry: A novel electrochemical method for electronic tongue. Sens. Actuators B Chem. 2007, 123, 1049–1056. Appl. Sci. 2019, 9, 2518 15 of 15

35. Pachori, R.B.; Nishad, A. Cross-terms reduction in the Wigner-Ville distribution using tunable-Q wavelet transform. Signal. Process. 2016, 120, 288–304. [CrossRef] 36. Auger, F.; Chassande-Mottin, E.; Flandrin, P. On Phase-Magnitude Relationships in the Short-Time Fourier Transform. IEEE Signal. Proc. Lett. 2012, 19, 267–270. [CrossRef] 37. Sejdic, E.; Djurovic, I.; Jiang, J. Time-frequency features representation using energy concentration: An overview of recent advances. Digit. Signal. Process. 2009, 19, 153–183. [CrossRef] 38. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. 39. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. 40. De Boer, P.T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [CrossRef] 41. Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Pearson Higher Education: London, UK, 2010; pp. 493–582. 42. Hahnloser, R.; Sarpeshkar, R.; Mahowald, M.A.; Douglas, R.J.; Seung, S. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 2000, 408, 1012–1024. [CrossRef] 43. Dogan, U.; Glasmachers, T.; Igel, C. A Unified View on Multi-class Support Vector Classification. J. Mach. Learn. Res. 2016, 17.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).