DISCRETE COSINE TRANSFORM BASED CAUSAL CONVOLUTIONAL NEURAL NETWORK FOR DRIFT COMPENSATION IN CHEMICAL SENSORS
Diaa Badawi∗ Agamyrat Agambayev† Sule Ozev† A. Enis C¸etin∗
∗ Department of Electrical & Computer Engineering, University of Illinois at Chicago, Chicago, IL † School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ
ABSTRACT identify the gas analyte identity. Others have employed dif- ferent methods to counteract the sensor drift for the task of Sensor drift is a major problem in chemical sensors that re- analyte identification, such as domain adaptation [5,6], semi- quires addressing for reliable and accurate detection of chem- supervised learning [7], and deep stacked autoencoders [8]. ical analytes. In this paper, we develop a causal convolu- In the aforementioned papers, the features extracted from the tional neural network (CNN) with a Discrete Cosine Trans- sensors become unreliable as the sensor ages. On the other form (DCT) layer to estimate the drift signal. In the DCT hand, other approaches try to estimate the drift directly from module, we apply soft-thresholding nonlinearity in the trans- the time-series measurements of the sensors using iterative form domain to denoise the data and obtain a sparse represen- interpolation techniques [9], or by using Independent Com- tation of the drift signal. The soft-threshold values are learned ponent Analysis [10]. After the drift signal is estimated, it during training. Our results show that DCT layer-based CNNs can be subtracted from the sensor measurements to determine are able to produce a slowly varying baseline drift signal. We the actual sensor response. train the CNN on synthetic data and test it on real chemical The wide success of deep convolutional neural networks sensor data. Our results show that we can have an accurate (CNNs) has been demonstrated by their ability to achieve and smooth drift estimate even when the observed sensor sig- state-of-the-art performance in many time-series recogni- nal is very noisy. tion tasks. CNNs have been used in analyte classification Index Terms— chemical sensor drift, chemical sensor, problems [4, 11]. One particular variant of convolutional time series analysis, discrete cosine transform, convolutional neural networks, the temporal convolutional neural network neural networks (TCNN), has been shown to outperform recurrent neural net- works on many benchmark data sets [12, 13]. This motivates 1 Introduction us to employ a TCNN-based framework to address the prob- Chemical sensors have been used for detecting and identify- lem of drift correction in chemical sensor data. Given the ing chemical analytes in a wide array of industrial and safety fact that sensor drift is a slowly varying signal, we propose applications [1]. Despite the fact that chemical sensor tech- utilizing the Discrete Cosine Transform (DCT) to extract nology provides practical solutions, sensor responses may de- a baseline drift signal from the observed sensor signal. In grade with time, resulting in inconsistent results.This phe- particular, we propose a new type of layer to be integrated nomenon is known as sensor drift. Sensor drift arises be- into the conventional TCNN, which we call DCT-based sub- cause of internal factors such as sensor poisoning and ag- network. In the DCT-based sub-network, we compute the ing, and/or external factors such as humidity and temperature DCT over sliding short-time windows of the temporal feature changes [2]. The chemical sensory system becomes unreli- maps. We then apply the soft-thresholding nonlinearity in the able over time, if the sensor drift signal is not properly esti- transform domain and compute the inverse DCT to transform mated. the features back to time domain. In the transform domain, There has been extensive work to address the drift prob- we use soft-thresholding that is parametrized by threshold lem using signal processing and machine learning. One com- variables, which are learned during training by the stan- mon approach is to extract features carefully from the time dard backpropagation algorithm. Our results show that the series sensor measurements and optimize a machine learning the nonlinear denoising and smoothing in transform domain algorithm to recognize patterns from these features [3, 4]. In using the DCT-based structure removes the high-frequency [3], discriminative features are extracted from the responses features, i.e., induces smoothness in the features and, sub- of the sensors and then fed into support vector machines to sequently, the constructed output baseline drift signal. Any significant deviation from the sensor baseline signal indicates This work is being supported in part by NSF grants 1739396 (UIC) and 1739451 (ASU). The authors, Badawi and Cetin, additionally thank NVIDIA the existence of chemical vapors. ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 978-1-7281-7605-5/20/$31.00 ©2021 DOI: 10.1109/ICASSP39728.2021.9414512 for an equipment grant. The organization of this paper is as follows: First we ex-
978-1-7281-7605-5/21/$31.00 ©2021 IEEE 8012 ICASSP 2021
Authorized licensed use limited to: University of Illinois at Chicago Library. Downloaded on June 02,2021 at 22:23:21 UTC from IEEE Xplore. Restrictions apply. plain the drift problem, TCNN-framework and the DCT-based 1.000 0.995 structure for drift estimation in Section 2. In Section 3, we 0.990
present experimental results. Finally, in Section 4 we con- Response 0.985 clude our work. 0.980 0 200 400 600 800 1000 2 Causal DCT-based Real-Time Sen- Time (mins) sor Drift Estimation Framework Fig. 1: Overall sensor measurement signal (blue) and the un- derlying drift signal (red) The drift response of ChemFET sensors is modeled as a linear is used. The PG method is an iterative method involving se- combination of exponential decays [14–18]. In our ChemFET quential forward and inverse Fourier Transform computations sensors we can approximate the drift waveform using the fol- over blocks of sensor data. It requires some time-domain in- lowing model [19, 20] formation about the drift signal and it is assumed that the sen- t t d(t) = (R0 + R) + Rf exp − + Rs exp − sor is not exposed to gas vapor initially. To interpolate the τf + f τs + s “missing” drift portions where p(t) = 0, the authors also (1) assumed a low-pass bandwidth (BW)6 for the drift signal in where d(0) = R0 + R + Rf + Rs is the sensor conversion the Fourier domain [9]. In contrast, we do not assume any factor at the initial time after deployment and reset, Rf , Rs prior bandwidth information. The neural network automati- are fast and slow drift coefficients, τf , τs are fast and slow cally imposes the sparsity constraint on the drift signal in the drift time constants, and R, f , s are the corresponding er- transform domain during training by learning DCT domain ror terms for each of the model variables. It is not possible soft thresholds. Furthermore, we do not use future samples to know the values of the time constants and other parameters during the DCT computation to estimate the drift signal d(t). in Eq. 1 but this model enables us to generate a random pop- We use only the past and current samples of y(t) to estimate ulation of sensors with individual sensitivities and individual the drift signal d(t). As a result, we compute a real-time esti- time constants that can differ significantly. mate of d(t)1. Let a sensor time measurement be y(t) at time t. Let d(t) be the drift signal at time t. Let p(t) be the desired sensor re- 2.1 TCNN with DCT layer sponse without the drift component (drift-corrected response We developed a Temporal Convolutional Neural Network in absence of noise). The observed sensor measurement is (TCNN) with a DCT layer to extract the drift signal d[n] y(t) = d(t) + p(t) + v(t), where v(t) is the noise signal. from the observed sensor signal y[n] 2. In TCNN, the convo- Given y(t), we are interested in estimating the drift signal lutional layers implement causal convolution. Therefore, in d(t) so that we can estimate p(t). this approach, given a time series y[t] for t = 1, 2, ... n , When a sensor is exposed to a gas vapour, it starts absorb- { } one can determine a drift estimate dˆ[n] and, subsequently, ing the vapour and its response changes accordingly. Once the pˆ[n] = y[n] d[n]. TCNN implements the so-called dilated sensor is no longer exposed to the gas vapour, it starts exuding convolution− in their convolutional layers, which can be ex- the vapour it had absorbed earlier. Absorbing and exuding the K−1 pressed as y [n] := P h[k]x[n rk], where h is the one vapour depends on the sensor material, the analyte type and r k=0 dimensional temporal filter of length−K, and r is the dilation the environment. The ideal sensor response in the absence of rate. drift can be approximated analytically as follows [21] TCNN is made up of successive convolutional blocks. 0 t Ts Furthermore, residual connections are also used between suc- ≤ −1 t−Ts p(t) = βτ tan Ts t Ts + ∆T cessive blocks of convolutional layers. Our TCN design goes τ ≤ ≤ −1 t−Ts −1 t−Ts−∆T as follows: We first apply an ordinary convolutional layer βτ tan τ tan τ t Ts + ∆T − ≥ (2) with filters of length 5. Afterwards, we apply convolutional where Ts is the starting time of the exposure, and ∆T is blocks with dilated convolution and residual connections. the exposure duration. When the sensors are not exposed to Each of the convolutional blocks comprises two dilated con- any chemical analytes, the response will be simply y(t) = volutional layers with a spatial dropout layer in between. d(t)+v(t). Therefore, we can consider estimating the drift as The second layer in each block has residual connections with a baseline (background) estimation problem. An example is the output of the previous block. The dilation rate increases shown in Fig 1 in which the sensor is exposed to Volatile Or- by 2 from r = 2 in the first dilated convolutional block to ganic Compounds (VOC) three times (the blue curve). Since r = 27 = 128 in the last dilated convolutional block. The the drift waveform is decaying, it is not possible to set a number of feature maps is 64 throughout all the layers. The threshold without estimating the drift waveform to detect the spatial dropout rate is set to 0.2. Afterwards, we feed the 64 gas. 1We could have used DFT instead of DCT but we preferred the DCT over In [9], the drift estimation is posed as a missing data in- DFT because the DCT is a real-valued transform. terpolation problem and the Papoulis-Gerchberg (PG) method 2We use the discrete-time notation from now on
8013
Authorized licensed use limited to: University of Illinois at Chicago Library. Downloaded on June 02,2021 at 22:23:21 UTC from IEEE Xplore. Restrictions apply. 1 1 Convolution Output dˆ[n] is applied element-wise to the coefficients Fn,i(k). Each fea- × ture map fi will have an associated thresholding parameter bi.
ˆ ˆ fi[n ] fi[n + 1] This proposed soft-thresholding also induces sparsity in the DCT-regularized ∗ ∗ feature spectral representation. After thresholding we transform back ˆ fi to the time (feature) domain using the inverse discrete cosine IDCT IDCT at n = n at n = n + 1 ∗ ∗ transform (IDCT) and realize a smoothed feature point fˆi[n]...... σ(.) σ(.) ...... Notice that, for each feature map, we only need to evaluate N-point DCT N-point DCT IDCT for one single point at time n, that is the smoothed ver- intermediate feature sion of the original feature value at time n. The DCT based DCT-based block fi TCNN structure is shown in Fig. 2. fi[n N + 1] fi[n ] fi[n + 1] ∗ − ∗ ∗
Input y[n] Temporal CNN 3 Experimental Results In our experiments we used Electronic-nose (E-nose) mea- Fig. 2: Block diagram of the DCT-based NN structure. The surements from the data collected by the JPL [22] using 32 non-linearity represents to the soft-thresholding function. σ(.) different carbon-polymer sensors for air quality monitoring. feature maps of the last layer of TCNN module into the DCT We only considered recordings corresponding to single-gas layer, in which we carry out the transform domain processing (methanol) exposure experiments as in [9]. Some recording as explained in Sec. 2.2. The DCT window size is set to 64. examples are shown in Fig. 3. We also collected our own data Finally, we apply a 1 1 convolution without any nonlinearity using the commercially available MQ-137 ammonia sensor to estimate the drift signal× d[n]. The DCT domain processing which has detection range of 5-500 ppm. Three typical sen- is described in Subsection 2.2 in detail. sor recordings are shown at the bottom three rows of Fig. 3. The whole framework is trained to minimize the following We also generated synthetic data for training and validating cost function our network model. The real E-nose data was then used as our test set. All the data are resized to a length of 512. We N−1 N−1 X 2 X synthesized 10,000 recordings that resemble idealized sensor := d[n] dˆ[n] + λ dˆ[n] dˆ[n 1] , (3) responses for different analyte exposure profiles with slowly J n=0 − n=1 | − − | varying drift to train the CNN. We used Eq. 2 to synthesize where the first term in the cost criterion is the reconstruction sensor responses and we randomized the starting instances square errors. The second term is the Total Variation (TV) of gas vapors, the exposure period, the parameters β and τ, regularization term. and the number of exposure sessions in order to create as di- verse a data set as possible. We synthesized the drift signal 2.2 DCT-based Layers simply by sampling from a Gaussian process with covariance 2 t1−t2 Discrete cosine transform has been widely used in image E(x(t1) µ)(x(t2) µ) = exp where α is cho- − − − α compression and speech coding. In this work, we propose sen to be sufficiently large so that the realizations of the drift incorporating DCT in the TCNN framework to construct signal are slowly varying with time. We normalized both the smooth drift estimates from the observed signal y[n]. In par- synthetic and real data using global mean and standard devia- ticular, let fi[n] be the i-th feature arising from the TCNN at tion. We trained our network using the synthetic training data time instance n. The DCT coefficients over a sliding windows for 80 epochs3. We set the TV parameter λ in Eq. 3 to 0.1. of size N are given by We implemented the Papoulis-Gerchberg (PG) algorithm over the test data set as in [9] to compare it with our approach. We N−1 assumed that there is no gas vapor exposure to the sensor ini- X i πk 1 Fn,i(k) := f [n l]cos (N l + ) , (4) − N − 2 tially and at the end of the recording. We extrapolated the re- l=0 maining part of the drift signal using the PG algorithm. Drift signal estimation results are shown in Fig 3. We estimated where k 0, 1, ..., N 1 is the DCT index, and F n,i the drift (ground-truth signal) manually by identifying some is the transform∈ { of the feature− } segment f [t] for t n i points in which we only have drift signal and perform piece- N + 1, ..., n . After the DCT computation we apply∈ { sign-− preserving soft-thresholding} nonlinearity in the transform wise interpolation (curve-fitting). The proposed TCNN-DCT domain. For this purpose, we use the soft thresholding framework is superior to the PG algorithm in terms of cosine similarity and the MSE as shown in Tables 1 and 2, and Fig. function defined for a scalar input x as SoftTh(x) := 3. In Table 1, we have a range of bandwidth values for the sgn(x)max(0, x b). The threshold b 0 is analogous to the additive bias| | term − in regular time-domain≥ convolutional PG algorithm to get the best possible result and in Table 2 we layers. The parameter b is learned during training by standard 3We used mini batches of size 32. We used Adam Optimizer with a learn- −3 backpropagation. The soft-thresholding function SoftTh(.) ing rate equal to 10 , β1 = 0.9, and β2 = 0.99.
8014
Authorized licensed use limited to: University of Illinois at Chicago Library. Downloaded on June 02,2021 at 22:23:21 UTC from IEEE Xplore. Restrictions apply. Bandwidth used in the PG Algorithm ( 2π ) Metric 4096 TCNN-DCT 5 6 7 8 9 10 11 12 ×13 14 15 16 MSE (avg.) 1.47 0.91 0.79 1.23 1.34 1.16 0.99 0.86 1.03 1.25 1.54 1.70 0.05 MSE (median) 0.28 0.10 0.23 0.22 0.20 0.16 0.16 0.17 0.27 0.33 0.37 0.55 0.02 Cos sim (avg.) 0.81 0.87 0.89 0.77 0.74 0.83 0.87 0.89 0.85 0.82 0.79 0.72 0.97 Cos sim (median) 0.94 0.95 0.92 0.87 0.89 0.92 0.93 0.93 0.88 0.85 0.80 0.74 0.99
Table 1: Comparison of TCNN-DCT framework vs the PG algorithm over the 5,000 examples in the validation data set. We implemented the PG algorithm with different bandwidth values. We report the results for two metrics: MSE (mean square error between the estimate drift and the true drift) and Cos sim (cosine similarity between the estimated drift and the true drift).
Sensor PG Algorithm TCNN-DCT