Detection and Reconstruction of Random Telegraph Signals Using Machine Learning

Detection and Reconstruction of Random Telegraph Signals Using Machine Learning Ben Hendrickson, Ralf Widenhorn, Paul R. DeStefano and Erik Bodegom Abstract - Random Telegraph Signal (RTS) noise is B. Math of RTS noise and Reconstruction Goals classified as stochastic discrete jumps in what otherwise RTS is defined by stochastic transitions between one of would be a constant electrical signal. This phenomenon two states, defined here as state 0 and state 1, the low and has been known and studied in semiconductor devices high states respectively. Represented mathematically, the for decades. More recently it has been shown that a new state 푠 at some given time 푡 is either 푠(푡) = 0 or 푠(푡) = 1. It flavor of RTS occurs in the photosensitive area of silicon has been suggested that there may be RTS centers with more image sensors. Here, a method of RTS signal detection than two discrete levels [11], but we will not address them and reconstruction is presented. This method is built on here. An RTS signal collected from image sensor dark frames machine learning techniques for classification and will have two noise contributors, the RTS transitions and denoising of one-dimensional time-series. The model is Gaussian or white noise. Since these noise sources are trained on a simulated dataset in order to provide assumed independent from one another in our model, their certainty in the fidelity of corresponding ‘clean’ and respective variances add together to determine the total noise ‘noisy’ signals, and tested on a similar set of signals. In of the signal such that: addition, a set of real signals collected from a CCD 2 2 2 sensor were used to provide a qualitative description of 휎푆퐼퐺 = 휎퐺푎푢푠푠푎푛 + 휎푅푇푆 the model’s efficacy. The magnitude of the signal at some time 푡 is written as: Keywords—machine learning, random telegraph 푥(푡) = 푥0 + 휖(푡) + 퐴푅푇푆 ∗ 푠(푡) signal, convolutional neural network, denoising autoencoder, signal reconstruction where 푥0 is the dark current noise floor, 휖(푡) is the dark current Gaussian noise contribution at time 푡, 푠(푡) is the RTS I. INTRODUCTION state at time 푡, and 퐴푅푇푆 is the RTS amplitude. In general, a single pixel can have many RTS defect A. Current state of art for RTS analysis centers. In this case the signal at some time 푡 is described by: RTS noise is a well-studied phenomena, typically the 푛 consequence of radiation damage in silicon devices. [1-5] A ( ) ( ) common manifestation of it comes in the form of discrete 푥 푡 = 푥0 + 휖 푡 + ∑ 퐴푅푇푆 ∗ 푠(푡) changes in the generation rate of leakage current, known in =1 image sensor as dark current. [6] RTS noise in image sensors where 푖 is the number of RTS defect centers located in the has been previously analyzed in a number of ways. Usually a pixel. For simplicity, this paper will assume that if a pixel lengthy time series is created for pixels of interest by displays RTS behavior, the number of RTS defect centers in collecting many (a few thousand) frames at regular intervals. This series is then analyzed to identify RTS behavior and that pixel is always equal to one. This is indeed the case for extract characteristics of interest. A variety of strategies have the majority of pixels that have RTS noise, except in cases of been used to analyze RTS pixels including visual inspection very high irradiation levels. [12] [7], histograms of the time series [8], and signal C. Machine Learning Classification reconstruction by wavelet denoising [9]. The most popular strategy over the last decade was created by Goiffon et al. and The goal in building a classification model is to take a aims to detect and reconstruct an RTS signal via set of data made of many categories and accurately separate convolutional filtering. There, a step-shaped filter is it into its different types. This classification model was convolved along a signal to suppress Gaussian noise, and trained to differentiate RTS signals from non-RTS signals. A produce large spikes where RTS transitions occur. Mean signal is represented as a vector and passed through various signal values are collected between spike locations to layers of operators or functions to produce, in this case, a reconstruct it without the white noise. [10] We will present single output (zero for RTS pixels or one for non-RTS in this paper a novel approach that takes advantage of recent pixels). This is similar to the way that image classification is progress that has been made in machine learning techniques. performed, and similar to machine learning classification methods previously used for one dimensional digital signals. [13-17] A typical convolutional classification model [18] will noise, and the other the confidence of a signal not containing include: convolutional, pooling, dropout, and fully-connect RTS. layers, here, each is addressed in turn. D. Classifier Training When the model is first initialized for training the coefficients of each filter, or the shape of each filter, are 1) Convolutional Layers randomized. Then, one by one, members of the training set Convolutional layers apply filters to extract prominent are passed through the network, and assigned a confidence of features that are representative of distinctive characteristics, RTS versus non-RTS. Because this is supervised training the such as RTS transitions. As the signal is passed forward confidence score is checked against the given label for the through the network, each neuron (or, filter or kernel) is signal, again 0 for RTS and 1 for non-RTS signals. The error convolved with the signal creating a feature map that is the of the confidence score is calculated by using the binary same size as the input.[19] Finally, an activation function is cross-entropy loss function, defined below, and improved by applied to each filter. This function ensures that each updating the filter and activation weights by means of back- convolution is, in the end, a non-linear operation. The propagation. [24] activation function used here is the rectified linear unit (ReLU) function [20] which returns a zero for negative 1) Binary Cross Entropy inputs, and the input value itself for positives. [21] The loss function used for classification is binary cross entropy, 퐸. Here, 푡 is the target label, 0 for RTS, 1 for non- 푓(푥)푅푒퐿푈 = max{0, 푥} RTS. 푦 is the probability of the signal being non-RTS Convolutional layers that are stacked after the initial layer according to the model. Notice if the target and probability will operate upon the feature maps produced from the close to one another the error is close to zero. [25] previous layers. The shapes of the filters, or weights of the 푛 퐸 = − ∑ 푡 log(푦 ) + (1 − 푡 )log(1 − 푦 ) neurons, are continuously changed during the training =1 process by back-propagation, to be discussed later. E. Denoising Autoencoder 2) Pooling Layers Once the signal is run through the classification model, Pooling layers reduce the dimensionality of the vector by and if it is determined to have RTS transitions, the signal has down-sampling the feature maps. Pooling layers typically its white noise component suppressed by means of a appear directly following a convolutional layer. While there denoising autoencoder (DAE). The autoencoder shares some are a variety of pooling techniques, our classification scheme features of the classifier, e.g., convolutional layers, pooling uses “max-pooling.” Essentially, max-pooling is a form of layers, etc,… Rather than attempting to identify the kind of compression that inspects a section of a feature map, say signal (RTS vs. non-RTS) it takes the ‘incorrect’ signal as an elements 7,8, and 9, finds the largest value amongst the three, input and attempts to return the ‘correct’ one. In this case, the and tosses the other two values out. Pooling not only eases autoencoder takes an RTS signal with Gaussian noise, and the computational stress of training a model by reducing the returns a signal with suppressed noise. number of parameters, but also provides spatial invariance of In practice, a clean RTS signal, 푥, is created by important features.[22] simulation. Then, Gaussian noise is added over the top to 3) Dropout Layers produce signal 푥̃. This signal is then encoded by running it Dropout layers turn off a percentage of neurons, or filters through convolutional and pooling layers to extract pertinent during training. This prevents filters from becoming features and compress it. The now encoded signal, or rather dependent on the presence of neighboring filters to optimize feature map, is then decoded by again running it through the model. This interdependence leads to overfitting. An convolutional layers, but now using up-sampling rather than overfit model will perform very well on the data it is trained pooling. The up-sampling returns the signal to its original on, but will perform poorly on data in general. [23] size by adding elements with value equal to zero. Adding these zeros forces the autoencoder to learn the relationships 4) Fully Connected Layers between non-zero values. Finally, the signal is passed The final layer in a classification model is a fully through a fully connected layer that produces a denoised connected layer. Each neuron in this layer, as the name reconstruction of the input signal 푥̂ as seen in figure 1. Just suggests, is connected to every output from the previous like with the classifier, the result is measured against the layer. This layer forms a vector where each element ground truth, or in this case the original clean signal 푥, by represents a confidence score corresponding to a distinct again using a loss function.

Load more