arXiv:1901.00447v1 [eess.SP] 2 Jan 2019 n mg rcsig[2,[3,adrsuc loainin allocation resource and signal [13], in [12], applications processing of image number and growing in popular coming impulsiv severe in of dramatically performance degrades the environment. [11], methods in these shown all as of t and sensitive performance thresholds highly The selected is approaches trivial. nonlinear not based is threshold analog broad- techniques threshol still of domain determining is advantage analog that, Note take distinguishable. impulsive [10] and the band [9], where [8], processing [7], domain in provide is Authors thresholds [6]. clipping fo and Neyman- equation blanking on analytical an quasi-optimal this based and proposed optimization in multiple-threshold is threshold criterion methods Pearson and a common [5], most [3], In the category. blanking are [2], [4] blanking/clipping clipping based approache threshold as nonlinear for Memoryless such features approaches. main duration the mitigation short as the IN considered and are amplitude IN high ef of the the tec example, mitigate Many For to [1]. IN. efforts prior of subcarriers in all explored over been have spread niques thresho gets certain effect a perf exceeds its power system and IN modulation, if degrade carrier still single can mance than mor IN orthogonal inherently to is Although resistant (OFDM) system. multiplexing communication frequency-division any of mance erlntok(N) rhgnlfeunydvso mult (OFDM). frequency-division ing orthogonal (DNN), network neural osdrdfrtann n etn,ad(i usyimpuls interleavi bursty with (ii) empowered is and receiver techniques. testing, the and when models environment training IN between for robustnes mismatch the approach (i) considered demonstrate under We this approach IN. clippin DNN-based the of and the detect blanking performance to as threshold such (BER) use approaches classic demonstra rate alleviate to results error relative to t Simulation stage bit Then, outliers. suppression impulsivity. superior the of of in effects instances blanked harmful the is detect IN netw to neural detected approach deep used a mitigation is stage, IN first (OFDM)-b (DNN) the stage In multiplexing two systems. communication frequency-division applicati a propose communication orthogonal we many for paper, in this essential In is signal ceived ahn erigmtossc sde erigaebe- are learning deep as such methods learning Machine perfor- the degrade significantly can (IN) noise Impulsive ne Terms Index Abstract Efiin eoa fiplienie(N rmre- from (IN) noise impulsive of removal —Efficient musv os eeto nOFDM-based in Detection Noise Impulsive Iplienie(N,mcielann,deep learning, machine (IN), noise —Impulsive ytm:ADe erigPerspective Learning Deep A Systems: .I I. eaBarazideh Reza NTRODUCTION † Email: eateto lcrcladCmue Engineering Computer and Electrical of Department assSaeUiest,Mnatn S USA. KS, Manhattan, University, State Kansas † omzNiknam Solmaz , { eaaaie,smnka,bala slmzniknam, rezabarazideh, ethe te that g iplex- the o sin ds the r ased in d ons. fect of s ork the or- ive ng he h- ld e e s † n aaurmna Natarajan Balasubramaniam and , las hr sacmrms ewe eeto n false metho and based threshold detection As traditional between methods. the in compromise classical probability a the alarm is of can there performance signals always, the OFDM degrade of hig also (PAPR) the Lastly, average-power-ratio mismatches. respo model peak-to in and vary will conditions threshold channel this to as challenge main thr the optimum is classi old the determining in approaches, Additionally, detection t outlier in applications. common example is communication that from uncertainty most learn for to account to ability capability their of and detection of efficient because for noise tools networ powerful impulse neural as used deep be employed, may (DNN) are struc strategies network processing appropriate and If [15]. [14], networks wireless fesu o2d an eaiet lnigadcipn at clipping and blanking to relative gains dB 2 BER approac to DNN-based up the offers when that tec show IN interleaving results domain bursty Simulation time in niques. by method accompanied our is additio receiver of In the IN robustness training. proposed for with the used evaluate performance the model we the the of from testing robustness different by model The highlighted clipping. is approach IN and su DNN-based blanking approaches proposed conventional as other the with of approach performance capability evaluatemitigation (BER) to the rate used compare error is samp and system Bit current communication not. OFDM-based the or an if in IN determine by to corrupted inputs is the the as filter (ROAD) Here, [17] Differences deviations Absolute statistic separable. to Rank-Ordered value, and linearly helps sample [16], output current not which the are function uses DNN that activation hidden the data nonlinear in maps distinguish a node that Each structure has hidden, outputs. connected layers appropriate (input, fully into a layers data is in input multiple detector The nodes of operator. with the removal output) consists of noise DNN operation any the proposed with the of conjunction independent as DNN-based in completely strategy proposed used mitigation be The can IN outliers. approach allevi of detection to be IN stage effects can suppression harmful IN the the in detected the detect clipped the to or (ii) Then, used blanked is instances. and either DNN signal detection a corrupted stage, IN first IN (i) the mitigation In stages: IN suppression. two IN proposed of comprises The approach system. OFDM- communication an for strategy based suppression IN based learning machine ooecm h frmninddabcs epooea propose we drawbacks, aforementioned the overcome To 10 − 3 . } @ksu.edu † tures esh- ds. nse cal ate he ch h- n, le h k h s i Impulsive noise w k k Mitigation Module Coding OFDM Deep Neural Input QAM S s r & k Modulator k Channel + + k Network Modulation Interleaving (IDFT) (DNN)

De-Interleaving OFDM rˆ Output Channel k Blanking Detection & De-Modulation De-Modulator Equalization Nonlinearity Decoding (DFT)

Fig. 1: System model block diagram.

The remainder of this paper is organized as follows. Sec- where sk = s(kTs/N); nk = wk +ik is the mixture of additive tion II describes the system and noise models. Section III white (AWGN) wk and IN ik. presents the structure of the proposed DNN and its input Here it is assumed that the noise samples nk are uncorre- features. The proposed algorithm for IN mitigation is detailed lated and their distribution can be expressed in terms of multi- in Section IV. The performance of the IN detector is analyzed component mixture-Gaussian model [11]. Corresponding to in Section V and finally conclusions are drawn in Section VI. this model, the probability density function (PDF) of the noise samples nk is obtained as II. SYSTEM MODEL J−1 2 Consider the OFDM system shown in Fig. 1. At the P (nk)= pjG(nk σj ) (4) transmitter, information bits are channel coded and then the j=0 X encoded bits are interleaved. Subsequently, the interleaved 2 where G nk σ is the PDF of the complex Gaussian vari- data is modulated and then passed through an inverse discrete 2 able with zero-mean and variance σ , and σ0, σ1, ..., σJ−1 Fourier transform (IDFT) module to generate OFDM symbols  { } and p0,p1, ..., pJ−1 are the model parameters such that over orthogonal subcarriers. In general, an OFDM symbol J−1 { } can be constructed with M non-data subcarriers and N M pj =1. The noise model (4) can support two commonly data subcariers. The non-data subcarriers are either pilot−s for j=0 usedP IN models. The first IN model is a two component channel estimation and synchronization, or nulled for spectral mixture-Gaussian noise model or Bernoulli Gaussian (BG) shaping and ICI reduction. Let the nonoverlapping sets of noise model [1] with model parameters corresponding to data, pilot, and null subcarriers be defined as SD, SP , and 2 2 2 2 2 S , respectively. Therefore, after digital-to-analog conversion J =2, p0 =1 ǫ, p1 = ǫ, σ = σ , σ = σ + σ . (5) N − 0 w 1 w i the transmitted signal envelope in the time domain can be 2 Here ǫ is the probability of the incoming impulse noise, σw expressed as 2 is the variance of AWGN component, and σi presents the 1 j 2πkt variance of the IN. The expression in (4) can also be used to s(t)= S e Ts , 0

m 1 the location parameter; β [ 1, 1] is called the skewness (W, b)= yi log(ˆyi)+(1 yi) log(1 yˆi) L − m − − ∈ − "i=1 # parameter and is a measure of asymmetry (β = 0 for SαS X L− n nl+1 distribution); γ (0, ) represents the scale parameter which λ 1 l ∈ ∞ + W 2 , (11) is a measure of the width of the distribution. 2m ij l i=1 j=1 X=1 X X EEP EURAL ETWORK ESIGN III. D N N D where m is the number of training samples; nl represents the In order to deal with IN, a DNN is exploited to find the number of neurons in layer l; and λ denotes the regularization instances of impulsivity. DNN is a black-box approach that hyper parameter that is used to prevent over-fitting in the training phase. The DNN aims to determine the weights W can be used to model any nonlinear system if properly trained. b In this section, the structure of DNN is introduced and then and the bias vector that minimize the loss function, i.e., the input features are presented. min (w, b). (12) W,b L A. DNN Structure The proposed DNN is trained using the back-propagation As shown in Fig. 2, the considered neural network consists algorithm along with Adam optimization algorithm [20]. The of two hidden layers with n1 and n2 hidden neurons in each Adam optimization is an extension to stochastic gradient layer, respectively. Typically, there is no analytical method to descent and has recently seen broader adoption in deep learn- choose the number of layers and neurons, and hence they are ing applications. Adam computes adaptive learning rates for determined experimentally on a trial and error basis. Here, x = each parameter Θ at time instant k. According to the Adam T [x1, x2, x3] represents the input vector consisting of three algorithm, the update rule for each parameter Θ in layer l is features (as discussed in the next subsection) and yˆ denotes the given by output of the DNN. There is only one node in the output layer, [l] [l] η [l] Θk+1 =Θk mˆ k . (13) which generate a binary sequence of zeros and ones. Note that − [l] υˆ + ε the soft outputs of DNN will be rounded off to a 0 or 1. An k q output 1 indicates that the received sample rk is corrupted Here, η is learning rate hyper parameter and by IN and output 0 implies that the kth received sample is uncorrupted. According to Fig. 2, the relation between layers l l ∂ (Θ) m[ ] = β m[ ] + (1 β ) L (14) can be expressed as k 1 k−1 − 1 ∂Θ[l] [l] [1] [1] [1] [1] [l] mk A = g W x + b mˆ k = k , 1 β1 [2] − A = g(2) W[2]A(1) + b[2]

 [3] [2] [3]  2 [3] W A b [l] [l] ∂ (Θ) yˆ = g + , (8) υ = β υ + (1 β ) L (15) k 2 k−1 − 2 ∂Θ[l]     W[l] b[l] [l] [l] where , , and g are the parameter matrix, bias vector, [l] υk th υˆ = , and activation function of l layer that will be applied to k 1 βk the output of the previous layer. The activation function is − 2 a nonlinear function in general, but can also be designed to where the proposed default values are β1 =0.9, β2 =0.999, −8 [l] [l] retain linearity in the transformation process. In this paper, the ε = 10 , and the initial value for m0 and υ0 are randomly Rectified linear unit (ReLU) function is used for the hidden chosen. B. DNN Input Features performed based on pilot subcarriers. Viterbi soft decoding is Feature extraction is one of the most important aspects of used to decode the demodulated signal and then detection is machine learning because it turns raw data into information performed based on the modulation scheme used. that is suitable for inferencing. Feature extraction eliminates the redundancy present in many types of measured data, V. SIMULATION RESULTS facilitating generalization which is critical to avoiding over- In this section, an OFDM-based communication system with fitting during the learning phase. According to Fig. 2, the input QPSK modulation in the presence of channel fading, channel layer has three nodes which are (i) the current sample value, coding, and IN is studied. The BER performance is used to (ii) Rank-Ordered Absolute Differences (ROAD) statistic, and compare the proposed DNN-based IN mitigation with other (iii) median deviations filter output. In the following we briefly conventional approaches such as blanking and clipping. Since introduce the ROAD and median deviation features. the distribution of the received OFDM signal in case of no 1) ROAD Value: The ROAD value is an efficient statistic IN can be considered as Gaussian, the threshold value for for distinguishing between corrupted and uncorrupted samples blanking and clipping in all scenarios is obtained based on as its value is high for noisy samples and low for uncorrupted the approach provided in [5]. samples [17]. In general, ROAD factor is widely used in We set n1 = 20 and n2 = 10 as the number of neurons in image processing for two dimensional (2D) signals. Here, we the first and the second hidden layers, respectively. With three compute the ROAD factor for a one dimensional received input features and according to Fig. 2, W(1) is (20 3) matrix signal as follows: and b(1) is (20 1) bias vector that connects the× input layer i. The absolute difference between the centre sample and to the first hidden× layer. After applying the activation function the remaining samples of a (1 2n) vector is calculated and g(1), the matrix W(2) with size (10 20) and the bias vector × (2) × denoted by d(k) which consists of 2n elements: b with size (10 1) will connect the first hidden layer to the second hidden× layer. Finally, W(3) is (1 10) matrix and d(k) = rk [rk−n, ..., rk−1, rk+1, ..., rk+n] (16) (3) × | − | b is a (1 1) bias that connects the second hidden layer × ii. Sort d(k) values in increasing order: to the output layer. Since the standard gradient descent from random initialization performs poorly with DNN, the initial b(k) = sort(d(k)) (17) values for all parameters is chosen based on Xavier initializer iii. The ROAD factor is calculated by summing up the first [21]. Here, the considered DNN is trained based on the signal n values of b(k): model in (3) and noise model in (4). Specifically, the training set consists of 1000 OFDM symbols with a range of E /N n b 0 and SIR that span the operating regions of interested. The ROAD = b(k). (18) samples with different Eb/N0 and SIR values in the training k=1 X data set is randomly shuffled to remove any trend that may 2) Median Deviations Filter: The median-deviations filter exist. to obtain ek can be expressed as For a quick reference, the simulation parameters for the

ek = rk median ([rk−n, ..., rk, ..., rk+n]) , (19) considered coded OFDM system in fading channel are listed in − Table I. A total of 1024 subcarriers are used with 672 carrying where the median filter used in (19) is a standard median filter data, 256 pilot, and 96 null subcarriers. Channel estimation which operates on a moving window of 2n +1 samples. is done based on pilot subcarriers which are equally spaced IV. IMPULSIVE NOISE MITIGATION between 1024 subcarriers. A 10-path fading channel is consid- ered with path arrival times following a Poisson distribution After the proposed DNN determines if a received sample is with mean 1 ms. The path amplitudes are Rayleigh distributed contaminated with IN or not, a simple memoryless nonlinear with exponentially decreasing average power. preprocessor such as blanking can be used to alleviate the effect of IN. Therefore, the output of blanking nonlinearity can be expressed as TABLE I SIMULATION PARAMETERS rk, yˆk =0 Parameters Values rˆk = , (20) 0, yˆk =1 Bandwidth (BW ) 6 kHz  No. of Subcarriers (N) 1024 where yˆk is the output of the DNN. It is worth mentioning Symbol Duration (T ) 170.7 ms that one can use other nonlinear preprocessors proposed in Modulation Scheme QPSK the literature to suppress the impact of IN. This extension is Channel Length (L) 10 Convolution Code Rate (CR) 1/2 straightforward and is not the main focus of this paper. After Code Constraint Length 7 IN mitigation a discrete Fourier transform (DFT) module is Generator Polynomial [171,133] used to transform the time domain signal to the frequency Learning Rate (η) 0.01 Regularization Hyper Parameter (λ) 0.1 domain. The DFT module is followed by frequency domain No. of Samples (n) 5 equalization that depends on channel estimation which can be 100 100

10-1 10-1

-2 10-2 10 BER BER No Impulsive Noise -3 -3 CLP, = 0.06 10 BG, = 0.04 10 BG, = 0.06 CLP, = 0.1 BG, = 0.1 BLN, = 0.06 BG, = 0.2 BLN, = 0.1 -4 10-4 No Mitigation 10 DNN, = 0.06 DNN, = 0.1

-5 10-5 10 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 E /N (dB) E /N (dB) b 0 b 0 (a) BER in BG noise, SIR = 0 dB. (a) BER comparison in BG noise, SIR = 0 dB.

100 100

10-1 10-1

10-2 10-2 BER BER

10-3 10-3 MCA, A = 0.04 CLP, MCA, A = 0.06 MCA, A = 0.06 CLP, MCA, A = 0.1 MCA, A = 0.1 BLN, MCA, A = 0.06 -4 -4 10 MCA, A = 0.2 10 BLN, MCA, A = 0.1 DNN, MCA, A = 0.06 DNN, MCA, A = 0.1

10-5 10-5 -6 -4 -2 0 2 4 -6 -4 -2 0 2 4 SINR (dB) SINR (dB) (b) BER in MCA noise, Γ = 0.2. (b) BER comparison in MCA noise, Γ = 0.2.

Fig. 3: BER performance of DNN for different model of IN. Fig. 4: BER comparison of DNN, BLN, and CLP for different model of IN.

Fig. 5 illustrates the robustness of the proposed DNN The BER performance of the proposed DNN-based IN approach under IN model mismatch. Although the proposed mitigation approach under two different test settings (i) BG DNN is trained based on the noise model in (4), the DNN- noise with SIR = 0 dB, and (ii) MCA with Γ = 0.2 and J=10 based method is the most robust technique relative to blanking are shown in Fig. 3a and Fig. 3b, respectively. As expected the and clipping in SαS noise model. The performance degrada- BER performance will degrade with increase in the frequency tion in blanking and clipping comes from the fact that the of IN occurrence. Fig. 4 compares the BER performance of the threshold calculation is performed based on Gaussian mixture DNN with blanking (BLN) and clipping (CLP) for different assumption for the received signal which does not hold in IN models in various levels of impulsivity. From Fig. 4, it is this scenario. Fig. 6 also investigates the BER performance of evident that DNN outperforms both blanking and clipping in the considered DNN-based method in bursty IN environment all scenarios of both BG and MCA noise models with gains when a time domain interleaver is included in the receiver. In − close to 2 dB at BER of 10 3. Fig. 4b shows that at high Fig. 6, the parameter Num denotes the number of consecutive SINR (signal to impulsive plus thermal noise ratio), blanking contaminated samples by IN. As shown in Fig. 6, the DNN and clipping are very vulnerable as the level of peakedness is able to find the IN instances while the level of burstiness decreases and it is difficult to find a proper threshold to can be alleviated by time domain interleaving. From Fig. 6 distinguish between desired and contaminated signals. On the it is obvious that the best performance is achieved when the other hand, a well trained DNN can handle the IN detection duration of IN is short. process even when the signal and IN peakedness is low. Although, the performance loss of DNN with increase in the VI. CONCLUSIONS frequency of IN occurrence is noticeable, it still outperforms In this work, a deep neural network (DNN) is proposed to other approaches in all scenarios. determine if a received sample is contaminated with impulsive [2] D.-F. Tseng et al., “Robust clipping for OFDM transmissions over 100 memoryless impulsive noise channels,” IEEE Commun. Lett., vol. 16, no. 7, pp. 1110–1113, Jul. 2012. [3] C.-H. Yih, “Iterative interference cancellation for OFDM signals with 10-1 blanking nonlinearity in impulsive noise channels,” IEEE Signal Process. Lett., vol. 19, no. 3, pp. 147–150, Mar. 2012. [4] N. Rozic, P. Banelli, D. Begusic, and J. Radic, “Multiple-threshold esti- 10-2 mators for impulsive noise suppression in multicarrier communications,” IEEE Trans. Signal Process., vol. 66, no. 6, pp. 1619–1633, Mar. 2018.

BER [5] G. Ndo, P. Siohan, and M. H. Hamon, “Adaptive noise mitigation -3 CLP, S S, = 1.4 10 CLP, S S, = 1.6 in impulsive environment: Application to power-line communications,” CLP, S S, = 1.8 IEEE Trans. Power Del., vol. 25, no. 2, pp. 647–656, Apr. 2010. BLN, S S, = 1.4 [6] H. Oh and H. Nam, “Design and performance analysis of nonlinearity BLN, S S, = 1.6 -4 10 BLN, S S, = 1.8 preprocessors in an impulsive noise environment,” IEEE Trans. Veh. DNN, S S, = 1.4 Technol., vol. 66, no. 1, pp. 364–376, 1 2017. DNN, S S, = 1.6 [7] R. Barazideh, B. Natarajan, A. V. Nikitin, and R. L. Davidchack, “Per- DNN, S S, = 1.8 10-5 formance of analog nonlinear filtering for impulsive noise mitigation in -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 OFDM-based PLC systems,” in IEEE Latin-American Conf. Commun., SINR (dB) Nov. 2017, pp. 1–6. Fig. 5: BER comparison of DNN, BLN, and CLP in SαS noise. β = 0, γ=1, [8] R. Barazideh, A. V. Nikitin, and B. Natarajan, “Practical implementation µ = 0. of adaptive analog nonlinear filtering for impulsive noise mitigation,” in IEEE Int. Conf. on Commun. (ICC), May 2018, pp. 1–7. [9] R. Barazideh, B. Natarajan, A. V. Nikitin, and S. Niknam, “Performance analysis of analog intermittently nonlinear filter in the presence of 100 impulsive noise,” Accepted in IEEE Trans. Veh. Technol, available online: arxiv.org/abs/1811.08940, 2018. [10] R. Barazideh, S. Niknam, B. Natarajan, and A. V. Nikitin, “Intermittently 10-1 nonlinear impulsive noise mitigation and doppler shift compensation in UWA-OFDM systems,” Submitted to IEEE J. Oceanic Eng, 2018. [11] S. V. Zhidkov, “Analysis and comparison of several simple impulsive

10-2 noise mitigation schemes for OFDM receivers,” IEEE Trans. Commun., vol. 56, no. 1, pp. 5–9, Jan. 2008. [12] G. Kaliraj and S. Baskar, “An efficient approach for the removal of BER Interleaving, Num = 4 impulse noise from the corrupted image using neural network based 10-3 Interleaving, Num = 10 Interleaving, Num = 20 impulse detector,” Image Vis. Comput., vol. 28, no. 3, p. 458466, 2010. Interleaving, Num = 30 [13] I. Kauppinen, “Methods for detecting impulsive noise in speech and Num = 4 -4 audio signals,” in 14th Int. Conf. Digit. Signal Process., vol. 2, July 10 Num = 10 Num = 20 2002, pp. 967–970. Num = 30 [14] R. Amiri, H. Mehrpouyan, L. Fridman, R. K. Mallik, A. Nallanathan, and D. Matolak, “A machine learning approach for power allocation in 10-5 0 2 4 6 8 10 HetNets considering QoS,” in IEEE Int. Conf. on Commun. (ICC), May E /N (dB) b 0 2018, pp. 1–7. [15] R. Amiri, H.Mehrpouyan, D.Matolak, and M.Elkashlan, “Joint power Fig. 6: BER performance of DNN in bursty IN, SIR = 0 dB, ǫ = 0.06. allocation in interference-limited networks via distributed coordinated learning,” in IEEE Vehicular Technology Conference, available online: arXiv:1806.02449, 2018. [16] H. Kong and L. Guan, “A noise-exclusive adaptive filtering framework noise (IN) or not in an OFDM-based communication system. for removing impulse noise in digital images,” IEEE Trans. Circuits Syst. The Rank-Ordered Absolute Differences (ROAD) along with II: Analog Digit. Signal Process., vol. 45, no. 3, pp. 422–428, March median deviations filter is used as input features for the DNN. 1998. [17] R. Garnett, T. Huegerich, C. Chui, and W. He, “A universal noise re- In the next stage, a nonlinear preprocessor such as blanking moval algorithm with an impulse detector,” IEEE Trans. Image Process., is used to suppress the effect of IN in corrupted samples. vol. 14, no. 11, pp. 1747–1754, Nov 2005. Simulation results show that the DNN-based approach of- [18] D. Middleton, “Canonical and quasi-canonical probability models of class a interference,” IEEE Trans. Electromagn. Compat., vol. EMC-25, fers significant improvement in the BER performance in the pp. 76–106, 5 1983. presence of strong impulsive component. Moreover, the DNN- [19] C. L. Nikias and M. Shao, with Alpha-Stable Distri- based IN mitigation outperforms other conventional threshold- butions and Applications. New York: Chapman-Hall, 1996. [20] J. B. Diederik P. Kingma, “Adam: A method for stochastic optimization,” based outlier mitigation methods such as blanking and clipping arXiv:1412.6980v9. with providing lower BER in IN environments. We also show [21] X. Glorot and Y. Bengio, “Understanding the difficulty of training that DNN-based approach is robust to IN model mismatches deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, vol. 9. and can effectively deal with bursty IN when the receiver PMLR, 13–15 May 2010, pp. 249–256. includes time domain interleaving. To extend this work one can exploit reinforcement learning to accomplish the impulsive noise mitigation.

REFERENCES

[1] M. Ghosh, “Analysis of the effect of impulse noise on multicarrier and single carrier QAM systems,” IEEE Trans. Commun., vol. 44, no. 2, pp. 145–147, Feb. 1996.