<<

Cascade-Net: a New Deep Learning Architecture for OFDM Detection

Qisheng Huang1, Chunming Zhao1, Ming Jiang1, Xiaoming Li1, Jing Liang2 1National Mobile Communications Research Lab., Southeast University, Nanjing 210096, China 2Huawei Technologies CO., LTD. Email:1{qshuang, cmzhao, jiang ming, xmli}@seu.edu.cn, [email protected]

Abstract—In this paper, we consider using deep neural network detection [7] and autoencoder for deep [8] [9]. for OFDM symbol detection and demonstrate its performance These specific applications of deep learning in communica- advantages in combating large Doppler Shift. In particular, a tions can be roughly divided into two types. First, using new architecture named Cascade-Net is proposed for detection, where deep neural network is cascading with a zero-forcing the deep unfolding [10] to add trainable parameters to the preprocessor to prevent the network stucking in a saddle point classical methods, through this data driven detection to find a or a local minimum point. In addition, we propose a sliding promoted algorithm. Second, substituting the modified dense detection approach in order to detect OFDM symbols with large neutral network, convolutional neutral network or residual number of subcarriers. We evaluate this new architecture, as well neutral network for the appropriate parts in communications as the sliding algorithm, using the Rayleigh channel with large Doppler spread, which could degrade detection performance in for enhancement. In this paper, we mainly focus on the first an OFDM system and is especially severe for high frequency band usage and enhance the performance of OFDM systems by and mmWave communications. The numerical results of OFDM improving the classical detection. As the primary requirement detection in SISO scenario show that cascade-net can achieve of higher transmitting rate, the neutral networks are usually better performance than zero-forcing method while providing ro- trained off-line and directly applied online with reconfigurable bustness against ill conditioned channels. We also show the better performance of the sliding cascade network (SCN) compared to hardware. sliding zero-forcing detector through numerical simulation. The main contribution of this paper is to propose a new ar- Index Terms—OFDM Detection; Neural networks; Rayleigh chitecture in OFDM detection. As mentioned before, Doppler fading channel; Large Doppler spread. spread can cause severe ICI between subcarriers especially when information are transmitted on high frequency. Inspired I.INTRODUCTION by deep MIMO detection [7], we use similar deep unfolding Orthogonal frequency division (OFDM) has method to create a trainable network through modifying ML been widely applied in modern communication. By using fast algorithm in purpose of combating against ICI. However, train- fourier transformation (FFT), this technique can support high ing of deep neutral network can easily get into a saddle point data-rate transmission and achieve high spectral efficiency or local minimum [11], which leads to poor performance in in wireless communications. Recently, application of this high signal-to-noise rate. In this paper, we creatively propose technique in the 5th generation (5G) wireless communication a cascade structure to handle this issue. In cascade network, system has been confirmed [1]. However, this attractive tech- neural network is cascaded to a zero-forcing preprocessor nique is very sensitive to the carrier frequency offset, phase being trained and used as a whole part. The thought of cascade noise, timing offset, and Doppler spread [2], which can break network is similar to transfer learning. By adding parameter- the orthogonality between subcarriers and cause inter-carrier fixed network to a new net, the difficulty of training a new interference (ICI). With growth of the carrier frequency used high-dimension network to converge sharply decreases. In in future broadband wireless access and the speed of modern multi subcarrier scenario, we propose a sliding structure [12] arXiv:1812.00023v1 [eess.SP] 30 Nov 2018 vehicles, the Doppler spread of the wireless channel strongly [13]. Our sliding structure consists of two parts: output area increases which leads to more severe ICI. To deal with this (OA) and guarding area (GA). Through careful analysis of emerging equalization problem, we propose a deep learning adjacent subcarriers’ ICI to the subcarrier being detected, we based method which usually performs better than the classical give out the empirical formula used for designing the length methods, such as zero-forcing and MMSE detection. of GA and OA. This sliding structure ensures the detecting During the past few years, with the development of deep performance of our cascade-net without adding too much learning approach, and the deep learning algorithm, neutral calculation complexity. network architecture has been successfully used in the field II.SYSTEMMODELANDNEURALNETWORKFOR of computer vision and language processing, given its ex- DETECTION pressive capacity and convenient optimization capability [3]. Particularly, many practical deep learning models for phys- A. SYSTEM MODEL ical layer communication come out, such like deep channel In our paper we mainly forcus on OFDM detection in estimation [4], deep channel decoding [5] [6], deep MIMO single in single out (SISO) scenario. For convenience, we use frequency-domain model to describe the whole system. HHT Assuming that cyclic prefix (CP) is long enough to eliminate internal symbol interference, transmitting and detection of b wk k wk 1 bk 1 ˆ each OFDM symbol would be independent. Considering an Xk 1 ˆ Xk 1 OFDM system with N subcarriers, the transmitted data in Concat- Concat- t t time-varying multipath channel is x(n). When the transmitted enate k enate k1 signal passes through the channel h(n, l), the received signal can be represented as [14] HYT Layer k Layer k+1 y (n) = h (n, l) ∗ x (n) + w (n) (1) th where ∗ denotes the convolution, L represents the number of Fig. 1. The k layer flowchart of DNT. discrete multipaths, h(n, l) is the time-varying complex gain of the lth path at the nth sample instant generated from Jakes The goal in our detection can be expressed in the following model [15] [16], and w(n) is the additive white Gaussian noise equation: (AWGN). Assuming perfect synchronization at the receiver th 2 side, the demodulated signal on the m subcarrier in the Xˆ θ(H, Y) = arg min kY − HXk N (6) frequency domain is x∈{symbolset} L−1 ! where the θ represents the trainable weights and bias. How- X 0 −j2πlk/N ever, the value of X is discrete, which is non-differentiable Y [m] = W [m] + Hl e X [m] l=0 and cannot be optimized. Thus, we enlarge the value set of N−1 L−1 (2) X to C and use hard-decision to achieve the estimation Xˆ X X m−k −j2πlk/N + X [k] Hl e of sending signal X. Our net’s architecture is proposed using k=0 l=0 deep unfolding [10] given as: k6=m Xˆ = 0 where X[k] represents the signal transmitted on the kth 0 m−k  HT Y  subcarrier in the frequency domain, Hl represents the FFT of the time-varying multipath channel tap l, which also  ˆ  zk = wk  Xk  + bk (7) indicates the ICI characteristics between subcarriers given as: T H HXˆ k N−1 1 X Xˆ = ϕ (z ) Hm−k = h (n, l) e−j2πn(m−k)/N (3) k+1 tk k l N n=0 th where Xˆ k is the estimation of sending signal X in the k the second term of (2) indicates the fading coefficient resulting iteration. wk, bk and tk are trainable parameters. Intuitively, from the multipath except interference of other subcarriers. each iteration is a linear combination. The kth iteration can The third term represents the ICI componet on the mth be seen as the forward propagation from kth layer to k + 1th subcarrier let layer(see Fig. 1). After adding trainable parameters. ϕtk is a X = [X[1],X[2], ··· ,X[N]]T piecewise linear soft sign activation function cited from [7]: T ρ (x + t ) ρ (x − t ) Y = [Y [1],Y [2], ··· ,Y [N]] (4) ϕ (x) = −1 + k − k (8) tk |t | |t | W = [W [1],W [2], ··· ,W [N]]T k k Our net uses a normalized multi-loss function [7], which is : the element of H in mth row, kth column be L−1 2 P m−k −j2πlk/N L ˆ H e . The transmission of an OFDM   X X − X l loss X; Xˆ (H, Y) = log (k) (9) l=0 θ 2 symbol with N subcarriers can be expressed as: ˜ k=1 X − X Y = HX + W (5) where L is the total layer number, X˜ is the zero-forcing result In our SISO scenario, matrix H is the frequency domain given as: −1 channel matrix which illustrates the interference and fading X˜ = (HT H) HT Y (10) to subcarriers in one OFDM symbol. This special designed loss function [7] uses zero-forcing B. NEURAL NETWORK FOR DETECTION detector as a standard to train the network while applying multi In this section, we achieve our deep detection network loss to prevent network from overfitting. The layer number is (DNT) by adding trainable parameters to the traditional detect- same to the number of iteration in origin ML algorithm. Thus, ing algorithm [5]. Inspired by article: Learning to detect for what DNT do is making use of the trainable parameters to MIMO detection [7], we also choose to unfold ML detection find the best detecting algorithm for ML detection in limited algorithm to a trainable network. iterations. However, as it actually uses all zero vector as the is N and w1 = [I2N×2N , −IN×N ] where I represents the HHT identity matrix, substituting Xˆ 0 into Xˆ k in equation (11) gives: ˆ T −1 T X1 = ϕt1 [w1((H H) )H Y) + b1] (12) ()1 w1 b1 Assuming wk(k > 1) to be the identity matrix, and bk be ˆ Y Concat X1 the zero matrix, the forward propagation of the cascade-net t -enate 1 becomes −1 T T T H ˆ Xk = ϕtk (ϕtk−1 ··· (ϕt2 (ϕt1 ((H H) H Y))) (13)

There, we suppose: tk = tk−1 ··· = t1 = 1, ϕtk (x)

satisfies ϕtk (x) = x in the activation area of the function. As ZF data preprocessor Layer 1 of DNT the constellation points have already been normalized before transmitting, their imaginary parts and real parts are less than Fig. 2. The cascade structure. 1. Therefore, nonlinear opponents will not affect the final hard- decision of the constellation point. At this time, the output of the cascade-net equals to a single zero-forcing detector. In ˆ initialization of Xk, training of DNT faces huge difficulty for another word, zero-forcing detector is one of the solutions converging. to the training of this net. Now, we can conclude that by cascading a zero-forcing detector as a data preprocessor, the III.CASCADENETANDSLIDINGDETECT training mission changes to find a better detector with the basis ALGORITHM of zero-forcing detection. A. CASCADE-NET In the DNT, a normalized loss function is used for eval- uation. However, in cascade-net this operation is redundant, As mentioned before, the DNT using zero vector as ini- as the first level is zero-forcing detector, the cascade-net tialization may suffer from slow convergence. To solve this natively consider the zero-forcing results as a standard, the problem, we proposed a cascade structure. In cascade-net, optimization of cascade-net is continuing learning from zero- the DNT is cascaded to a ZF data preprocessor (see Fig. 2). forcing detection to ML detection. Thus, in cascade-net, we To the DNT, the initializing vector is no longer all zero but directly use Euclidean distance as the loss function, which is: roughly processed data providing by ZF detector. In single DNT detector, it has to perform the work of estimating the   L 2 ˆ X ˆ sending signal from the very beginning, however, in cascade- loss X; Xθ (H, Y) = log (k) X − X (14) net its work changes to complete sending signal detection k=1 based on the results of ZF detector. In another word, the first B. SLIDING STRUCTURE part of the cascade-net completes the coarse detection while In modern communication system, number of subcarriers the second part performs the detailed detection. This idea is in one OFDM symbol can be very large. Therefore, using inspired by transfer learning. In transfer learning, learning on one cascade-net to learn the detection of such a OFDM a target problem is sped up by using the weights obtained symbol requires dramatic calculation resource which is hard from a network trained for a related source task [17]. The to be realized. However, ICI between subcarriers has strong parameter fixed network completes parts of work for a new correlation among them which indicates the possibility of learning target. Similarly, what detectors shared in OFDM peforming the whole detection step by step. In fact, ICI to symbol detection is restoring sending signal from receiving one subcarrier is mainly caused by limited number of adjacent signal. Thus, cascade-net should have promoted performance subcarriers, therefore we propose a sliding detecting structure on convergence. The forward propagation of cascade-net (CN) [12] for detection of those OFDM symbols with large number can be expressed as: of subcarriers. Different from the classical ICI cancellation method [18], sliding window (SW) consists of two parts: ˆ T −1 T X0 = (H H) H Y guarding area (GA) and output area (OA). GA is used to help  HT Y  OA complete the ICI cancellation, which means the detection z = w  Xˆ  + b (11) result in this part will not be output. Then, SW keeps sliding k k  k  k until finishing the detection of the whole symbol. The structure T ˆ H HXk of SW and the detecting process are shown in Fig. 3. Design ˆ Xk+1 = ϕtk (zk) of these two parts is based on the signal to interference power ratio and feasibility of calculation. where Xˆ 0 is the data preprocessor output and the input of the In previous section, we introduce cascade-net for OFDM secondary DNT. detection. Sliding cascaded-net is applying sliding window In fact, a cascade-net will not worse than a single zero- to cascade structure, which means each SCN will consist of forcing detector. If number of subcarriers in an OFDM symbol two parts: zero-forcing data preprocessor and deep detection TABLE I Channel matrix THEPARAMETERSUSEDFORDETECTION.    Cascade- Screening  Net Y Scenario: N = 32 fNd = 0.16 or 0.18 H     Label Modulation Learning Rate Layers  training  Value QPSK 0.005 20 Label BatchSize Fixed Sliding Value 500 position Cascade- Guarding Area Net

Output Area −1 10 DNT−fNd=0.16 Sliding Sliding ZF−fNd=0.16 Cascade频域 - Cascade- 滑动窗Net Net CN−fNd=0.16 DNT−fNd=0.18 ZF−fNd=0.18 CN−fNd=0.18

N-subcarrier OFDM symbol

−2 10 BER Fig. 3. The structure and application of SCN. network. The training to SCN is finding proper parameters to

−3 optimize the detection performance with submatrix of H and 10 14 16 18 20 22 24 26 28 30 32 34 Y as feed-in data. We treat each SCN as a CN for OFDM SNR(dB) symbol detection with l subcarriers where l correspond T T Fig. 4. BER versus SNR of 32-subcarrier OFDM symbol detection on QPSK to the length of SCN. After proper training, SCN detects lT with fNd = 0.16 or 0.18 subcarriers in each slide and keeps sliding until it completes detection of the receiving OFDM symbol shown in Fig. 3. ˆ scn Suppose Xθ (H, Y) to be the converged SCN detector with focus on the performance of OA. Based on this idea, the loss proper training. After nslip times sliding, the symbols to be function used to train SCN is now promoted to this form: Y Xout detected nslip , and the SCN detector output nslip can be L 2 expressed as: ˆ scn X ˆ loss(Xout; Xθ (H, Y)) = log(k) Xout − Xout h i (nslip−1)lT (nslip−1)lT +1 nsliplT k=1 Ynslip = Y Y ...Y (17) ˆ ˆ scn (15) where Xout and Xout represents the signal in output area. Xnslip = Xθ (Hnslip , Ynslip ) Xout = [XlG+1XlG+2 ...XlT −lG ] nslip nslip nslip nslip IV.LEARNINGTODETECTANDNUMERICAL RESULTS where lG correspond to the length of GA in SCN, Hnslip represents the sub-matrix used in nslip-times detection. GA In this section, we compare the detection performance of includes those subcarriers which corresponds to main interfer- CN to DNT and classical ZF detector. The deep detectors are ence to subcarriers in OA. Therefore, GA is used to assist OA trained off-line and applied online. Our simulation is based completing ICI cancellation. on the assumption that receiver can get accurate channel Next, we give out the specific details about the GA design. information. To prevent our network from miss adjustment, In fact, interference of kth subcarrier to the mth is a decreasing we dismiss the channel matrix with condition number larger function. It indicates that interference to a specific subcarrier than 10000 in training phase. The signal to noise rate (SNR) concentrates on limited number of adjacent subcarriers. Based of the data in training set depends on the range of SNR while on this idea, the empirical formula we proposed is: detecting. If the detecting range is 15dB to 35dB, the SNR used for training should be 15+35 = 25dB. The trainable 2 2 1 sin πxl parameter w is initialized using truncated norm function x = min{ arg { · = αβ}} k l 2 0 1 b 0.01 xl>fN N sin πxl/N (16) with mean and variance . k is initialized with , and tk is initialized with 1. We use tensorflow to construct our lG = dxl − fN e d network, and apply Adam algorithm as optimizer. TABLE I where β indicates the lowest interference power considered in shows the parameters for the detection of OFDM symbols with one SW while α is used to compensate for unequal amplitude 32 subcarriers transmitted in 4 paths fading channel [19] with modulation. The length of output area mainly depends on the normalized Doppler shift fNd = 0.16 or 0.18 constructed by capability of the calculation resource. Jakes model. To SCN the signal in GA is used to assist the detection of The result shown in Fig. 4. illustrates that when signal- signal in OA. In another word, we actually do not care about to-noise ratio is around 20dB, the performance of DNT is the output of GA. Thus, the loss function should also only much better than classical zero-forcing detector, however when TABLE II indicate that our network performs better than classical zero- THEPARAMETERSUSEDFORDETECTION. forcing detector. Moreover, it has a better performance than single deep detection network in high signal-to-noise ra- Scenario: N = 256 f = 0.16 or 0.18 Nd tio. Though perfect channel information is assumed in our Label Modulation Output Area Guarding Area simulation, cascade-net is robust against inaccurate channel Value QPSK 16 8 Label Learning Rate Layers BatchSize estimation. In fact, sliding structure could also be used in the Value 0.005 20 1500 first level (data preprocessor) of SCN for further reduction of computational complexity of data preprocessor.

−1 10 VI.ACKNOWLEDGMENT DNT−fNd=0.16 ZF−fNd=0.16 REFERENCES SCN−fNd=0.16 DNT−fNd=0.18 [1] G. Berardinelli, K. Pajukoski, E. Lahetkangas,¨ R. Wichman, O. Tirkko- ZF−fNd=0.18 SCN−fNd=0.18 nen, and P. E. Mogensen, “On the potential of OFDM enhancements as 5g waveforms.” in VTC Spring, 2014, pp. 1–5. [2] W. Hou, W. Ye, S. Feng, and F. Ke, “Iterative channel estimation and −2 10 successive ICI cancellation for OFDM systems over doubly selective BER channels,” Wireless personal communications, vol. 55, no. 2, pp. 289– 303, 2010. [3] T. Wang, C.-K. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deep learning for wireless physical layer: Opportunities and challenges,” China Communications, vol. 14, no. 11, pp. 92–111, 2017. [4] H. Ye, G. Y. Li, and B. H. Juang, “Power of deep learning for channel

−3 IEEE Wireless 10 estimation and signal detection in OFDM systems,” 14 16 18 20 22 24 26 28 30 32 34 Communications Letters, vol. 7, no. 1, pp. 114–117, 2018. SNR(dB) [5] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning- based channel decoding,” in Information Sciences and Systems, 2017 Fig. 5. BER versus SNR of 256-subcarrier OFDM symbol detection on QPSK 51st Annual Conference on. IEEE, 2017, pp. 1–6. with f = 0.16 or 0.18 Nd [6] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y. Beery, “Deep learning methods for improved decoding of linear codes,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, SNR goes up the constrained iteration leads to the decline no. 1, pp. 119–131, 2018. [7] N. Samuel, T. Diskin, and A. Wiesel, “Deep MIMO detection,” arXiv of DNT’s performance. To CN, due to the first level of data preprint arXiv:1706.0115, 2018. preprocessor, it always achieves a better performance than [8] M. Kim, N. I. Kim, W. Lee, and D. H. Cho, “Deep learning-aided zero-forcing detector. SCMA,” IEEE Communications Letters, vol. 22, no. 4, pp. 720–723, 2018. In SCN, the calculation complexity of matrix inverse would [9] S. Li, C. Hager,¨ N. Garcia, and H. Wymeersch, “Achievable information be Θ S3, while the complexity of sliding detector is linear. rates for nonlinear fiber communication via end-to-end autoencoder Thus, the output part should be as short as possible. To reach learning,” arXiv preprint arXiv:1804.07675, 2018. [10] J. R. Hershey, J. L. Roux, and F. Weninger, “Deep unfolding: Model- a compromise between complexity and detection delay, we based inspiration of novel deep architectures,” Computer Science, 2014. suggest the length of the output part lO to be 8 or 16. The [11] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The MIT Press, 2016. total length lT of SCN equals lO +2lG. The total sliding times [12] W. G. Jeon, K. H. Chang, and Y. S. Cho, “An equalization technique per detection is N . One more thing should be taken into con- lO for orthogonal frequency-division multiplexing systems in time-variant sideration in SCN training is the selecting of sub-matrices. We multipath channels,” IEEE Transactions on Communications, vol. 47, no. 1, pp. 27–32, 1999. get the submatrices HT from the fixed location p to p + lT in [13] N. Farsad and A. Goldsmith, “Neural network detection of data se- channel matrices H (shown in Fig. 3). This method is practical quences in communication systems,” arXiv preprint arXiv:1802.02046, as the statistical characteristic of the channel is ergodicity. 2018. TABLE II shows the hyper-parameters for the detection of [14] M. Kim, W. Lee, and D.-H. Cho, “A novel papr reduction scheme for OFDM system based on deep learning,” IEEE Communications Letters, OFDM symbols with subcarrier N = 256 transmitted in the vol. 22, no. 3, pp. 510–513, 2018. same fading channel as above. The GA length is designed [15] R. Clarke, “A statistical theory of mobile- reception,” Bell system using the method mentioned equation (16). The results in technical journal, vol. 47, no. 6, pp. 957–1000, 1968. [16] W. C. Jakes and D. C. Cox, Mobile Communications. IEEE Fig. 5. shows that both classical sliding detector and deep Press, 1993. sliding detector have the error floor. However, after using [17] L. Y. Pratt, “Discriminability-based transfer between neural networks,” deep sliding detector, the detector performance is obviously in Advances in neural information processing systems, 1993, pp. 204– 211. promoted. SCN makes further promotion because the error [18] Y. Zhao and S. G. Haggman, “Intercarrier interference self-cancellation floor of it is the lowest. scheme for OFDM mobile communication systems,” IEEE Transactions on Communications, vol. 49, no. 7, pp. 1185–1191, 2002. [19] R. Balraj, “Techniques for determining covariance measures based on V. CONCLUSION correlation criteria,” Jun. 30 2015, US Patent 9,071,318. In this paper, we propose a cascade network for detection of OFDM symbol transmitted in channel with large Doppler shift and provide a sliding structure for the detection of multi-subcarrier OFDM symbol. Simulations based on QPSK