arXiv:1711.00379v4 [eess.SP] 5 Oct 2018 si a ac h efrac fteplnma non-linear polynomial the complexit of computational performance lower significantly the with match simp canceler can and exceptionally it small works a canceler as that network demonstrate neural re testbed feed-forward Measurement full-duplex functions. a basis polynomial from on cancellati based n t non-linear is neural traditional that a in the of to use alternative proposed the an investigate as been we suc work, have this In As literature. algorithms soph impairments. and cancellation sufficient transceiver not non-linear usually is various an alone no cancellation to correctly by linear due is operate created signal self-interference to effects the of order part in significant cancellation interference eurdi re oflysprs h It h ee fthe of polynomial level on the based to are SI methods the These suppress floor. fully ar to receiver methods order cancellation in non-linear sign required SI complicated the that distort [9], means [8], ampli [5], power [4], and [7], non-linearities (PA) [6], [4 phase-noise [5], (ADC)) [4], converter imbalance IQ analog-to-digital non-lineari and transceiver (DAC) several digital-t converter (e.g., as non-linearities case baseband pra various i the as in such Unfortunately, it not known. is since fully this domain, is tice that digital signal the a s in by SI caused cancelable residual aft easily this receiver be the principle, meaning should at In achieve, present stage. still to cancellation is analog signal costly analo SI and the residual a front-end challenging in that analog very cancellation the is perfect However, i saturating domain receiver. and avoid the signal) to cancellation of acti order a or in of receiver) injection necessary the physi the Ana- and through through floor. transmitter (i.e., (i.e., noise passive the receiver either between the be isolation of can level cancellation the log suppress to to down order signal in necessary SI usually is domain digital the to node nee FD transmitter canceled. own effectively an its by be receiver for to sign node’s the order (SI) at self-interference produced In is strong band. simultaneously the frequency correctly, data operate same receiving the and communicatio in transmitting current of by efficiency systems spectral the increase to Abstract obnto fS aclaini h nlgadi the in and analog the in cancellation SI of combination A method promising a is [3] [2], [1], (FD) full-duplex In-band o-ierDgtlSl-nefrneCancellation Self-Interference Digital Non-Linear Fl-ulxssesrqievr togself- strong very require systems —Full-duplex o nBn ulDpe aisUsing Radios Full-Duplex In-Band for .I I. cl oyehiu ´deaed asne H11 Lausa CH-1015 Lausanne, f´ed´erale de polytechnique Ecole NTRODUCTION ´ erlNetworks Neural eeomnctosCrut Laboratory Circuits Telecommunications lxo Balatsoukas-Stimming Alexios alexios.balatsoukas@epfl.ch nmethod on o-analog isticated n-linear l This al. etwork lthat al y. rthe er ignal well, sults ties, a d is t fier the cal ve he ds ns c- h, le ], g e s 1] h nietasevr nldn h rnmsinc transmission the we including transceiver, [1 literature, entire In the applied. for successfully [11], the been networks have in networks ar we neural neural communications where radios layer As of physical other full-duplex [10]. some applications outline layer briefly in any physical cancellation of various the SI aware on to not focus networks are recently particular neural renewed a been of with has which application scenarios, the communications in interest etdadi osntrqietecmuaino n non- any of computation the require function. not basis impl linear does be bee to it has multiplications and real training mented fewer non-lin after 36% based requires (i.e., network canceler neural step the inference Specifically, learnable performed). the co of computational for lower number plexity significantly for a same with model the but polynomial parameters, with state-of-the-art match cancellation a already non-linear simple of can a performance canceler that non-linear the demonstrate based testbed network hardware neural mea a a using results from as experimental used samples signal, initial are Our that cancellation models literature. digital polynomial the construct standard the to the of to network alternative neural part a non-linear uses the that method cancellation hne infiaty h C prto ed ob re-run. be to needs chann operation self-interference PCA cancellation the the significantly, whenever the changes Moreover mention, generate complexity. authors additional to the multiplied the introducing matrix be method thus to transformation signal, this need with a samples However, (PCA) with baseband [9]. analysis digital in component transmitted presented non principal also significant using was non-linear most the terms of identifies complexit linearity number effective that An large technique computed. be reduction a to a have because functions have basis and non- also considered order estimated maximum can the of linearity they with number rapidly but the grows practice, as parameters complexity in been implementation have well high models work Polynomial to was n imbalance. shown PA model IQ both incorporates Hammerstein and that parallel cancellation linearities SI a digital for where used [9], was model in comprehensive and presented recent most the and expansions eae Work: Related Contribution: nti ok epooeannlna SI non-linear a propose we work, this In vrteyas hr a ensignificant been has there years, the Over n,Switzerland nne, hannel sured as , on- eas ear m- 0], in e- el n n y - where K1,K2 ∈ R and typically K1 ≫ K2. The output IQ Mixer BP Filter signal of the mixer is amplified by the PA, which introduces x(n) xIQ(n) xPA(n) further non-linearities that can be modeled using a parallel DAC PA h SI Hammerstein model as [9] P M Local x n h m x n m x n m p−1, Oscillator PA( )= PA,p( ) IQ( − )| IQ( − )| p=1, m=0 pXodd X y(n) ADC LNA (2)

where hPA,p is the impulse response for the p-th order non- IQ Mixer BP Filter linearity and M is the memory length of the PA. The xPA Fig. 1. Basic model of a full-duplex transceiver with a shared local oscillator SI signal arrives at the receiver through an SI channel with where some components have been omitted for simplicity. A more detailed impulse response hSI(l), l = 0, 1,..., (L − 1). Assuming model can be found in, e.g., [9]. that the ADC and potential baseband amplifiers are ideal, the downconverted and digitized received SI signal y(n) can be and transceiver non-idealities, was treated as an auto-encoder modeled as neural network which can, in some cases, learn an end-to-end L−1 signal processing algorithm that results in better error rate per- y(n)= hSI(l)xPA(n − l). (3) l=0 formance than traditional signal processing algorithms. Detec- X tion for molecular communications using neural networks was By substituting (1) and (2) in (3) and performing some considered in [12]. The work of [13] considered a modification arithmetic manipulations [8], [9], y(n) can be re-written as of the well-known belief propagation (BP) decoding algorithm P p M+L−1 for LDPC codes where weights are assigned to each message ∗ − y(n)= h (m)x(n − m)qx (n − m)p q, in the Tanner graph of the code that is being decoded and deep p,q p=1, q=0 m=0 learning techniques are used in order to learn good values for pXodd X X these weights. A similar approach was taken in [14], where (4) the offset parameter of the offset min-sum (OMS) decoding where hp,q(m) is a channel containing the combined effects algorithm are learned by using deep learning techniques. The of K1, K2, hPA,p, and hSI. work of [15] considered using a neural network in order to By adapting the expression of [9, Eq. (19)] to the case of a decode polar codes. In [16], [17], [18], neural networks were single antenna, we can calculate the total number of complex employed in order to perform detection and intra-user (and parameters hp,q(m) as mostly linear) successive interference cancellation in multi- user CDMA systems. Finally, the work of [19] considered P +1 P +1 npoly = (M + L) +1 , (5) using neural networks for wireless resource management. 2 2     which grows quadratically with the PA non-linearity order II. POLYNOMIAL NON-LINEAR CANCELER P . The task of the non-linear digital canceler is to compute ˆ In this section we briefly review a state-of-the-art polyno- estimates of all hp,q, which we denote by hp,q, and then mial model for non-linear digital cancellation that can mitigate construct an estimate of the SI signal, which we denote by the effects of both IQ imbalance and PA non-linearities [8], yˆ(n), using (4) and subtract it from the received signal in the [9], which are usually the dominant non-idealities, while the digital domain. The amount of SI cancellation over a window remaining transceiver components are assumed to be ideal. of length N, expressed in dB, is This model will serve as the baseline for our comparison in N−1 2 n=0 |y(n)| CdB = 10 log . (6) Section V. In Fig. 1 we show a simple full-duplex transceiver 10 N−1 2 architecture with a shared local oscillator, which is useful n=0P |y(n) − yˆ(n)| ! for the description of the polynomial non-linear cancellation III. NEURAL NETWORKP NON-LINEAR CANCELER model that follows. In this section, we first provide a brief background on neural Let the complex digital transmitted signal at time instant networks and then we describe our proposed neural network n be denoted by x(n). This digital signal is first converted based non-linear cancellation method. to an analog signal by the digital-to-analog converter (DAC) and then upconverted by an IQ mixer. The digital baseband A. Feed-forward Neural Networks equivalent of the signal after the IQ imbalance introduced by Feed-forward neural networks are directed graphs that con- the IQ mixer and assuming that the DAC is ideal can be tain three types of nodes, namely input nodes, hidden nodes, modeled as [9] and output nodes, which are organized in layers. An example

∗ of a feed-forward neural network with 6 input nodes, 5 hidden xIQ(n)= K1x(n)+ K2x (n), (1) nodes, and 2 output nodes is depicted in Fig. 2. Each edge in InputlayerHidden layer Outputlayer on real numbers, we split all complex baseband signals into their real and imaginary parts. We note that, in principle the ℜ{x(n)} neural network could learn to cancel both the linear and the non-linear part of the signal. However, because the non-linear ℑ{x(n)} part of the SI signal is significantly weaker than the linear

ℜ{x(n−1)} ℜ{yˆnl(n)} part, in practice our experiments indicate that the noise in the gradient computation due to the use of mini-batches essentially ℑ{x(n−1)} ℑ{yˆnl(n)} hides the non-linear structure from the learning algorithm. ℜ{x(n−2)} We use a single layer feed-forward neural network as depicted in Fig. 2. The neural network has 2(L + M) inputs ℑ{x(n−2)} nodes, which correspond to the real and imaginary parts of the (M + L) delayed versions of x in (4), and two output nodes, Fig. 2. A neural network with 6 input, 5 hidden, and 2 output nodes. which correspond to the real and imaginary parts of the target ynl(n) sample. The number of hidden nodes is denoted by nh the graph is associated with a weight. The input to each node and is a parameter that can be chosen freely. For the neurons of the graph is a weighted sum of the outputs of nodes in the in the hidden layer, we use a rectified linear unit (ReLU) previous layer, while the output of each node is obtained by activation function, defined as ReLU(x) = max(0, x), while applying a non-linear activation function to its input. the output neurons use an identity activation function. The weights can be optimized through supervised learning We note that, apart from the connections that are visible in by using training samples that contain known inputs and Fig. 2, each node has also has a bias input, which we have corresponding expected outputs. To this end, a cost function is omitted from the figure for simplicity. Thus, the total number associated with the output nodes, which measures the distance of (real-valued) weights that need to be estimated is between the outputs of the neural network using the current weights and the expected outputs. The derivative of the cost nw = (2M +2L + 1)nh + 2(nh + 1). (10) function with respect to each of the weights in the neural Moreover, the linear cancellation stage that precedes the neural network can be efficiently computed using back-propagation network has 2(M + L) real parameters that need to be and it can then used in order to minimize the cost function estimated. Thus, the total number of learnable parameters for using some gradient descent variant. Training is performed by our proposed neural network canceler is splitting the data into mini-batches and performing a gradient descent update after processing each mini-batch. One pass nNN = nw + 2(M + L). (11) through the entire training set is called a training epoch. IV. COMPUTATIONAL COMPLEXITY B. Neural Network Non-Linear Canceler In this section, we analyze the computational complexity of The SI signal of (4) can be decomposed as the polynomial and the neural network canceler in terms of the required number of real additions and multiplications for y(n)= ylin(n)+ ynl(n), (7) the inference step (i.e., after training has been performed). We note that, the computational complexity of the training phase where ylin(n) is the linear part of (4) (i.e., the term of the sum is also an important aspect that should be considered, but it is with p =1 and q =1) and ynl(n) contains all remaining (non- linear) terms. We propose to use standard linear cancellation beyond the scope of this paper due to space limitations. to construct an estimate of ylin(n), denoted by yˆlin(n), while A. Polynomial Canceler considering the much weaker y (n) signal as noise, and then nl In order to derive the computational complexity of the reconstruct ynl(n) using a neural network. Specifically, the polynomial canceler, we ignore the terms in (4) for p = 1 linear canceler first computes hˆ1,1 using standard least-squares and q = 1, since these correspond to the linear cancellation channel estimation [8], [9], and then uses hˆ1 1 to construct , which is also performed verbatim for the neural network yˆlin(n) as follows canceler. This means that there remain npoly −M −L complex M+L−1 parameters in (4). Moreover, in order to perform a best-case ˆ yˆlin(n)= h1,1(m)x(n − m). (8) complexity analysis for the polynomial canceler, we assume m=0 X that the calculation of the basis functions in (4) comes at no The linear cancellation signal is then subtracted from the SI computational cost. For the non-linear part of (4), npoly−M−L signal in order to obtain complex parameters need to be summed, so the minimum required total number of real additions is ynl(n) ≈ y(n) − yˆlin(n), (9) nADD,poly = 2(npoly − M − L − 1). (12) The goal of the neural network is to reconstruct each ynl(n) sample based on the subset of x that this ynl(n) sample Moreover, assuming that each complex multiplication is depends on (cf. (4)). Since neural networks generally operate implemented optimally using three real multiplications, the number of real multiplications between the complex param- −110 eters hp,q(m) and the complex basis functions is −120 nMUL,poly = 3(npoly − M − L). (13) B. Neural Network Canceler −130

For each of the nh hidden neurons, 2M +2L +1 incoming − real values need to be summed, which requires a total of at 140 least (2M +2L)nh real additions. Moreover, at each of the two −150 output neurons, nh +1 real values need to be summed, which requires a total of at least 2nh real additions. The computation −160

of each of the nh ReLU activation functions requires one Power Spectral Density (dBm/Hz) multiplexer (and one comparator with zero, which can be −170 trivially implemented by looking at the MSB). Assuming a −10 −8 −6 −4 −20 2 4 6 810 worst case where a multiplexer has the same complexity as Frequency (MHz) an addition, the total number of real additions required by the neural network canceler is SI Signal (−42.7 dBm) Linear DC (−80.6 dBm) Polynomial DC (−87.5 dBm) Neural Network DC (−87.7 dBm) −90 8 nADD,NN = (2M +2L + 3)nh. (14) ( . dBm)

Excluding the biases which are not involved in multiplications, Fig. 3. Power spectral densities of the SI signal, the SI signal after linear there are (2M +2L)nh real weights that are multiplied with cancellation, as well as the SI signal after non-linear cancellation using both the polynomial model and the proposed neural network. We also show the the real input values and 2nh real weights that are multiplied measured noise floor for reference. with the real output values from the hidden nodes. Thus, the total number of real multiplications required by the neural 8 network canceler is 7 nMUL,NN = (2M +2L + 2)nh. (15) 6

V. EXPERIMENTAL RESULTS 8 5 7 In this section, we first briefly describe our experimental 6 4 5 setup and then we present results to compare the digital 4 cancellation achieved by the standard polynomial non-linear 3 3 2 cancellation method and our proposed neural network based 1 0 200 400 600 800 1000 method. We note that all results are obtained using actual 2 Non-Linear SI Cancellation (dB) Training Epoch measured baseband samples and not simulated waveforms. Non-Linear SI Cancellation1 (dB) Training Frames Test Frames A. Experimental Setup 0 2 4 6 8 10 12 14 16 18 20 Full-Duplex Testbed: Our full-duplex hardware testbed, which is described in more detail in [20], [21], [4], uses a Training Epoch National Instruments FlexRIO device and two FlexRIO 5791R Fig. 4. Achieved non-linear SI cancellation on the training frames and the RF transceiver modules. We use a QPSK-modulated OFDM test frames as a function of the number of training epochs. signal with a passband bandwidth of 10 MHz and Nc = 1024 carriers. We sample the signal with a sampling frequency of 20 MHz so that we can also observe the signal side-lobes. Each achieved SI suppression, and after some point even decreased transmitted OFDM frame consists of approximately 20, 000 performance on the test frames due to overfitting. The total baseband samples, out of which 90% are used for training number of complex parameters hp,q(m) is npoly = 260, and the remaining 10% are used to calculate the achieved SI meaning that a total of 2npoly = 520 real parameters have to cancellation, both for the polynomial model and for the neural be estimated. As in [8], [9], we use a standard least-squares ˆ network. We use an average transmit power of 10 dBm and formulation in order to compute all hp,q(m). our two-antenna FD testbed setup provides a passive analog Neural Network: The neural network was implemented suppression of 53 dB. We note that we do not perform active using the Keras framework with a TensorFlow backend. More- analog cancellation as, for the results presented in this paper, over, we use the Adam optimization algorithm for training the achieved passive suppression is sufficient. with a mean-squared error cost function, a learning rate Polynomial Model: For the polynomial model, we present of λ = 0.004, and a mini-batch size of B = 32. All results for M + L = 13 taps for the equivalent SI channel and remaining parameters have their default values. In order to for a maximum non-linearity order of P = 7, since further provide a fair comparison with the polynomial model, we use increasing these parameters results in very limited gains in the 2(M + L)=26 input units and nh = 17 hidden units so that nw = 495 weights have to be learned by the neural network REFERENCES and the total number of weights and real parameters that need [1] M. Jain, J. I. Choi, T. Kim, D. Bharadia, S. Seth, K. Srinivasan, to be estimated is nNN = 521. P. Levis, S. Katti, and P. Sinha, “Practical, real-time, full duplex wireless,” in Proc. 17th International Conference on Mobile Computing B. Experimental Self-Interference Cancellation Results and Networking. ACM, 2011, pp. 301–312. [2] M. Duarte, C. Dick, and A. Sabharwal, “Experiment-driven characteri- In Fig. 3 we present SI cancellation results using the zation of full-duplex wireless systems,” IEEE Trans. Wireless Commun., polynomial model of Section II and our proposed neural vol. 11, no. 12, pp. 4296–4307, Dec. 2012. network. We can observe that digital linear cancellation pro- [3] D. Bharadia, E. McMilin, and S. Katti, “Full duplex radios,” in ACM SIGCOMM, 2013, pp. 375–386. vides approximately 38 dB of cancellation, while both non- [4] A. Balatsoukas-Stimming, A. C. M. Austin, P. Belanovic, and A. Burg., linear cancelers can further decrease the SI signal power by “Baseband and RF hardware impairments in full-duplex wireless sys- approximately 7 dB, bringing it very close to the receiver noise tems: experimental characterisation and suppression,” EURASIP Journal on Wireless Communications and Networking, vol. 2015, no. 142, 2015. floor. The residual SI power for both cancelers is slightly above [5] D. Korpi, L. Anttila, V. Syrjala, and M. Valkama, “Widely linear the noise floor, but this is mainly due to the peaks close to the digital self-interference cancellation in direct-conversion full-duplex DC frequency, for which we currently do not have a consistent transceiver,” IEEE J. Sel. Areas Commun., vol. 32, no. 9, pp. 1674– 1687, Sep. 2014. explanation, and not due to an actual residual signal. [6] A. Sahai, G. Patel, C. Dick, and A. Sabharwal, “On the impact of phase In Fig. 4 we observe that after only 4 training epochs the noise on active cancelation in wireless full-duplex,” IEEE Trans. Veh. neural network can already achieve a non-linear SI cancel- Technol., vol. 62, no. 9, pp. 4494–4510, Nov. 2013. [7] V. Syrjala, M. Valkama, L. Anttila, T. Riihonen, and D. Korpi, “Analysis lation of over 6 dB on both the training and the test frames. of oscillator phase-noise effects on self-interference cancellation in full- After 20 training epochs the non-linear SI cancellation reaches duplex OFDM radio transceivers,” IEEE Trans. Wireless Commun., approximately 7 dB, which is the same level of cancellation vol. 13, no. 6, pp. 2977–2990, June 2014. [8] L. Anttila, D. Korpi, E. Antonio-Rodr`ıguez, R. Wichman, and that the polynomial model can achieve, and there is no obvious M. Valkama, “Modeling and efficient cancellation of nonlinear self- indication of overfitting since the SI cancellation on the interference in MIMO full-duplex transceivers,” in Globecom Work- training and on the test data is very similar. Moreover, in the shops, 2014, pp. 777–783. [9] D. Korpi, L. Anttila, and M. Valkama, “Nonlinear self-interference can- inset figure we observe that allowing for significantly more cellation in MIMO full-duplex transceivers under crosstalk,” EURASIP training epochs does not improve the performance further. Journal on Wireless Comm. and Netw., vol. 2017, no. 1, p. 24, Feb. 2017. C. Computational Complexity [10] T. J. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. on Cogn. Commun. and Networking, vol. 3, For P = 7, M + L = 13, and nh = 17, the poly- no. 4, Dec. 2017. nomial non-linear canceler requires nADD,poly = 492 real [11] T. J. O’Shea, K. Karra, and T. C. Clancy, “Learning to communicate: Channel auto-encoders, domain specific regularizers, and attention,” in additions and the neural network non-linear canceler requires IEEE International Symposium on Signal Processing and Information nADD,NN = 493 real additions, which is practically identical. Technology, Dec. 2016, pp. 223–228. However, the polynomial canceler requires nMUL,poly = 741 [12] N. Farsad and A. Goldsmith, “Detection algorithms for commu- nication systems using deep learning,” ArXiv e-prints, May 2017, real multiplications while the neural network canceler only https://arxiv.org/abs/1705.08044. requires nMUL,NN = 476 real multiplications, which is a [13] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linear reduction of approximately 36%. We note that, in reality the codes using deep learning,” in Allerton Conf. on Comm., Control, and Computing, Sep. 2016, pp. 341–346. reduction is much more significant since the calculation of [14] L. Lugosch and W. J. Gross, “Neural offset min-sum decoding,” in IEEE the basis functions in (4) also requires a large number of real International Symposium on Information Theory (ISIT), Jun. 2017. multiplications. [15] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning- based channel decoding,” in Annual Conf. on Inf. Sciences and Syst., VI. CONCLUSION Mar. 2017, pp. 1–6. [16] B. Aazhang, B. P. Paris, and G. C. Orsak, “Neural networks for In this paper, we have demonstrated through experimental multiuser detection in code-division multiple-access communications,” measurements that a small feed-forward neural network with IEEE Trans. Commun., vol. 40, no. 7, pp. 1212–1222, Jul 1992. [17] M.-H. Yang, J.-L. Chen, and P.-Y. Cheng, “Successive interference a single hidden layer containing nh = 17 hidden nodes, a cancellation receiver with neural network compensation in the CDMA ReLU activation function, and 20 training epochs can achieve systems,” in Asilomar Conference on Signals, Systems and Computers, the same non-linear digital cancellation performance as a vol. 2, Oct 2000, pp. 1417–1420. [18] B. Geevarghese, J. Thomas, G. Ninan, and A. Francis, “CDMA in- polynomial-based non-linear canceler with a maximum non- terference cancellation techniques using neural networks in rayleigh linearity order of P =7 while at the same time requiring 36% channels,” in International Conference on Information Communication fewer real multiplications to be implemented. and Embedded Systems (ICICES), Feb 2013, pp. 856–860. [19] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, ACKNOWLEDGMENT “Learning to optimize: Training deep neural networks for wireless resource management,” in IEEE Int. Workshop on Sig. Proc. Advances The author gratefully acknowledges the support of NVIDIA in Wireless Commun. (SPAWC), Jul. 2017, pp. 1–6. Corporation with the donation of the Titan Xp GPU used for [20] A. Balatsoukas-Stimming, P. Belanovic, K. Alexandris, and A. Burg, “On self-interference suppression methods for low-complexity full- this research. The author would also like to thank Mr. Orion duplex MIMO,” in Asilomar Conference on Signals, Systems and Com- Afisiadis for his aid in carrying out the full-duplex testbed puters, Nov. 2013, pp. 992–997. measurements and Prof. Andreas Burg for useful discussions. [21] P. Belanovic, A. Balatsoukas-Stimming, and A. Burg, “A multipurpose testbed for full-duplex wireless communications,” in IEEE International Conference on Electronics, Circuits, and Systems, Dec. 2013, pp. 70–71.