Achievable Information Rates for Nonlinear Fiber Communication Via End-To-End Autoencoder Learning
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Chalmers Research Achievable Information Rates for Nonlinear Fiber Communication via End-to-end Autoencoder Learning Downloaded from: https://research.chalmers.se, 2019-08-13 09:38 UTC Citation for the original published paper (version of record): Li, S., Häger, C., Garcia, N. et al (2018) Achievable Information Rates for Nonlinear Fiber Communication via End-to-end Autoencoder Learning 2018 European Conference on Optical Communication (ECOC), 2018-September http://dx.doi.org/10.1109/ECOC.2018.8535456 N.B. When citing this work, cite the original published paper. research.chalmers.se offers the possibility of retrieving research publications produced at Chalmers University of Technology. It covers all kind of research output: articles, dissertations, conference papers, reports etc. since 2004. research.chalmers.se is administrated and maintained by Chalmers Library (article starts on next page) Achievable Information Rates for Nonlinear Fiber Communication via End-to-end Autoencoder Learning Shen Li(1), Christian Hager¨ (1,2), Nil Garcia(1), Henk Wymeersch(1) (1) Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden (2) Department of Electrical and Computer Engineering, Duke University, Durham, USA [email protected] Abstract Machine learning is used to compute achievable information rates (AIRs) for a simplified fiber channel. The approach jointly optimizes the input distribution (constellation shaping) and the auxiliary channel distribution to compute AIRs without explicit channel knowledge in an end-to-end fashion. Introduction where x0 = x is the (complex-valued) channel Fiber transmission rates can be increased by input, y = xK is the channel output, nk+1 ∼ multi-level quadrature amplitude modulation (M- CN (0;PN =K), L is the total link length, PN is QAM) formats, which require higher input power the noise power, and γ is the nonlinearity param- and are thus more susceptible to nonlinear im- eter. The model assumes ideal distributed am- pairments such as nonlinear signal-noise in- plification and K ! 1. The channel input x teraction (NLSNI). Conventional techniques to is drawn randomly from an M-point constellation 2 deal with NLSNI include improved detector de- with EfjXj g = Pin, where Pin is the input power. signs 1–3 and optimized modulation formats 3–5. Even though dispersive effects are ignored, the The achievable transmission rates are them- model still captures some of the nonlinear ef- selves upper-bounded by the channel capacity, fects encountered during realistic transmission which is unknown for optical channels with NL- over optical fiber, in particular nonlinear phase SNI, even for simplified nondispersive scenar- noise (NLPN). The main interest in this channel ios, though upper 6 and lower 6–8 capacity bounds model lies in the fact that the channel probabil- have been established. ity density function (PDF) p(yjx) is known analyt- A different approach for constellation or detec- ically 1,7,8. This allows us to compare the AE per- tor design is to rely on machine learning and deep formance to an ML detector and benchmark the learning, including 9–15. Recently, autoencoders obtained AIRs using known capacity bounds. (AE) have emerged as a promising tool for end-to- Proposed Autoencoder Structure end design and have been shown to lead to good In machine learning, an AE is a neural network performance for wireless 9,10, noncoherent opti- (NN) which consists of two parts: an encoder cal 14, as well as visible light communication 15. maps an input s (e.g., an image) to a lower- In this paper, we develop an AE for a simplified dimensional representation or code and a de- memoryless fiber channel model. It is shown that coder attempts to reconstruct the input from the the AE approach can be used to establish tight code. It has recently been proposed to interpret lower bounds on the channel capacity by comput- all components of a communication system, con- ing achievable information rates (AIR) 16–19. More- sisting of a transmitter, channel, and receiver, as over, the AE can approach maximum likelihood an AE 9. This allows for end-to-end learning of (ML) performance and leads to optimized constel- good transmitter and receiver structures. lations that are more robust against NLSNI than The AE structure used in this paper is shown in conventional QAM formats. Fig.1 and will be described in the following. The s Simplified Fiber Channel Model goal is to transmit a message chosen from a set of M possible messages f1; 2; :::; Mg M. Similar to 1–4,6–8, we consider a simplified mem- , Following 9, the messages are first mapped to M- oryless channel for fiber-optic communication dimensional ”one-hot” vectors where the s-th ele- which is obtained from the nonlinear Schrodinger¨ ment is 1 and all other elements are 0. The one- equation by neglecting dispersion. The resulting hot vectors denoted by u are the inputs to a trans- per-sample model is defined by the recursion 2 mitter NN, which consists of multiple dense layers jLγ xk =K xk+1 = xke j j + nk+1; 0 ≤ k < K; (1) of neurons. Each neuron takes inputs from the transmitter receiver 0 0.01 1 zr xr yr 0.88 s 0 0.15 sˆ ∈ M map zi xi yi b b b b b b b b b b b b b b b b b b b b b b b b fiber channel “one-hot” normalization find largest index vector 0 0.07 Fig. 1: Autoencoder structure assuming 2 hidden layers in both the transmitter and receiver neural network. previous layer and generates an output accord- Tab. 1: Autoencoder parameters ing to zout = f(w|zin + b), where w is a vector transmitter receiver of weights, b 2 R is a bias, and f(·) is an acti- layer 1 2–6 7 1 2–7 8 vation function, here considered to be a sigmoid neurons MM 2 2 MM f(·) - tanh linear - tanh sigm. or tanh function. The values of the two transmit- ter output neurons (zr and zi in Fig.1) are used 100 to form the channel input. To meet the average power constraint, a normalization is applied us- 10−1 ing M different training inputs to the NN. Then the normalized output is assumed to be sent over 10−2 the channel, leading to an observation y. The −3 16-QAM, real and imaginary parts of y are taken as the 10 ML detector input to a receiver NN, the output of which we AE constellation, symbol error rate (SER) AE detector f (s ) 2 [0; 1]; s 2 M 10−4 denote by y 0 0 , where we as- AE constellation, sume a sigmoid in the last layer and then normal- ML detector 10−5 ize the sum of the output to 1. Finally, we set 15 10 5 0 5 10 − − − s^ = arg maxs0 fy(s0). input power Pin [dBm] The AE is trained using many batches of train- Fig. 2: SER as a function of Pin for M = 16. ing data averaging over different messages and Straightforward manipulations then yield channel noise configurations. In particular, the Z weights and biases of all neurons in both the X fy(x) I(X; Y ) ≥ p(x; y) log dy; (4) transmitter and receiver NN are optimized with re- p(x) 1 PN (i) (i) x spect to N i=1 `(us ; fy(s0) ), where which can easily be evaluated via Monte Carlo in- (i) (i) (i) (i) `(us ; fy(s0) ) = −us log fy(s0) : (2) tegration. The right-hand side of (4) is the AIR of the AE. Both the mutual information and the AIR is the cross-entropy loss, N is the batch size (a are lower bounds on the channel capacity. multiple of M), and the superscript refers to dif- ferent training data realizations, the subscript s Performance Analysis refers to the sth element of u(i). The optimization For the numerical results, we assume L = is performed using a variant of stochastic gradient 5000 km, γ = 1:27, and PN = −21:3 dBm. The descent with an appropriate learning rate. number of iterations to simulate the fiber model (1) is set to K = 50, which is sufficient to approxi- mate the true asymptotic channel PDF 1. The AE Achievable Information Rates is trained separately for different values of Pin us- ing the Adam optimizer in TensorFlow. The AE The AE can be used to determine lower bounds parameters are summarized in Tab.1. on the mutual information We start by comparing the symbol error rate X Z p(yjx) (SER), i.e., p(s 6=s ^), of the AE to the SER of an I(X; Y ) = p(x; y) log dy (3) 2 p(y) ML detector applied to (a) standard 16-QAM and x (b) the signal constellation optimized by the AE. 16–19 as follows . We normalize fy(s0) with respect The results are shown in Fig.2. The optimal input to s0 and consider it as a distribution over x. Then, power for 16-QAM under ML detection is around fy(x)p(y) is a valid joint distribution over x and y, −2 dBm, after which the SER increases due to so that, due to the non-negativity of the Kullback- NLPN. The SER of the AE decreases with input Leibler divergence, KL(p(x; y)jjp(y)fy(x)) ≥ 0. power, showing that the AE can find more suitable 8 upper bound 6 lower bound 6 lower bound 8 AE M = 256 6 AE M = 16 16-QAM Fig.