End-To-End Deep Learning for Phase Noise-Robust Multi-Dimensional Geometric Shaping
Total Page:16
File Type:pdf, Size:1020Kb
MITSUBISHI ELECTRIC RESEARCH LABORATORIES https://www.merl.com End-to-End Deep Learning for Phase Noise-Robust Multi-Dimensional Geometric Shaping Talreja, Veeru; Koike-Akino, Toshiaki; Wang, Ye; Millar, David S.; Kojima, Keisuke; Parsons, Kieran TR2020-155 December 11, 2020 Abstract We propose an end-to-end deep learning model for phase noise-robust optical communications. A convolutional embedding layer is integrated with a deep autoencoder for multi-dimensional constellation design to achieve shaping gain. The proposed model offers a significant gain up to 2 dB. European Conference on Optical Communication (ECOC) c 2020 MERL. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Mitsubishi Electric Research Laboratories, Inc. 201 Broadway, Cambridge, Massachusetts 02139 End-to-End Deep Learning for Phase Noise-Robust Multi-Dimensional Geometric Shaping Veeru Talreja, Toshiaki Koike-Akino, Ye Wang, David S. Millar, Keisuke Kojima, Kieran Parsons Mitsubishi Electric Research Labs., 201 Broadway, Cambridge, MA 02139, USA., [email protected] Abstract We propose an end-to-end deep learning model for phase noise-robust optical communi- cations. A convolutional embedding layer is integrated with a deep autoencoder for multi-dimensional constellation design to achieve shaping gain. The proposed model offers a significant gain up to 2 dB. Introduction with codeength scalability and residual PN. The main contributions are summarized below: Due to laser linewidth and fiber nonlinerity, phase 1. We apply DL to joint optimization of both the noise (PN) has been one of the major issues in HDM constellations and DNN demapper in coherent optical communications. To tackle the an E2E framework for optical communication PN issue, there has been a lot of research re- channels. [1] lated to carrier phase estimation and PN-robust 2. We develop a PN-robust E2E model using [2] techniques . For example, modified closed-form convolutional embedding layers, which en- [2],[3] log-likelihood ratio (LLR) calculations have ables scaling to larger code lengths. been proposed to deal with the residual PN. 3. We verify that the E2E model can achieve [4],[5] High-dimensional modulation (HDM) has also high shaping gain close to the Polyanskiy’s shown benefit to mitigate nonlinear PN. bound[14]. Recently, deep learning (DL) has garnered a 4. We demonstrate that the PN-robust E2E can lot of attention in the field of optical communica- offer 2 dB gain in the presence of strong PN. tions. DL has been used for various tasks such as mitigation of fiber nonlinearity[6]–[9], modulation Phase Noise-Robust End-to-End System classification, link quality monitoring, resource al- We implement a typical optical communications location, and end-to-end (E2E) design[10]–[13]. In system including the encoder, the decoder, and many of these advancements, deep neural net- channel as a complete E2E deep neural network works (DNN) have been applied to optimize an with a focus on PN channels as shown in Fig. 1. individual function of the optical communication Unlike the typical E2E methods, the input mes- sub-systems, e.g., coding, modulator, demodula- sage to our embedding layers is not one-hot en- tor, and equalizer. However, such an approach coded, but instead represented as a k-bit vec- may be sub-optimal and therefore, an E2E de- tor x 2 f0; 1gK . This representation of the input sign could be more beneficial for next-generation enables scalability to larger block lengths, while optical communications. E2E method is a novel we introduce a tail-biting convolutional embed- concept that can be utilized to optimize the the ding layer to retain rich encoding capability with transmitter and receiver jointly in interaction with reasonable computational efficiency. This em- the communication channel in an end-to-end pro- bedding layer, parameterized by a dictionary of cess. It was shown that E2E approaches[10]–[13] 2m embedding vectors of length L, maps each can achieve a significant shaping gain in vari- segment of m consecutive bits to an embedding ous fiber-optic systems, by jointly optimizing HDM vector of size L. This embedding layer is cyclically constellation paired with DNN demodulator. applied across the K message bits with a stride However, in most existing E2E literature, one- of one. Then all of the embeddings are concate- hot encoding of the input message is considered, nated vertically to form a vector xe, which is given which limits the practical application to only small as input to the encoder. In our experiments, we codeword lengths. In this paper, we propose an use m = 3 and L = 8. E2E framework optimized for optical communica- The encoder is implemented as a feed forward tions, enabling scalability to larger codelengths as MLP, which consists of an input layer, one hidden well as robustness against PN. Our E2E frame- layer that uses tanh activation and an output layer, work employs tail-biting convolutional embedding followed by a power normalization layer. The in- layers integrated with a deep autoencoder to deal put layer is of size KL, which is equal to that of Hidden Hidden Layer Layer E2E 10-1 Polyanskiy BCH MLD Convolutional Embedding m bits Channel Enc(x ) -2 K bits e y x' 10 x WER AWGN Noise -3 1dB Gain xe 10 xe= K * L Power Normalization Layer Encoder Decoder Fig. 1: Proposed architecture of end to end model. 10-4 3 4 5 6 7 8 9 10 the output of the embedding layer, xe. The output SNR (dB) layer size is N, which effectively yields encoding Fig. 2: AWGN performance for (4,7) codeword. with the parameters (N; K). We use hidden layer E2E 10-1 Polyanskiy size equal to the sum of the input layer and output BCH MLD layer sizes for simplicity. The power normalization is performed with batch normalization (BN), while -2 disabling scale and shift operations. 10 The PN channel is modeled as r = exp(θ)s + WER w, where r is the received symbols vector at the -3 demapper, s is the transmitted symbols vector, 10 θ is the residual PN, which follows the Gaussian 2 distribution of zero mean and variance σρ, and w -4 10 is an additive white Gaussian noise (AWGN) vec- 3 4 5 6 7 8 9 10 tor, whose element follows circularly symmetric SNR (dB) Fig. 3: AWGN performance for (7,15) codeword. complex-Gaussian distribution of zero mean and variance σ2. In coherent optical communications, Hamming distance for (7; 4) codes. Here, we also the PN may come from laser spectrum linewidth, plot the Polyanskiy normal approximation (NA)[14]. fiber nonlinearity, imperfect phase recovery, etc. From the figure, it can be observed that our In the presence of laser linewidth ∆ν, the effec- proposed model outperforms the BCH-MLD by 2 −3 tive PN variance is expressed as σρ = 2π∆νTs nearly 1 dB for a WER of 10 . This suggests that where Ts is the symbol duration. our E2E design can enjoy the geometric shaping The decoder is also implemented as a feed for- gain over the best-known linear coded hyper-cube ward MLP, which consists of an input layer, one modulation. The proposed E2E model also out- hidden layer with tanh activation and an output performs the Polyanskiy’s bound, which is due to layer. The input layer and output layer sizes are the NA being loose for small codeword lengths. equal to N and K, respectively. The hidden layer Fig. 3 shows performance for a (15; 7) code size is equal to the sum of the input layer and out- in AWGN channels. As observed, our pro- put layer sizes. The output layer of decoder uses posed model consistently outperforms MLD per- a sigmoid activation to output a vector x0 repre- formance of the BCH code across all SNRs. We senting the likelihoods for each bit. We use bi- can also observe that performance of our E2E nary cross entropy (BCE) loss to train the E2E model approaches the Polyanskiy NA. network, and hence the DNN output can be di- We verified that our E2E method can achieve rectly fed into a soft-decision FEC without relying excellent performance close to Polyanskiy’s NA in on an external LLR converter. the AWGN channels. We now show the benefit of the E2E design in the PN channels. Fig. 4 shows Performance Analysis the performance of E2E shaping methods with- We evaluated the performance of the proposed /without residual PN. We assume a residual PN 2 E2E system for the AWGN and PN channels. variance of σρ = 0:05, which corresponds to an Fig. 2 compares the word error rate (WER) of our effective linewidth of 239 MHz for 30 GBd. When proposed method for a (7; 4) code in the AWGN E2E is trained at AWGN channels without deal- channel against BCH (7; 4) maximum likelihood ing with the PN, the optimized E2E works well for decoding (MLD). Note that there is no better lin- the AWGN channel as expected, whereas it can ear codes than this BCH code in term of minimum suffer from a significant degradation in the pres- [6] T.