Joint Learning of Geometric and Probabilistic Constellation Shaping
Total Page:16
File Type:pdf, Size:1020Kb
Joint Learning of Geometric and Probabilistic Constellation Shaping Maximilian Stark∗x{, Fayçal Ait Aoudiazx, and Jakob Hoydisz ∗Hamburg University of Technology Institute of Communications, Hamburg, Germany, Email: [email protected] zNokia Bell Labs, Paris, France, Email: {faycal.ait_aoudia, jakob.hoydis}@nokia-bell-labs.com Abstract—The choice of constellations largely affects the per- which maximize I(X; Y ), without requiring any tractable formance of communication systems. When designing constella- model of the channel. Autoencoders have been used in the tions, both the locations and probability of occurrence of the past to perform geometric shaping [1], [2]. However, to the points can be optimized. These approaches are referred to as geometric and probabilistic shaping, respectively. Usually, the best of our knowledge, leveraging autoencoders to achieve geometry of the constellation is fixed, e.g., quadrature amplitude probabilistic shaping has not been explored. In this paper, modulation (QAM) is used. In such cases, the achievable in- probabilistic shaping is learned by leveraging the recently formation rate can still be improved by probabilistic shaping. proposed Gumbel-Softmax trick [3] to optimize discrete distri- In this work, we show how autoencoders can be leveraged butions. Afterwards, joint geometric and probabilistic shaping to perform probabilistic shaping of constellations. We devise an information-theoretical description of autoencoders, which of constellation is performed. Presented results show that allows learning of capacity-achieving symbol distributions and the achieved mutual information outperforms state-of-the-art constellations. Recently, machine learning techniques to perform systems over a wide range of signal-to-noise ratios (SNRs) geometric shaping were proposed. However, probabilistic shaping and on both additive white Gaussian noise (AWGN) and fading is more challenging as it requires the optimization of discrete channels, whereas current approaches are typically optimized distributions. Furthermore, the proposed method enables joint probabilistic and geometric shaping of constellations over any to perform well only on specific channel models and small channel model. Simulation results show that the learned constel- SNR ranges [4]. lations achieve information rates very close to capacity on an The rest of this paper is organized as follows. Section II additive white Gaussian noise (AWGN) channel and outperform provides background on autoencoder-based communication existing approaches on both AWGN and fading channels. systems and motivates their use for constellation shaping. Index Terms—Probabilistic shaping, Geometric shaping, Au- toencoders Section III details the considered neural network (NN) ar- chitecture and how the Gumbel-Softmax trick is leveraged to I. INTRODUCTION achieve probabilistic shaping. Section IV provides results on the mutual information achieved by different schemes. Finally, Various constellation schemes were developed in digital Section V concludes this paper. communications, including quadrature amplitude modulation (QAM), phase-shift keying (PSK), amplitude-shift keying Notations (ASK) etc. Shaping of constellations involves either optimiz- Random variables are denoted by capital italic font, e.g., ing the locations of the constellation points in the complex X; Y with realizations x ; y , respectively. I(X; Y ), 2 X 2 Y plane, i.e., geometric shaping, or optimizing the probabilities p(y x) and p(x; y) represent the mutual information, condi- j of occurrence of the constellation points, i.e., probabilistic tional probability and joint probability distribution of the two shaping. In either case, the focal aim is to maximize the mutual random variables X and Y . Vectors are represented using information I(X; Y ) of the channel input X and output Y by a lower case bold font, e.g., y, upper case bold font letters arXiv:1906.07748v3 [cs.IT] 29 Aug 2019 optimizing the constellation. This approach follows directly denote matrices, e.g., C. from the definition of the channel capacity C: II. AUTOENCODER-BASED COMMUNICATION SYSTEMS C = max I(X; Y ) (1) p(x) The key idea of autoencoder-based communication systems is to regard transmitter, channel, and receiver as a single NN where p(x) denotes the marginal distribution of X. Usually, such that the transmitter and receiver can be optimized in an finding the optimal p(x) is a difficult problem as it requires end-to-end manner. This idea was pioneered in [1], and has the knowledge of the channel distribution p(y x). Moreover, j led to many extensions [5]–[8]. Fig. 1 shows the end-to-end even if p(y x) is known, solving (1) is often intractable. j communication system considered in this work. The system In this work, we present how the recently proposed idea takes as input a bit sequence denoted by b which is mapped of end-to-end learning of communication systems by lever- onto hypersymbols s such that symbols s appear with aging autoencoders [1] can be used to design constellations 2 S frequencies corresponding to a parametric distribution pθS (s) x with parameters θ . Here, = 1;:::;N is the eventspace Equally contributed. S S f g {Work carried out at Nokia Bell Labs France. of the random variable S, N being the modulation order. The is used as loss function. Rewriting the loss function yields (θ ; θ ; θ ) = Hθ (S) Iθ θ (X; Y ) L S M D S − S ; M + Ey DKL (pθS ;θM (x y) p~θD (x y)) (5) f j jj j g where DKL is the Kullback–Leibler (KL) divergence. A more detailed derivation is given in the Appendix. Notice that if only geometric shaping is performed, no optimization with respect to (w.r.t.) θS is done and therefore the first term in (5) is a constant. However, when performing probabilistic shaping, minimizing leads to the minimization L of HθS (S). To avoid this unwanted effect, we define the loss function Fig. 1: Trainable end-to-end communication system. Compo- nents on which this work focuses are indicated by thicker b(θS; θM ; θD) (θS; θM ; θD) HθS (S): (6) L , L − outlines. Training the end-to-end system by minimizing b corresponds L to maximizing the mutual information of the channel inputs X and outputs Y , while minimizing the KL divergence sequence of hypersymbols is fed into a symbol modulator between the true posterior distribution pθ θ (x y) and the which maps each symbol s into a constellation point x C. S ; M j 2 one learned by the receiver p~θD (x y). Moreover, the NN The modulator is implemented as an NN fθM with trainable j implementing the receiver should approximate the posterior parameters θM . distribution pθ θ (x y) of a constellation maximizing the The demodulator is also implemented as an NN with S ; M j mutual information with high precision. This avoids learning trainable parameters θ , which maps each received sample D a constellation where the posterior distribution is well approx- y C to a probability vector over the set of symbols . The 2 S imated, but which does not maximize the mutual information. mapping defined by the demodulator is denoted by p~θ (s y), D j In practice, this is ensured by choosing the NN implementing and defines, as it will be seen below, an approximation of the demodulator complex enough to ensure that the trainable the true posterior distribution pθ θ (s y). Finally, the sent S ; M j receiver is capable of approximating a wide range of posterior bits are reconstructed by the symbols to bits mapper from distribution with high precision. p~θ (s y). D j Recently, [10] proposed to leverage the mutual information A. Mutual Information Perpective on Autoencoders neural estimator (MINE) approach as described in [11] to train In this work, it is assumed that a bits to symbols mapper the transmitter such that the estimated mutual information of exists, which maps the bits from b to symbols s according the input and output of the channel is maximized. Therefore, 2 S training a transmitter with this approach is similar to training to the distribution pθS (s). This can be done, e.g., using the algorithm presented in [9]. Therefore, in the rest of this a transmitter as a part of an autoencoder, as in both cases work, the transmitter directly outputs the transmit symbols the transmitter is trained to maximize the mutual informa- tion. Optimization using MINE does not require to train the sampled from pθS (s), and the receiver aims to reconstruct the transmitted symbols by approximating the posterior distri- receiver. However, the autoencoder approach jointly learns the transmitter and the receiver including the corresponding bution pθ θ (s y). Thus, only the signal processing blocks S ; M j surrounded by thicker outlines in Fig. 1 are of interest in this posterior distribution. The respective soft information output work. The distribution of X equals by the learned receiver can then be used in subsequent units, e.g., a channel decoder. Moreover, whereas MINE requires an N X additional NN only to approximate the mutual information, pθ θ (x) = δ (x fθ (s)) pθ (s): (2) S ; M − M S using an autoencoder the mutual information can be estimated s=1 from the loss as b provides a tight lower bound, assuming −L where δ(:) denotes the Dirac distribution. Please recall, that, a sufficiently complex NN implementing the receiver. as defined in (1), the target of constellation shaping is to find As by training an autoencoder-based communication system pθS (s), such that I(X; Y ) is maximized. One performs con- one maximizes the mutual information of X and Y , it can be stellation shaping by optimizing pθS (probabilistic shaping) or used to perform constellation shaping. Although, geometric fθM (geometric shaping) so that I(X; Y ) is maximized. shaping using autoencoders has been done in the past [1], [2], As the demodulator performs a classification task, for train- [12], performing probabilistic shaping is less straightforward ing, the categorical cross entropy as it requires to optimize the sampling mechanism for symbols s drawn from .