SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 1

Exploring and Experimenting with Shaping Designs for Next-Generation Optical Communications Fanny Jardel, Tobias A. Eriksson, Cyril Measson,´ Amirhossein Ghazisaeidi, Fred Buchali, Wilfried Idler, and Joseph J. Boutros

Abstract—A class of circular 64-QAM that combines ‘geo- randomized schemes rose in the late 90s after the rediscovery metric’ and ‘probabilistic’ shaping aspects is presented. It is of probabilistic decoding [18], [19]. Multilevel schemes such compared to square 64-QAM in back-to-back, single-channel, as bit-interleaved coded [20] offer flexible and and WDM transmission experiments. First, for the linear AWGN channel model, it permits to operate close to the Shannon low-complexity solutions [21], [22]. In the 2000s, several limits for a wide range of signal-to-noise ratios. Second, WDM schemes have been investigated or proved to achieve the simulations over several hundreds of kilometers show that the fundamental communication limits in different scenarios as obtained signal-to-noise ratios are equivalent to – or slightly ex- discussed in [23]–[27]. ceed – those of probabilistic shaped 64-QAM. Third, for real-life In the last years, practice-oriented works related to op- validation purpose, an experimental comparison with unshaped 64-QAM is performed where 28% distance gains are recorded tical transmissions have successfully implemented different when using 19 channels at 54.2 GBd. This again is in line – or shaping methods, from many-to-one and geometrically-shaped slightly exceeds – the gains generally obtained with probabilistic formats to non-uniform signaling. The latter, more often called shaping. Depending upon implementation requirements (core probabilistic shaping in the optical community [28]–[32], has forward-error correcting scheme for example), the investigated perhaps received most attention. Various transmission demon- modulation schemes may be key alternatives for next-generation optical systems. strations and record experiments using shaped modulation formats have indeed been reported as, e.g., in [33]–[40], [42]. Index Terms—Communications theory, coded modulation, For illustration purpose, 65Tb/s of operational achievable rate non-uniform signaling, probabilistic amplitude shaping, non- binary codes, BICM, optical networks, nonlinear optics. using state-of-the-art dual-band WDM technologies, partial nonlinear interference cancellation, and non-uniform signaling are reported in [37]. I.INTRODUCTION A. Historical Notes B. Implementations Constraints and Future Optical Systems In communication theory, shaping is the art of adapting a mismatched input signaling to a channel model by modify- This work is motivated in part by the use of advanced QAM ing the per-channel-use distribution of its modulation points. formats and in part by non-binary information processing. Efficient information transmission schemes may use various The investigated formats are neither restricted to non-binary shaping methods in order to increase spectral efficiency. Many architecture, nor specific to any information representation, nor of them have been investigated over the years, from nonlinear even constrained by any coding/modulation method. Depend- mapping over asymmetric channel models or many-to-one ing upon the application, different design criteria might be mapping [2] to optical experiments involving non-uniformly considered. In particular, despite the induced complexity, sev- shaped QAM signaling. In particular, research efforts from eral advanced channel models envisioned for next-generation the 70s towards the 90s derive conceptual methods to achieve optical systems require the use of circular and possibly arXiv:1803.02206v4 [cs.IT] 19 Sep 2018 shaping gains in communication systems. Following the ad- high-dimensional constellations. In one example, nonlinear vent of trellis coded modulation [3], a sequence of works particularities of the optical fiber channel should be addressed. [4]–[10] present operational methods and achieve a large Due to the third-order nonlinear Kerr effect, the fiber channel fraction of the ultimate shaping gain associated with square becomes nonlinear at optimum launch power for WDM trans- lattices. Trellis shaping or shell mapping are implemented mission [44]. The perturbation-based model [45]–[47], [49], in applications such as the ITU V.34 . Non-uniform [50] shows that specific characteristics such as the 4-th or 6-th input signaling for the Gaussian channel is further investigated order moments of the random input may be taken into consid- in [11], [12]. While several shaping schemes are based on eration. In another important example, non-unitary and multi- the structural properties of lattices [13]–[17], the interest in dimensional channel characteristics may be addressed. In particular, the work in [51]–[53] shows that rotation-invariant F. Jardel (e-mail: [email protected]), C. Measson,´ and formats are instrumental whenever polarization-dependent loss A. Ghazisaeidi are with Nokia , Paris-Saclay, F-91620 Nozay, France. T. Eriksson is with Quantum ICT Advanced Development Center, happens. It indeed permits to attenuate or even eliminate NICT, 4-2-1 Nukui-kita, Koganei, Tokyo 184-8795, Japan. F. Buchali and the angle dependency when dimensional imbalance occurs, W. Idler are with Nokia Bell Labs, D-70435 Stuttgart, Germany. J.J. Boutros hence removing capacity loss due to angle fluctuation. In is with Texas A&M University, 23874 Doha, Qatar. Part of this paper had been presented at the European Conference on Optical addition, spherical constellations may facilitate implemen- communication (ECOC), Gothenburg, Sweden, 2017. tations of MMA-type (multi-modulus algorithm) of MIMO SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 2 blind equalization. Various other system criteria may also obey a real-valued zero-mean half-unit-variance Gaussian dis- enter the picture. A matching between channel physical model tribution. We model the random channel output by and transceiver architecture (in particular, receiver algorithms) √ is key to enable the ultimate transmission performance. A Y = SNRX + Z, conventional receiver chain (comprising sampling, chromatic X ∈ X p dispersion post-processing, MIMO equalization, phase and whereby is the random input with probability X SNR channel estimation, channel decoding and ) that and the signal-to-noise ratio. In case of continuous and operates in a sequential manner is quite often sub-optimal. power constrained input alphabet, the capacity of the model is log(1+SNR) Joint processing may be required to preserve the sufficient achieved by the Gaussian distribution and equals . statistics and improve the receiver performance. An imple- 2) Coding and Modulation: This paper investigates simple mentation solution consists of using conventional non-binary but efficient time-invariant modulation formats. A format is information processing associated with matching signaling. defined by the pair (X , pX ) composed of the input alphabet As various digital communication schemes requiring non- (constellation of points in the complex plane) X and the input square-QAM-based constellation are candidates for next- distribution pX . The input alphabet is a codebook with indexes generation optical applications, this paper aims at providing formed by letters (denoted by B or S) of the original informa- design guidelines for modulation formats. tion alphabet. Shaping in this paper is seen as the art of opti- mizing the transmission performance of a format with bounded entropy. Recall that, if the resulting constellations asymptot- C. Outline of the Paper ically sample a Gaussian density that achieves the capacity This paper presents results originally reported in [42]. It log(1+SNR), then the spectral efficiency gets optimized. Non- deals with an experimental study on the use of specific uniform signaling is obtained in [12] by letting pX follow modulation formats with high spectral efficiency for long-haul the Maxwell-Boltzmann envelope (or any other distribution). communications. The WDM fiber channel has been histori- It is called probabilistic shaping and sometimes probabilistic cally approximated in the linear regime, or in the limit of short constellation shaping in the optical literature, which leads reach communications with short to mid-size constellations, by to distinguish between geometric and probabilistic shaping the standard additive white Gaussian noise (AWGN) channel aspects of a format (X , pX ). Optimal system performance is model encountered in communications theory [1], [2]. This measured in terms of achievable rates. The mutual information paper investigates efficient modulation formats defined on the between X and Y is denoted by I(X; Y ). This quantity complex plane that operate very close to the fundamental operationally corresponds to coded-modulation: it is termed communication limits of the Gaussian model. They are further the CM information rate. For practical (often mismatched) tested in more complete scenarios, including the simulation of systems, we may operationally refer to the achievable rate long reach cases, and, finally, experiments that validate the associated with conventional estimation of the representation modulation proposals. Note that, because this work deals with letter (bit or symbol). This quantity corresponding to bit (or first guidelines for advanced signaling and multi-dimensional symbol) MAP estimation is termed the B-CM (or S-CM, optical systems, it does not, at first, consider system-dependent respectively) information rate. In many instances, it coin- optical models such as the enhanced Gaussian noise (EGN) cides with the classical bit-interleaved (or symbol-interleaved) model [47]. coded-modulation BICM (or SICM, respectively) framework of [20], [22], [58]–[60] and is a particular case of generalized II.SHAPINGAND OPTICAL COMMUNICATIONS mutual information (GMI) [61], [62]. Unless stated otherwise, A. Setup and Notations the information source is represented by the random binary 1) Channel Model: A crude approximation of the fiber variable B. Random binary vectors can be equivalently rep- channel under current coherent WDM technologies (involving resented as random symbols Si = Si(B1, ··· ,Bm0 ). Random PDM and mismatched architecture) is represented by the symbol vectors can be equivalently represented as random channel inputs X = X(S1, ··· ,Sm). The S-CM information complex-valued AWGN channel model. This model is valid Pm in ideal back-to-back scenarios and short-range transmissions. rate is then given as H(S1, ··· ,Sm) − i=1 H(Si|Y ) (and For characterizing future optical systems, the performance in similarly for the B-CM rate). In practice, simple Riemann- the linear regime remains central at the first order. In most based integration methods are used to compute the different real-life scenarios however, long range communications create information rates. More details on achievable rates are given different types of (intra, extra, noise) nonlinear interference for in Appendix A. which perturbations on the solution of the Manakov equation [45], [46], [49], [50] may provide some insight into the design B. Square Quadrature and analysis of efficient constellation. See also [44], [47], [48]. In this paper, shaping tradeoffs are first addressed in the 1) Definition: Popular modulation formats are based on idealized linear regime. They are later tested in the nonlinear Pulse Amplitude Modulation (PAM) per quadrature, for which m regime by simulations and experiments. Formally, the receiver P = 2 real-valued points x1, ··· , xP are equally spaced and is assumed to see, independently at each channel use, an centered around 0. The alphabet set is denoted by P -PAM and overall additive white noise equivalent to a complex-valued √ 2m-PAM ∝ {−(2m − 1), ··· , −3, −1, 1, 3, ··· , 2m − 1}. random noise Z = Z1+ −1Z2 where the independent Z1,Z2 SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 3

In current practice, the P points are associated with equal is exemplified for QAM in Fig. 1 where non-uniform signaling 1 m probabilities pX = 2m . The choice P = 2 enables sim- is obtained using the Maxwell-Boltzmann distribution [12]. ple bit labeling. For notational convenience, we use X = In the target SNR region, the capacity of the shaped system m m 2m-PAM def= {−√2 +1 , ··· , −√ 1 , √ 1 , ··· , √2 −1 } approaches the ultimate limit given by the continuous Gaussian E(m) E(m) E(m) E(m) input distribution. Beyond shaping loss, one may list additional def 22m−1 where E(m) = 3 is a constant that normalizes the drawbacks of square QAM that are specific to optical systems. total power to one. The Cartesian product of two P -PAM Those include suboptimal equalization in case of non-unitary alphabets is called (square) Quadrature Amplitude Modulation impairments [51], [52] or other mismatches as discussed 2 and denoted by P -QAM. in Section I-B. Investigations on QAM-based variations are therefore critical to envision alternative engineering designs. 7 Ultimate SNR loss of πe ≈ 1.53dB 6−→ ←− CM C. Circular Quadrature Amplitude Modulation B-CM 6 0 Gaussian bound 1) Definition: Within this paper, we define a q ×q -CQAM Square lattice bound constellation to be a circular QAM format that is rotation- 5 CM with shaping invariant in the I/Q plane. More precisely, by rotation of angle B-CM with shaping 2π 0 q , the P = q × q constellation points are mapped onto con- stellation points with same associated probabilities. Examples 4 include APSK formats as in [39], or other constructions as Information loss due to in [64]. If q = q0, then a q2-circular quadrature amplitude square QAM input mismatch 2 3 modulation (q -CQAM) is a two-dimensional constellation −→ ←−

- that includes q shells (circles containing points of the same Achievable Rate amplitude) with q points per shell [32]. We write 2 q−1 √ 2 [ i 2π −1 q -CQAM ∝ e q B, 1 i=0

SNR (dB) where B is a fundamental (connected or not) discrete set of q Shaping target SNR 0 points with distinct amplitudes. 0 5 10 15 20 2) Properties: One interesting aspect of CQAM-like for- Fig. 1: Achievable information rates for square 64-QAM. The shaped distribution is optimized for a target SNR region around 10dB, which reduces mats is that they are naturally adapted to q-ary PAS coding. the maximal transmitted entropy from 6bits per channel use down to 5.45. There are obviously many possible q2-CQAM constructions. Depending upon the design criteria, e.g., the figure of merit [5] 2) Properties: QAM formats are the constellations of (minimum distance) as in [32], different properties and per- choice in various communication systems. PAM enables a formance are obtained. In the sequel, we investigate different natural Gray labeling of the information bits which increases criteria options for CQAM constellation and perform specific performance at mid-to-large signal-to-noise (SNR) ratios. Be- optimization. We eventually focus on the CQAM construction cause I/Q QAM components remain independent in the pres- of [32]. This particular CQAM construction, which originates ence of standard Gaussian noise, the statistical separation from an exercise on the generalization of the PAS method, leads to individualized demodulation schemes. Practical in- turns out to be particularly efficient with respect to CM dividual demodulation in this regime is enabled by the max- capacity. log approximation. Despite such important practical aspects, square QAM formats suffer from a noticeable drawback when associated with uniformly distributed codebooks. Geometric D. Non-uniform QAM Signaling and the PAS method arguments on square lattices show that the overall transmission 1) Background: The reason PAS [29]–[31] has been ex- rate is generally bounded away from the [5]. perimented with in optical communications are twofold. First, In the case of additive Gaussian noise, shaping permits to non-uniform signaling is obtained after shaping the source up πe reduce this gap and asymptotically achieve up to 6 ≈ 1.53dB front a (possibly legacy) coding system: this offers backward of signal-to-noise ratio (SNR) gain. This is shown in Fig. 1 compatibility. Second, the distribution matcher (DM) provides for the example of 64-QAM. It can be observed that, when an additional degree of freedom for rate adaptation: this may associated with an example of Gray mapping, the B-CM be a useful feature. The general PAS framework is found in capacity deviates from the CM capacity at low SNR. As Appendix B. summarized in Section I-A, various shaping methods involving 2) Case with Square QAM : In its original binary instance, multi-dimensional geometric considerations have been devised PAS is based on the antipodal symmetry of 2m-PAM. Indeed, m S in the past, e.g., shell mapping and trellis coded modulation up to power normalization, 2 -PAM ∝ s=±1 sB, where for wire-line communications. In this paper, we focus on time- B = {1, 3,..., 2m − 1}. Referring to Appendix B, the invariant non-uniform signaling [6], [12] as recently investi- isomorphic representation gated in optical research (in particular in combination with m Probabilistic Amplitude Shaping (PAS) [29], [36], [37]). This 2 -PAM ≡ {−1, +1} × B SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 4

1.5 72 72 73 1.5 63 62 72 62 1.5 62 42 52 73 52 73 52 1 53 32 63 42 71 63 42 71 43 31 41 1 53 32 61 1 53 51 74 22 43 51 32 33 21 51 33 22 41 43 22 31 61 64 12 61 23 12 31 33 1221 41 0.5 23 13 11 71 0.5 13 21 0.5 13 02 0302 11 23 02 11 54 03 01 20 30 01 7464 03 01 20 30 7464544434241404 44 04 00 50 44 14 04 00 10 40 0 0 14 10 0 05 0010203040506070 54 34 07 40 34 24 05 07 50 15 07 05 06 17 06 25 06 24 27 6070 75 15 17 27 17 15 16 37 65 60 -0.5 35 16 27 -0.5 25 26 -0.5 55 25 16 37 45 26 37 45 47 26 70 55 47 65 35 36 57 45 35 47 65 36 57 55 46 67 -1 75 46 67 -1 36 57 75 56 77 -1 56 46 56 77 66 76 66 67 -1.5 66 -1.5 76 77 76 ‘star’ ‘hybrid’ ‘2-dist’ -1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 (a) (b) (c) 5.5 6 4.5 4.5 5 5.5 4.5 5 4 4 4 4.5 Gaussian bound 3.5 12 13 14 15 12 13 14 4 CM 'star' unif. CM 'hybrid' unif. 3 Gaussian bound CM '2-dist' unif. S-CM 'star' 3.5 2.5 CM 'true' unif. B-CM 'star' Achievable Rate Achievable Rate 3 CM 'star' 2 S-CM 'hybrid' CM 'hybrid' B-CM 'hybrid' 2.5 CM '2-dist' 1.5 S-CM '2-dist' CM 'true' B-CM '2-dist' 2 1 6 8 10 12 14 16 18 20 22 0 5 10 15 20 (d) SNR (dB) (e) SNR (dB) Fig. 2: Types of CQAM-like constructions, information rates, and CM vs S-CM tradeoffs. The CM rate characterizes the optimal communication limits. The S-CM rate is a relevant operational quantity when working with q = 2p-ary-based architectures. The B-CM rate is given for the sake of completeness. Shaping parameters have been independently optimized for the 10dB SNR region while keeping an input entropy close to the shaped QAM of Fig. 1.

permits to distinguish between signal amplitudes in B and field Fq with a prime q > 2. A generalized PAS framework their sign in {−1, +1}. PAS in [29] is based on the mapping is presented in Appendix B where it is observed that the new 1 {1, 0} ≡ {−1, +1} that encodes the sign while, independently, schemes relax the code rate constraint to RC ≥ 2 for any q. binary vectors label points in B. PAS is a layered coding Referring to Appendix B, we use the isomorphic representation scheme. The central channel coding layer uses a linear code 2 q -CQAM ≡ Fq × B, with systematic encoding and rate RC where parity bits encode amplitude signs. The systematic information is, for where B represents the fundamental region. In [32], the main example, Maxwell-Boltzmann-shaped in a layer up front via, goal is to explore the use of q-ary codes by generalizing PAS for example, prefix-free source coding or similar methods: this and the binary sign flipping technique to the q-ary case. In is further used to encode the amplitudes at the end layer. For this paper, the goal is slightly different. For practical reasons, 2 64-QAM-based systems, the code rate constraint is RC ≥ 3 . we are restricted to computation fields of characteristic 2 and, 3) Case with Circular QAM: A linear dense combination in particular, q = 23. As this field is an extension of the 1 of q-ary symbols tends to asymptotically admit a uniform binary field, the nature of code constraint is less stringent and, distribution [19], [32]. If PAS (with, e.g., standard LDPC, even for PAS, suboptimal schemes with binary codes could Turbo, or polar codes) tends to map uniformly-distributed be envisioned. The code rate constraint of the generalized parity bits into the signs of PAM points, then parity bits do not framework however remains valid and of interest. For 64- perturb amplitude shaping. The generalization of this property 1 CQAM-based systems, it is RC ≥ 2 . to alternative (non-binary) information representations is en- 4) Perspective: Notice that the non-uniform signaling for- abled by specific QAM format, among others q ×q-CQAM as mats presented in this paper may serve as general baseline in [32] when the underlying alphabet is assumed to be a finite performance guidelines. They are not PAS-specific guidelines. The obtained results and insights can be used in a variety 1The proof [19], [32] over involves the q roots of the unity as a Fq of coding system models and coded modulation schemes. As generalization of the sign symmetry over F2. This generalization motivates the construction of CQAM over Fq with the use of circular symmetry [32]. discussed in the previous section, this paper is an experimental SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 5

15 6 4.5 1.5 74 64 53 63 73 14.5 43 1 75 54 5 33 32 42 52 4 22 14 65 44 23 12 11 21 62 0.5 31 34 13 02 10 4 55 03 01 41 72 13.5 11 12 13 14 24 04 00 20 0 76 51 45 14 050607 17 30 35 13 3 Gaussian bound 66 15 16 61 SNR [dB] -0.5 25 27 40 CM CQAM H(X)=5.25 56 26 37 Achievable Rate 36 S-CM CQAM H(X)=5.25 46 50 71 12.5 -1 47 2 (S-)CM QAM H(X)=5.25 77 67 57 60 (S-)CM QAM H(X)=6.00 70 12 Optimum Shaped 64-QAM CM CQAM H(X)=6.00 -1.5 Optimum Shaped 64-CQAM 1 11.5 0 5 10 15 20 -1 0 1 102 103 SNR [dB] Experimental ‘true’ 8 × 8 CQAM [32], [42] Distance [km] (a) (b) (c) Fig. 3: (a) & (b) Experimental 23 × 23-CQAM (CM-optimized in the SNR region below 12dB). Information rate as a function of SNR over the linear AWGN model. (c) Obtained SNR as a function of transmission reach for shaped 64-QAM and shaped 64-CQAM.

validation of efficient modulation formats taking first into shell spacing equal to dmin. A Gray mapping is represented account advanced core linear model constraints, in particular using pairs (m = 2) of labels in F23 = {0, 1, ··· , 7}. multi-dimensional non-unitary polarization- and Respective labels of {0, 1, ··· , 7} are represented using the multi-ary DSP implementations. Non-linear interference or binary alphabet {111, 110, 100, 101, 001, 000, 010, 011}. This other mismatches that depend on the transmission distance are translates into a (non-optimized) binary Gray code for the treated in a second phase. Because optimizing the Euclidean CQAM constellation where each point is now labeled by distance is sufficient in the simplified high SNR linear scenar- m = 6 bits. ios and because we target low-complexity practical solutions, we use Maxwell-Boltzmann distributed amplitudes as the Example 2: ‘2-dist’ Construction. Fig. 2c represents a implied distorsion is negligible. Notice that alternative rate- CQAM constellation that has been constructed based on a distortion methods (e.g., Blahut-Arimoto) to optimize the input ‘two-distance’ criteria. The greedy construction is performed with respect to d and the second minimum Euclidean pair (X, pX ) have been recently investigated. In [54] and [55], min the mutual information is optimized within the framework of distance given the first two shells (the second at dmin being the EGN model [47], [56], [57] (the classical one-dimensional a scaled version of the first). Compared with a ‘star’ design, Gaussian model with non-linear noise discussed in the optical this naturally increases the CM capacity while attempting to literature) or the split-step Fourier transform. maintain good properties for Gray labels. Example 3: ‘Hybrid’ Construction. Fig. 2b represents a III.MODULATION TRADEOFFS balance between the previous two example. Angular regions A. Geometric Shaping via Circular QAM Formats have been preserved in addition to the two-distance criteria. Depending upon receiver design and available information, 2 The general construction of q -CQAM constellations con- angular region and mapping can be adapted for optimizing sists in populating q shells with q points that are uniformly bit-based estimation performance, see also [63]. distributed on the q-th circle. A construction criterion permits to control spacing and phase-offset between shells. The selec- From these examples, we see that, if conventional (mis- tion of a particular criterion may be motivated by geometric matched) BICM estimation were to be assumed, then a tradeoff considerations. It may also be combined with the further may have to be made as the respective behaviors of CM and optimization of the transceiver design at a given target SNR S-CM capacities are reversed in the operational SNR region. under non-uniform signaling. Hence, for standard receiver Indeed, while similar shaping and maximal input entropy have architectures, the construction and the choice of Maxwell- been represented, it appears that the performance behavior Boltzmann parameters may aim at maximizing the B-CM is first conditioned by the initial geometric properties of the capacity under suboptimal bit-based estimation. For evolved constellation, then enhanced by a particular set of shaping architectures or advanced fiber channel models, they may be parameters. For the CM capacity, it is known that mini- chosen to maximize the CM capacity. Recall that the latter mum Euclidean distance maximization leads to remarkable case is considered in [32] and experimented with in [42]; it is CM capacity at high SNR. When based on that criterion, indicated as the ‘true’ CQAM reference in Fig. 2 where several CQAM appears to be very efficient. Notice however that, if CQAM-like examples are depicted. Let us describe them. conventional (mismatched) BICM estimation is to be used for Example 1: ‘Star’ Construction. Fig. 2a depicts ‘star-like’ technical reasons, this cannot be fully exploited and a tradeoff CQAM. Such constellations are similar to APSK constellations may have to be made. In the remainder of this paper, we focus used in [39]. Their geometry combined with Gray mapping on CQAM constructions that, when shaped, maximizes the make them perform well in practice. In Fig. 2 we use a CM capacity. SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 6

Fig. 4: Experimental Setup.

B. Optimization for CM Capacity and Linear AWGN Model this case, the nonlinear distortions are effectively considered Let us present in more details the type of CQAM-like as additive Gaussian noise, but with a variance that scales constructions introduced in [32]. It is solely based on the cubically with channel average launched power, and depends minimum Euclidean distance and is referred to as q2-CQAM on constellation moments [50]. We compared the performance in the sequel. The chosen criteria maximizes the figure of of 64-CQAM and shaped 64-QAM formats for a nonlinear 2 P 2 fiber channel using the theory presented in [50] (see Eq. (123) merit or the ratio between |X |dmin(X ) and x∈X |x| , where 2 0 2 therein). We assumed a system with 19 channels, modulated dmin(X ) = min(x,x0)∈X 2,x6=x0 |x − x | denotes the minimum squared Euclidean distance. In practice, the minimum distance at 54.2 GBd, and spaced at 62.5 GHz. The link consisted of of the power-normalized constellation X is first maximized 80 km spans of SMF fiber. We theoretically computed the via a greedy procedure. Then, Maxwell-Boltzmann shaping is SNR of the central (i.e., the 10-th) channel at the receiver performed such that, for a given SNR, the gap between the CM side at the nonlinear threshold (i.e., the optimum launched information rate denoted by I(X; Y ) and the ultimate limit power that maximizes the SNR) as a function of the number denoted by log(1+SNR) is minimized. An optional stretching of spans. This is given for each of the two modulation step may be performed, see [32]. This optimization procedure schemes, and for a given span count. Referring to [12], [29], is sufficient to devise optimized constellations very close to the parameter of the Maxwell-Boltzmann of the PMFs of the Shannon bounds of the core model. More importantly, each scheme varied between 0 and 4, and the probability this simple optimization is guided by operational constraints, distribution that maximized the optimum SNR is found for i.e., the construction of circular constellations. The q2-CQAM each constellation by exhaustive search. Fig. 3c illustrates the format that supports the experimental work conveyed in this optimized SNR vs distance for both formats. We observe that paper is represented in Fig. 3b. In terms of CM capacity, the two modulation schemes have very similar performance the stretched version achieves performance that are less than in the nonlinear regime. This observation permits us to assert 0.1dB away from the Gaussian capacity log(1 + SNR). This that the shaping scheme proposed in this work is (at least is illustrated by Fig. 3a where the optimization has been done equally) as robust as the existing solutions to fiber nonlinear for target SNRs around 10dB as slightly above. It can be impairments. seen that shaped 64-CQAM and shaped 64-QAM have similar performance at the operating point. Notice that these observa- IV. EXPERIMENTAL SETUP tions concern the CM capacity. The receiver architecture may The experimental setup is shown in Fig. 4a. The transmit- require specific demapping nodes and estimation methods to ter is based on two four-channel digital-to-analog converters take full benefit of the circularly-symmetric format. (DACs) running at 88 GS/s generating 54.2 Gbaud polarization multiplexed 64-QAM or 64-CQAM, using raised cosine pulses with a roll-off factor of 0.08. The length of the random trans- C. Optimization for Nonlinear Long-Haul Communications mitted sequences are 184320 symbols. In total, we modulate The values of spectral efficiency of shaped constellations 19 WDM channels with a channel separation of 62.5 GHz (whether square QAM or circular QAM) are very close to using external cavity lasers (ECLs) with linewidths of around the Shannon capacity for the AWGN channel. For long- 100 kHz. One DAC is used to generate the channel under distance transmissions however, their performance are tested test and its two second nearest neighboring channels. The and reevaluated in the presence of the fiber nonlinear impair- second DAC generates the remaining 16 channels. Independent ments. As previously mentioned, this is justified as the 4-th and symbol patterns are used for the two DACs. After the dual- 6-th moments of the constellations appear in the expressions polarization I/Q-modulators, we use erbium doped fiber ampli- of the total effective SNR. By total effective SNR, we mean fiers (EDFAs) to boost the signal. In the loading channel arm, the SNR of the sampled received signal assuming symbol-by- we use a wavelength selective switch to remove the in-band symbol coherent detector, without nonlinear equalization. In amplified spontaneous emission noise for the channel under SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 7 test before combining the signals from the two transmitters. to 64-QAM. This is most likely due a more efficient use of The signals are either noise loaded and detected in a back- the DAC resolution when shaping is applied, see, e.g., [43]. to-back scenario, or transmitted over the recirculating loop Notice that the experimental CM values have been determined depicted in Fig. 4b. The recirculating loop consists of three knowing the transmitted sequence of the channel under test. spans of conventional single mode fiber (SSMF), EDFAs and a This enables to build the signal statistic (estimated input polarization scrambler (PS). A programmable gain equalizer is distribution) and the channel model (estimated conditional used to equalize the power of the WDM channels and to filter distribution) associated with the experimental results up to out the ASE noise that is outside of the total channel count. some negligible (quantization) errors while the nonlinear Kerr- The signals are detected using a conventional polarization effects are treated as white Gaussian noise. diverse coherent receiver, shown in Fig. 5 and digitized using Fig. 8a shows the information rate I(X; Y ) (CM) as a a 33 GHz 80 GS/s real-time sampling oscilloscope. function of the launch power at 1440 km for the two formats. We observe no apparent difference in the optimal launch power for the two formats in neither single channel nor WDM transmission. The optimal launch power per channel was around 2 dBm for single channel and 0 dBm for WDM. The transmission results are depicted in Fig. 8b for the optimal launch power. Assuming CM at 4.5 bits/symb., 54.2 Gbaud 64-CQAM can be transmitted up to 1750 km in single- channel transmission at the optimal launch power, and 1100 km with 19 WDM channels transmission. Considering 19 WDM channels, if the formats are compared at CM = 4 bits/symb., the transmission distance can be increased by 480 km by using 64-CQAM which corresponds to an increase of 28%. In the experiments, 64-CQAM has a slightly lower im- Fig. 5: Receiver. plementation penalty compared to 64QAM. For the shaped format, clipping is performed at the DAC level. Both formats For a fair comparison between 64-CQAM and 64-QAM suffer equally from hardware restrictions due in particular without penalty due to potential suboptimal equalization, we to the non-optimized evaluation board. In order to verify use a genie-aided-based digital signal processing (DSP) so- the gains we see in experiments, without being influenced lution. Notice that, in practice, for future system implemen- by the implementation penalties, we computed the mutual tations, pilot-aided DSP solutions are proved to be efficient information of both formats, using formulas for the total [41]. Phase recovery follows from simple inverse mapping variance of nonlinear distortions, which includes the impact and standard DSP techniques are applied. The DSP starts of modulation format in the nonlinear regime, see Eq. (123) with resampling to 2 samples/symbol followed by electronic in [50]. The transmitter and receiver are assumed ideal without dispersion compensation (EDC). Timing estimation, as well as implementation penalty, and the receiver DSP consists only polarization demultiplexing and adaptive equalization using a of the matched filter. The modeling transmission results are multi-modulus algorithm (MMA) is applied where knowledge depicted Fig. 7. At each distance, first the maximum SNR at of the transmitted data is used to calculate the error function. optimum launch power is computed, then the corresponding The signals are then sent to a frequency offset estimation and optimum mutual information is computed. Fig. 7 illustrates phase estimation stage. Finally, in this experimental demon- the optimum mutual information vs distance for 64-CQAM stration, a symbol-spaced real-valued decision-directed least and 64-QAM. 64-CQAM has a clear advantage over 64-QAM mean square (DD-LMS) equalizer is used independently on beyond 1500 km. At CM = 4 bits/symb., the transmission the signals in the x- and y-polarization to compensate for any reach can be increased by 14% by switching from 64-QAM remaining imperfections such as transmitter side timing skew. to 64-CQAM. The parameters of the genie-aided DSP are adapted such that the performance is close to that of blind DSP for 64-QAM. To assure a fair comparison, the same parameters are then used VI.CONCLUSION for 64-CQAM. Interest in circular QAM emanates from a better matching to the polarization-multiplexed WDM fiber model, from the V. RESULTS adaptation to non-binary processing, or from other evolved The back-to-back results for 54.2 Gbaud are shown in Fig. 6 design constraints such as, potentially, flexible rate adaptation together with theoretical results. At a target mutual information for PAS. (CM) of 4.5 bits/symb., we measure a 1.25 dB gain for 64- Long-haul transmission simulations for shaped CQAM have CQAM over 64-QAM. The target CM has been chosen for indeed been compared to simulations for shaped 64-QAM illustration purpose but still lies in the same region as the target in both the linear and nonlinear regime. Importantly, ad- CM of 4 bits/symb. of the previous sections. We note that 64- vanced simulations show that the new schemes have similar CQAM has a 0.7 dB lower implementation penalty compared performance to the state-of-the-art schemes based on shaped SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 8

6.5 6 64-CQAM Model 6 64-QAM Model 5.5 5.5

5 5

4.5 4.5 4 MI [bit/2D-symbol] 3.5 64-CQAM Theoretical results 64-QAM Theoretical results 4

3 64-CQAM Exp. MI at optimum SNR [bit/2D-symbol] 64-QAM Exp. 2.5 3.5 15 20 25 30 35 0 500 1000 1500 2000 2500 3000 3500 OSNR [dB] 0.1 nm Transmission Distance [km] Fig. 6: Exp. back-to-back results. Fig. 7: Modeling transmission distance results.

4.8

Single Channel 4.6 5.5 64-CQAM Loop Exp. Single Channel 64-QAM Loop Exp. 4.4 19 WDM Channels 5 64-CQAM WDM Exp. 64-QAM WDM Exp. 4.2 4.5

4 MI [bit/2D-symbol] MI [bit/2D-symbol] 4 19 WDM Channels

3.8 64-CQAM Exp. 64-CQAM WDM Exp. 0 500 1000 1500 2000 2500 64-QAM Exp. 64-QAM WDM Exp. Transmission Distance [km] 3.6 -4 -2 0 2 4 6 (b) Exp. transmission distance results. Launch Power [dBm] (a) Exp. launch power results.

Fig. 8: (a) CM information rate as a function the launch power at 1440 km. (b) CM information rate as a function of the transmission distance at the optimal launch power for single-channel and WDM transmissions.

QAM. Transmission experiments and comparisons with stan- to the anonymous reviewers for their insightful and valuable dard (unshaped) 64-QAM have validated this design and the comments. use of CQAM for practical purpose. For example, in WDM transmission of 54.2 Gbaud signals, 64-CQAM achieved 28% APPENDIX gain in transmission reach over conventional 64-QAM. A. Achievable Information Rates This work demonstrates that advanced shaping schemes such as combined geometric-probabilistic CQAM could be For a memoryless channel model with random input letters used and may have very interesting performance in practice. X taking on discrete values x ∈ X with probability pX (x) Assuming that significant performance gains result from ad- at each channel use, the channel capacity is given by the vanced channel modeling and particular constellation geome- information rate I(X; Y ). For the sake of simplicity, the try, and assuming that coding and modulation can be efficiently term of CM (Coded Modulation) capacity is employed in translated in high-speed transceivers, this may turn out to be this paper to refer to I(X; Y ) when the input alphabet X key for the next generation of optical systems. is fixed. For the complex-valued AWGN model, it is as a function of the SNR (ratio between the average constellation power and the additive noise). Modern error-correcting codes ACKNOLEDGMENTS closely approach the achievable bounds in practical setups and The authors would like to thank L. Schmalen, A. Dumenil, for large blocklengths. See [61], [62] for capturing potential and R.J. Essiambre for valuable comments and suggestions additional transceiver mismatches as well as [20], [22], [58]– on an early version of this work. The authors are also grateful [60] for operational characterizations. For the modulation SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 9

Fig. 9: Concatenation chain for PAS encoding. All operations up to final modulation are performed in the original symbol alphabet (thin lines) of size |V| = q. The task of the left-most encoding layer is shaping. To this aim, a standard source code (called Distribution Matcher) with rate RDM ∈ (0, 1) is used after an initial split of the information sequence at rate r ∈ [0, 1) to further let the next layer be constraint-free. A peculiarity of PAS is to perform the shaping task up front conventional error-correction encoding. Error-correction encoding with rate RC ∈ (0, 1) is then performed at the middle layer and generate a redundancy sequence (e.g., from linear parities). The right-most encoding layer is the modulation step: the general principle of PAS is to select a region according to the (typically uniform) distribution of the Ss and combine it with a fundamental point labeled at rate RAM ≥ 1 with the (previously shaped) information sequence. Notice U2 = V2. schemes of our running examples involving square or circular have therefore an operational meaning. For the sake of clarity, QAM constellations, each letter modulates m symbols of the we explicitly characterize the rate as CM or S-CM depending original information alphabet represented by S1,S2, ··· ,Sm. upon the choice of system architecture among those considered In other words, there is a one-to-one mapping µ such that in this paper. X = µ(S1, ··· ,Sm). Notice that m = 3 for 8-PAM using binary labels or m = 2 for 8 × 8-QAM constellations B. Probabilistic Amplitude Shaping using q = 23-ary labels. Using the chain rule and because conditioning reduces entropy, we see that PAS originally stands for Probabilistic Amplitude Shaping [31], a method devised in [6], [29] to implement non-uniform I(X; Y ) = H(X) − H(X|Y ) signaling [12]. Although, in the presented generalized version, m X PAS no longer refers to modulating “amplitudes” as such, the ≥ H(S1, ··· ,Sm) − H(Si|Y ), original name is conserved for simplicity. i=1 Assume that we want to communicate messages through N i.e., the CM capacity is never less than the rate independent uses of a . More precisely, Pm + (H(S1, ··· ,Sm)− i=1 H(Si|Y )) that indicates the system let us denote by (X , pX ) the channel input where a random capacity when a maximum a posteriori (MAP) estimator variable X takes on values x ∈ X according to pX (x). For a operating at the symbol level is implemented (S-CM capacity). source of independent symbols V ∈ V distributed uniformly at Notice that this expression encompasses the general case of def random (|V| = q), the number of messages scales as MM = correlated Sis. By iterating the decomposition with Si = |V|M = qM where M is the information length. PAS [29] is i i Si(B1, ··· ,B`) for example, we also see that the capacity a layered coding scheme that maps the information symbols associated with symbol-MAP decoding (S-CM capacity) is not def log (MM ) into the Xs. The overall coding rate2 is R = q = M less than the capacity associated with bit-MAP decoding (B- T N N where N is the encoder output length. CM capacity) provided that a symbol is labeled by a group of The basic principle (amplitude sign flipping triggered by bits. Let us make a couple of observations. First, in the specific a bit when q = 2) of the PAS method as devised in [29] example m = 2 of two symbols, the difference between the relies on binary channel symmetry. It is then tailored to binary- two is given by I(S ; S |Y ) which, in the CQAM case or 1 2 input real-valued-output symmetric channels such as the 2m- in [63], differentiates between amplitude and phase. Second, PAM AWGN channel (or product of it such as 22m-QAM for independent symbols, the capacity associated with symbol- AWGN). It can be extended to q-ary channel symmetry and MAP decoding (sometimes called bit-metric decoding [59]) the generalized scheme is summarized3 in Fig. 9. can be written as Pm I(S ; Y ). Hence, in this case, we may i=1 i 1) Sufficient Constellation: PAS relies on the subdivi- as well use the framework of ICM (Interleaved Coded Modu- sion of the input alphabet X into J constellation regions lation) to define the notion of achievable rates for conventional {C } such that X = ∪ C . Various constel- processing. The conceptual view of an infinite interleaver j j∈{1,··· ,J} j∈{1,··· ,J} j lations and subdivisions are PAS-compatible. For simplicity, [20], [22] before any alphabet mapping and in conjunction assume that all constellation regions have same cardinality, with uniform signaling indeed permits to characterize different system capacities. More generally, it may be convenient to use 2When not stated otherwise, coding rate, information rate, or entropy are the GMI framework in [61], [62] where the achievable rates defined using logq where q is the original field characteristic. def permit to characterize conventional processing mismatches and 3The notation ρ¯ for ρ ∈ [0, 1] indicates the complement to one, ρ¯ = 1−ρ. SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 10

1 i.e., |Cj| = |C1|, with Pr{Cj} = J for any j. Assume 4) Compatible Rates: A compatibility criteria of the set further that q = |V| divides J and that q = |U1| divides of rates {r, RDM,RC,RAM} is easily obtained from Fig. 9. the region cardinality |S1|. For each region, the points are Consider the layered encoding flow. We see that the system is identically distributed. PAS is eventually performed on a constrained at the selective node when the end (modulation) r¯ reduced fundamental region B chosen for example to be layer is processed. The constraint reads N = R R M = ¯ DM AM B = C1. PAS coding consists in mapping labels obtained from r¯RC+rRDM M. Its satisfaction implies a dependency between RDMRC the sequence of (uniformly distributed) symbols of type S the rates as RC −rRC −RAM +RAMRC +rRAM −rRAMRC − into regions. Independently, PAS coding maps labels obtained rRAMRDM = 0. When solved for RC, it shows that RC = RAM r from the (shaped) information sequence of type U1 into (1+ RDM), i.e., the choice of the core channel code 1+RAM 1−r fundamental points. In other words, there is a m such that may be restricted to particular code rates. A first example is m def m X ≡ {0, 1, ··· , q − 1} × B, with B = C1. the binary case with 2 -PAM for which it is required to have m−1 r m−1 2) State-of-the-Art and Legacy Systems: Let us provide RC = m (1 + r¯RDM) ≥ m (achieved for r = 0). This 1 2 here some background on coding in actual optical applica- translates as RC ≥ 2 for 16-QAM or RC ≥ 3 for 64-QAM tions. In practice, forward error-correction (FEC) is typically (bit-triggered region selection [29]). A second example is the 2 1 r 1 performed via a (systematic) linear code of rate R . PAS is q-ary case with q -CQAM for which RC = 2 (1+ r¯RDM) ≥ 2 C 2 said to be compatible with legacy systems because it can be for any q -CQAM (symbol-triggered region selection [32]). built around a standard (or pre-existing) FEC coding engine 5) PAS Information Rate: The splitting rate r provides (e.g., an LDPC-based system). PAS first focuses on shaping the designer with the degree of freedom that is necessary to the distribution of points inside the fundamental region using satisfy the rate constraint. When solved for r, the compatibility constraint gives r = 1 − RAMRDM . Therefore the distribution matcher (DM). To exemplify this, let us use the RC−RAM+RAMRC+RAMRDM binary case q = 2. The distribution of the 2m-PAM amplitudes the overall PAS coding rate is is shaped to let the distribution of the full constellation behave RT = RC − RAM + RAMRC + RAMRDM. like the capacity-achieving Gaussian [12]. If the standard PAM modulation rate is RM = m, then PAS modulates the signal In our binary and non-binary running examples, this gives amplitudes at the output of the distribution matcher at rate RT = mRC − (m − 1)(1 − RDM) for m-PAM-based schemes 2 RAM = m−1. PAS uses (up to very few operational changes) and RT = 2RC − 1 + RDM for q -CQAM-based schemes, a conventional coding and modulation chain. After the DM, respectively. Expressed in binary units, those rates express the information sequence is parsed to modulate the point the operational spectral efficiency of the respective coding amplitudes while the (uniformly distributed) parity bits (as systems. For example, for the constellations of two real dimen- well as the unshaped information fraction) encode the sign of sions and 2m points of our running examples, the respective the PAM amplitudes. The binary case is used for example in system capacities in bits per channel uses are [37]. b RT = 2mRC − 2(m − 1)(1 − RDM) 3) General Framework: As depicted in Fig. 9, PAS is m m b seen as a layered coding system. The concatenation chain is for 2 × 2 -QAM (bit-triggered) and RT = 2 log2(q)RC − 2 divided into three main layers and encoding operations are log2(q)(1 − RDM) for q -CQAM (phase-triggered), i.e., done in a sequential order. Practical decoding is envisioned b R = 2mRC − m(1 − RDM) to occur in the reversed order. A fraction r¯ def= 1 − r (with T in some cases r = 0) of the information stream is first for (2m)2-CQAM. Notice that the maximal transmitted en- encoded into a sequence (seen as a sequence of symbol tropy is H(X) = H(V1) + H(S) as the region and points packets or labels) with a given (required) distribution (typ- within the fundamental region are independent. For our run- ically Maxwell-Boltzmann as in [12]). Hence, independent ning examples, we see that the binary entropy becomes identically uniformly distributed symbols are encoded into a H(X) ≤ log2(q)RDM + log2(q) ≤ 2 log2(q). This represents symbol sequence which (from parsing) labels the modulated the maximal amount of information that PAS may transmit. regions at rate RAM. The rate RAM is equal to the number of symbols in an alphabet of size q needed to label a region REFERENCES (for example, R = m − 1 in the binary case of [29] AM [1] C.E. Shannon, “A Mathematical Theory of Communications,” The Bell where a region is an amplitude, or RAM = 1 in the non- Technical System Journal, Vol. 27, Issue 3, July 1948. binary case of [32]). Second, a sequence of redundant symbols, [2] R. G. Gallager, Information theory and reliable communication. New generally obtained from linear combinations of information York: Wiley, 1968. [3] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE symbols, is then generated by a linear channel encoder. Dense Trans. Inf. Theory, vol. 28, pp. 55–67, Jan. 1982. linear combinations of symbols make that the distribution of [4] A. R. Calderbank and N. J. A. Sloane, “New trellis codes based on resulting sum symbols tends to get asymptotically uniform. lattices and cosets,” IEEE Trans. Inf. Theory, vol. 33, no. 2, pp. 177– 195, Mar. 1987. Third, the final encoding layer modulates symbols in X by [5] G. D. Forney and L.-F. Wei, “Multidimensional constellations – Part I: selecting a pair composed of a point (for example representing Introduction, figures of merit, and generalized cross constellations,” IEEE an amplitude) in the fundamental region according to the label J. Sel. Areas Commun., vol. 7, no. 6, pp. 877–892, Aug. 1989. [6] A. R. Calderbank and L. H. Ozarow, “Non-equiprobable signaling on the sequence (for example representing a quadrant or an angular Gaussian channel,” IEEE Trans. Inf. Theory, vol. 36, no. 4, pp. 726–740, region). Jul. 1990. SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 11

[7] P. Fortier, A. Ruiz, and J. M. Cioffi, “Multidimensional signal sets through [35] T. Fehenberger, G. Bocherer,¨ A. Alvarado, and N. Hanik, “LDPC coded the shell construction for parallel channels,” IEEE Trans. Commun., modulation with probabilistic shaping for optical fiber systems,” OFC, vol. 40, no. 3, pp. 500–512, Mar. 1992. paper Th2A.23, Mar. 2015. [8] G. D. Forney, “Trellis shaping,” IEEE Trans. Inf. Theory, vol. 38, no. 2, [36] F. Buchali, G. Bocherer,¨ W. Idler, L. Schmalen, P. Schulte, F. Steiner, pp. 281–300, Mar. 1992. “Experimental Demonstration of Capacity Increase and Rate-Adaptation [9] A. K. Khandani and P. Kabal, “Shaping multidimensional signal spaces by Probabilistically Shaped 64-QAM,” ECOC, Sep. 2015. – Part 1. Optimum shaping, shell mapping,” IEEE Trans. Inf. Theory, [37] A. Ghazisaeidi, I. Fernandez de Jauregui, R. Rios-Mueller, L. Schmalen, vol. 39, no. 6, pp. 1799–1808, Nov. 1993. P. Tran, P. Brindel, A. Carbo Meseguer, Q. Hu, F. Buchali, G. Charlet, [10] R. Laroia, N. Farvardin, and S. A. Tretter, “On optimal shaping of and J. Renaudier, “65Tb/s Transoceanic Transmission using Probabilistic multi-dimensional constellations,” IEEE Trans. Inf. Theory, vol. 40, no. 4, Shaping,” ECOC, Sep. 2016. pp. 1044–1056, Jul. 1994. [38] S. Chandrasekhar, B. Li, J. Cho, X. Chen, E. Burrows, G. Raybon, P. [11] W. Betts, A. R. Calderbank, and R. Laroia, “Performance of Nonuniform Winzer, “High-spectral-efficiency transmission of PDM 256-QAM with Constellations on the Gaussian Channel,” IEEE Trans. Inf. Theory, Parallel Probabilistic Shaping at Record Rate-Reach Trade-offs,” ECOC, vol. 40, pp. 1633–1638, Sep. 1994. Sep. 2016. [12] F. R. Kschischang and S. Pasupathy, “Optimal Nonuniform Signaling for [39] S. Zhang, F. Yaman, Y.K. Huang, J.D. Downie, D. Zou, W. A. Wood, A. Gaussian Channels,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 913–929, Zakharian, R. Khrapko, S. Mishra, V. Nazarov, J. Hurley, I.B. Djordjevic, May 1993. E. Mateo, and Y. Inada, “Capacity-Approaching Transmission over 6375 km at Spectral Efficiency of 8.3 bit/s/Hz,” OFC, paper Th5C.2, Mar. [13] G. D. Forney, M. D. Trott, and S.-Y. Chung, “Sphere-bound-achieving coset codes and multilevel coset codes,” IEEE Trans. Inf. Theory, vol. 46, 2016. no. 3, pp. 820–850, May 2000. [40] Q. Hu, F. Buchali, L. Schmalen, and H. Buelow, “Experimental Demon- stration of Probabilistically Shaped QAM,” Advanced Photonics 2017, [14] U. Erez, S. Litsyn, and R. Zamir, “Lattices which are good for (almost) OSA Technical Digest (online), paper SpM2F.6, 2017. everything,” IEEE Trans. Inf. Theory, vol. 51, no. 10, pp. 3401–3416, [41] I. F. de Jauregui Ruiz, A. Ghazisaeidi, R. Rios-Muller, and P. Tran, “Per- Oct. 2005. formance Comparison of Advanced Modulation Formats for Transoceanic [15] R. de Buda. “Some optimal codes have structure.” IEEE J. Sel. Ar- Coherent Systems,” OFC, paper Th4D.6, 2017. eas Commun., vol. 7, no. 6, pp. 893–899, Aug. 1989. [42] F. Jardel, T. Eriksson, F. Buchali, W. Idler, A. Ghazisaeidi, C. Measson,´ [16] J. Boutros, E. Viterbo, C. Rastello, and J.-C. Belfiore, “Good lattice and J. Boutros, “Experimental Comparison of 64-QAM and Combined constellations for both Rayleigh fading and Gaussian channels,” IEEE Geometric-Probabilistic Shaped 64-QAM,” ECOC, Tu.1.D.5, Sep. 2017. Trans. Inf. Theory, vol. 42, no. 2, pp. 502–518, Mar. 1996. [43] F. Buchali, et al., “Flexible Optical Transmission close to the Shannon [17] H.A. Loeliger, “Averaging bounds for lattices and linear codes,” IEEE Limit by Probabilistically Shaped QAM,” in Proc. OFC, paper M3C.3, Trans. Inf. Theory, vol .43, no. 6, pp. 1767–1773, Nov. 1997. Mar. 2017. [18] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error [44] G.P.Agrawal, “Nonlinear Fiber Optics,” 5-th Edition, Academic Press, correcting coding and decoding,’ ICC, Geneve, Switzerland, pp. 1064– Oct. 2012. 1070, May 1993. [45] A. Mecozzi, C.B. Clausen, and M. Shtaif, “System impact of intrachan- [19] R. G. Gallager, Low-Density Parity-Check codes. Cambridge, MA: MIT nel nonlinear effects in highly dispersed optical pulse transmission,” IEEE Press, 1963. Photon. Tech. Lett., vol. 12, no. 12, pp. 1633–1635, Dec. 2000. [20] G. Caire, G. Taricco, and E. Biglieri, “Bit-Interleaved Coded Modula- [46] R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Properties of nonlinear tion,” IEEE Trans. Inf. Theory, vol. 44, no. 23, pp. 927–946, May 1998. noise in long dispersion-uncompensated fiber links,” Optics Express, vol. [21] H. Imai and S.Hirakawa, “A multilevel coding method using error- 21, no. 22, pp. 25685–25699, Oct. 2013. correcting codes,” IEEE Trans. Inf. Theory, vol. 23, pp. 371–377, 1977. [47] A. Carena, G. Bosco, V. Curri, Y. Jiang, P. Poggiolini, and F. Forghieri, [22] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE “EGN model of nonlinear fiber propagation,” Optics. Express, vol. 22, Trans. Commun., vol. 40, no. 5, pp. 873–884, May 1992. no. 13 pp. 16335–16362, May 2014. [23] R. J. McEliece, “Are turbo-like codes effective on nonstandard chan- [48] T. Eriksson, T. Fehenberger, P. Andrekson, M. Karlsson, N. Hanik, and nels?” IEEE Inform. Theory Soc. Newslett., vol. 51, no. 4, pp. 1–8, Dec. E. Agrell, “Impact of 4D Channel Distribution on the Achievable Rates 2001. in Coherent Optical Communication Experiments,” IEEE J. Lightwave [24] J. B. Soriaga and P. H. Siegel, “On distribution shaping codes for Technol., vol. 34, pp. 2256–2266 , May 2016. partial-response channels,” Allerton Conf. on Commun., Control, and [49] R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Pulse Collision Picture Computing, Monticello, USA, Oct. 2003. of Inter-Channel Nonlinear Interference in Fiber-Optic Communications,” [25] R. Gabrys and L. Dolecek, “Coding for the Binary Asymmetric Chan- IEEE J. Lightwave Technol., vol. 34, no. 2, pp. 593-607, Jan. 2016. nel,” Int. Conf. on Computing Networking and Communications, pp. 461– [50] A. Ghazisaeidi, “A Theory of Nonlinear Interactions between Signal and 465, 2012. Amplified Spontaneous Emission Noise in Coherent Wavelength Division [26] C. Ling and J.C. Belfiore, “Achieving AWGN channel capacity with Multiplexed Systems, ” IEEE J. Lightwave Technol., vol. 44, no. 23, pp. lattice Gaussian coding,” IEEE Trans. Inf. Theory, vol. 60, no. 10, 5150–5175, Dec. 2017. pp. 5918–5929, Oct. 2014. [51] E Awwad, Y. Jaouen¨ and G. Rekaya-Ben Othman “Polarization-time coding for PDL mitigation in long-haul PolMux OFDM systems,” Optics [27] M. Mondelli, S.H. Hassani, and R. Urbanke, “How to Achieve the Express, OSA, vol. 21, no. 19, pp. 22773–22790, 2013. Capacity of Asymmetric Channels,” Allerton Conf. on Commun., Control, [52] A. Dumenil, E. Awwad, C. Measson,´ “Polarization Dependent Loss: and Computing, Monticello, pp. 789–796, Oct. 2014. Fundamental Limits and How to Approach Them,” Signal Processing in In IEEE 27th [28] N. Palgy and R. Zamir. “Dithered probabilistic shaping,” Photonic Commun. Conf., New Orleans, Louisiana, USA, Jul. 2017. Convention of Electrical Electronics Engineers in Israel, Nov. 2012. [53] A. Dumenil, E. Awwad, C. Measson,´ “Low-Complexity Polarization [29]G.B ocherer,¨ F. Steiner,and P. Schulte, “Bandwidth Efficient and Coding for PDL-Resilience,” Accepted for publication, Th1F.5., 4071998, Rate-Matched Low-Density Parity-Check Coded Modulation,” IEEE ECOC, Sep. 2018. Trans. Commun., vol. 63, no. 12, pp. 4651–4665, Dec. 2015. [54] M.P. Yankovn, F. Da Ros, E.P. da Silva, S. Forchhammer, K.J. Larsen, [30] P. Schulte and G. Bocherer,¨ “Constant Composition Distribution Match- L.K. Oxenlowe, M. Galili, and D. Zibar, “Constellation Shaping for WDM ing,” IEEE Trans. Inf. Theory, vol. 62, no. 1, pp. 430–434, Jan. 2016. Systems Using 256QAM/1024QAM With Probabilistic Optimization,” [31] G. Kramer, “Probabilistic amplitude shaping applied to fiber-optic com- IEEE J. Lightwave Technol., vol. 34, no. 22, pp. 5146-5156, Nov. 2015. munication systems,” Int. Symp. on Turbo Codes and Iterative Inf. Proc., [55] J. Renner, T. Fehenberger, M.P. Yankov, F. Da Ros, S. Forchhammer, Oct. 2016. G. Bcherer, and N. Hanik, “Experimental Comparison of Probabilistic [32] J.J. Boutros, F. Jardel, and C. Measson, “Probabilistic Shaping and Non- Shaping Methods for Unrepeated Fiber Transmission,” IEEE J. Lightwave Binary Codes,” ISIT, pp. 2308-2312, Jun. 2017. Technol., vol. 35, no. 22, pp. 4871-4879, Nov. 2017. [33] M.P. Yankov, D. Zibar, K.J. Larsen, L.P.B. Christensen, and S. Forch- [56] C. Pan and F. R. Kschischang, “Probabilistic 16-QAM shaping in WDM hammer, “Constellation Shaping for Fiber-Optic Channels with QAM and systems,” IEEE J. Lightwave Technol., vol. 34, no. 18, pp. 42854292, High Spectral Efficiency,” IEEE Photon. Technol. Lett., vol. 26, no. 23, Jul. 2016. pp. 2407–2410, Dec. 2014. [57] T. Fehenberger, A. Alvarado, G. Bocherer, and N. Hanik, “On prob- [34] L. Beygi, E. Agrell, J.M. Kahn, and M. Karlsson, “Rate-Adaptive abilistic shaping of quadrature amplitude modulation for the nonlinear Coded Modulation for Fiber-Optic Communications,” IEEE J. Lightwave fiber channel,” IEEE J. Lightwave Technol., vol. 34, no. 21, pp. 50635073, Technol.,vol. 32, no. 2, pp. 333–343, Jan. 2014. Jul. 2016. SUBMITTED TO JOURNAL OF LIGHTWAVE TECHNOLOGY 12

[58] U. Wachsmann, R. Fischer, and J.B. Huber, “Multilevel codes: Theoreti- information rates for mismatched decoders,” IEEE Trans. Inf. Theory, cal concepts and practical design rules,” IEEE Trans. Inf. Theory, vol. 45, vol. 40, no. 6, pp. 1953–1967, Nov. 1994. no. 5, pp. 1361–1391, Jul. 1999. [62] A. Ganti, A. Lapidoth, and E. Telatar, “Mismatched decoding revisited: [59] A. Guillen´ i Fabregas,` A. Martinez, and G. Caire, “Bit-Interleaved General alphabets, channels with memory, and the wideband limit,” IEEE Coded Modulation,” Foundations and Trends in Communications and Trans. Inf. Theory, vol. 46, no. 7, pp. 2315–2328, Nov. 2000. Information Theory, vol. 5, no. 12, pp. 1–153, 2008. [60] A. Martinez, A. Guillen´ i Fabregas,` G. Caire, and F. Willems, “Bit- [63] R.J. Essiambre, G. Kramer, P.J. Winzer, G.J. Foschini, B. Goebel, Interleaved Coded Modulation Revisited: A Mismatched Decoding Per- “Capacity limits of optical fiber networks,” IEEE J. Lightwave Technol., spective,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2756–2765, vol. 28, no. 4, pp. 662–701, 2010. Jun. 2009. [64] P. Larsson, “Golden ,” submitted for pub. in IEEE [61] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai (Shitz), “On Wireless Comm. Let., Sep. 2017.