AI Coding: Learning to Construct Error Correction Codes
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019 1 AI Coding: Learning to Construct Error Correction Codes Lingchen Huang, Member, IEEE, Huazi Zhang, Member, IEEE, Rong Li, Member, IEEE, Yiqun Ge, Member, IEEE, Jun Wang, Member, IEEE, Abstract—In this paper, we investigate an artificial- intelligence (AI) driven approach to design error correction Code codes (ECC). Classic error-correction code design based upon Performance coding-theoretic principles typically strives to optimize some performance-related code property such as minimum Hamming distance, decoding threshold, or subchannel reliability ordering. In contrast, AI-driven approaches, such as reinforcement learn- Coding AI ing (RL) and genetic algorithms, rely primarily on optimization methods to learn the parameters of an optimal code within a cer- Theory Techniques tain code family. We employ a constructor-evaluator framework, in which the code constructor can be realized by various AI algorithms and the code evaluator provides code performance Code metric measurements. The code constructor keeps improving the code construction to maximize code performance that is Construction evaluated by the code evaluator. As examples, we focus on RL and genetic algorithms to construct linear block codes and polar codes. The results show that comparable code performance can Fig. 1: Error correction code design logic be achieved with respect to the existing codes. It is noteworthy that our method can provide superior performances to classic constructions in certain cases (e.g., list decoding for polar codes). improve its code performance. Equivalently, given a target er- ror rate, we optimize code design to maximize the achievable Index Terms—Machine learning, Code construction, Artificial code rate, i.e. to approach the channel capacity. intelligence, Linear block codes, Polar codes. A. Code design based on coding theory I. INTRODUCTION Classical code construction design is built upon coding theory, in which code performance is analytically derived Error correction codes (ECC) have been widely used in terms of various types of code properties. To tune these in communication systems for the data transmission over properties is to control the code performance so that code de- unreliable or noisy channels. In [1], Shannon provided the sign problems are translated into code property optimization definition of channel capacity and proved the channel coding problems. theorem, Hamming distance is an important code property for linear “All rates below capacity C are achievable, that block codes of all lengths. For short codes, it is the domi- arXiv:1901.05719v2 [cs.IT] 30 Oct 2019 is, for arbitrary small ǫ> 0 and rate R < C, there nant factor in performance, when maximum-likelihood (ML) exists a coding system with maximum probability of decoding is feasible. For long codes, it is also important for error performance in the high signal-to-noise ratio (SNR) regime. λ ≤ ǫ (1) A linear block code can be defined by a generator matrix G or the corresponding parity check matrix H over finite fields. for sufficiently large code length n. Conversely, if Directed by the knowledge of finite field algebra, the distance λ → 0, then R ≤ C.” profile of linear block codes can be optimized, and in par- Following Shannon’s work, great effort has been continu- ticular, the minimum distance. Examples include Hamming ously devoted to designing ECC and their decoding algo- codes, Golay codes, Reed-Muller (RM) codes, quadratic rithms to achieve or approach the channel capacity. residue (QR) codes, Bose-Chaudhuri-Hocquenghem (BCH) Specifically, with a code rate smaller than channel capacity, codes, Reed-Solomon (RS) codes, etc. code construction and decoding algorithms are designed to Similar to the Hamming distance profile, free distance, another code property, is targeted for convolutional codes. L. Huang, H. Zhang, R. Li and J. Wang are with the Hangzhou Convolutional codes [2] are characterized by code rate and Research Center, Huawei Technologies, Hangzhou, China. Y. Ge the memory order of the encoder. By increasing the memory is with the Ottawa Research Center, Huawei Technologies, Ottawa, Canada. (Email: {huanglingchen, zhanghuazi, lirongone.li, yiqun.ge, order and selecting proper polynomials, larger free distance justin.wangjun}@huawei.com). can be obtained at the expense of encoding and decoding 2 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019 complexity. Turbo codes [3], by concatenating convolutional In communication systems, AI-driven transceivers have codes in parallel, are the first capacity-approaching codes been studied. By treating an end-to-end communication sys- under iterative decoding. tem as an autoencoder, people proposed to optimize an entire In addition to the distance profile, code properties of transceiver jointly given a channel model without any expert decoding threshold and girth are adopted for the design knowledge about channel coding and modulation [20]. In of low density parity check (LDPC) codes. First investi- addition, for AWGN channel with feedback, recurrent neural gated in 1962 [4], they are defined by low density parity network (RNN) was used to jointly optimize the encoding check matrices, or equivalently, Tanner graphs. Three decades and decoding [21]. By regarding a channel decoder as a later, LDPC codes were re-discovered [5] and shown to classifier, it was reported that a one-shot NN-based decoding approach the capacity with belief propagation (BP) decoding could approach maximum a posteriori (MAP) performance on sparse Tanner graphs. Code structures, such as cyclic for short codes [22]. It was also observed that for structured and quasi-cyclic (QC) [6], not only provide minimum dis- codes, the NN-based decoder can be generalized to some tance guarantee but also simplify hardware implementation. untrained codewords, even though no knowledge about the The most relevant code property, however, is the decoding code structure is explicitly taught. However, this NN-based threshold [7]. Assuming BP decoding on a cycle-free Tanner decoder is a classifier in nature, and its complexity is pro- graph, it can accurately predict the asymptotic performance. portional to the number of codewords to be classified. As The decoding threshold can be obtained by the extrinsic the code space expands exponentially with the code length, information transfer (EXIT) chart [8] technique, as well as the NN-based decoder may not satisfy the stringent latency density evolution (DE) [7], [9] and its Gaussian approxima- constraint in physical layer. tion (GA) [10], as a function of the check node and variable In contrast, our work focuses on using AI techniques to node degree distributions of the parity check matrix. Thus, help design codes rather than to directly encode and decode the degree distributions of a code ensemble can be optimized. signals. Code design can be done offline where the latency In addition, the girth, defined as the minimum cycle length constraint is significantly relaxed. Moreover, we can continue in the Tanner graph, is maximized to maximally satisfy the to use legacy encoding and decoding algorithms as they admit cycle-free assumption. efficient and flexible hardware or software implementations. The code property of synthesized subchannel reliabilities As shown in Fig. 1 (right branch), we explore the alterna- can be targeted for the design of polar codes [11], the first tive AI techniques for code construction, in addition to the class of capacity-achieving codes with successive cancel- expert knowledge from coding theory. There are numerous lation (SC) decoding. For polar codes, physical channels AI algorithms out there, which could not be all covered in are synthesized to N polarized subchannels, with the K this paper. We will focus on reinforcement learning (RL) and most reliable ones selected to carry information bits. As N genetic algorithm as representatives. Nevertheless, we work increases, subchannels polarize to either purely noiseless or within a general framework that may accommodate various completely noisy, where the fraction of noiseless subchannels AI algorithms. approaches the channel capacity [11]. For binary erasure Specifically, we hope to answer the following two ques- channel (BEC), subchannel reliabilities can be efficiently tions: calculated by Bhattacharyya parameters. For general binary- • Q1: Can AI algorithms independently learn a code input memoryless channels, DE was applied to calculate construction (or part of it) within a given general en- subchannel reliabilities [12], [13], and then improved in [14] coding/decoding framework? and analyzed in [15] in terms of complexity. For AWGN • Q2: Can the learned codes achieve comparable or bet- channels, GA was proposed [16] to further reduce complexity ter error correction performance with respect to those with negligible performance loss. Recently, a polarization derived in classic coding theory? weight (PW) method [17], [18] was proposed to generate For the first question, we try to utilize AI techniques a universal reliability order for all code rates, lengths and to learn linear code constructions. By code construction channel conditions. Such a channel-independent design prin- we mean the degree of freedom we have in determining ciple is adopted by 5G in the form of a length-1024 reliability a set of codes beyond the minimum constraints required sequence [19]. to specify a general scope of codes. The input to AI is As concluded in Fig. 1 (left branch), the classical code restricted