arXiv:1901.05719v2 [cs.IT] 30 Oct 2019 ento fcanlcpct n rvdtecanlcoding the channel provided the proved Shannon over theorem, and [1], capacity transmission channel In of data channels. definition the noisy or for unreliable systems communication in nelgne ierbokcds oa codes. Polar codes, block Linear intelligence, htormto a rvd ueirpromne oclassi pol to for decoding notewor performances list (e.g., is superior cases It provide certain in codes. can constructions existing method the our to ca that performance respect code with comparable achieved RL that p be on and show is codes results focus block The we that linear codes. construct examples, performance to As algorithms evaluator. code genetic and code maximize the improving to by keeps AI evaluated construction various constructor performanc code by code code the realized The provides be evaluator measurements. code can metric the constructor and code algorithms the which ce framew constructor-evaluator a in a within employ code We optimal family. an code optimiza of tain on parameters the primarily learn rely to algorithms, methods genetic l reinforcement and as (RL) such ing approaches, AI-driven o contrast, reliability In subchannel or threshold, Hamming optimiz decoding minimum distance, as to such up property strives correcti code based performance-related typically error design principles design code error-correction coding-theoretic to Classic approach (ECC). codes driven (AI) intelligence EETASCIN NCMUIAIN,VL X O ,OCTOBE X, NO. XX, VOL. COMMUNICATIONS, ON TRANSACTIONS IEEE ul eoe odsgigECadterdcdn algo- capacity. decoding channel the their approach and or continu- achieve ECC been to designing rithms has to effort devoted great ously work, Shannon’s Following swt h taaRsac etr uwiTcnlge,Ot Technologies, Y. Huawei Center, China. Research Hangzhou, Ottawa justin.wangjun (Email: Technologies, the Canada. with Huawei is Center, Research to designed are algorithms decoding and construction code ICdn:Lann oCntutErrCorrection Error Construct to Learning Coding: AI .Hag .Zag .L n .Wn r ihteHangzhou the with are Wang J. and Li R. Zhang, H. Huang, L. ro orcincds(C)hv enwdl used widely been have (ECC) codes correction Error ne Terms Index Abstract pcfial,wt oert mle hncanlcapacity channel than smaller rate code a with Specifically, s o rirr small arbitrary for is, xssacdn ytmwt aiu rbblt of probability maximum error with system coding a exists o ufiinl ag oelength λ code large sufficiently for → “ l ae eo capacity below rates All I hsppr eivsiaea artificial- an investigate we paper, this —In 0 then , } ighnHuang, Lingchen @huawei.com). Mcielann,Cd osrcin Artificial construction, Code learning, —Machine { unlnce,zaguz,lrnoel,yiqun.ge, lirongone.li, zhanghuazi, huanglingchen, R ≤ .I I. C NTRODUCTION .” > ǫ λ ≤ 0 iu Ge, Yiqun ǫ ebr IEEE, Member, n rate and C r civbe that achievable, are n ovrey if Conversely, . C < R ebr IEEE, Member, there , uz Zhang, Huazi rcodes). ar (1) rdering. some e Codes earn- tawa, ork, olar tion thy on on Ge 2019 R r- n e c , u Wang, Jun ierbokcd a edfie yagnrtrmatrix generator matrix a check by parity defined corresponding be the can regime or fo code (SNR) important block ratio also linear signal-to-noise A is high domi- it the codes, the in long is performance For it feasible. (ML) codes, is maximum-likelihood short decoding when performance, For in lengths. factor all nant of codes block inpolm r rnltdit oepoet optimizati property code these de into code problems. tune translated that To so are performance properties. problems code code sign the derived control of to analytically types is is various properties of performance terms code in which in theory, theory coding on based design Code A. capacity. channel the approach achievable to the maximize i.e. to rate, design code code optimize we target rate, a ror given Equivalently, performance. code its improve oe,Re-ooo R)cds etc. quadratic codes, (RS) codes, (BCH) Reed-Solomon Bose-Chaudhuri-Hocquenghem (RM) codes, codes, Reed-Muller par- (QR) in codes, residue Hamming and include Golay optimized, Examples be codes, distance. can minimum codes the block ticular, distanc the linear algebra, of field finite profile of knowledge the by Directed a eotie tteepneo noigaddecoding and encoding of expense the distan at free obtained larger be memory polynomials, the can proper increasing selecting and By and encoder. rate the order code of by codes order characterized memory convolutional the are for [2] targeted codes is Convolutional property, code another amn distance Hamming coding upon built is design construction code Classical iia oteHmigdsac profile, distance Hamming the to Similar ebr IEEE, Member, i.1 ro orcincd einlogic design code correction Error 1: Fig. ebr IEEE, Member, Coding ogLi, Rong sa motn oepoet o linear for property code important an is Construction Performance Code Code Code ebr IEEE, Member, Techniques H AI vrfiiefields. finite over redistance free er- on G ce e 1 r - . . , 2 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019 complexity. Turbo codes [3], by concatenating convolutional In communication systems, AI-driven transceivers have codes in parallel, are the first capacity-approaching codes been studied. By treating an end-to-end communication sys- under iterative decoding. tem as an autoencoder, people proposed to optimize an entire In addition to the distance profile, code properties of transceiver jointly given a channel model without any expert decoding threshold and girth are adopted for the design knowledge about channel coding and [20]. In of low density parity check (LDPC) codes. First investi- addition, for AWGN channel with feedback, recurrent neural gated in 1962 [4], they are defined by low density parity network (RNN) was used to jointly optimize the encoding check matrices, or equivalently, Tanner graphs. Three decades and decoding [21]. By regarding a channel decoder as a later, LDPC codes were re-discovered [5] and shown to classifier, it was reported that a one-shot NN-based decoding approach the capacity with belief propagation (BP) decoding could approach maximum a posteriori (MAP) performance on sparse Tanner graphs. Code structures, such as cyclic for short codes [22]. It was also observed that for structured and quasi-cyclic (QC) [6], not only provide minimum dis- codes, the NN-based decoder can be generalized to some tance guarantee but also simplify hardware implementation. untrained codewords, even though no knowledge about the The most relevant code property, however, is the decoding code structure is explicitly taught. However, this NN-based threshold [7]. Assuming BP decoding on a cycle-free Tanner decoder is a classifier in nature, and its complexity is pro- graph, it can accurately predict the asymptotic performance. portional to the number of codewords to be classified. As The decoding threshold can be obtained by the extrinsic the code space expands exponentially with the code length, information transfer (EXIT) chart [8] technique, as well as the NN-based decoder may not satisfy the stringent latency density evolution (DE) [7], [9] and its Gaussian approxima- constraint in . tion (GA) [10], as a function of the check node and variable In contrast, our work focuses on using AI techniques to node degree distributions of the parity check matrix. Thus, help design codes rather than to directly encode and decode the degree distributions of a code ensemble can be optimized. signals. Code design can be done offline where the latency In addition, the girth, defined as the minimum cycle length constraint is significantly relaxed. Moreover, we can continue in the Tanner graph, is maximized to maximally satisfy the to use legacy encoding and decoding algorithms as they admit cycle-free assumption. efficient and flexible hardware or software implementations. The code property of synthesized subchannel reliabilities As shown in Fig. 1 (right branch), we explore the alterna- can be targeted for the design of polar codes [11], the first tive AI techniques for code construction, in addition to the class of capacity-achieving codes with successive cancel- expert knowledge from coding theory. There are numerous lation (SC) decoding. For polar codes, physical channels AI algorithms out there, which could not be all covered in are synthesized to N polarized subchannels, with the K this paper. We will focus on reinforcement learning (RL) and most reliable ones selected to carry information bits. As N genetic algorithm as representatives. Nevertheless, we work increases, subchannels polarize to either purely noiseless or within a general framework that may accommodate various completely noisy, where the fraction of noiseless subchannels AI algorithms. approaches the channel capacity [11]. For binary erasure Specifically, we hope to answer the following two ques- channel (BEC), subchannel reliabilities can be efficiently tions: calculated by Bhattacharyya parameters. For general binary- • Q1: Can AI algorithms independently learn a code input memoryless channels, DE was applied to calculate construction (or part of it) within a given general en- subchannel reliabilities [12], [13], and then improved in [14] coding/decoding framework? and analyzed in [15] in terms of complexity. For AWGN • Q2: Can the learned codes achieve comparable or bet- channels, GA was proposed [16] to further reduce complexity ter error correction performance with respect to those with negligible performance loss. Recently, a polarization derived in classic coding theory? weight (PW) method [17], [18] was proposed to generate For the first question, we try to utilize AI techniques a universal reliability order for all code rates, lengths and to learn constructions. By code construction channel conditions. Such a channel-independent design prin- we mean the degree of freedom we have in determining ciple is adopted by in the form of a length-1024 reliability a set of codes beyond the minimum constraints required sequence [19]. to specify a general scope of codes. The input to AI is As concluded in Fig. 1 (left branch), the classical code restricted to the code performance metric measured by an design philosophy relies on coding theory (e.g., finite field evaluator (viewed as a black box) that implements off-the- theory, ) as a bridge between code perfor- shelf decoders under a specific channel condition. Therefore, mance and code construction. AI knows neither the internal mechanism of the decoders nor the code properties so that code construction beyond the encoding/decoding constraints is learned without the expert B. Code design based on AI knowledge in coding theory. Recently, AI techniques have been widely applied to For the second question, we demonstrate that AI-driven many industry and research domains, thanks to advances techniques provide solutions that perform better when expert in algorithms, an abundance of data, and improvements in knowledge fails to guarantee optimal code performance. computational capabilities. These cases include (i) a target decoder is impractical, or (ii) HUANG et al.: AI CODING: LEARNING TO CONSTRUCT ERROR CORRECTION CODES 3 the relation between code performance and code properties is not theoretically analyzable (incomplete or inaccurate). For example, although the minimum distance of short linear block Constructor codes can be optimized, the resulting optimality of code performance is only guaranteed under maximum likelihood (ML) decoding at high SNRs. In reality, ML decoders may performance code measure construction be impractical to implement, and a high SNR may not always be available. As for polar codes, existing theoretical analysis mainly focuses on SC decoders. To the best of our Evaluator knowledge, the widely deployed successive cancellation list (SCL) decoders still lack a rigorous performance analysis, even though there have been some heuristic constructions Fig. 2: Constructor-evaluator framework optimized for SCL [23], [24], and its variants CRC-aided SCL (CA-SCL) [19] and parity-check SCL [25]–[27]. In the absence of theoretical guarantee, the open question is mechanism nor the channel condition adopted by the code whether the current code constructions are optimal (for a evaluator but requests the code evaluator to feed back an specific decoder)? We will address these cases and show that accurate performance metric of its current code construc- AI techniques can deliver comparable or better performances. tion under the evaluator-defined environment, by which the It is worth noting that a recent approach [28] also focuses exploration of possible code construction opens for a wide on using AI techniques for code design rather than decoding. range of decoding algorithms and channel conditions. Similar In their work, the off-the-shelf polar decoder is embedded ideas have been proposed in existing works. For example, in the code optimization loop. Only the code construction the differential evolution algorithm was used to optimize was optimized while the encoding and the degree distribution of LDPC codes under both erasure remain the same, which allows efficient implementation of channel [29] and AWGN channel [7]. The algorithm also legacy encoding and decoding algorithms. Specifically, the treats the optimization problem as a black box, and merely proposed code optimization method is based on the genetic relies on the cost function (e.g., decoding threshold) as algorithm, where code constructions evolve via evolutionary feedback. transformations based on their error performance. In most cases, the code constructor is trained offline, be- In this paper, we employ a general constructor-evaluator cause both coded bits and decoding results can be efficiently framework to design error correction codes. AI techniques are generated and training computation is not an issue in an of- investigated under this framework. The performance of con- fline simulator. Once constructed e.g., the performance metric structed codes are compared with the state-of-the-art classical converges, the resultant codes can be directly implemented in ones, and the results are discussed. The structure of this paper a low-complexity practical system with legacy encoding and is as follows. Section II introduces the constructor-evaluator decoding algorithms, which is applicable from the industry’s framework. By instance, the constructor is implemented by point of view. RL and genetic algorithms. The evaluator implements the Note that the constructed codes are closely related to the decoder and channel conditions of interest and provides per- code evaluator because of performance metric. Different code formance metric measurements for given code constructions. evaluators, e.g. with different decoding parameters, channel Section III shows examples of designing linear block codes condition assumptions, or decoding algorithms, may result and polar codes under the proposed framework. Section IV in different code constructions in the end. During training concludes the paper and discusses some future works. procedure, we choose some realistic decoding algorithms and channel condition. In theory, an online training is a II. CODECONSTRUCTIONBASEDONLEARNING natural extension by collecting performance metric samples A. The constructor-evaluator framework from a real-world decoder and continuing to improve code We advocate a code design framework, as shown in Fig. 2. construction. The framework consists of two parts, i.e. a code constructor and a code evaluator. The code constructor iteratively learns B. Constructor a series of valid code constructions based on the perfor- The code constructor generates code constructions based mance metric feedback from the code evaluator. The code on performance metrics feedback from the code evaluator. evaluator provides an accurate performance metric calcula- To fit the code design problem into the AI algorithms, a tion/estimation for a code construction under the decoder code construction is defined under the constraints of code and channel condition of interest. The code construction representations. For example, keeps improving through the iterative interactions between • Binary matrix: the generator matrix for any linear block the constructor and evaluator until the performance metric codes, or the parity-check matrix for LDPC codes. This converges. is the most general form of definition. The constructor-evaluator framework in Fig. 2 is quite • Binary vector: a more succinct form of definition for general. The code constructor knows neither the internal some codes, including the generator polynomials for 4 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019

convolutional codes, or whether a synthesized subchan- states and actions, we consider continuous state and nel is frozen for polar codes. action space. PG defines a differentiable policy function • Nested representation: defining a set of codes in a πθP G (s,a), parameterized by θPG, to select the action

set of nested matrices or vectors. This bears practical at each state. The policy function πθP G (s,a) outputs a importance due to low implementation complexity and probability desity/mass function of taking each action a

rate compatibility. Examples include LTE-RM codes and at state s according to the policy πθP G . Then the next 5G Polar codes. state s′ is determined and the reward R is evaluated According to the code representations and how the con- by the code evaluator. When a complete episode struction improvement procedure is modeled, there are sev- (s0,a0, R0,s1,a1, R1, ··· ,st,at, Rt, ··· ,sT −1,aT −1, RT −1) eral approaches to implement the constructor: is explored, where t is the time stamp and T is the 1) Reinforcement learning approach: RL approach can be time horizon length, the policy function is updated: used, because we model construction procedure as a Markov T −1 T −1 ′ decision process (MDP). An MDP is defined by a 4-tuple (S, ∆θPG = αPG · [∇θP G log πθP G (st,at) · Rt ]. ′ A, Pa, R): Xt=0 tX=t (3) • S is the state space, After sufficient exploration, the policy function πθ • A is the action space, P G ′ ′ can be used to lead the MDP to maximize the total • P (s,s ) = P r(s +1 = s |s = s,a = a) is the a t t t reward. probability that action a in state s at time t will lead to • Advantage Actor Critic (A2C) [32]: A2C merges the state s′ at time t +1, idea of state value function into PG to take advantage • R is the immediate reward feedback by code evaluator of stepwise update, and speeds up the convergence. In after transitioning from state s to state s′, triggered by addition to the policy (actor) function π (s,a), A2C action a. θA defines a differentiable value (critic) function VθC (s). Code construction can be viewed as a decision process The interaction between A2C and the code evaluator is in general. For the binary matrix and binary vector code similar to that of PG. For A2C, the policy (actor) update representations, the decisions correspond to which positions can be more frequent, i.e. in stepwise manner, since the are 0 or 1. For the nested code representation, the decisions T −1 cumulative reward from st, t′=t R, is estimated by correspond to how to evolve from a set of subcodes to the critic function. At each stateP translation exploration their supercodes (or vice versa). In our setting, a state s (s,a,s′, R), the advantage value is calculated: corresponds to a (valid) code construction, and an action ′ ′ ′ a that leads to the state transition (s → s ) corresponds Adv(s,s , R)= R + γ · VθC (s ) − VθC (s). (4) to a modification to the previous construction s. The state Then the actor function πθ can be adjusted by: transition (s → s′) herein is deterministic by the action a and A ′ the previous state s. The reward function is the performance ∆θA = αA · Adv(s,s , R) · ∇θA log πθA (s,a). (5) metric measurement with respect to the decoder and channel The critic function V can be updated by: condition of interest. In the end, a desired code construction θC ′ can be obtained from the final state of the MDP. ∆θC = αC · Adv(s,s , R) · ∇θC VθC (s). (6) There are several classical algorithms to solve MDP that By viewing code construction as a decision process, its are model-free, i.e., they do not require the model of the code influence on code performance can be modeled (or approx- evaluator. These include: imated) as differentiable functions that can be realized by • Q-learning [30]: given the architecture of a code con- neural networks. A code construction (to be optimized) is struction, its potential code construction schemes corre- embedded in the coefficients of the neural networks. Due spond to a finite set of discrete states and action spaces. to the excellent function approximation capability of neural In Q-learning, a table Q(s,a) is maintained and updated networks, these coefficients can be learned through optimiza- to record an expected total reward metric to take action tions techniques such as gradient descent. a at state s. At learning stage, an ǫ-greedy approach is 2) Genetic algorithm approach: we observe that a code often used to diverge the exploration in the state and ′ construction can usually be decomposed into many discrete action space. When a state transition (s,a,s , R) has decisions. For the binary matrix and binary vector code rep- been explored, the table Q(s,a) can be updated by: resentations, the decisions on which positions are 0 or 1 can ′ ′ be made individually. These decisions may collaboratively ∆Q(s,a)= αQ ·[R+γ·maxa′ Q(s ,a )−Q(s,a)], (2) contribute to the overall code performance. They resemble where αQ is learning rate, γ is reward discount factor, the “chromosomes” of a code construction, where a set of and R is reward from the evaluator. After sufficient good decisions is likely to produce good code constructions. exploration, the table Q(s,a) is then used to guide the The process of refining these decisions can be defined in MDP to maximize the total reward. iterative steps, where each step produces a better construction • Policy Gradient (PG) [31]: if the architecture of a based on several candidate constructions. Genetic algorithm code construction translates into an immense set of is well-suited for this purpose. HUANG et al.: AI CODING: LEARNING TO CONSTRUCT ERROR CORRECTION CODES 5

Below, we briefly describe how a genetic algorithm can be III. LEARNING CODE REPRESENTATIONS applied to the code design. In this section, we present several code design examples 1) A number of code constructions {C1, C2, ···} are ran- in which the code constructions are automatically gener- domly generated, defined as initial population. ated by the constructor-evaluator framework. Specifically, 2) A subset of (good) code constructions are selected as we propose three types of AI algorithms to learn the code parents, e.g., Cp1, Cp2. constructions under the three definitions mentioned in II-B: 3) New constructions (offspring) are produced from • Binary matrix → policy gradient: we provide an crossover among parents, e.g., {Cp1, Cp2}→Co1. example of short linear block codes. 4) Offsprings go through a random mutation to introduce • Binary vector → genetic algorithm: we focus on polar new features, e.g., Co1 →Cm1. codes with a fixed length and rate. 5) Finally, good offspring replace bad ones in the popu- • Nested representation → A2C algorithm: we design lation, and the process repeats. nested polar codes within a range of code rates with the proposed scheme. The above operations are defined in the context of error correction codes. Regarding code definition C, it may boil Although we provide three specific code examples, the AI down to a set of chromosomes (binary vectors and matrices) algorithms are generic. That means all codes defined by the accordingly. above three representations can be learned by the proposed Crossover is defined as taking part of the chromosomes methods. from each parent, and combining them into an offspring. This We use the following hardware platform and software step resembles “reproduction” in biological terms, in which environment. For the reinforcement learning approach (in- offspring are expected to inherit some good properties from cluding policy gradient and A2C), we use one Telsa V100 their parents. Subsequently, mutation randomly alters some of GPU and one 28-thread Xeon Gold CPU to accomplish the chromosomes to encourage exploration during evolution. learning within the Tensorflow framework. For the genetic algorithm approach, the program is written in Matlab and A fitness function is defined to indicate whether the newly C/C++ language and runs on a server that contains 4 Intel produced offspring are good or bad. In this work, the fitness Xeon(R) E5-4627v2 (16M Cache, 3.30 GHz) CPUs with 12 is defined as the code performance. cores and 256 GB RAM. For all the AI algorithms, we did not pay extra attention to the optimization of hyperparameters, as they seem not to affect the results very much. For a proof of C. Evaluator concept, binary codes and AWGN channel are considered in this work. Codewords are encoded from randomly generated The evaluator provides performance metric measurements information bits. It is shown that these learned codes can for code constructions. If the performance metric of the achieve at least as good error correction performance as the decoder is analyzable, it can be directly calculated. In most state-of-the-art codes. cases, the error correction performance estimation in terms of block error rate (BLER) can be performed based on sampling techniques, such as Monte-Carlo (MC) method. To ensure A. Binary matrix: linear block codes that the estimation is accurate and thereby does not mislead The definition of a linear of dimension K the constructor, sufficient MC sampling (simulation) should and length N is a binary generator matrix G in a standard be performed to control the estimation confidence level. If the form (i.e. G = [I, P] where I is an identity matrix of size designed code is to work within a range of BLER level, the K × K and P is a matrix of size K × (N − K)). For the performance metric measurements can merge error correction decoder, a class of most-reliable-basis (MRB) reprocessing performance at several SNR points. Measuring more than based decoding can achieve near-ML performance. The class one SNR points provides a better control over the slope of decoders uses the reliability measures at the decoder of BLER curve, at the cost of longer evaluation time. In input to determine an information set consisting of the addition, power consumption and implementation complexity K most reliable independent positions. One efficient MRB for the encoding and the corresponding decoding also can be reprocessing based decoder is the ordered statistic decoding factored into the performance metric measurements. (OSD) [33]. By incorporating box and match algorithm Intuitively, the evaluator can be stationary. The decoding (BMA) into OSD [34], the computational complexity can algorithm, including the parameters and the channel statistics be reduced at the cost of additional memory for a realistic can be static. Then the code design is preferred to be realized implementation. Therefore, our code evaluator deploys BMA offline, and is not very sensitive to the code design time decoder in this example. consumption. On the other hand, the evaluator can be non- The linear block code construction is modeled by a single- stationary. For example, the channel statistics can be time- step MDP with a PG algorithm, as shown in Fig. 3. The varying. Then online design may be required. In this case, state, action and reward are introduced in section II-B1, and a feedback link is required for performance measurement of detailed as follows. each code construction. The communication cost and code • The input state s0 is a code construction defined a design time consumption should be considered as well. binary generator matrix of size K × N. To impose a 6 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019

Algorithm 1 Policy gradient based linear block codes design RL Policy function // Initialization:

Randomly initialize the policy function πθP G ; Set initial state s0 = [IK , 0]; // Loop: while 1 do ′ 1 0 Constructor( 0) initial designed (s , πθP G (s ,a)) ← s construction construction R ← Evaluator(s1) ′ ∆θPG ← SGD(πθP G , πθ (s0,a), R) performance code P G θPG ← (θPG + ∆θPG) measure construction end while // Constructor: ′ function 1 0 = Constructor( 0) (s , πθP G (s ,a)) s BMA decoding µ ← NeuralNet(θPG) for i = {1, ··· ,K} do for j = {1, ··· ,N − K} do 2 2 Fig. 3: Framework of learning linear block codes by policy ai,j ∼ N (µi,j , σ ), where σ =0.1; gradient. Pi,j = (ai,j > 0.5) ? 1 : 0; end for end for standard form of generator matrix, the input state is s1 ← [IK , P]; always set as s0 = [IK , 0], where IK is an identity ′ 2 πθ (s0,a) ← fN (a|µ, σ ); matrix of size K × K and the parity part P = 0 is an P G ′ return 1 0 s , πθP G (s ,a) all-zero matrix of size K × (N − K). end function • policy function A Gaussian πθP G is implemented by a // Evaluator: multilayer perceptron (MLP) neural network, defined by function R = Evaluator(s1) NeuralNet(θPG), with two hidden layers and sigmoid Obtain EsN0 such that BLER = 0.01 ← output nonlinearity: BMA(s1,EsN0,BMAo,BMAs); – The hidden layer width is 2K(N − K), a function R = −EsN0; of code dimension and code length. Here, we set it return R as twice the size of the neural network output. end function – The coefficients in the neural network are defined by θPG. TABLE I: Policy Gradient algorithm parameters – The output of the neural network is a real-valued matrix µ of size K × (N − K), which defines Parameters values Policy function hidden layer number 2 the policy function πθP G . It is used to determine Policy function hidden layers width 2K(N − K) the parity part P of the generator matrix, to be Batch size 1024 described shortly. Learning rate 10−5 Reward −EsN0 • An action a is sampled from the Gaussian policy Decoder BMA

function πθP G as follows. Specifically, a is a real-valued matrix of size K × (N − K), where each ai,j is drawn from Gaussian distribution with mean µi,j and variance 2 the neural network are trained by mini-batch based stochastic σ = 0.1. Then, the action a is quantized to a binary- gradient descent (SGD) according to (3). Here, we directly valued matrix P. The probability of taking this action ′ implement a method called “AdamOptimizer”, which is a is recorded as π (s0,a). θP G integrated in Tensorflow, to update θ . For each policy I P PG • The output state s1 is updated by [ K , ], which is the function update step, after exploring a batch of action-reward generator matrix of the constructed codes. pairs, the rewards are first normalized before equation (3) is • The reward R is defined as −EsN0, the required EsN0 applied. Such processing reduces gradient estimation noise −2 to achieve BLER=10 under BMA decoding. It is also and enables a faster convergence speed. Fig. 4 shows the the feedback to the policy function. evolution of average EsN0 w.r.t. the BLER of 10−2 per – The BLER performance of code s1 is defined by iteration during the learning procedure. We observe that the BMA(s1,EsN0,BMAo,BMAs), where EsN0 average EsN0 improves in a stair-like manner. This is because is the SNR point, BMAo is the order, BMAs is the stochastic gradient descent (SGD)-based PG algorithm the control band size. See details in [34]. tends to pull the mean of the normal distribution, from The PG algorithm for linear block code construction is also which the action is sampled, towards a local optimal point. described in Algorithm 1, with parameters listed in Table I. In the vicinity of this local optimal point, explorations are To optimize the policy function, the coefficients θPG in conducted randomly, which accounts for the performance HUANG et al.: AI CODING: LEARNING TO CONSTRUCT ERROR CORRECTION CODES 7

3.32 100 eBCH(32,16,8) BMA(1,5) 3.3 Learned (32,16,7) BMA(1,5) 10-1 eBCH(64,36,12) BMA(1,8) 3.28 Learned (64,36,7) BMA(1,8)

3.26 10-2 ~0.11dB 3.24 stair like 10-3 3.22 BLER average EsN0 at BLER=1E-2 -4 3.2 10

3.18 0 1000 2000 3000 4000 5000 6000 10-5 Iteration

Fig. 4: Evolution of the average reward per batch 10-6 0 1 2 3 4 5 6 Es/N0 (dB)

100 Fig. 6: Performance comparison between learned linear block RM(32,16,8) BMA(1,5) codes and eBCH codes Learned (32,16,7) BMA(1,5) 10-1 RM(64,22,16) BMA(1,10) Learned (64,22,12) BMA(1,10) (MCTS) can be incorporated into reinforcement learning to 10-2 potentially enhance code performance [35].

10-3 BLER B. Binary vector: polar codes with a fixed length and rate 10-4 Polar codes can be defined by c = uG [36]. A code construction is defined by a binary vector s of length N, in 10-5 which 1 denotes an information subchannel and 0 denotes a frozen subchannel. Denote by I, the support of s, the 10-6 -2 -1 0 1 2 3 4 5 6 set of information subchannel indices. The K information Es/N0 (dB) bits are assigned to subchannels with indices in I, i.e., uI. Fig. 5: Performance comparison between learned linear block The remaining N − K subchannels, constituting the frozen codes and RM codes set F, are selected for frozen bits (zero-valued by default). The generator matrix consists of the K rows, indicated by I, of the polar transformation matrix G = F⊗n, where 1 0 fluctuations as if it has converged. Due to the high dimension F = is the kernel and ⊗ denotes Kronecker power, of the action space, a small-probability exploration would be 1 1 c helpful to avoid local optimum. and is the codeword. For the decoders, both SC and SCL type decoders are The error correction performance comparison between the considered. An SC decoder recursively computes the transi- learned code constructions, RM codes and extended BCH tion probabilities of polarized subchannels, and sequentially (eBCH) codes are plotted. In Fig. 5, the learned linear block develops a “path”, i.e., hard decisions up to the current (i-th) codes show similar performance to RM codes for cases of information bits ˆu , uˆ1, uˆ2, ··· , uˆ . At finite code length, (N = 32,K = 16) and (N = 64,K = 22). In Fig. 6, i an SCL decoder brings significant performance gain, which the learned linear block codes show similar performance to is briefly described below. eBCH codes for cases of (N = 32,K = 16) and (N = 64,K = 36). 1) run L instances of SC in parallel, keep L paths; It is interesting that for the case of N = 32,K = 16, 2) extend the L paths (with both 0, 1 decisions) to obtain though the minimum code distance of the learned code 2L paths, and evaluate their path metrics (PMs); (D = 7) is smaller than that of RM code or eBCH code 3) preserve the L most likely paths with smallest PMs. (D = 8), there is no obvious performance difference within Upon reaching the last bit, only one path is selected as the practical SNR range (BLER within 10−4 ∼ 10−1). In decoding output. We consider two types of SCL decoders, this SNR range for the considered case, the error correction characterized as follows. performance of linear block codes is determined by the code • SCL-PM: select the first path, i.e., with smallest PM; distance spectrum, not only the minimum distance. • SCL-Genie: select the correct path, as long as it is Alternatively, the design of linear block codes also can be among the L surviving paths. modeled as a multi-step MDP. For example, from an initial In practice, SCL-Genie can be implemented by CA-SCL [37], state, an action can be defined as determining one column (or [38], which selects the path that passes the CRC check. With row) of matrix P per step, or sequentially flipping each entry a moderate number of CRC bits (e.g., 24), CA-SCL yields of matrix P per step. Furthermore, Monte Carlo tree search almost identical performance to SCL-Genie. 8 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019

Genetic algorithm ing cumulative distribution function cdf is sampled crossover mutation selection uniformly at random to determine the index p to be selected. This is implemented in Matlab with a com- mand: [∼,p]=histc(rand,cdf). In this way, a ...... better code construction will be selected with a higher probability. Two distinct code constructions, denoted by their information subchannel sets Ip1 and Ip2, are initial population offsprings selected as parents. By adjusting the parameter α, performance code measure construction we can tradeoff between exploitation (a larger α) and exploration (a smaller α). 2) Merge subchannels from parents by Imerge = Ip1 ∪ . Note that and their union SC/SCL Ip2 |Ip1| = |Ip2| = K decoding set contains more than K subchannels. This ensures that an offspring code construction contains the sub- channels from both parents. In fact, it implements the Fig. 7: Framework of learning polar codes by genetic algo- “crossover” function described in Section II-B2. rithm. 3) Mutate the code construction by including a few mutated subchannels Imutate from Imerge = {1, ··· ,N}\Imerge, that is, the remaining subchan- Genetic algorithm is applied to construct polar codes for nels. Specifically, ⌊β × |Imerge|⌉ indices are randomly various types of decoders. We observe that the information and uniformly selected from Imerge, where β is called subchannels in a code construction play the same role of the mutation rate. Finally, the K information subchan- chromosomes in genetic algorithm, because they both in- nels of the offspring Io are randomly and uniformly dividually and collaboratively contribute to the fitness of selected from Imerge ∪Imutate. Note that the offspring a candidate solution. The key insight is that good code may or may not contain the mutated subchannels constructions are constituted by good subchannels. Therefore, depending on whether those in Imutate are finally a pair of good parent code constructions is likely to produce selected into Io. The mutation rate β provides a way good offspring code constructions. This suggests that a to control exploration. The larger β is, the more likely genetic algorithm may ultimately converge to a good code a mutated subchannel is included in the offspring. construction. The framework is shown in Fig. 7, and the 4) Insert the offspring Io back to population according algorithm is detailed below. to ascending BLER performance. If its BLER perfor- During the initialization of the genetic algorithm, the mance is worse than all existing ones in the population, population (code constructions) are randomly generated. it is simply discarded. Specifically, the information subchannel set I is randomly and uniformly selected from {1, ··· ,N} for each code con- The while loop can be terminated after a maximum number struction without given prior knowledge about existing polar of iterations Tmax is reached, or a desired performance is code construction techniques such as Bhattacharyya [11] and obtained by the best code construction in the population. The DE/GA [14], [16]. The purpose is to test whether the genetic algorithm is described in Algorithm 2. algorithm can learn a good code construction without this We compare the learned code constructions with the base- expert knowledge. line schemes below:

A population of M code constructions are randomly • IDE/GA, a close-to-optimal construction under SC de- initialized and sorted according to ascending BLER per- coding that is obtained via GA [16]: we search for a , formance BLERIi BLER1 × BLER2, which is de- design SNR (starting from -20dB, increasing with step fined as the product of BLERs at two SNR points. Typi- size 0.25dB) to obtain a construction that requires the cally, M should be sufficiently large to store all the good lowest SNR to achieve BLER=10−2. chromosomes (subchannel indices) to ensure an efficient • IP W , an SNR-independent construction obtained by convergence. A polar decoder, denoted by BLERx ← PW [17], [18]. P olarDecoder(Ii,SNRx), returns the BLER performance • IRM−Polar, a heuristic construction [23] that yields of the constructed codes at a specified SNR point. At least better performance under SCL-PM decoder. The only 1000 block error events are collected to ensure an accurate difference from [23] is that we use PW [17], [18] as the estimate. reliability metric to make the design SNR-independent. After initialization, the algorithm enters a loop consisting We use them as baselines, and observe the learning process of four steps, as shown in Fig. 8. through two metrics: (1) the BLER performance of the 1) Select parents Ip1, Ip2 from population. The i-th code learned codes at two SNR points, which corresponds to the construction is selected according to a probability dis- required SNRs to achieve BLER=10−1 and 10−2 for the −αi M −αj tribution e / j=1 e (normalized), where α is baseline scheme (2) the difference between the learned infor- called the sampleP focus. Specifically, the correspond- mation subchannels Ilearned and those of baseline schemes. HUANG et al.: AI CODING: LEARNING TO CONSTRUCT ERROR CORRECTION CODES 9

I. Select II. Merge

Subchannels of parent p1 Subchannels of parent p2 Exploitation Select the i-th polar α = 0.03 construction with -αi probability ĝ e Ip1 Ip2

Exploration α = 0.01 Merged subchannels

Selectionprobability Parents I I I 1 p1 p2 M merge = p1Ĥ p1 Population (ordered by ascending BLER)

III. Mutate IV. Insert

`Imerge ={1,Օ,N}\ Imerge Evaluate and insert back to population `Imerge Io Select with I probability β o

Imerge Imutate BLERperformance

Randomly select K subchannels 1 j M Io Population (ordered by ascending BLER)

Fig. 8: Genetic algorithm for polar code construction in four steps.

TABLE II: Convergence time for learning process DE/GA is derived assuming an SC decoder. We adopt RM- N K design EsN0 (dB) nchoosek(N,K) # iterations Polar [23] to obtain the baseline performance. In this case, 16 8 4.50 12870 126 the minimum distance is 16 for RM-Polar and 8 for DE/GA 32 16 4.00 6.0 × 109 623 19 and PW. In this case, the PW construction coincides with 64 32 3.75 1.8 × 10 2100 37 128 64 3.50 2.4 × 10 5342 DE/GA, i.e., IP W = IDE/GA. In the upper subfigure, the 256 128 3.25 5.8 × 1075 19760 performance (product of BLERs measured at EsN0=[1.74, 152 512 256 3.00 4.7 × 10 56190 2.76]dB) of the learned codes quickly converges and out- performs that of the RM-polar codes at iteration 3100. In the lower subfigure, unlike the case of the SC decoder, the To demonstrate the effectiveness of genetic algorithm, difference between Ilearned and IDE/GA (design EsN0 at we record the first time (iteration number) a code con- 3.5dB) stops to decrease after reaching 8. The learned codes struction converges to the existing optimal construction, i.e., outperform DE/GA by 0.8 dB and RM-polar by 0.2 dB at Ilearned = IDE/GA under SC decoding. The results for BLER=10−3, as shown in Fig. 10. These results demonstrate different code lengths are shown in Table II. Note that a that, in the current absence of an optimal coding-theoretic brute-force search would check all nchoosek(N,K) possible solution, learning algorithms can potentially play important candidates, which is prohibitively costly as shown in Table II. roles in code construction. By contrast, a reasonable converging time is observed with the genetic algorithm at medium code length. There is no The same observation holds for SCL-Genie, where the big difference in terms of learning efficiency between the optimum construction is also unknown. The BLER curves SC and SCL decoders. For SC decoder, there exists already for N = 256,K = 128 under SCL-Genie with L = 8 an explicit optimum way of constructing polar codes and is shown in Fig. 11. We aim at DE/GA (design EsN0 at the learned construction could not outperform that. However, 3.25dB) and PW constructions as the baseline schemes since the optimal code constructions for both SCL-PM and SCL- they perform better than RM-Polar under SCL-Genie. In Genie are still open problems. The above results imply that this case, the minimum distance is 16 for RM-Polar and 8 we can apply genetic algorithm to SCL decoders to obtain for DE/GA and PW. In the genetic algorithm, the evaluator good performance within a reasonable time. measures performance as product of BLERs at EsN0=[1.17, We first consider SCL-PM with L = 8. The learning 1.88]dB. As seen, the learned codes perform better than those process for N = 128,K = 64 is shown in Fig. 9. Note that generated by DE/GA and PW, with a slightly better slope. 10 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019

Algorithm 2 Genetic algorithm based polar code design N=128, K=64, Evaluator={SCL-PM} 100 // Parameters ) : 2 BLER (learned) population size M = 1000, sample focus α = 0.03, BLER (RM-Polar) ,BLER 1 mutation rate β =0.01, SNR1 and SNR2; 10-2 // Initialization: prod(BLER for i = {1, 2, ··· ,M} do 10-4 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Ii ← randomly and uniformly select K indices from Iteration {1, ··· ,N} 60

BLERIi ← BLER1 × BLER2, where BLERx ← Difference to DE/GA Difference to PW P olarDecoder(Ii,SNRx) 40 end for 20 Sort {I1, ··· , IM } such that BLERIi < Subchannel difference BLERI , ∀ i < j; 0 j 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 t =0 Iteration while t

Ipk ← select Ipk from {I1, ··· , IM } end for N=128, K=64, Evaluator={SCL-PM}, QPSK/AWGN // Merge: 100 PW Imerge = Ip1 ∪ Ip2 DE/GA RM-Polar // Mutate: Learned Imerge = {1, ··· ,N}\Imerge 10-1 Imutate ← randomly and uniformly select ⌊β × |Imerge|⌉ indices from Imerge

Io ← randomly and uniformly select K indices from BLER I ∪ I merge mutate -2 // Insert: 10 {I1, ··· , IM , IM+1} = {I1, ··· , IM } ∪ Io where IM+1 = Io

BLERIo ← BLER1 × BLER2, where BLERx ← 10-3 P olarDecoder(I ,SNR ) 1 1.5 2 2.5 3 3.5 4 o x Es/N0 (dB)

Sort {I1, ··· , IM+1} such that BLERIi <

BLERIj , ∀ 1 ≤ i < j ≤ M +1 Fig. 10: BLER comparison between learned polar code end while constructions (information subchannels) and {DE/GA, PW, // Output: RM-Polar} under SCL-PM (L =8) decoder. The DE/GA is

I1 and BLERI1 constructed with design EsN0 at 3.5dB.

However, the performance gain (0.06 dB between learned (the genetic algorithm evaluates performance as product of codes and DE/GA) is much smaller than the case with SCL- BLERs at EsN0=[0.94, 1.89]dB.), the only difference is that PM. Ilearned preferred the 15-th subchannel over the 43-th. It It is of interest to observe the difference between the is easy to verify that this choice violates the UPO because learned information subchannels and those selected by 15 = [0, 0, 0, 1, 1, 1, 1] and 43 = [0, 1, 0, 1, 0, 1, 1] in binary DE/GA. In Fig. 12, the subchannel differences between the form. Note that UPO applies in theory to the SC decoder, learned construction and DE/GA under various code lengths rather than the SCL decoder. and rates are plotted, where the positive positions are Ilearned In this subsection, we demonstrate that good polar codes and negative ones are IDE/GA (design EsN0 are labeled in can be learned for SC, SCL-PM and SCL-Genie decoders each subfigure). The evaluator is SCL-Genie with L =8. The using genetic algorithm. For SCL decoders, the learned first observation is that all the learned constructions prefer codes may even outperform existing ones. Note that a recent subchannels with smaller indices. A close look would reveal independent work [28] proposed very similar approaches to that the learned construction may violate the universal partial this subsection. The main difference is that prior knowl- order (UPO) [39], [40]. For the case of N = 128,K = 64 edge such as Bhattacharyya construction [11] and RM-Polar HUANG et al.: AI CODING: LEARNING TO CONSTRUCT ERROR CORRECTION CODES 11

N=256, K=128, Evaluator={SCL-Genie}, QPSK/AWGN always secured in the next-iteration population, the next nchoosek(T, 2) members are generated through PW DE/GA 10-1 crossover of the T best members, and the final T RM-Polar Learned members are generated by mutation. In this work, we select a pair of parents according to certain sample probability (a better member is selected with higher

×10-3 probability), which provides a flexible tradeoff between 2 BLER -2 Human 10 1.8 exploration and exploitation. knowledge 1.6 • The crossover operation is not exactly the same. In 1.4 1.2 [28], the offspring takes half of the subchannels from Learned 1 each of the parents. In our work, the subchannels of

2.262.282.3 2.322.342.36 both parents are first merged, followed a mutation step

10-3 that includes a few mutated subchannels. The resulting 1 1.5 2 2.5 Es/N0 (dB) offspring randomly takes K subchannels from among these subchannels. Fig. 11: BLER comparison between learned polar code • The cost function of [28] is the error-rate at a single constructions (information subchannels) and {DE/GA, PW} SNR point, whereas our work allows to choose a set of under SCL-Genie (L = 8) decoder. The DE/GA is con- SNR points for potential benefit of controlling the slope structed with design EsN0 at 3.25dB. of error rate. • The work [28] also tried belief propagation decoder,

Evaluator={SCL-Genie}, L=8 whereas we focus on the SC-based decoders due to their 1 superior performances. Learned 0 DE/GA at 0.50dB K=16 N=64 -1 0 10 20 30 40 50 60 1 Learned C. Nested representation: polar codes within a range of code 0 DE/GA at 3.75dB K=32 N=64 rates -1 0 10 20 30 40 50 60 1 We evaluate another type of polar code constructions with 15 Learned 0 DE/GA at 3.50dB nested property, which bears practical significance due to its K=64 N=128 43 -1 description and implementation simplicity. Polar codes for 0 20 40 60 80 100 120 1 code length N and dimension range [Kl,Kh] are defined as Learned 0 a reliability ordered sequence of length . The corresponding DE/GA at 3.25dB N K=128 N=256 -1 polar code can be determined by reading out the first (or last) 0 50 100 150 200 250 Subchannel Index K entries from the ordered sequence to form the information position set. Fig. 12: Difference of information subchannels between The code design procedure is modeled by a multi-step learned constructions (Ilearned) and DE/GA (IDE/GA, de- MDP. Specifically for each design step, with a given (N,K) sign EsN0 labeled in each subfigure) under SCL-Genie polar code (current state), a new subchannel (action) is (L =8) decoder. selected, to get the (N,K + 1) polar code (an updated state). The reliability ordered sequence is constructed by sequentially appending the actions to the end of initial polar construction [23] is utilized by [28] during the population code construction. initialization to speed up the learning process, while no Note that the optimal constructions for (N,K) polar codes such knowledge was exploited in our work. The detailed and (N,K + 1) polar codes may not be nested, i.e., the differences are summarized below: information set of (N,K) polar codes may not be a subset of • The initialization of learning algorithms is different. The the information set of (N,K + 1) polar codes. As a result, work [28] initializes the (code constructions) popula- the optimal constructions for different (N,K) polar codes tion based on the Bhattacharyya construction obtained with do not necessarily constitute a nested sequence. In other for BECs with various erasure probabilities, and RM- words, the performance of some (N,K) codes needs to be Polar construction. In this way, the convergence time is compromised for the nested property. Therefore, during the significantly reduced. In our experiment, we randomly construction of the reliability ordered sequence, a tradeoff initialize the population to test whether the genetic exists between the short-term reward (from the next state algorithm can learn a good code construction without construction) and long-term reward (from the construction of these prior knowledge. One of our motivation is to a state that is a few steps away). The problem of maximizing answer Q1 in section I-B. the total reward, which consists of both short-term reward • The new population is generated in a different way. and long-term reward, can be naturally solved by the A2C In [28], the T best members in the population is algorithm. 12 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019

TABLE III: A2C algorithm parameters A2C function Parameters Values Actor function Actor function hidden layer number 2 Actor function hidden layer width 4N Critic function hidden layer number 2 Critic function hidden layer width 4N Batch size 32 Advantage Actor learning rate 1.0 × 10−3 Critic learning rate 2.0 × 10−3 Reward discount factor 0.2 Reward log(BLER)

Decoder SCL-PM and SCL-Genie with L = 8 Critic function performance code measure construction The framework is shown in Fig. 13. The state, action and reward are introduced in section II-B1, and detailed as follows. SCL decoding • The input state st is a code construction defined a binary vector of length N, in which 1 denotes an infor- mation subchannel and 0 denotes a frozen subchannel. Fig. 13: Framework of learning polar codes by advantage

Denote by Ist , the support of st, the set of information actor critic algorithm. subchannel indices. • Both the actor function πθA (st) and critic function Algorithm 3 A2C based polar code reliability ordered se-

VθC (st) are implemented by MLP neural networks, quence design defined by NeuralNet(θA) and NeuralNet(θC), with Initialize the coefficients θA and θC randomly; two hidden layers and sigmoid output nonlinearity: while 1 do – The input to both of the actor and critic functions s0 = zeros(1,N); is the state st (code construction). s0 ← s0[N − 1,N − 2, ··· ,N − Kl +1]=1; – The hidden layer width is 4N. for t =0 to (Kh − Kl) do

– The coefficients in the neural network are defined VθC ← NeuralNet(θC)

by θA and θC, respectively. πθA ← NeuralNet(θA) – The output of NeuralNet(θA) is a probability at ∼ πθA (st); s +1 ← s [a ]=1; mass function πθA (st) of all possible action at ∈ t t t {1, ··· ,N} taken at state st. Ist+1 ← support(st+1); – The output of NeuralNet(θC) is a state value BLER ← P olarDecoder(Ist+1 ,EsN0); R = log(BLER); estimation VθC (st) for the state st. t ∆θC ← SGD(Vθ ,st,st+1, Rt); • An action at denotes the next information subchannel C position to be selected, which is sampled according to θC ← (θC + ∆θC ) ∆θA ← SGD(πθA ,st,at,st+1, Rt); the probability mass function πθA (st). θA ← (θA + ∆θA) • The output state s +1 is updated by setting the a -th t t end for position in st to 1, i.e., st+1 ← st[at]=1. Sequence is {N − 1,N − 2, ··· ,N − Kl + • The reward Rt is defined as log(BLER) at a given 1,a0,a1, ··· ,a − }. SNR point. It is also the feedback to the A2C function Kh Kl block. end while

– The BLER performance of code construction st+1 is evaluated by the SCL decoding (either SCL-PM As an example, a reliability ordered sequence of length or SCL-Genie decoding). N = 64 and dimension range K ∈ [4, 63] is constructed The A2C based polar code reliability ordered sequence by A2C. To ensure a fair comparison, the design EsN0 design is also described in Algorithm 3, with parameters for the DE/GA polar codes are always optimized to obtain listed in Table III. a construction that requires the lowest SNR to achieve −2 To optimize the A2C function block, the coefficients θC BLER=10 . The constructed codes are tested under the −2 and θA in the neural networks are trained by mini-batch based required SNRs to achieve BLER=10 for the optimized SGD according to (6) and (5), respectively. Specifically, the DE/GA construction with the same K. next state st+1 and the reward Rt are feed back to A2C func- In the first experiment, the code performance evaluator tion block. Note that the advantage value Adv(st,st+1, R) deploys SCL-PM with L =8. Fig. 14 compares the relative in (6) and (5) is calculated according to (4), which requires SNR to achieve a target BLER level of 10−2. A better overall critic function and the current reward. The “AdamOptimizer” performance can be observed for the learned polar codes, in Tensorflow is applied to update θC and θA. with the largest gain over 0.5 dB over DE/GA. There is HUANG et al.: AI CODING: LEARNING TO CONSTRUCT ERROR CORRECTION CODES 13

N=64, Evaluator={SCL-PM}, L=8 learning and genetic evolution) and the flexibility of the code evaluator is analyzed. 1 EsN0 (Learned - DE/GA) We have provided three detailed AI-driven code construc- 0.5 Loss to DE/GA tion algorithms for three types of code representations. In essence, the framework is able to iteratively refine code 0 construction without being taught explicit knowledge in coding theory. For proof of concept, we show that, for linear -0.5 block codes and polar codes in our examples, the learned Gain over DE/GA codes can achieve a comparable performance to the state-of- relative EsN0 at BLER=1E-2 -1 the-art ones. For certain cases, the learned codes may even

0 10 20 30 40 50 60 outperform existing ones. Code Dimension (information length K) For future works, both the constructor and evaluator design Fig. 14: Relative performance between polar codes con- need to be explored to either solve more general problems or structed by reinforcement learning and DE/GA under SCL- further improve the efficiency. Some code construction prob- PM lems that were intractable under classical coding theoretic approaches may be revisited using AI approaches. Moreover, N=64, Evaluator={SCL-Genie}, L=8 more realistic settings such as online code construction 0.2 should be studied. 0.15 EsN0 (Learned - DE/GA)

0.1 Loss to DE/GA REFERENCES 0.05

0 [1] C. E. Shannon, “A mathematical theory of communication”, Bell System Technical Journal, vol. 27, no. 34, pp. 379–423, Jul. 1948. -0.05 [2] P. Elias, “Error-free coding”, Transactions of the IRE, vol. 4, no. 4, pp. -0.1 29–37, Sep. 1954. [3] C. Berrou and A. Glavieux, “Near optimum error correcting coding and Relative EsN0 at BLER=1E-2 -0.15 Gain over DE/GA decoding: turbo-codes”, IEEE Transactions on Communications, vol. -0.2 44, no. 10, pp. 1261–1271, Oct. 1996. 0 10 20 30 40 50 60 70 Code Dimension (information length K) [4] R. Gallager, “Low-density parity-check codes”, IRE Transactions on Information Theory, vol. 8, no. 1, pp. 21–28, Jan. 1962. Fig. 15: Relative performance between polar codes con- [5] D. MacKay, “Good error-correcting codes based on very sparse ma- structed by reinforcement learning and DE/GA under SCL- trices”, IEEE Transactions on Information Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999. Genie [6] M. Fossorier, “Quasicyclic low-density parity-check codes from circu- lant matrices”, IEEE Transactions on Information Theory, vol. 50, no. 8, pp. 1788–1793, Aug. 2004. almost no loss cases. [7] T. Richardson, M. Shokrollahi and R. Urbanke. “Design of capacity- approaching irregular low-density parity-check codes”, IEEE Transac- In the second experiment, the code performance evaluator tions on Information Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001. deploys SCL-Genie with L =8. Fig. 15 compares the SNR [8] S. ten Brink, G. Kramer and A. Ashikhmin, “Design of low-density required to achieve a target BLER level of 10−2. A slightly parity-check codes for modulation and detection”, IEEE Transactions on Communications, vol. 52, no. 4, pp. 670–678, Apr. 2004. better overall performance can be observed for the learned [9] T. Richardson and R. Urbanke, “Modern coding theory”, Cambridge polar codes, with the largest gain of 0.16 dB at K = 15 over University Press, Oct. 2007. DE/GA, and the largest loss of 0.09 dB at K = 58. [10] S. Chung, T. Richardson and R. Urbanke, “Analysis of sum-product decoding of low-density parity-check codes using a Gaussian approxi- Some discussions on why good nested polar codes can be mation”, IEEE Transactions on Information Theory, vol. 47, no. 2, pp. learned using A2C algorithm are as follows. Compared with 657–670, Feb. 2001. the fixed (N,K) case, the nested code constraint makes the [11] E. Arikan, “Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels”, optimization complicated for classical coding theory. Because IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3051– there is no theory to optimize the overall performance within 3073, Jul. 2009. a range of code rates. The problem of selecting subchannels [12] R. Mori and T. Tanaka, “Performance of polar codes with the con- in a nested manner is essentially a multi-step decision prob- struction using density evolution”, IEEE Communications Letters, vol. 13, no. 7, pp. 519–521, July 2009. lem. It naturally fits into the multi-step MDP model and the [13] R. Mori and T. Tanaka, “Performance and construction of polar codes problem can be solved by existing RL algorithms. on symmetric binary-input memoryless channels”, in IEEE International Symposium on Information Theory, pp. 1–5, June 2009. [14] I. Tal and A. Vardy, “How to construct polar codes”, IEEE Transactions IV. CONCLUSION AND FUTURE WORKS on Information Theory, vol. 59, no. 10, pp. 6562–6582, July 2013. In this paper, we tried to design error correction codes [15] R. Pedarsani, S. Hassani, I. Tal and E. Telatar, “On the construction of polar codes”, in IEEE International Symposium on Information Theory, with AI techniques. We employed a constructor-evaluator pp. 1–5, July 2011. framework, in which the code constructor is realized by AI al- [16] P. Trifonov, “Efficient design and decoding of polar codes”, IEEE gorithms whose target function depends on only performance Transactions on Communications vol. 60, no. 11, pp. 3221–3227, Nov. 2012. metric feedback from the code evaluator. The implementation [17] R1-167209, “Polar code design and rate matching”, 3GPP TSG RAN of the code constructor is illustrated (e.g., by reinforcement WG1 Meeting #86, Gothenburg, Sweden, Aug. 2016. 14 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. XX, NO. X, OCTOBER 2019

[18] G. He, J. Belfiore, I. Land, G. Yang, X. Liu, Y. Chen, R. Li, J. Wang, Lingchen Huang is a research engineer at Huawei Y. Ge, R. Zhang and W. Tong, “β-expansion: A theoretical framework Technologies Co., Ltd. His current research inter- for fast and recursive construction of polar codes”, in IEEE Global ests are channel coding schemes with focus on Communications Conference, pp. 1–6, Dec. 2017. algorithm design and hardware implementations. [19] R1-1712174, “Summary of email discussion [NRAH2-11] polar code sequence”, 3GPP TSG RAN WG1 Meeting #90, Prague, Czech Repub- lic, Aug. 2017. [20] T. O’Shea, J. Hoydis, “An introduction to deep learning for the physical layer”, arXiv:1702.00832, Feb. 2017. [21] H. Kim, Y. Jiang, S. Kannan, S. Oh and P. Viswanath, “Deepcode: feedback codes via deep learning”, arXiv:1807.00801, Jul. 2018. [22] T. Gruber, S. Cammerer, J. Hoydis and S. Brink , “On deep learning- based channel decoding”, in 51st Annual Conference on Information Sciences and Systems (CISS), pp. 1–6, Mar. 2017. [23] B. Li, H. Shen and D. Tse, “A RM-polar codes”, arXiv:1407.5483, July 2014. [24] M. Qin, J. Guo, A. Bhatia, A. Fabregas and P. Siegel, “Polar code Huazi Zhang is a research engineer at Huawei constructions based on LLR evolution”, IEEE Communication Letters, Technologies Co., Ltd. His current research inter- vol. 21, no. 6, pp. 1221–1224, Jan. 2017. ests are channel coding schemes with focus on [25] P. Trifonov and V. Miloslavskaya, “Polar subcodes”, IEEE Journal on algorithm design and hardware implementations. Selected Areas in Communications, vol. 34, no. 2, pp. 254–266, Feb. 2016. [26] T. Wang, D. Qu and T. Jiang, “Parity-check-concatenated polar codes”, IEEE Communication Letters, vol. 20, no. 12, pp. 2342–2345, Dec. 2016. [27] H. Zhang, R. Li, J. Wang, S. Dai, G. Zhang, Y. Chen, H. Luo and J. Wang, “Parity-check polar coding for 5G and beyond”, in IEEE International Conference on Communications, pp. 1–7, May 2018. [28] A. Elkelesh, M. Ebada, S. Cammerer and S. ten Brink, “Decoder- tailored polar code design using the genetic algorithm,” IEEE Transac- tions on Communications, (Early Access), Apr. 2019. [29] M. Shokrollahi and R. Storn. “Design of efficient erasure codes with differential evolution”, in IEEE International Symposium on Information Rong Li is a research expert at Huawei Technolo- Theory, pp. 1–5, June 2000. gies Co., Ltd. His current research interests are [30] H. Watkins and P. Dayan, “Q-learning”, Machine Learning, vol. 8, pp. channel coding schemes with focus on algorithm 279–292, May 1992. design and hardware implementations. He has been [31] R. Williams, “Simple statistical gradient-following algorithms for the Technical Leader on Huawei 5G air interface connectionist reinforcement learning”, Machine Learning, vol. 8, pp. design focusing mainly on channel coding. 229–256, May 1992. [32] V. Konda and J. Tsitsiklis, “Actor-critic algorithms”, Advances in neural information processing, pp. 1008–1014, 2000. [33] M. Fossorier and S. Lin, “Soft-decision decoding of linear block codes based on ordered statistics”, IEEE Transactions on Information Theroy, vol. 41, no. 5, pp. 1379–1396, Sep. 1995. [34] A. Valembois and M. Fossorier, “Box and match techniques applied to soft-decision decoding”, IEEE Transactions on Information Theroy, vol. 50, no. 5, pp. 796–810, May 2004. [35] M. Zhang, Q. Huang, S. Wang and Z. Wang, “Construction of LDPC codes based on deep reinforcement learning”, in IEEE International Yiqun Ge is a distinguished research engineer at Conference on Wireless Communications and , pp. Huawei Technologies Co., Ltd. His research areas 1–6, Oct 2018. include designing low-power chip for wireless ap- [36] E. Arikan, “Systematic polar coding,” IEEE Communications Letters, plications, research on polar codes and related 5G vol. 15, no. 8, pp. 860–862, Aug. 2011. standardization. [37] I. Tal and A. Vardy, “List decoding of polar codes”, IEEE Transactions on Information Theroy, vol. 61, no. 5, pp. 2213–2226, May 2015. [38] K. Niu and K. Chen, “CRC-aided decoding of polar codes”, IEEE Communications Letters, vol. 16, no. 10, pp. 1668–1671, Oct. 2012. [39] C. Schrch, “A partial order for the synthesized channels of a polar code,” in IEEE International Symposium on Information Theory, pp. 1C-5, July 2016. [40] M. Bardet, V. Dragoi, A. Otmani and J. Tillich, “Algebraic properties of polar codes from a new polynomial formalism,” in IEEE International Symposium on Information Theory, pp. 1C-6, July 2016, pp. 230C234.

Jun Wang is a senior research expert at Huawei Technologies Co., Ltd. His research areas include wireless communications, systems design and im- plementations.