Deep Learning for Decoding of Linear Codes - a Syndrome-Based Approach
Total Page:16
File Type:pdf, Size:1020Kb
Deep Learning for Decoding of Linear Codes - A Syndrome-Based Approach Amir Bennatan*, Yoni Choukroun* and Pavel Kisilev Huawei Technologies Co., Email: famir.bennatan, yoni.choukroun, [email protected] Abstract— We present a novel framework for applying deep In this paper, we present a method which overcomes this neural networks (DNN) to soft decoding of linear codes at drawback while maintaining the resilience to overfitting of [2]. arbitrary block lengths. Unlike other approaches, our framework Our framework allows unconstrained neural network design, allows unconstrained DNN design, enabling the free application of powerful designs that were developed in other contexts. Our paving the way to the application of powerful neural network method is robust to overfitting that inhibits many competing designs that have emerged in recent years. Central to our methods, which follows from the exponentially large number approach is a preprocessing step, which extracts from the of codewords required for their training. We achieve this by channel output only the reliabilities (absolute values), and transforming the channel output before feeding it to the network, the syndrome of its hard decisions, and feeds them into the extracting only the syndrome of the hard decisions and the channel output reliabilities. We prove analytically that this neural network. The network’s output is later combined with approach does not involve any intrinsic performance penalty, and the channel output to produce an estimate of the transmitted guarantees the generalization of performance obtained during codeword. training. Our best results are obtained using a recurrent neural Decoding methods that focus on the syndrome are well network (RNN) architecture combined with simple preprocessing known in literature on algebraic decoding (see e.g. [12][Sec. by permutation. We provide simulation results that demonstrate performance that sometimes approaches that of the ordered 3.2]). The approach decouples the estimation of the channel statistics decoding (OSD) algorithm. noise from that of the transmitted codeword. In our context of deep learning, its potential lies in the elimination of the need to simulate codewords during training, thus overcoming I. INTRODUCTION the overfitting problem. A few early works that used shallow Interest in applying neural networks to decoding has existed neural networks have employed syndromes (e.g. [6]). How- since the 1980’s [6], [7], [8]. These early works, however, did ever, these works did not discuss its potential in terms of not have a substantial impact on the field due to the limitations overfitting, presumably because the problem was not as acute of the networks that were available at the time. More recently, in their relatively simple networks. Importantly, their approach deep neural networks were studied in [4], [5], [2]. does not apply in cases where the channel includes reliabilities A major challenge facing applications of deep networks to (mentioned above). decoding is the avoidance of overfitting the codewords encoun- Our approach in this paper extends syndrome decoding tered during training. Specifically, training data is typically to include channel reliabilities, and applies it to overcome produced by randomly selecting codewords and simulating the overfitting issue mentioned above. We provide a rigorous the channel transitions. Due to the large number of codewords analysis which proves that our framework incurs no loss (exponential in the block length), it is impossible to account for in optimality, in terms of bit error rate (BER) and mean square error (MSE). Our analysis utilizes some techniques arXiv:1802.04741v1 [cs.IT] 13 Feb 2018 even a small fraction of them during training, leading to poor generalization of the network to new codewords. This issue developed by Burshtein et al: [14], Wiechman and Sason [13] was a major obstacle in [4], [5], constraining their networks and Richardson and Urbanke [11][Sec. 4.11]. to very short block lengths. Building on our analysis, we propose two deep neural Nachmani et al: [2], [3] proposed a deep learning frame- network architectures for decoding of linear codes. The first work which is modeled on the LDPC belief propagation (BP) is a vanilla multilayer network, and the second a more- decoder, and is robust to overfitting. A drawback of their elaborate recurrent neural network (RNN) based architecture. design, however, is that to preserve symmetry, the design is We also develop a preprocessing technique (beyond the above- constrained to closely mimic the message-passing structure mentioned computation of the syndrome and reliabilities), of BP. Specifically, the connections between neurons are which applies a permutation (an automorphism) to the decoder controlled to resemble BP’s underlying Tanner graph, as are input to facilitate the operation of the neural network. This the activations at neurons. This severely limits the freedom technique builds on ideas by Fossorier et al: [9], [10] and available to the neural networks design, and precludes the Dimnik and Be’ery [17] but does not involve list decoding. application of powerful architectures that have emerged in Finally, we provide simulation results for decoding of BCH recent years [15]. codes which for the case of BCH(63,45), demonstrate per- formance approaches that of the ordered statistics algorithm * Both authors contributed equally to this work. (OSD) of [9], [10]. Summarizing, our main contributions are: for an associated computational algorithm. The nodes are 1) A novel deep neural network training framework for called neurons and each performs (i.e., is associated with) a decoding, which is robust to overitting. It is based on the simple computation on inputs to produce a single output. The extension of syndrome decoding that is described below. algorithm inputs are fed into the first-layer neurons, whose 2) An extension of syndrome decoding which accounts outputs are fed as inputs into the next layer and so forth. for reliabilities. As with legacy syndrome decoding, we Finally, the outputs of the last layer neurons become the define a generic framework, leaving room for a noise- algorithm output. Deep networks are simply neural networks estimation algorithm which is specified separately. We with many layers. provide analysis that proves that the framework involves The inputs and output at each neuron are real-valued num- no loss of optimality, and that regardless of the noise- bers. To compute its output, each neuron first computes an estimation algorithm, performance is invariant to the affine function of its input (a weighted sum plus a bias). It then transmitted codeword. applies a predefined activation function, which is typically 3) Two neural network designs for the noise-estimation al- nonlinear (e.g., sigmoid or hyperbolic tangent), to render the gorithm, including an elaborate RNN-based architecture. output. 4) A simple preprocessing technique that applies permuta- The power of neural networks lies in their configurability. tions to boost the decoder’s performance. Specifically, the weights and biases at each neuron are pa- Our work is organized as follows. In Sec. II we introduce rameters which can be tailored to produce a diverse range some notations and in Sec III we provide some bakground on of computations. Typically, the network is configured by a neural networks. In Sec. IV we describe our syndrome-based training procedure. This procedure relies on a sample dataset, framework and provide an analysis of it. In Sec. V we discuss consisting of inputs, and in the case of supervised training, of deep neural network architectures as well as preprocessing by desired outputs (known as labels). There are several training permutation. In Sec. VI we present simulation results. Sec. VII paradigms available, most of which are variations of gradient concludes the paper. descent. Overfitting occurs typically when the training dataset is II. NOTATIONS insufficiently large or diverse to be representative of all valid We will often use the superscripts b and s (e.g., xb and xs) to network inputs. The resulting network does not generalize well denote binary values (i.e, over the alphabet f0; 1g) and bipolar to inputs that were not encountered in the training set. values (i.e., over {±1g), respectively. We define the mapping from the binary to the bipolar alphabet, denoted bipolar(·), by IV. THE PROPOSED SYNDROME-BASED FRAMEWORK AND 0 ! 1; 1 ! −1 and let bin(·) denote the inverse mapping. ITS ANALYSIS Note that the following identity holds: A. Framework Definition bin(xs · ys) = bin(xs) ⊕ bin(ys); 8xs; ys 2 {±1g (1) We begin by briefly discussing the encoder. We assume standard transmission that uses a linear code C. We let where ⊕ denotes XOR. sign(x) and jxj denote the sign m 2 f0; 1gK denote the input message, which is mapped and absolute value of any real-valued x, respectively, In to a codeword xb 2 f0; 1gN (recall that the superscript b our analysis below, when applied to a vector, the operations denotes binary vectors). We assume that xb is mapped to a bipolar(·), bin(·), sign(·) and j · j are assumed to be applied bipolar vector x using the mapping defined in Sec. II, and x independently to each of the vector’s components. is transmitted over the channel. We let y denote the channel output. For convenience, we define m, xb, x and y to be III. BRIEF OVERVIEW OF DEEP NEURAL NETWORKS column vectors. We now briefly describe the essentials of neural networks. Our discussion is in no way comprehensive, and there are Decoder many variations on the simple setup we describe. For an 퐲풔 elaborate discussion, see e.g. [15]. Channel 퐲풃 퐻퐲풃 output sign bin 퐻 Noise Estimate estimation 퐲 x sign 퐱 abs F |퐲| Algorithm Algorithm inputs outputs Fig. 2. Decoder framework. The final sign operation is omitted when the system is designed to produce soft decisions. In Sec. V, we will implement F using a neural network. Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Our decoder framework is depicted in Figure 2.